[antlr-interest] 'filter' option in ANTLR 3.0
Jim Idle
jimi at intersystems.com
Wed Sep 20 16:01:57 PDT 2006
Ryan,
I suggest that you will have more traction specifying these words as individual keywords and not using spaces within the keywords. I presume that you want filter mode for some other reason though? You would not normally require it for ‘normal’ parsing. Then again maybe this is not a parser as such?
So: Try this, (with suitable mods of course)
grammar test;
test
: ( classDef | fieldDef | inlineDef)+
;
classDef: ID IS A CLASSDEFINITION ;
fieldDef: ID IS (A | AN) ID ;
inlineDef: ID IS (ALPHA | NUMERIC) ;
A : 'a' ;
ALPHA : 'Alpha' ;
AN : 'an';
CLASSDEFINITION : 'ClassDefinition' ;
IS : 'is' ;
NUMERIC : 'Numeric';
WS : (' ' | '\t' | 'n' | '\r') { channel=99;} ;
ID : IDCHARS IDCHARS*;
fragment
IDCHARS: ('a'..'z' | 'A'.. 'Z') ;
This works perfectly in ANTLRWorks 1.0b3 with the input:
Elephant is a ClassDefinition
Socrates is an Elephant
AllElephants is a Socrates
Ter – this reminds me that though we have debated the case insensitive lexing thing, I no longer remember what the conclusion was [meaning los in meta levels ;-) ]? Just overriding things in the end was it?
Jim
_____
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Ryan Hollom
Sent: Wednesday, September 20, 2006 2:17 PM
To: antlr-interest at antlr.org
Subject: [antlr-interest] 'filter' option in ANTLR 3.0
Greetings-
I have a grammar with several multi-word keywords, and I'm having trouble properly tokenizing the input. For example, I have the rules
classDef : ID 'is a ClassDefinition';
fieldDef: ID ('is a' | 'is an') ID
inlineDef : ID 'is' ('Alpha' | 'Numeric')
So the 'is'-prefixed keywords are 'is a ClassDefinition', 'is a Class', 'is a', 'is an', and 'is'. With these rules, the lexer chokes on input like:
MyClass is a ClassDefinition
MyNumericField is Numeric
with a no viable alt line 2:20; char='N'
It would seem to me that the lexer should try to match the longest multi-word keyword it can, and, in this case, should create the tokens <MyClass>, <'is a ClassDefinition'>, <MyNumericField>, <'is'>, and <'Numeric'>. I have tried to use the filter option to properly tokenize, but this forces me to list all of my keywords in the order in which they should be recognized (correct?), which seems like it would be a big issue when importing a different vocab/super grammar.
Am I missing an obvious solution here? I've tried many different permutations and can't seem to get it just right.
Thanks in advance,
Ryan
PS - Why is it that when the filter option is set to true, semantic actions are handled differently? For the rule
fieldDef: ID { printId(); } 'is a' ID;
generates to
if (backtracking == 1) { printId(); }
with filter=true vs
if (backtracking == 0) { printId(); }
when filter=false.
I am using antlr3.0 b4. Thanks again!
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.12.5/450 - Release Date: 9/18/2006
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.405 / Virus Database: 268.12.5/450 - Release Date: 9/18/2006
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060920/f9db6bb7/attachment-0001.html
More information about the antlr-interest
mailing list