[antlr-interest] 'filter' option in ANTLR 3.0

Ryan Hollom ryan.hollom at us.lawson.com
Wed Sep 20 14:16:34 PDT 2006


Greetings-
I have a grammar with several multi-word keywords, and I'm having trouble 
properly tokenizing the input.  For example, I have the rules 

classDef : ID 'is a ClassDefinition';
fieldDef: ID ('is a' | 'is an') ID
inlineDef : ID 'is' ('Alpha' | 'Numeric')

So the 'is'-prefixed keywords are 'is a ClassDefinition', 'is a Class', 
'is a', 'is an', and 'is'.  With these rules, the lexer chokes on input 
like:

MyClass is a ClassDefinition
        MyNumericField is Numeric

with a no viable alt line 2:20; char='N'

It would seem to me that the lexer should try to match the longest 
multi-word keyword it can, and, in this case, should create the tokens 
<MyClass>, <'is a ClassDefinition'>, <MyNumericField>, <'is'>, and 
<'Numeric'>.  I have tried to use the filter option to properly tokenize, 
but this forces me to list all of my keywords in the order in which they 
should be recognized (correct?), which seems like it would be a big issue 
when importing a different vocab/super grammar.

Am I missing an obvious solution here?  I've tried many different 
permutations and can't seem to get it just right.

Thanks in advance, 
Ryan

PS - Why is it that when the filter option is set to true, semantic 
actions are handled differently?  For the rule

fieldDef: ID { printId(); } 'is a' ID;

generates to 
if (backtracking == 1) { printId(); }
with filter=true vs
if (backtracking == 0) { printId(); }
when filter=false.

I am using antlr3.0 b4.  Thanks again!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060920/d67bc5ed/attachment.html 


More information about the antlr-interest mailing list