[antlr-interest] 'filter' option in ANTLR 3.0

Terence Parr parrt at cs.usfca.edu
Wed Sep 20 14:27:00 PDT 2006


On Sep 20, 2006, at 2:16 PM, Ryan Hollom wrote:

>
> Greetings-
> I have a grammar with several multi-word keywords, and I'm having  
> trouble properly tokenizing the input.  For example, I have the rules
>
> classDef : ID 'is a ClassDefinition';
> fieldDef: ID ('is a' | 'is an') ID
> inlineDef : ID 'is' ('Alpha' | 'Numeric')
>
> So the 'is'-prefixed keywords are 'is a ClassDefinition', 'is a  
> Class', 'is a', 'is an', and 'is'.  With these rules, the lexer  
> chokes on input like:
>
> MyClass is a ClassDefinition
>         MyNumericField is Numeric
>
> with a no viable alt line 2:20; char='N'
>
> It would seem to me that the lexer should try to match the longest  
> multi-word keyword it can, and, in this case, should create the  
> tokens <MyClass>, <'is a ClassDefinition'>,

Hi Ryan,

It probably is matching the longest it can, but there are maybe a  
rule that matches the whitespace for something also that is messing  
this up.

> <MyNumericField>, <'is'>, and <'Numeric'>.  I have tried to use the  
> filter option to properly tokenize, but this forces me to list all  
> of my keywords in the order in which they should be recognized  
> (correct?), which seems like it would be a big issue when importing  
> a different vocab/super grammar.

  your problem does not seem like a filtering problem to me.  Can you  
explain why are using the filter option?

>
> Am I missing an obvious solution here?  I've tried many different  
> permutations and can't seem to get it just right.
>
> Thanks in advance,
> Ryan
>
> PS - Why is it that when the filter option is set to true, semantic  
> actions are handled differently?  For the rule

great question! if you look at the nextToken method in the output you  
will note that it turns on backtracking to sequentially try the rules  
in order that you have specified looking for a match.  Because of  
that actions are executed while the lexer is technically backtracking.

>
> fieldDef: ID { printId(); } 'is a' ID;
>
> generates to
> if (backtracking == 1) { printId(); }
> with filter=true vs
> if (backtracking == 0) { printId(); }
> when filter=false.

hence you will see this backtracking check for level 1 instead of  
level 0.

Ter


More information about the antlr-interest mailing list