[antlr-interest] 'filter' option in ANTLR 3.0

Thu Sep 21 07:46:59 PDT 2006

Thanks for the response.  I have trimmed down the grammar to a (seemingly) 
simple case, and am still getting a lexer error with my input. 
>Can you explain why are using the filter option?
Because I was getting lexer errors, I turned the filter option on to avoid 
the errors as I developed, so there is no explicit reason for me to use 
this option.

So, given the grammar:

grammar testgrammar;
options { language=Java; output=AST; }

bc_definition
        : ID 'is a ClassDefinition' NEWLINE
                persistentField* ;

persistentField : ID (inlineDef | fieldDef) NEWLINE ; 
fieldDef        : ('is an' | 'is a') ID;
inlineDef       : ('is' ('Alpha' | 'Numeric'));

NEWLINE :   (('\r')? '\n' )+ ;
ID      :       ( 'A' .. 'Z' | '0' .. '9') ( 'A' .. 'Z' | 'a' .. 'z' | '0' 
.. '9')* ; 
WS      :       (' '|'\t')+ {channel=99;};

and the input

MyClass is a ClassDefinition
        TestField is Numeric
        MyField is an AwesomeField

I get the error(s):

[]: line 3:11 1:1: Tokens : ( T7 | T8 | T9 | T10 | T11 | T12 | NEWLINE | 
ID | WS ); state 10 (decision=5) no viable alt line 3:11; char='i'
[]: line 3:12 1:1: Tokens : ( T7 | T8 | T9 | T10 | T11 | T12 | NEWLINE | 
ID | WS ); state 0 (decision=5) no viable alt line 3:12; char='s'
[bc_definition, persistentField]: line 3:14 state 0 (decision=2) no viable 
alt; token=[@4,55:61='Numeric',<12>,3:14]

is this expected behavior?

Thanks again!
-Ryan

Terence Parr <parrt at cs.usfca.edu> 
Sent by: antlr-interest-bounces at antlr.org
09/20/2006 04:27 PM

To
ANTLR Interest <antlr-interest at antlr.org>
cc

Subject
Re: [antlr-interest] 'filter' option in ANTLR 3.0

On Sep 20, 2006, at 2:16 PM, Ryan Hollom wrote:

>
> Greetings-
> I have a grammar with several multi-word keywords, and I'm having 
> trouble properly tokenizing the input.  For example, I have the rules
>
> classDef : ID 'is a ClassDefinition';
> fieldDef: ID ('is a' | 'is an') ID
> inlineDef : ID 'is' ('Alpha' | 'Numeric')
>
> So the 'is'-prefixed keywords are 'is a ClassDefinition', 'is a 
> Class', 'is a', 'is an', and 'is'.  With these rules, the lexer 
> chokes on input like:
>
> MyClass is a ClassDefinition
>         MyNumericField is Numeric
>
> with a no viable alt line 2:20; char='N'
>
> It would seem to me that the lexer should try to match the longest 
> multi-word keyword it can, and, in this case, should create the 
> tokens <MyClass>, <'is a ClassDefinition'>,

Hi Ryan,

It probably is matching the longest it can, but there are maybe a 
rule that matches the whitespace for something also that is messing 
this up.

> <MyNumericField>, <'is'>, and <'Numeric'>.  I have tried to use the 
> filter option to properly tokenize, but this forces me to list all 
> of my keywords in the order in which they should be recognized 
> (correct?), which seems like it would be a big issue when importing 
> a different vocab/super grammar.

  your problem does not seem like a filtering problem to me.  Can you 
explain why are using the filter option?

>
> Am I missing an obvious solution here?  I've tried many different 
> permutations and can't seem to get it just right.
>
> Thanks in advance,
> Ryan
>
> PS - Why is it that when the filter option is set to true, semantic 
> actions are handled differently?  For the rule

great question! if you look at the nextToken method in the output you 
will note that it turns on backtracking to sequentially try the rules 
in order that you have specified looking for a match.  Because of 
that actions are executed while the lexer is technically backtracking.

>
> fieldDef: ID { printId(); } 'is a' ID;
>
> generates to
> if (backtracking == 1) { printId(); }
> with filter=true vs
> if (backtracking == 0) { printId(); }
> when filter=false.

hence you will see this backtracking check for level 1 instead of 
level 0.

Ter

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060921/27ca476e/attachment-0001.html