[antlr-interest] 'filter' option in ANTLR 3.0
Ryan Hollom
ryan.hollom at us.lawson.com
Thu Sep 21 07:46:59 PDT 2006
Thanks for the response. I have trimmed down the grammar to a (seemingly)
simple case, and am still getting a lexer error with my input.
>Can you explain why are using the filter option?
Because I was getting lexer errors, I turned the filter option on to avoid
the errors as I developed, so there is no explicit reason for me to use
this option.
So, given the grammar:
grammar testgrammar;
options { language=Java; output=AST; }
bc_definition
: ID 'is a ClassDefinition' NEWLINE
persistentField* ;
persistentField : ID (inlineDef | fieldDef) NEWLINE ;
fieldDef : ('is an' | 'is a') ID;
inlineDef : ('is' ('Alpha' | 'Numeric'));
NEWLINE : (('\r')? '\n' )+ ;
ID : ( 'A' .. 'Z' | '0' .. '9') ( 'A' .. 'Z' | 'a' .. 'z' | '0'
.. '9')* ;
WS : (' '|'\t')+ {channel=99;};
and the input
MyClass is a ClassDefinition
TestField is Numeric
MyField is an AwesomeField
I get the error(s):
[]: line 3:11 1:1: Tokens : ( T7 | T8 | T9 | T10 | T11 | T12 | NEWLINE |
ID | WS ); state 10 (decision=5) no viable alt line 3:11; char='i'
[]: line 3:12 1:1: Tokens : ( T7 | T8 | T9 | T10 | T11 | T12 | NEWLINE |
ID | WS ); state 0 (decision=5) no viable alt line 3:12; char='s'
[bc_definition, persistentField]: line 3:14 state 0 (decision=2) no viable
alt; token=[@4,55:61='Numeric',<12>,3:14]
is this expected behavior?
Thanks again!
-Ryan
Terence Parr <parrt at cs.usfca.edu>
Sent by: antlr-interest-bounces at antlr.org
09/20/2006 04:27 PM
To
ANTLR Interest <antlr-interest at antlr.org>
cc
Subject
Re: [antlr-interest] 'filter' option in ANTLR 3.0
On Sep 20, 2006, at 2:16 PM, Ryan Hollom wrote:
>
> Greetings-
> I have a grammar with several multi-word keywords, and I'm having
> trouble properly tokenizing the input. For example, I have the rules
>
> classDef : ID 'is a ClassDefinition';
> fieldDef: ID ('is a' | 'is an') ID
> inlineDef : ID 'is' ('Alpha' | 'Numeric')
>
> So the 'is'-prefixed keywords are 'is a ClassDefinition', 'is a
> Class', 'is a', 'is an', and 'is'. With these rules, the lexer
> chokes on input like:
>
> MyClass is a ClassDefinition
> MyNumericField is Numeric
>
> with a no viable alt line 2:20; char='N'
>
> It would seem to me that the lexer should try to match the longest
> multi-word keyword it can, and, in this case, should create the
> tokens <MyClass>, <'is a ClassDefinition'>,
Hi Ryan,
It probably is matching the longest it can, but there are maybe a
rule that matches the whitespace for something also that is messing
this up.
> <MyNumericField>, <'is'>, and <'Numeric'>. I have tried to use the
> filter option to properly tokenize, but this forces me to list all
> of my keywords in the order in which they should be recognized
> (correct?), which seems like it would be a big issue when importing
> a different vocab/super grammar.
your problem does not seem like a filtering problem to me. Can you
explain why are using the filter option?
>
> Am I missing an obvious solution here? I've tried many different
> permutations and can't seem to get it just right.
>
> Thanks in advance,
> Ryan
>
> PS - Why is it that when the filter option is set to true, semantic
> actions are handled differently? For the rule
great question! if you look at the nextToken method in the output you
will note that it turns on backtracking to sequentially try the rules
in order that you have specified looking for a match. Because of
that actions are executed while the lexer is technically backtracking.
>
> fieldDef: ID { printId(); } 'is a' ID;
>
> generates to
> if (backtracking == 1) { printId(); }
> with filter=true vs
> if (backtracking == 0) { printId(); }
> when filter=false.
hence you will see this backtracking check for level 1 instead of
level 0.
Ter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060921/27ca476e/attachment-0001.html
More information about the antlr-interest
mailing list