[antlr-interest] Lexer problem

Thomas Brandon tbrandonau at gmail.com
Mon Mar 10 22:02:12 PDT 2008


On Tue, Mar 11, 2008 at 2:51 PM, Brent Yates <brent.yates at gmail.com> wrote:

> I need some help understanding syntactic predicates when used in the lexer.
>
> Here is a simple grammar that will run in AntlrWorks:
>
> grammar Simple;
>
> options
>     {
>     language= Java;
>     output=AST;
>     }
>
> start
>     :   TEST
>     ;
>
> POUND   :   '#' ;
> ID      :   'a'..'z'+ ;
> fragment DECIMAL_DIGIT
>     :   '0'..'9'
>     ;
>
> TEST
>     :   POUND WS?
>     (
>         ('aaa') => 'aaa' WS DECIMAL_DIGIT       {$channel=HIDDEN;$type=DECIMAL_DIGIT;}
>     |   ('bbb') => 'bbb' WS DECIMAL_DIGIT       {$channel=HIDDEN;$type=ID;}
>     |   ID
>     )
>     ;
>
> fragment SPACE_OR_TAB
>     :  (' '|'\t')+
>     ;
>
> WS
>     :   SPACE_OR_TAB+
>         {$channel=HIDDEN;}
>     ;
>
> NEWLINE
>     :   ('\r'? ('\u000C'|'\n') )
>         {$channel=HIDDEN;}
>     ;
>
> When fed this input:
>
> # aaa 4
> # bbb
> #hi
>
> I would expect the following:
>
> 1) the '# aaa 4' matches alt1 in TEST and should be set to HIDDEN and type DECIMAL_DIGIT.  And that does happen.
> 2) the '# bbb<nl>#hi' does not match alt2, however it does match the predicate.  I would expect a lexer error.  What happens is that the token type is set to HIDDEN and the rules actually matches the ID and returns a type of TEST.  That I don't understand.
>
> It looks like the actions of alt2 are being run even though only the predicate matches.  Also, if the predicate matches, why does the lexer later match alt3?

I would suggest you examine he generated code to better understand
whats happening. I think your problem is syntactic predicates
disambiguate syntactically ambiguous alternatives they don't supplant
standard lookaheads. Simply meeting the predicate is not enough to
guarantee that alternative it has to also match the alternative. As
the second line doesn't fit alt2 it won't match that alternative,
regardless of the predicate. So it is taking alt3. Your predicates are
redundant. If you changed 'aaa' and 'bbb' to ID, leaving the
predicates the same, then the predicates should disambiguate the two
ambiguous alternatives.
The actions of alt2 are not being run, the WS rule is setting channel=hidden.

Tom.
>
> Thanks for your help,
>
> Brent Yates
> brent,yates at gmail.com
>
>
>


More information about the antlr-interest mailing list