[antlr-interest] Lexer problem

Mon Mar 10 20:51:23 PDT 2008

I need some help understanding syntactic predicates when used in the lexer.

Here is a simple grammar that will run in AntlrWorks:

grammar Simple;

options
    {
    language= Java;
    output=AST;
    }

start
    :   TEST
    ;

POUND   :   '#' ;
ID      :   'a'..'z'+ ;
fragment DECIMAL_DIGIT
    :   '0'..'9'
    ;

TEST
    :   POUND WS?
    (
        ('aaa') => 'aaa' WS DECIMAL_DIGIT
{$channel=HIDDEN;$type=DECIMAL_DIGIT;}
    |   ('bbb') => 'bbb' WS DECIMAL_DIGIT       {$channel=HIDDEN;$type=ID;}
    |   ID
    )
    ;

fragment SPACE_OR_TAB
    :  (' '|'\t')+
    ;

WS
    :   SPACE_OR_TAB+
        {$channel=HIDDEN;}
    ;

NEWLINE
    :   ('\r'? ('\u000C'|'\n') )
        {$channel=HIDDEN;}
    ;

When fed this input:

# aaa 4
# bbb
#hi

I would expect the following:

1) the '# aaa 4' matches alt1 in TEST and should be set to HIDDEN and type
DECIMAL_DIGIT.  And that does happen.
2) the '# bbb<nl>#hi' does not match alt2, however it does match the
predicate.  I would expect a lexer error.  What happens is that the token
type is set to HIDDEN and the rules actually matches the ID and returns a
type of TEST.  That I don't understand.

It looks like the actions of alt2 are being run even though only the
predicate matches.  Also, if the predicate matches, why does the lexer later
match alt3?

Thanks for your help,

Brent Yates
brent,yates at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080310/aa98ac00/attachment.html