[antlr-interest] Trouble with lexer ambiguity

Kevin J. Cummings cummings at kjchome.homeip.net
Sun May 30 16:47:36 PDT 2010


On 05/30/2010 06:37 PM, Michael Stover wrote:
> I'm making a grammar that knows how to parse coordinates, such as:
> 
> 4,5
> 
> 
> It knows about offsets, like:
> 
> +2,-8
> 
> It should parse adding coordinates together:
> 
> 4,5+5,9
> 
> However, it seems to see the '+' and from there predict the next token to be
> an offset, despite the fact there is not '+'|'-' before the '9'.
> 
> Here's the grammar I think should work (I made k large enough to show it's
> not helping, no matter how large):
> 
> grammar Test;
> 
> main    :  COORD '+' COORD
>            |    OFFSET
>            ;

When ANTLR lexes, it does so without regard to parser context.  Without
sufficient lookahead, it cannot tell that +5,9 is not a valid OFFSET,
since '+' is an OFFSET start character.  You would need to play with
your grammar in such a way that it can handle this case.  Perhaps
turning OFFSET into a parser rule instead of a lexer rule.

> COORD options{k=7;} :    '0'..'9'+  ',' '0'..'9'+
>     ;
> 
> OFFSET options{k=7;} :    ('+'|'-') '0'..'9'+  ',' ('-'|'+') '0'..'9'+
>     ;

I would try the following:

OFFSET : (('+'|'-') '0'..'9'+ ',' ('+'|'-'))=>
          ('+'|'-') '0'..'9'+ ',' ('-'|'+') '0'..'9'+
       ;

and if ANTLR doesn't like that, I'd make a parser rule to construct
offsets from lesser token sequences and use the syntactic predicate
in the parser rule to help differentiate the possible alternatives in
the parser rules which use OFFSETs.

> 
> WS  :   ( ' '
>         | '\t'
>         | '\r'
>         | '\n'
>         ) {$channel=HIDDEN;}
>     ;
> 
> 
> What am I missing?

You should take look at the example rule on the ANTLR wiki that shows
how to parse floating point numbers using selective look-ahead.  Its the
same principle that you need to help determine when you have an OFFSET
and when you don't....

> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


-- 
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)


More information about the antlr-interest mailing list