[antlr-interest] Look-ahead problem parsing phrase?

Gavin Lambert antlr at mirality.co.nz
Sun Jun 28 14:41:09 PDT 2009


At 09:21 29/06/2009, Sean O'Dell wrote:
>Why should lexer rules not refer to other lexer rules without 
>being fragments?  I've read that doing so only prevented token 
>creation.  It affects logic, as well?

The moment you have one top-level lexer rule referring to another 
top-level rule, you introduce ambiguity -- you're basically 
telling the lexer "given this input, produce one of these two 
tokens but I don't care which", and then in the parser you're 
expecting exactly one of those tokens.  Sometimes you'll happen to 
pick the right one and it'll parse.  Sometimes you 
won't.  Sometimes the rules are sufficiently different that given 
certain input it produces one token and given other input it 
produces the other.  Then you're basically screwed.

It's important that given any particular input in isolation, there 
should be one and only one possible token that can be produced for 
it.  Doing anything else is just letting yourself in for a world 
of pain.


Also, your EOL rule was a top-level lexer rule that can 
successfully match zero characters.  Doing that creates infinite 
loops, and is something else that must be avoided.  (Which is 
another reason why it should be a parser rule.)



More information about the antlr-interest mailing list