[antlr-interest] Lexer errors when looking for wrong token

Mon Oct 11 21:15:38 PDT 2010

Thanks for the responses.

Kevin,

  Yes, that helps. I'm using the C target so I haven't been able to actually
test it but I see the logic.

Joachim,

  Your explanation is very clear. My grammar is mostly working now so I
don't think I'll be changing lexers. I'll probably first try adding the
extra token definitions and emitting two tokens for those cases.

Ad

On Mon, Oct 11, 2010 at 5:57 AM, Joachim Schrod <jschrod at acm.org> wrote:

> A Z wrote:
>
> > I have a lexer with the following rules:
> >
> > LBMINUSGT                  : '[->';
> > LBASRB                     : '[*]';
> > LBAST                      : '[*';
> > LBEQUALS                   : '[=';
> > LBPLUSRB                   : '[+]';
> > LBRACE                     : '{';
> > LBRACKET                   : '[';
> > MINUS                      : '-';
> >
> > The lexer fails(with an error message) when any string of '[-' or '[*' is
> > detected. I'm confused why ANTLR cannot tokenize '[-' correctly as
> LBRACKET
> > MINUS.
>
> Because ANTLR-lexers cannot backtrack.
>
> '[-' starts the token LBMINUSGT and only that token. Thus, when '['
> and '-' arrive in input, recognition for the token LBMINUSGT is
> started. When no '>' arrives, the lexer is not able to backtrack to
> the point in time where '-' has not arrived and turn '[' into
> LBRACKET. Since there are no other tokens that start with '[-', an
> error is reported and error recovery takes place.
>
> The canonical way to solve this problem is to create tokens that
> cover all prefixes of all existing tokens. I.e., in your cited
> grammar fragment you need additional tokens that match '[-' and '[+'.
>
> I hope this makes the problem more understandable,
>
>        Joachim
>
> PS: Actually, there is a non-canonical way to solve the problem:
> One can use a different tool to generate the lexer, one that can
> backtrack, and use ANTLR only for its great parser abilities.
> That's what I do, I use JFlex, after having fought with ANTLR lexer
> definition restrictions one time too often. ;-)
>
> --
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> Joachim Schrod                          Email: jschrod at acm.org
> Roedermark, Germany
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>