[antlr-interest] Lexer matching non-matching rule
Jesper Larsson
antlr at avadeaux.net
Sun May 17 02:01:36 PDT 2009
> If you want the longest match, then left factor everything and let it
> do that:
>
> A ( B (C|) |) ;
>
> And set the token type at the appropriate points.
Not always so easy, however. My original example was, even more
simplified, something like this:
FOO: 'foo';
BAR: 'bar';
FOOZ: 'foo'* 'z';
It might be possible to refactor using emit() or something, I'm not
sure. Difficult, anyway. An alternative would be to force backtracking
using syntactic predicates in the manner Indhu suggested in a previous
reply, but that means the lexer would scan the same input more than
once, and avoiding this is sort of why I use a lexer generator tool
instead of just matching the input with regexps to start with.
By the way, I got around my own problem with the URL/IDENT conflict by
incorporating the URL in the larger context where it appears, getting a
larger token from the lexer which is split up later. This seemed to be
the most bearable inelegancy in my situation.
J'
More information about the antlr-interest
mailing list