[antlr-interest] Lexer matching non-matching rule

Jesper Larsson antlr at avadeaux.net
Sun May 17 02:01:36 PDT 2009


> If you want the longest match, then left factor everything and let it
> do that: 
> 
> A ( B (C|) |) ;
> 
> And set the token type at the appropriate points.

Not always so easy, however. My original example was, even more
simplified, something like this:

FOO:    'foo';
BAR:    'bar';
FOOZ:   'foo'* 'z';

It might be possible to refactor using emit() or something, I'm not
sure. Difficult, anyway. An alternative would be to force backtracking
using syntactic predicates in the manner Indhu suggested in a previous
reply, but that means the lexer would scan the same input more than
once, and avoiding this is sort of why I use a lexer generator tool
instead of just matching the input with regexps to start with.

By the way, I got around my own problem with the URL/IDENT conflict by
incorporating the URL in the larger context where it appears, getting a
larger token from the lexer which is split up later. This seemed to be
the most bearable inelegancy in my situation.

J'





More information about the antlr-interest mailing list