[antlr-interest] Re: Recommendation for Lexer

Wed Feb 8 10:17:57 PST 2006

> Here is simple example: in ruby, '/' can be the DIVIDE operator, or 
> start of regular expression (same syntax as in perl), so you can have 
> the following lexer rule:
> DIV_OR_REGEX
> : {exprect_div()}? '/' {$setType(DIV);}
> | '/' REGEX_CONTENT '/' {$setType(REGEX);}
> ;

This works if you've got 5 or 6 single ambiguities. In my language,
every keyword (and there are about 50) is ambiguous as it could also be
an identifier. Plus some other ambiguities unrelated to that. Plus XML
as an island language (directly embeddable but you can again call out to
XQuery). I tried going with ANTLR some time ago, but it all ends up in a
mess. The above code from you also only works for a fixed k == 1 without
predicates (this can bite you really ugly!).

JFlex looks good at the moment. It doesn't impose any class inheritance
on you and the generated lexer is completely standalone, so it should be
easy to integrate with ANTLR. Plus it brings native support for the
issues I have. The only thing I'm missing is a deeper control about what
parts of the token end up in the tokens text, but maybe I've just not
found that yet.

Martin