[antlr-interest] greedy=false for lexersBy default

Gavin Lambert antlr at mirality.co.nz
Sat May 24 05:32:32 PDT 2008


At 10:32 24/05/2008, Terence Parr wrote:
 >  I'm thinking of changing lexers to use greedy=false by default 
so
 >
 >that things like
 >
 >STRING : '"' ('\\' '"'|.)* '"' ;
 >
 >  so I don't have to say
 >
 >STRING : '"' (options {greedy=false;}:'\\' '"'|.)* '"' ;

Provided it only goes non-greedy if there's a following 
character.  Otherwise I think it'd lead to too much change in 
behaviour.  (And I'm not sure what makes sense if the following 
character can be optional.)

But yeah, like Loring I almost never use a . in this sort of case; 
it's more common to do something along the lines of:

STRING : '"' ('\\' ('"' | '\\' | 'r' | 'n') | ~('\\' | '"'))* '"';

Very explicit and I think it makes greediness or non-greediness 
irrelevant too.  Alternatively, if I don't want the lexer to choke 
if an invalid escape sequence is used, I'll use the simpler form:

STRING : '"' ('\\' . | ~('\\' | '"'))* '"';

Admittedly this approach does get a bit messy when the termination 
sequence consists of multiple characters (eg. '*/' for a C-like 
block comment or ']]>' for XML CDATA).  That's when an 
auto-non-greedy approach might be beneficial.



More information about the antlr-interest mailing list