[antlr-interest] Making common prefixes work

Wed Dec 28 14:21:05 PST 2011

At 08:38 29/12/2011, none < wrote:
 >the main problem is the lexer, I tried the following:
 >
 >HASH : '#';
 >SEMICOLON : ';';
 >TOPBEGIN : '#TOP ';
 >BLOCKBEGIN : '{';
 >BLOCKEND : '}';
 >
 >LINE : ~( HASH | SEMICOLON | NEWLINE | BLOCKBEGIN |
 >BLOCKEND | TAGBEGIN | TAGEND | '\n' | '\r' )+
 >;
 >
 >While this is working in general, it suddenly breaks if we have
 >metadata starting with a T:

The problem is that (at present, anyway) ANTLR lexers are a little 
too optimistic -- they assume that they can get away with minimal 
lookahead and don't do backtracking.  In your case for example the 
"TOPBEGIN" rule gets matched when it looks ahead to see "#T" and 
then it has no way back to generate a HASH instead when it finds 
that the next character isn't an "O".

One solution for this is to set a fixed lookahead to the length of 
your longest possibly-ambiguous token, but the usual/better fix is 
to explicitly code the lookahead yourself via predicates and type 
changes.  For example:

fragment TOP: 'TOP';
HASH
   : '#'
   ( /* nothing -- just a HASH */
   | (TOP) => TOP { $type = TOP; }
   | (ANOTHER) => ANOTHER { $type = ANOTHER; } /* eg. */
   );

Note also that you have to have them in a single rule like this -- 
predicates don't (consistently) work between rules, only within a 
rule.