[antlr-interest] Making common prefixes work
Gavin Lambert
antlr at mirality.co.nz
Wed Dec 28 14:21:05 PST 2011
At 08:38 29/12/2011, none < wrote:
>the main problem is the lexer, I tried the following:
>
>HASH : '#';
>SEMICOLON : ';';
>TOPBEGIN : '#TOP ';
>BLOCKBEGIN : '{';
>BLOCKEND : '}';
>
>LINE : ~( HASH | SEMICOLON | NEWLINE | BLOCKBEGIN |
>BLOCKEND | TAGBEGIN | TAGEND | '\n' | '\r' )+
>;
>
>While this is working in general, it suddenly breaks if we have
>metadata starting with a T:
The problem is that (at present, anyway) ANTLR lexers are a little
too optimistic -- they assume that they can get away with minimal
lookahead and don't do backtracking. In your case for example the
"TOPBEGIN" rule gets matched when it looks ahead to see "#T" and
then it has no way back to generate a HASH instead when it finds
that the next character isn't an "O".
One solution for this is to set a fixed lookahead to the length of
your longest possibly-ambiguous token, but the usual/better fix is
to explicitly code the lookahead yourself via predicates and type
changes. For example:
fragment TOP: 'TOP';
HASH
: '#'
( /* nothing -- just a HASH */
| (TOP) => TOP { $type = TOP; }
| (ANOTHER) => ANOTHER { $type = ANOTHER; } /* eg. */
);
Note also that you have to have them in a single rule like this --
predicates don't (consistently) work between rules, only within a
rule.
More information about the antlr-interest
mailing list