[antlr-interest] Can antlr v3 lex star | tristar properly?
Guntis Ozols
guntiso at latnet.lv
Wed Nov 21 07:14:04 PST 2007
Is it a bug or a feature that
TRISTAR : ('***')=>'***'; does not work?
Is it a bug or a feature that
STAR : '*' ('**' {type = TRISTAR;})?; does not work?
Can it be lexed with only syntactic predicates?
How can the following be lexed:
DCOLON : '::';
NS_TEST : NCName ':*';
PrefixedName : NCName ':' NCName;
NCName : ('a'..'z' | 'A'..'Z' | '_')
('a'..'z' | 'A'..'Z' | '.' | '-' | '_' | '0'..'9')*;
> The problem is basically that ANTLR doesn't do longest-match matching.
> It predicts the next rule that can possibly match based on a minimal
> number of lookahead symbols (characters, tokens or tree nodes).
>
> After seeing two STAR tokens as lookahead, it concludes that the only
> thing that makes sense should be TRISTAR. This behavior is probably
> not terribly intuitive, but as ANTLR doesn't backtrack like lex does
> (lex can simply backtrack in the internal state machine, ANTLR would
> have to do that across method calls...) it's pretty much unavoidable.
> In these cases you need to have some kind of predicate to help ANTLR.
> This should only apply to prefix problems like this, though.
>
> Here's my solution to the problem:
>
> stars : (STAR | TRISTAR)* EOF;
>
> TRISTAR : {input.LA(3) == '*'}? => '*' '*' '*';
> STAR : '*';
>
> Works like a charm. Try it with five '*' chars in ANTLRWorks :)
> You only have to help out at one place here, to force it to match the
> longer token first. Pretty good tradeoff if you ask me.
>
> cheers,
> -k
> --
> Kay Röpke
> http://classdump.org/
More information about the antlr-interest
mailing list