[antlr-interest] Understanding Lexer rules

Johannes Luber jaluber at gmx.de
Wed Feb 20 06:50:57 PST 2008


Mark Volkmann schrieb:
  > Let's see if I can summarize the rules from this wiki section.
> 
> If the upcoming characters in the stream match a non-imaginary token
> defined in a token spec. then that is used. tokens { ... } comes
> first.

Correct.

> After that, lexer rules are evaluated in the order in which they are
> specified. The first one that matches the upcoming characters in the
> stream is used, not the one that matches the greatest number of
> characters.

No. Rereading the text, I suppose one could be confused about that, as 
it isn't as clear as it could be. Is adding "Longer matches are 
preferred over shorter matches. If one has two tokens KEY='key'; and 
KEYWORD='keyword';, then the input 'keyword' will match KEYWORD, even if 
KEY comes first." enough?

> After that, literals specified in parser rules are considered. This
> means that parser rules containing literals will not match the input
> if there is a lexer rule that matches the same input.
> 
> Does all that sound correct?
> 
> At the end of the "How to define tokens" section in the wiki it says
> that "lexer rules will greedily match the maximum of applicable
> characters". There is an exception to this. When the patterns ".*" or
> ".+" appear in a lexer rule, they do no match greedily.

I'll add that.

Johannes




More information about the antlr-interest mailing list