[antlr-interest] Understanding Lexer rules

Wed Feb 20 05:37:51 PST 2008

On Feb 20, 2008 2:17 AM, Gavin Lambert <antlr at mirality.co.nz> wrote:
> At 09:44 20/02/2008, Darien Hager wrote:
> >1) It helps to consider lexer rules (token definitions) to be
> >separate from the parser rules, even though they're in the same
> >file.
>
> Yes.  In fact my opinion is that it's best not to use character
> literals in the parser at all (since this helps to reinforce the
> separation between lexer and parser).  There was a big discussion
> on this last week.
>
> >2) Unlike parser rules, the order of appearance matters. (The
> >auto-named tokens generated by literals in parser rules are
> >appended.)
>
> I believe they're prepended, actually.  And the order only sort-of
> matters; a rule that consumes more input will usually win against
> one that consumes less input regardless of order.

This really bothers me. Every time this subject comes up it seems that
people talk about what they *think* happens, using words like "I
believe", "sort-of" and "usually". Gavin, I'm not faulting you. You're
supplying as much information as you have which is helpful. I just
wish that someone who knows for sure would speak up and put the
definitive information in the wiki so we can stop speculating on how
the lexer chooses the next rule to apply.

> >3) The lexer seeks to match the first viable token.
>
> Sort of (see above).

-- 
R. Mark Volkmann
Object Computing, Inc.