[antlr-interest] Understanding Lexer rules

Mark Volkmann r.mark.volkmann at gmail.com
Wed Feb 20 06:35:24 PST 2008


On Feb 20, 2008 8:22 AM, Johannes Luber <jaluber at gmx.de> wrote:
> Micke Hovmöller schrieb:
>
> > On 2/20/08, Mark Volkmann <r.mark.volkmann at gmail.com> wrote:
> >> This really bothers me. Every time this subject comes up it seems that
> >> people talk about what they *think* happens, using words like "I
> >> believe", "sort-of" and "usually". Gavin, I'm not faulting you. You're
> >> supplying as much information as you have which is helpful. I just
> >> wish that someone who knows for sure would speak up and put the
> >> definitive information in the wiki so we can stop speculating on how
> >> the lexer chooses the next rule to apply.
> >
> > I'd like to second this motion, and just add that this information
> > will have to include some quite detailed and specific examples. This
> > seems like the sort of thing that is just too complex for most people
> > to understand without examples.
> >
> > /Micke
> >
> Isn't
> <http://www.antlr.org/wiki/display/ANTLR3/Quick+Starter+on+Parser+Grammars+-+No+Past+Experience+Required>
> including this information in the "How to define tokens" section?

Let's see if I can summarize the rules from this wiki section.

If the upcoming characters in the stream match a non-imaginary token
defined in a token spec. then that is used. tokens { ... } comes
first.

After that, lexer rules are evaluated in the order in which they are
specified. The first one that matches the upcoming characters in the
stream is used, not the one that matches the greatest number of
characters.

After that, literals specified in parser rules are considered. This
means that parser rules containing literals will not match the input
if there is a lexer rule that matches the same input.

Does all that sound correct?

At the end of the "How to define tokens" section in the wiki it says
that "lexer rules will greedily match the maximum of applicable
characters". There is an exception to this. When the patterns ".*" or
".+" appear in a lexer rule, they do no match greedily.

-- 
R. Mark Volkmann
Object Computing, Inc.


More information about the antlr-interest mailing list