[antlr-interest] newbie: lexer rules vs parser rules

Vidar Håkestad vidar at hawkis.com
Sat May 20 10:18:19 PDT 2006


Antlr just have rules, they beeing parser or lexer ones. No literals.
A rule in the Antlr grammar syntax is a production modeled after the EBNF 
method of specifying sequence of language constructs.

What is lexing, and what is parsing?
Lexing is interpretation of character sequences.
Parsing is interpretation of token sequences.
On a supplied sequence of characters, the lexer definitions creates tokens 
from which the parser then vegetates (nextToken()).
In this respect, neither the lexer nor the parser have any knowledge of the 
concept of literals. You may specify character sequences in both parts of the 
grammar files, but the context is different. A literal becomes a literal 
because you define it that way in either lexer or parser rules.

What Terrence is suggesting is that when a rule starts with a Capital letter 
it is interpreted in Antlr (i.e in the Antlr grammar interpretor) as a lexer 
rule. When it starts with a lower case letter it is interpreted as a parser 
rule.
These are strict Anltr grammar interpretor syntactic rules, so if you want to 
use the generator, you have to obide by those rules.

It is also important to know that the lexer is always created before a parser, 
so that lexer definitions have to 'know' what the parser will expect.

The general answer to your general question will be:
Try to partition your language into as big chunks of character sequences as 
possible. Those partitions will go into your lexer as lexer rules. The rest 
of the logic of your language will go into your parser rules.

regards
Hawkis

On Saturday 20 May 2006 16:47, Sam Barnett-Cormack wrote:
> Dieter Frej wrote:
> > ok, even though I might look like a total newbie I have to ask that:
> > Are there any rule of thumb on how to decide what a literal is and what
> > a rule is? (respectively what goes into the parser and what into the
> > lexer?)
> >
> > Digits:
> > Digit
> > Digits Digit
> >
> > Digit:
> > 0
> > NonZeroDigit
> >
> > NonZeroDigit: one of
> > 1 2 3 4 5 6 7 8 9
> >
> > I would say NonZeroDigit is a literal and goes into the lexer, right?
> > What about the other two? Should both go into the parser?
>
> On further thinking, your questions seem to suggest (to me) that you
> might do well to read a book/take a course on languages and grammars and
> so on. You seem to be unfamiliar with a lot of the terms, or at best not
> using them in the way they are normally used.
>
> I mean no offence by this, just suggesting a profitable course of action.
>
> Sam
>
> > Terence Parr wrote:
> >> On May 18, 2006, at 12:54 AM, <JConner at ssp-uk.com>
> >>
> >> <JConner at ssp-uk.com> wrote:
> >>> Hi All,
> >>>
> >>> I've started to get my feet wet with ANTLR a little, and I've come
> >>> across a
> >>> few things that I thought would be handled by lexer rules, but seem
> >>> to be
> >>> handled in general by parser rules.  For example, most of the
> >>> examples I've
> >>> seen handle numbers (floating, exponents, sign, etc) with parser rules,
> >>
> >> Those should be lexer rules...most places I've seen.  Remember FLOAT
> >> means lexer rule :)
> >>
> >> Ter


More information about the antlr-interest mailing list