[antlr-interest] newbie: lexer rules vs parser rules
Micheal J
open.zone at virgin.net
Sat May 20 07:42:59 PDT 2006
Dieter,
> ok, even though I might look like a total newbie I have to
> ask that: Are there any rule of thumb on how to decide what a
> literal is and what a rule is? (respectively what goes into
> the parser and what into the
> lexer?)
Your use of "literals" and "rule" is very ambiguous. Don't know what you
mean.
As for what goes into lexer vs parser, well lexers tokenize the input
character stream. A lexer should identify all substrings that are classed as
"tokens" in the language your are lexing/parsing.
In your example:
> Digits:
> Digit
> Digits Digit
>
> Digit:
> 0
> NonZeroDigit
>
> NonZeroDigit: one of
> 1 2 3 4 5 6 7 8 9
>
> I would say NonZeroDigit is a literal and goes into the
> lexer, right? What about the other two? Should both go into
> the parser?
I can't comment on your aprticular application but I can say that for most
programmaing languages, none of these would likely be classed as a token.
A parser is much happier seeing:
FLOAT(134.8) or
FLOAT_LITERAL(134.8) or
NUMBER(134.8)
than this:
DIGIT(1) DIGIT(3) DIGIT(4) DOT(.) DIGIT(8)
in it's input token stream.
Micheal
More information about the antlr-interest
mailing list