[antlr-interest] newbie: lexer rules vs parser rules

Sat May 20 07:42:59 PDT 2006

Dieter,

> ok, even though I might look like a total newbie I have to 
> ask that: Are there any rule of thumb on how to decide what a 
> literal is and what a rule is? (respectively what goes into 
> the parser and what into the
> lexer?)

Your use of "literals" and "rule" is very ambiguous. Don't know what you
mean.

As for what goes into lexer vs parser, well lexers tokenize the input
character stream. A lexer should identify all substrings that are classed as
"tokens" in the language your are lexing/parsing.

In your example:

> Digits:
> Digit
> Digits Digit
> 
> Digit:
> 0
> NonZeroDigit
> 
> NonZeroDigit: one of
> 1 2 3 4 5 6 7 8 9
> 
> I would say NonZeroDigit is a literal and goes into the 
> lexer, right? What about the other two? Should both go into 
> the parser?

I can't comment on your aprticular application but I can say that for most
programmaing languages, none of these would likely be classed as a token.

A parser is much happier seeing:
	FLOAT(134.8) or
	FLOAT_LITERAL(134.8) or
	NUMBER(134.8)
than this:
	DIGIT(1) DIGIT(3) DIGIT(4) DOT(.) DIGIT(8) 
in it's input token stream.

Micheal