[antlr-interest] newbie: lexer rules vs parser rules

Sam Barnett-Cormack sdb at geekworld.co.uk
Sat May 20 05:25:39 PDT 2006


Dieter Frej wrote:
> ok, even though I might look like a total newbie I have to ask that:
> Are there any rule of thumb on how to decide what a literal is and what
> a rule is? (respectively what goes into the parser and what into the
> lexer?)
> 
> Digits:
> Digit
> Digits Digit
> 
> Digit:
> 0
> NonZeroDigit
> 
> NonZeroDigit: one of
> 1 2 3 4 5 6 7 8 9
> 
> I would say NonZeroDigit is a literal and goes into the lexer, right?
> What about the other two? Should both go into the parser?

Ignoring the use of the term 'literal', which can be ambiguous if we
don't know if you're talking about the perspective of antlr, or the
perspective of the language being parsed...

All of those should be lexer rules. Of course, that means that they
ought to be DIGIT, DIGITS, and NONZERODIGIT...

Think about it in terms of tokens. The lexer produces tokens, which the
parser then interprets. So all the parser ought to know is that it's
getting, say, an INT followed by EQUALS followed by a FLOAT followed by
a TERMINATOR.

Literals, on the other hand, are a more complex subject, and I can't be
bothered to explain it now... I'd probably mess up, as I haven't had my
narcolepsy meds yet.

Sam

> Terence Parr wrote:
> 
>>
>> On May 18, 2006, at 12:54 AM, <JConner at ssp-uk.com>
>> <JConner at ssp-uk.com> wrote:
>>
>>> Hi All,
>>>
>>> I've started to get my feet wet with ANTLR a little, and I've come
>>> across a
>>> few things that I thought would be handled by lexer rules, but seem
>>> to be
>>> handled in general by parser rules.  For example, most of the
>>> examples I've
>>> seen handle numbers (floating, exponents, sign, etc) with parser rules,
>>
>>
>> Those should be lexer rules...most places I've seen.  Remember FLOAT
>> means lexer rule :)
>>
>> Ter
>>
>>
> 



More information about the antlr-interest mailing list