[antlr-interest] newbie: lexer rules vs parser rules

Dieter Frej dieter_frej at gmx.net
Tue May 30 05:42:39 PDT 2006


thank you all, guys.

I am not a DAU, I am just not an expert in the community of languages
and parsing ;)

Modifying a grammar and understanding the concepts is one thing, but 
creating a new grammar from scratch is something different. That was the 
reason why I asked.

Coming back to the example I copied & pasted from the Java language
specificitation:

Digits:
Digit
Digits Digit

Digit:
0
NonZeroDigit

NonZeroDigit: one of
1 2 3 4 5 6 7 8 9

If I understand you right: NonZeroDigit should go into the lexer and
both other productions should go into the parser, right?

- Didi



Vidar Håkestad wrote:
> Antlr just have rules, they beeing parser or lexer ones. No literals.
> A rule in the Antlr grammar syntax is a production modeled after the EBNF 
> method of specifying sequence of language constructs.
> 
> What is lexing, and what is parsing?
> Lexing is interpretation of character sequences.
> Parsing is interpretation of token sequences.
> On a supplied sequence of characters, the lexer definitions creates tokens 
> from which the parser then vegetates (nextToken()).
> In this respect, neither the lexer nor the parser have any knowledge of the 
> concept of literals. You may specify character sequences in both parts of the 
> grammar files, but the context is different. A literal becomes a literal 
> because you define it that way in either lexer or parser rules.
> 
> What Terrence is suggesting is that when a rule starts with a Capital letter 
> it is interpreted in Antlr (i.e in the Antlr grammar interpretor) as a lexer 
> rule. When it starts with a lower case letter it is interpreted as a parser 
> rule.
> These are strict Anltr grammar interpretor syntactic rules, so if you want to 
> use the generator, you have to obide by those rules.
> 
> It is also important to know that the lexer is always created before a parser, 
> so that lexer definitions have to 'know' what the parser will expect.
> 
> The general answer to your general question will be:
> Try to partition your language into as big chunks of character sequences as 
> possible. Those partitions will go into your lexer as lexer rules. The rest 
> of the logic of your language will go into your parser rules.
> 
> regards
> Hawkis
> 
> On Saturday 20 May 2006 16:47, Sam Barnett-Cormack wrote:
>> Dieter Frej wrote:
>>> ok, even though I might look like a total newbie I have to ask that:
>>> Are there any rule of thumb on how to decide what a literal is and what
>>> a rule is? (respectively what goes into the parser and what into the
>>> lexer?)
>>>
>>> Digits:
>>> Digit
>>> Digits Digit
>>>
>>> Digit:
>>> 0
>>> NonZeroDigit
>>>
>>> NonZeroDigit: one of
>>> 1 2 3 4 5 6 7 8 9
>>>
>>> I would say NonZeroDigit is a literal and goes into the lexer, right?
>>> What about the other two? Should both go into the parser?
>> On further thinking, your questions seem to suggest (to me) that you
>> might do well to read a book/take a course on languages and grammars and
>> so on. You seem to be unfamiliar with a lot of the terms, or at best not
>> using them in the way they are normally used.
>>
>> I mean no offence by this, just suggesting a profitable course of action.
>>
>> Sam
>>
>>> Terence Parr wrote:
>>>> On May 18, 2006, at 12:54 AM, <JConner at ssp-uk.com>
>>>>
>>>> <JConner at ssp-uk.com> wrote:
>>>>> Hi All,
>>>>>
>>>>> I've started to get my feet wet with ANTLR a little, and I've come
>>>>> across a
>>>>> few things that I thought would be handled by lexer rules, but seem
>>>>> to be
>>>>> handled in general by parser rules.  For example, most of the
>>>>> examples I've
>>>>> seen handle numbers (floating, exponents, sign, etc) with parser rules,
>>>> Those should be lexer rules...most places I've seen.  Remember FLOAT
>>>> means lexer rule :)
>>>>
>>>> Ter
> 
> 



More information about the antlr-interest mailing list