AW: [antlr-interest] newbie: lexer rules vs parser rules

Koehne Kai Kai.Koehne at student.hpi.uni-potsdam.de
Tue May 30 07:20:08 PDT 2006


Hi,
 
I would say it depends on the rest of your grammar ... If you only use Digit and NonzeroDigit for the definition of Digits, and nowhere else, just put everything in the lexer. The reason is that in most cases, you want to keep your parser as simple as possible, so everything that can be easily done in the lexer should be done in the lexer. You might leave Digits in the parser, though, if Digit is also used in other parser rules which cannot be put in the lexer. However, in general I agree with Hawkins on his rule of thumb: Whenever you can, put it in the lexer.

Regards
 
Kai Koehne
 
BTW: If you put Digits in the lexer, remember to mark Digit and NonZeroDigit as protected rules!

________________________________

Von: antlr-interest-bounces at antlr.org im Auftrag von Dieter Frej
Gesendet: Di 30.05.2006 14:42
An: Vidar Håkestad
Cc: antlr-interest at antlr.org
Betreff: Re: [antlr-interest] newbie: lexer rules vs parser rules



thank you all, guys.

I am not a DAU, I am just not an expert in the community of languages
and parsing ;)

Modifying a grammar and understanding the concepts is one thing, but
creating a new grammar from scratch is something different. That was the
reason why I asked.

Coming back to the example I copied & pasted from the Java language
specificitation:

Digits:
Digit
Digits Digit

Digit:
0
NonZeroDigit

NonZeroDigit: one of
1 2 3 4 5 6 7 8 9

If I understand you right: NonZeroDigit should go into the lexer and
both other productions should go into the parser, right?

- Didi



Vidar Håkestad wrote:
> Antlr just have rules, they beeing parser or lexer ones. No literals.
> A rule in the Antlr grammar syntax is a production modeled after the EBNF
> method of specifying sequence of language constructs.
>
> What is lexing, and what is parsing?
> Lexing is interpretation of character sequences.
> Parsing is interpretation of token sequences.
> On a supplied sequence of characters, the lexer definitions creates tokens
> from which the parser then vegetates (nextToken()).
> In this respect, neither the lexer nor the parser have any knowledge of the
> concept of literals. You may specify character sequences in both parts of the
> grammar files, but the context is different. A literal becomes a literal
> because you define it that way in either lexer or parser rules.
>
> What Terrence is suggesting is that when a rule starts with a Capital letter
> it is interpreted in Antlr (i.e in the Antlr grammar interpretor) as a lexer
> rule. When it starts with a lower case letter it is interpreted as a parser
> rule.
> These are strict Anltr grammar interpretor syntactic rules, so if you want to
> use the generator, you have to obide by those rules.
>
> It is also important to know that the lexer is always created before a parser,
> so that lexer definitions have to 'know' what the parser will expect.
>
> The general answer to your general question will be:
> Try to partition your language into as big chunks of character sequences as
> possible. Those partitions will go into your lexer as lexer rules. The rest
> of the logic of your language will go into your parser rules.
>
> regards
> Hawkis
>
> On Saturday 20 May 2006 16:47, Sam Barnett-Cormack wrote:
>> Dieter Frej wrote:
>>> ok, even though I might look like a total newbie I have to ask that:
>>> Are there any rule of thumb on how to decide what a literal is and what
>>> a rule is? (respectively what goes into the parser and what into the
>>> lexer?)
>>>
>>> Digits:
>>> Digit
>>> Digits Digit
>>>
>>> Digit:
>>> 0
>>> NonZeroDigit
>>>
>>> NonZeroDigit: one of
>>> 1 2 3 4 5 6 7 8 9
>>>
>>> I would say NonZeroDigit is a literal and goes into the lexer, right?
>>> What about the other two? Should both go into the parser?
>> On further thinking, your questions seem to suggest (to me) that you
>> might do well to read a book/take a course on languages and grammars and
>> so on. You seem to be unfamiliar with a lot of the terms, or at best not
>> using them in the way they are normally used.
>>
>> I mean no offence by this, just suggesting a profitable course of action.
>>
>> Sam
>>
>>> Terence Parr wrote:
>>>> On May 18, 2006, at 12:54 AM, <JConner at ssp-uk.com>
>>>>
>>>> <JConner at ssp-uk.com> wrote:
>>>>> Hi All,
>>>>>
>>>>> I've started to get my feet wet with ANTLR a little, and I've come
>>>>> across a
>>>>> few things that I thought would be handled by lexer rules, but seem
>>>>> to be
>>>>> handled in general by parser rules.  For example, most of the
>>>>> examples I've
>>>>> seen handle numbers (floating, exponents, sign, etc) with parser rules,
>>>> Those should be lexer rules...most places I've seen.  Remember FLOAT
>>>> means lexer rule :)
>>>>
>>>> Ter
>
>





More information about the antlr-interest mailing list