[antlr-interest] All say that literal strings in parser rules are doing harm. Why?

Loring Craymer lgcraymer at yahoo.com
Tue Feb 28 07:31:40 PST 2012


Add to Eric's comments that strings commit a DFA path; if you have a typical IDENT rule, and a keyword, say, 'default', then any IDENT which partially matches the keyword, say 'define', will confuse the lexer-- 'd' 'e' 'f' is the path to 'default' so that it has troubles matching 'define'; to get around that, you eventually fold keyword specifications into the IDENT rule.

--Loring



>________________________________
> From: Stefan Mätje <Stefan.Maetje at esd-electronics.com>
>To: antlr-interest <antlr-interest at antlr.org> 
>Sent: Tuesday, February 28, 2012 5:37 AM
>Subject: Re: [antlr-interest] All say that literal strings in parser rules are doing harm. Why?
> 
>Hi Eric,
>
>thanks for that information. I added my comments below.
>
>But to all the others: Are there more drawbacks to expect using literals in 
>parser rules?
>
>Thanks in advance,
>    Stefan
>
>
>Am 28.02.2012 13:37:39 schrieb(en) Eric:
>> Hi Stefan,
>> 
>> As I only use the tools and do not do formal proofs on them, there may be
>> more to this than what I present here.
>>
>> If you are using string and/or char literals in parser rules, then ANTLR
>> must create a new set of lexer rules that include all of the string and/or
>> char literals in the parser rules. Remember that the parser can only see
>> tokens and not raw text. So string and/or char literals cannot be passed to
>> the parser.
>
>That's clear so far.
>
>> To see the new set of lexer rules, use org.antlr.Tool –Xsavelexer, and then
>> open the created grammar file. The name may be like <grammar>__.g . If you
>> have string and/or char literals in your parser rules you will see lexer
>> rules with name starting with T__  .
>
>That is a valuable hint to see how the real lexer will be implemented by 
>ANTLR.
>
>> The T__ names make it harder to debug because you don't know what they
>> mean. 
>
>I always used the generated *.token file to match T__xxx names to the strings 
>they mean. But I needed to do that nearly never.
>
>> Also because ANTLR added them at the top, it may cause other problems
>> for other lexer rules.
>
>As I only used the keywords directly in the parser rules (punctuation symbols 
>have lexer rules) the keywords surprisingly appear in the generated lexer 
>intermediate grammar at the point I myself would have written them down.
>
>> 
>> Eric
>
>Thank you so far,
>    Stefan
>
>
>
>> On Tue, Feb 28, 2012 at 6:08 AM, Stefan Mätje <
>> Stefan.Maetje at esd-electronics.com> wrote:
>> 
>> > Dear list members,
>> >
>> > often I read on this list that including literal strings in parser rules 
>> is
>> > not recommended. Doing this would provoke problems and make error 
>> reporting
>> > more difficult.
>> >
>> > Could somebody explain the possible problems and drawbacks to me. All
>> > postings
>> > I found on the list so far sound a little bit vague to me.
>> >
>> > Can somebody please point me to a discussion or example grammar where the
>> > pros
>> > and cons are displayed more thoroughly?
>> >
>> > At the moment I have a somewhat mixed grammar file (around 1800 lines)
>> > with in
>> > part using lexer tokens and in part using string literals in the parser
>> > rules.
>> > Especially I do that if the keyword exists only in a single rule.
>> >
>> > Regards,
>> >        Stefan Mätje
>> >
>> >
>> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> > Unsubscribe:
>> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>> >
>> 
>
>
>List: http://www.antlr.org/mailman/listinfo/antlr-interest
>Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>


More information about the antlr-interest mailing list