[antlr-interest] All say that literal strings in parser rules are doing harm. Why?

Jan Finis finis at in.tum.de
Tue Feb 28 07:38:23 PST 2012


I have never seen keyword specs in an IDENT rule and it would be very 
confusing. To be honest, I do not understand a single word of your mail; 
a DFA is surely able to distinguish words with identical prefixes. If 
not, lexer design would be very hard.

I use literals in parser code and never had any problems with it until now.

On 02/28/2012 04:31 PM, Loring Craymer wrote:
> Add to Eric's comments that strings commit a DFA path; if you have a typical IDENT rule, and a keyword, say, 'default', then any IDENT which partially matches the keyword, say 'define', will confuse the lexer-- 'd' 'e' 'f' is the path to 'default' so that it has troubles matching 'define'; to get around that, you eventually fold keyword specifications into the IDENT rule.
>
> --Loring
>
>
>
>> ________________________________
>> From: Stefan Mätje<Stefan.Maetje at esd-electronics.com>
>> To: antlr-interest<antlr-interest at antlr.org>
>> Sent: Tuesday, February 28, 2012 5:37 AM
>> Subject: Re: [antlr-interest] All say that literal strings in parser rules are doing harm. Why?
>>
>> Hi Eric,
>>
>> thanks for that information. I added my comments below.
>>
>> But to all the others: Are there more drawbacks to expect using literals in
>> parser rules?
>>
>> Thanks in advance,
>>      Stefan
>>
>>
>> Am 28.02.2012 13:37:39 schrieb(en) Eric:
>>> Hi Stefan,
>>>
>>> As I only use the tools and do not do formal proofs on them, there may be
>>> more to this than what I present here.
>>>
>>> If you are using string and/or char literals in parser rules, then ANTLR
>>> must create a new set of lexer rules that include all of the string and/or
>>> char literals in the parser rules. Remember that the parser can only see
>>> tokens and not raw text. So string and/or char literals cannot be passed to
>>> the parser.
>> That's clear so far.
>>
>>> To see the new set of lexer rules, use org.antlr.Tool –Xsavelexer, and then
>>> open the created grammar file. The name may be like<grammar>__.g . If you
>>> have string and/or char literals in your parser rules you will see lexer
>>> rules with name starting with T__  .
>> That is a valuable hint to see how the real lexer will be implemented by
>> ANTLR.
>>
>>> The T__ names make it harder to debug because you don't know what they
>>> mean.
>> I always used the generated *.token file to match T__xxx names to the strings
>> they mean. But I needed to do that nearly never.
>>
>>> Also because ANTLR added them at the top, it may cause other problems
>>> for other lexer rules.
>> As I only used the keywords directly in the parser rules (punctuation symbols
>> have lexer rules) the keywords surprisingly appear in the generated lexer
>> intermediate grammar at the point I myself would have written them down.
>>
>>> Eric
>> Thank you so far,
>>      Stefan
>>
>>
>>
>>> On Tue, Feb 28, 2012 at 6:08 AM, Stefan Mätje<
>>> Stefan.Maetje at esd-electronics.com>  wrote:
>>>
>>>> Dear list members,
>>>>
>>>> often I read on this list that including literal strings in parser rules
>>> is
>>>> not recommended. Doing this would provoke problems and make error
>>> reporting
>>>> more difficult.
>>>>
>>>> Could somebody explain the possible problems and drawbacks to me. All
>>>> postings
>>>> I found on the list so far sound a little bit vague to me.
>>>>
>>>> Can somebody please point me to a discussion or example grammar where the
>>>> pros
>>>> and cons are displayed more thoroughly?
>>>>
>>>> At the moment I have a somewhat mixed grammar file (around 1800 lines)
>>>> with in
>>>> part using lexer tokens and in part using string literals in the parser
>>>> rules.
>>>> Especially I do that if the keyword exists only in a single rule.
>>>>
>>>> Regards,
>>>>          Stefan Mätje
>>>>
>>>>
>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>> Unsubscribe:
>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
>>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list