[antlr-interest] All say that literal strings in parser rules are doing harm. Why?
finis at in.tum.de
Tue Feb 28 07:38:23 PST 2012
I have never seen keyword specs in an IDENT rule and it would be very
confusing. To be honest, I do not understand a single word of your mail;
a DFA is surely able to distinguish words with identical prefixes. If
not, lexer design would be very hard.
I use literals in parser code and never had any problems with it until now.
On 02/28/2012 04:31 PM, Loring Craymer wrote:
> Add to Eric's comments that strings commit a DFA path; if you have a typical IDENT rule, and a keyword, say, 'default', then any IDENT which partially matches the keyword, say 'define', will confuse the lexer-- 'd' 'e' 'f' is the path to 'default' so that it has troubles matching 'define'; to get around that, you eventually fold keyword specifications into the IDENT rule.
>> From: Stefan Mätje<Stefan.Maetje at esd-electronics.com>
>> To: antlr-interest<antlr-interest at antlr.org>
>> Sent: Tuesday, February 28, 2012 5:37 AM
>> Subject: Re: [antlr-interest] All say that literal strings in parser rules are doing harm. Why?
>> Hi Eric,
>> thanks for that information. I added my comments below.
>> But to all the others: Are there more drawbacks to expect using literals in
>> parser rules?
>> Thanks in advance,
>> Am 28.02.2012 13:37:39 schrieb(en) Eric:
>>> Hi Stefan,
>>> As I only use the tools and do not do formal proofs on them, there may be
>>> more to this than what I present here.
>>> If you are using string and/or char literals in parser rules, then ANTLR
>>> must create a new set of lexer rules that include all of the string and/or
>>> char literals in the parser rules. Remember that the parser can only see
>>> tokens and not raw text. So string and/or char literals cannot be passed to
>>> the parser.
>> That's clear so far.
>>> To see the new set of lexer rules, use org.antlr.Tool –Xsavelexer, and then
>>> open the created grammar file. The name may be like<grammar>__.g . If you
>>> have string and/or char literals in your parser rules you will see lexer
>>> rules with name starting with T__ .
>> That is a valuable hint to see how the real lexer will be implemented by
>>> The T__ names make it harder to debug because you don't know what they
>> I always used the generated *.token file to match T__xxx names to the strings
>> they mean. But I needed to do that nearly never.
>>> Also because ANTLR added them at the top, it may cause other problems
>>> for other lexer rules.
>> As I only used the keywords directly in the parser rules (punctuation symbols
>> have lexer rules) the keywords surprisingly appear in the generated lexer
>> intermediate grammar at the point I myself would have written them down.
>> Thank you so far,
>>> On Tue, Feb 28, 2012 at 6:08 AM, Stefan Mätje<
>>> Stefan.Maetje at esd-electronics.com> wrote:
>>>> Dear list members,
>>>> often I read on this list that including literal strings in parser rules
>>>> not recommended. Doing this would provoke problems and make error
>>>> more difficult.
>>>> Could somebody explain the possible problems and drawbacks to me. All
>>>> I found on the list so far sound a little bit vague to me.
>>>> Can somebody please point me to a discussion or example grammar where the
>>>> and cons are displayed more thoroughly?
>>>> At the moment I have a somewhat mixed grammar file (around 1800 lines)
>>>> with in
>>>> part using lexer tokens and in part using string literals in the parser
>>>> Especially I do that if the keyword exists only in a single rule.
>>>> Stefan Mätje
>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest