[antlr-interest] Invalid parser generation

Jesse McGrew jmcgrew at gmail.com
Thu Sep 6 05:54:51 PDT 2012


You can't have two lexer rules that match the same input. When the
lexer sees a string like "foo", how is it supposed to know whether it
should return DIMENSION or ITEM (or ID)? You should probably be using
parser rules instead.

Jesse

On Thu, Sep 6, 2012 at 2:44 AM, mark4 at voila.fr <mark4 at voila.fr> wrote:
> Hi Stefan,
>
> I wanted to revert to your post. You recommended to put the most specific lexer rules first. But how can I do if 2 rules are close, or even identical?
>
> For instance:
> DIMENSION : ID;
> ITEM : ID;
>
> They automatically generate an error in ANTLR. Of course, this situation seems useless, but in the future, I may modify these rules and make them different. That's the reason why I'd like to distinguish them in the grammar file.
>
> Thanks in advance,
> Mark
>
>> Message du 04/09/12 à 15h40
>> De : "Stefan Mätje"
>> A : antlr-interest at antlr.org
>> Copie à : "mark4 at voila.fr"
>> Objet : Re: [antlr-interest] Invalid parser generation
>>
>> Am 04.09.2012 14:35, schrieb mark4 at voila.fr:> Hi Stefan,
>> >
>> > Thanks for your reply. I didn't understand the difference between
>> > lexer rules and parser rules because,
>> > in fine, a parser rule will always resolve in a series of lexer
>> > rules...
>>
>> Please don't mix the lexer and the parser phase in your mind. The lexer
>> deals with single characters and groups them into tokens.
>>
>> The parser doesn't know anything about single characters and deals only
>> with tokens.
>>
>> > Anyway, I applied the modification but I now get an error:
>> >
>> > COMPTE : ('0'..'9')+;
>> >
>> > ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;
>> >
>> > The following token definitions can never be matched because prior
>> > tokens match the same input: COMPTE,ID
>>
>> You have rules in your grammar before COMPTE and ID that define a
>> superset of the character sequences that COMPTE and ID can match.
>>
>> > Well, I have several entities in my grammar that have different
>> > encoding forms, so how can I specify them one after the other?
>>
>> If at the end one type of token should be produced all needed
>> regular expressions have to go into one rule.
>>
>> > Thanks,
>> > Mark
>> >
>>
>> As rule of thumb write the most specific lexer rules first and then
>> follow them with the less specific rules. The lexer will give the
>> rules first written a higher precedence.
>>
>> So put your keywords first (which are fixed strings). Then follow them
>> with something like operators (also fixed strings). At the lower level
>> rules that can match different strings like ID and COMPTE follow.
>>
>> See what Antlrworks tells you about multiple matches and which rules are
>> involved.
>>
>> Don't know if this may help but the rule that matches both COMPTE and ID
>> would be most interesting.
>>
>> Best regards,
>> Stefan
>>
>> PS.: Please reply also to the list.
>>
>>
>
> ___________________________________________________________
> 10 conseils pour un ventre plat sur Voila.fr http://actu.voila.fr/evenementiel/beaute-minceur/conseils-ventre-plat/
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list