[antlr-interest] Invalid parser generation

mark4 at voila.fr mark4 at voila.fr
Tue Sep 4 07:00:07 PDT 2012


Hi Stefan,

Thanks for the info. The antlr.org website is down, so I couldn't delve into details about lexer and parser rules. What I understood from trials and errors is that ALL the lexer rules that are placed into the grammar are matched against input. But the COMPTE and ID rules are just parts of other rules. Tagging them as "fragment" seems to do the job by telling ANTLR not to match those rules taken alone. Now, the grammar validates and the generated code compiles.

I'm using a ANTLRStringStream to input a string to be matched against the grammar. Now, when I execute the program, whatever string I may put, I never fall into the RecognitionException exception. I've asked Terrence whether the expression() method was the right method to run the parsing process (since in the ANTLR 3 C# tutorial, the method expr() was used but it's deprecated) but I did not get his reply yet. Note that I also had to change "HIDDEN" to "Hidden" for the code to compile. It seems that the tutorials are not up to date on the website.

Regards,
Mark

> Message du 04/09/12 à 15h40
> De : "Stefan Mätje"
> A : antlr-interest at antlr.org
> Copie à : "mark4 at voila.fr"
> Objet : Re: [antlr-interest] Invalid parser generation
>
> Am 04.09.2012 14:35, schrieb mark4 at voila.fr:> Hi Stefan,
> >
> > Thanks for your reply. I didn't understand the difference between
> > lexer rules and parser rules because,
> > in fine, a parser rule will always resolve in a series of lexer
> > rules...
>
> Please don't mix the lexer and the parser phase in your mind. The lexer
> deals with single characters and groups them into tokens.
>
> The parser doesn't know anything about single characters and deals only
> with tokens.
>
> > Anyway, I applied the modification but I now get an error:
> >
> > COMPTE : ('0'..'9')+;
> >
> > ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;
> >
> > The following token definitions can never be matched because prior
> > tokens match the same input: COMPTE,ID
>
> You have rules in your grammar before COMPTE and ID that define a
> superset of the character sequences that COMPTE and ID can match.
>
> > Well, I have several entities in my grammar that have different
> > encoding forms, so how can I specify them one after the other?
>
> If at the end one type of token should be produced all needed
> regular expressions have to go into one rule.
>
> > Thanks,
> > Mark
> >
>
> As rule of thumb write the most specific lexer rules first and then
> follow them with the less specific rules. The lexer will give the
> rules first written a higher precedence.
>
> So put your keywords first (which are fixed strings). Then follow them
> with something like operators (also fixed strings). At the lower level
> rules that can match different strings like ID and COMPTE follow.
>
> See what Antlrworks tells you about multiple matches and which rules are
> involved.
>
> Don't know if this may help but the rule that matches both COMPTE and ID
> would be most interesting.
>
> Best regards,
> Stefan
>
> PS.: Please reply also to the list.
>
>

___________________________________________________________
Quand Jean-Luc Delarue parlait de son grand amour… à lire sur Voila.fr http://people.voila.fr/people/actu-stars/personnalites/quand-jean-luc-delarue-parlait-d-anissa-kehl-son-grand-amour-people_8397.html


More information about the antlr-interest mailing list