[antlr-interest] simple lexical analysis question

John B. Brodie jbb at acm.org
Wed Dec 16 06:47:16 PST 2009


Moins : '-' ( ('FIN-')=>'FIN-' { $type = FIN; } )?;

On Wed, 2009-12-16 at 11:39 +0100, Jean-Claude Durand wrote:
> Thanks for your answers, I now understand the stategy of lexers.
> The left factoring you propose does not work better: because of the  
> 'F'  letter of the identifier following the minus sign, the
> problem remains the same in the example '-FOO -FIN-' !
> 
> ~/Soft/Antlr/LexJava: java Main test
> line 1:2 mismatched character 'O' expecting 'I'
>   --> [@-1,3:3='O',<6>,1:3]
>   --> [@-1,4:4='\n',<7>,channel=99,1:4]
>   --> [@-1,5:9='-FIN-',<5>,2:0]
>   --> [@-1,10:30='                    \n',<7>,channel=99,2:5]
> 
> Jean-Claude Durand
> 
> LIG, équipe GETALP
> 385, rue de la Bibliothèque
> BP 53
> 38041 Grenoble cedex 9
> France
> 
> Jean-Claude.Durand at imag.fr
> tél: +33 (0)4 76 51 43 81
> fax: +33 (0)4 76 63 56 86
> 
> 
> Le 14 déc. 09 à 19:35, John B. Brodie a écrit :
> 
> > Greetings!
> > On Mon, 2009-12-14 at 19:18 +0100, Jean-Claude Durand wrote:
> >> My lexical grammar (I use antlr v3.2):
> >>
> >> lexer grammar Lex;
> >> options
> >> { language=Java; }
> >>
> >>
> >> WS: ( ' ' | '\t' | '\n' )+ { $channel=HIDDEN; } ;
> >>
> >>
> >> FIN : '-FIN-' ;
> >> Moins : '-' ;
> >>
> >>
> >> // Identifiers:
> >> Idf : ('A'..'Z')+ ;
> >>
> >>
> >> I want to enumerate the tokens for the following example (Main.java  
> >> is
> >> in the archive):
> >>
> >>
> >> VLEG-XLEG-FCINFZU
> >>
> >>
> >> And the output is:
> >>
> >>
> >> ~/Soft/Antlr/LexJava: java Main test
> >> --> [@-1,0:3='VLEG',<7>,1:0]
> >> --> [@-1,4:4='-',<6>,1:4]
> >> --> [@-1,5:8='XLEG',<7>,1:5]
> >> line 1:11 mismatched character 'C' expecting 'I'
> >> --> [@-1,12:16='INFZU',<7>,1:12]
> >> --> [@-1,17:36='                    ',<4>,channel=99,1:17]
> >> ~/Soft/Antlr/LexJava:
> >>
> >>
> >> The lexer is looking for the keyword -FIN-  and not for minus sign
> >> followed by an identifier (which begins with an F).
> >
> > This is a well-known "feature" of ANTLR lexers. that once it sees the
> > left prefix of a token it commits itself to only that token and will  
> > not
> > backup and consider other possibilities.
> >
> > you need to left factor your FIN and Moins rules. Something like the
> > following (off the top of my head, untested, but gives the general
> > idea):
> >
> > lexer grammar Lex;
> > options { language=Java; }
> > tokens { FIN; }
> >
> > WS: ( ' ' | '\t' | '\n' )+ { $channel=HIDDEN; } ;
> >
> > Moins : '-' ( 'FIN-' { $type = FIN; } )?;
> >
> > // Identifiers:
> > Idf : ('A'..'Z')+ ;
> >
> >
> 




More information about the antlr-interest mailing list