[antlr-interest] simple lexical analysis question
Jean-Claude Durand
Jean-Claude.Durand at imag.fr
Wed Dec 16 02:39:01 PST 2009
Thanks for your answers, I now understand the stategy of lexers.
The left factoring you propose does not work better: because of the
'F' letter of the identifier following the minus sign, the
problem remains the same in the example '-FOO -FIN-' !
~/Soft/Antlr/LexJava: java Main test
line 1:2 mismatched character 'O' expecting 'I'
--> [@-1,3:3='O',<6>,1:3]
--> [@-1,4:4='\n',<7>,channel=99,1:4]
--> [@-1,5:9='-FIN-',<5>,2:0]
--> [@-1,10:30=' \n',<7>,channel=99,2:5]
Jean-Claude Durand
LIG, équipe GETALP
385, rue de la Bibliothèque
BP 53
38041 Grenoble cedex 9
France
Jean-Claude.Durand at imag.fr
tél: +33 (0)4 76 51 43 81
fax: +33 (0)4 76 63 56 86
Le 14 déc. 09 à 19:35, John B. Brodie a écrit :
> Greetings!
> On Mon, 2009-12-14 at 19:18 +0100, Jean-Claude Durand wrote:
>> My lexical grammar (I use antlr v3.2):
>>
>> lexer grammar Lex;
>> options
>> { language=Java; }
>>
>>
>> WS: ( ' ' | '\t' | '\n' )+ { $channel=HIDDEN; } ;
>>
>>
>> FIN : '-FIN-' ;
>> Moins : '-' ;
>>
>>
>> // Identifiers:
>> Idf : ('A'..'Z')+ ;
>>
>>
>> I want to enumerate the tokens for the following example (Main.java
>> is
>> in the archive):
>>
>>
>> VLEG-XLEG-FCINFZU
>>
>>
>> And the output is:
>>
>>
>> ~/Soft/Antlr/LexJava: java Main test
>> --> [@-1,0:3='VLEG',<7>,1:0]
>> --> [@-1,4:4='-',<6>,1:4]
>> --> [@-1,5:8='XLEG',<7>,1:5]
>> line 1:11 mismatched character 'C' expecting 'I'
>> --> [@-1,12:16='INFZU',<7>,1:12]
>> --> [@-1,17:36=' ',<4>,channel=99,1:17]
>> ~/Soft/Antlr/LexJava:
>>
>>
>> The lexer is looking for the keyword -FIN- and not for minus sign
>> followed by an identifier (which begins with an F).
>
> This is a well-known "feature" of ANTLR lexers. that once it sees the
> left prefix of a token it commits itself to only that token and will
> not
> backup and consider other possibilities.
>
> you need to left factor your FIN and Moins rules. Something like the
> following (off the top of my head, untested, but gives the general
> idea):
>
> lexer grammar Lex;
> options { language=Java; }
> tokens { FIN; }
>
> WS: ( ' ' | '\t' | '\n' )+ { $channel=HIDDEN; } ;
>
> Moins : '-' ( 'FIN-' { $type = FIN; } )?;
>
> // Identifiers:
> Idf : ('A'..'Z')+ ;
>
>
More information about the antlr-interest
mailing list