[antlr-interest] simple lexical analysis question

Mon Dec 14 10:35:49 PST 2009

Greetings!
On Mon, 2009-12-14 at 19:18 +0100, Jean-Claude Durand wrote:
> My lexical grammar (I use antlr v3.2):
> 
> lexer grammar Lex; 
> options 
> { language=Java; }
> 
> 
> WS: ( ' ' | '\t' | '\n' )+ { $channel=HIDDEN; } ;
> 
> 
> FIN : '-FIN-' ;
> Moins : '-' ;
> 
> 
> // Identifiers:
> Idf : ('A'..'Z')+ ;
> 
> 
> I want to enumerate the tokens for the following example (Main.java is
> in the archive):
> 
> 
> VLEG-XLEG-FCINFZU
> 
> 
> And the output is:
> 
> 
> ~/Soft/Antlr/LexJava: java Main test
>  --> [@-1,0:3='VLEG',<7>,1:0]
>  --> [@-1,4:4='-',<6>,1:4]
>  --> [@-1,5:8='XLEG',<7>,1:5]
> line 1:11 mismatched character 'C' expecting 'I'
>  --> [@-1,12:16='INFZU',<7>,1:12]
>  --> [@-1,17:36='                    ',<4>,channel=99,1:17]
> ~/Soft/Antlr/LexJava: 
> 
> 
> The lexer is looking for the keyword -FIN-  and not for minus sign
> followed by an identifier (which begins with an F).

This is a well-known "feature" of ANTLR lexers. that once it sees the
left prefix of a token it commits itself to only that token and will not
backup and consider other possibilities.

you need to left factor your FIN and Moins rules. Something like the
following (off the top of my head, untested, but gives the general
idea):

lexer grammar Lex; 
options { language=Java; }
tokens { FIN; }

WS: ( ' ' | '\t' | '\n' )+ { $channel=HIDDEN; } ;

Moins : '-' ( 'FIN-' { $type = FIN; } )?;

// Identifiers:
Idf : ('A'..'Z')+ ;