[antlr-interest] antlr 3 lexer question

Tue Nov 16 07:52:41 PST 2010

Greetings!

On Tue, 2010-11-16 at 11:37 +0100, Philippe Frankson wrote:
> Hi,
> 
> I spent quite some time to find a solution to the following problem but
> I could not find a suitable solution so any help would be very much
> appreciated.
> 
> When I have the following input:
> row1.subrow1.subsubrow1..row1.subrow1.subsubrow5
> I would like the lexer to return the following tokens: NAME RANGE NAME
> Where RANGE is '..', the first NAME would be 'row1.subrow1.subsubrow1'
> and the second one ' row1.subrow1.subsubrow5'.
> For info, the dot is not mandatory (we can have row1 alone, for
> example).
> Let's assume that we allow any alpha characters (apart from the dot) ->
> fragment ALPHA 	: ('a'..'z'|'A'..'Z');
> 
> Rem.: it is important to me to have a solution in the lexer side (I know
> it is possible to solve this in the parser but I would like to avoid
> it).
> 

sometimes syntactic predicates can be Good (but be careful!)

try this:

NAME : ID ( ('.' ALPHA)=> '.' ID )* ;

RANGE : '..' ;

fragment ID : ALPHA (ALPHA|DIGIT)* ;
fragment ALPHA : ('a'..'z')|('A'..'Z') ;
fragment DIGIT : '0'..'9' ;

hope this helps...
   -jbb