[antlr-interest] antlr 3 lexer question

Tue Nov 16 10:33:21 PST 2010

Thanks a lot John!
This is what I needed.

Rgds
Philippe Frankson

-----Original Message-----
From: John B. Brodie [mailto:jbb at acm.org] 
Sent: 16 November 2010 16:53
To: Philippe Frankson
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] antlr 3 lexer question

Greetings!

On Tue, 2010-11-16 at 11:37 +0100, Philippe Frankson wrote:
> Hi,
> 
> I spent quite some time to find a solution to the following problem but
> I could not find a suitable solution so any help would be very much
> appreciated.
> 
> When I have the following input:
> row1.subrow1.subsubrow1..row1.subrow1.subsubrow5
> I would like the lexer to return the following tokens: NAME RANGE NAME
> Where RANGE is '..', the first NAME would be 'row1.subrow1.subsubrow1'
> and the second one ' row1.subrow1.subsubrow5'.
> For info, the dot is not mandatory (we can have row1 alone, for
> example).
> Let's assume that we allow any alpha characters (apart from the dot) ->
> fragment ALPHA 	: ('a'..'z'|'A'..'Z');
> 
> Rem.: it is important to me to have a solution in the lexer side (I know
> it is possible to solve this in the parser but I would like to avoid
> it).
> 

sometimes syntactic predicates can be Good (but be careful!)

try this:

NAME : ID ( ('.' ALPHA)=> '.' ID )* ;

RANGE : '..' ;

fragment ID : ALPHA (ALPHA|DIGIT)* ;
fragment ALPHA : ('a'..'z')|('A'..'Z') ;
fragment DIGIT : '0'..'9' ;

hope this helps...
   -jbb