[antlr-interest] Lexer bug?

Jim Idle jimi at temporal-wave.com
Tue Oct 23 10:19:38 PDT 2007


Here are two possibilities for the number vs range question, that should be
fairly self explanatory. Again, whether ANTLR should deal with the input as
specified in previous posts is of course a different subject to how get a
working lexer at the moment :-)

The first returns the range as a single token, which may or may not be
useful to people and means that you need to cater for misconstructed ranges
in the lexer (but you generally already have to do things like this for
unterminated strings and so on so that the error messages are closer to the
source of the problem). 

The second is arranged to return three tokens for the range, without
resorting to multiple tokens per lexer rule (though I think that we need to
make this happen more easily as it does make some things easier). Of course,
the predicate also protects against '99.' which is probably acceptable in
some languages, so watch for this if you need that to be a valid number - it
would be easiest to accept NUMBER DOT in the parser I think, but it could be
handled in the lexer.

// Single token range, explicitly inlined separation of NUMBER and RANGE
//
grammar fred;

tokens
{
	RANGE;
}

start
	:	(number | range)+
	;
	
number
	: NUMBER
	;

range
	: RANGE
	;
	
NUMBER
	: ('0'..'9')+
		(
			 '.'
			 	(
			 		 '.' ('0'..'9')+ { $type = RANGE; }
			 		| ('0'..'9')+
			 		| // ERROR
			 	)
			| // Just an integer
		)
	;
	
OTHER
	: . { $channel = HIDDEN; }
	;


// Return decimal numeric as one token but 888..888 as NUMBER RANGE NUMBER
// Also shows that '.' can still be recognized on its own, just for kicks.
// Try the input 999.88 . . 666..667
//
grammar harry;

start 
	:	(number_range)+
	;
	
number_range
	: NUMBER ( RANGE NUMBER)?
	| DOT
	;
	
NUMBER
	: ('0'..'9')+
		(
			 ('.' '0'..'9')=> ('.' ('0'..'9')+)
			|// Just an integer
		)
	;

RANGE
	:	 '..'
	;
	
DOT
	: '.'
	;
	
OTHER
	: . {$channel = HIDDEN; }
	;


I hope that these are helpful in some small way :-) I tested these in
ANTLRWorks 1.1.3.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Gavin Lambert
> Sent: Tuesday, October 23, 2007 4:47 AM
> To: Clifford Heath; antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Lexer bug?
> 
> At 13:47 23/10/2007, Clifford Heath wrote:
>  >With regard to the suggestions offered, I'm not sure I
>  >understand all of them, and if I do, I'm not sure I want
>  >to implement that way. For example, it seemed that one
>  >suggestion would have it that I should recognize
>  >the string "0.12 ..  3.5" as a single token... and I'm
>  >*sure* I don't want to do that!
> 
> True, although you could combine it with the method shown in the
> wiki regarding how to emit multiple tokens from a single lexer
> rule.  Which admittedly is a little messy too, but that's mostly
> glue code.  Once that's in place the actual procedure is pretty
> straightforward.
> 
> 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.488 / Virus Database: 269.15.6/1086 - Release Date:
> 10/22/2007 7:57 PM
> 

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.488 / Virus Database: 269.15.6/1086 - Release Date: 10/22/2007
7:57 PM
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20071023/8fd04de3/attachment.html 


More information about the antlr-interest mailing list