[antlr-interest] Number tokenizer vs. number grammar

Sat Nov 15 22:30:35 PST 2008

At 09:50 16/11/2008, Todd O'Bryan wrote:
 >Assume that that both 2 * 3+2i and 2*3+2i should lex as NUMBER 
OP
 >NUMBER. What does that determine about my possible approaches? 
:-)

It implies that you're going to experience pain with "2+3+2i" (or 
"2/3+2i", for that matter, given that you've already said that 
this ought to be a single NUMBER).  :)

If you can require that whitespace is significant (ie. "2 / 3+2i" 
is two NUMBERs and a division, but "2/3+2i" is a single NUMBER, 
and "2 /3+2i" is simply illegal), then probably the simplest way 
to deal with this (and avoid duplication) is to define NUMBER as 
any sequence with a leading digit and any combination of digits 
and operators afterwards, with no whitespace:

fragment DIGIT : '0'..'9';
NUMBER : '-'? '.'? DIGIT (DIGIT | '+' | '-' | '/' | '.' | 'i')* ;

This will of course be able to match invalid constructs as well, 
but you can deal with that at the parser / tree parser / driver 
code level (which permits better error messages anyway).