[antlr-interest] Identifiers starting with numbers

Martin Probst mail at martin-probst.com
Wed Jul 19 13:58:52 PDT 2006


Hi,

> In our scripting language, the identifier can start with numbers. We
> changed IDENT to be like this

So what is the lexical difference between an identifier and a number?  
How do you decide if the single token: '5'
is a number or an identifier? Or 55123 for that matter?

If there is no lexical difference, you will have to make the decision  
within the parser, e.g. have '5' be a number and then a rule
identifier: IDENT | INT_NUM;
This will probably give you ambiguities, but they should be  
resolvable within the parser. If they are not, it's most likely that  
your language in itself is ambiguous. E.g. in this case:

var 5 := 12;
var foo := 5; // is foo = 5 or foo = 12?

> IDENT
> 	options {testLiterals=true;}
> 	:	('a'..'z'|'0'..'9'|'_')*   //pspsps
> 	;

This rule allows empty identifiers, you should use '+'.

Regards,
Martin




More information about the antlr-interest mailing list