[antlr-interest] Lexer rules that handle numeric locale?

Jim Idle jimi at temporal-wave.com
Thu Apr 30 16:09:01 PDT 2009


Chuck wrote:
> Does anyone have lexer rules that can be used to recognize numbers in the default Java locale?
>
> For example:
>   Locale.US         12,345,678.9
>   Locale.FRENCH     12 345 678,9
>   Locale.ITALIAN    12.345.678,9
>   
Well ANTLR is of course specifically looking for the separator 
characters rather than being driven by locale. That said, I think you 
could construct a set of lexer rules to do this for Java and C# at least.

If you take the standard rules from say Java .g, then where the rule 
looks for '.', you would need to use input.LA(1) and test for the 
separator. Because French uses space as a separator, you would need to 
use a semantic predictate:

{input.LA(1) == currentSep && input.LA(2) >= '0' && input.LA(2) <= '9'}?=> .

Then for decimals  {input.LA(1) == currentDec}?=> .

Then you would need to set up lexer member variables currentDec and 
CurrentSep before starting the lexer.

Of course perhaps an easier way is to just look for a digit, then ask 
the standard Java.util.scan to pick out the number, work out how many 
characters it had to consume to do so, then use input.consume() and a 
loop to gather the text of the formatted number :-) This might help:

http://java.sun.com/docs/books/tutorial/essential/io/scanning.html

Jim




More information about the antlr-interest mailing list