[antlr-interest] Lexer rules that handle numeric locale?
Jim Idle
jimi at temporal-wave.com
Thu Apr 30 16:09:01 PDT 2009
Chuck wrote:
> Does anyone have lexer rules that can be used to recognize numbers in the default Java locale?
>
> For example:
> Locale.US 12,345,678.9
> Locale.FRENCH 12 345 678,9
> Locale.ITALIAN 12.345.678,9
>
Well ANTLR is of course specifically looking for the separator
characters rather than being driven by locale. That said, I think you
could construct a set of lexer rules to do this for Java and C# at least.
If you take the standard rules from say Java .g, then where the rule
looks for '.', you would need to use input.LA(1) and test for the
separator. Because French uses space as a separator, you would need to
use a semantic predictate:
{input.LA(1) == currentSep && input.LA(2) >= '0' && input.LA(2) <= '9'}?=> .
Then for decimals {input.LA(1) == currentDec}?=> .
Then you would need to set up lexer member variables currentDec and
CurrentSep before starting the lexer.
Of course perhaps an easier way is to just look for a digit, then ask
the standard Java.util.scan to pick out the number, work out how many
characters it had to consume to do so, then use input.consume() and a
loop to gather the text of the formatted number :-) This might help:
http://java.sun.com/docs/books/tutorial/essential/io/scanning.html
Jim
More information about the antlr-interest
mailing list