[antlr-interest] Number tokenizer vs. number grammar

Sat Nov 15 08:36:23 PST 2008

 (One warning--I'm new to ANTLR so I haven't wrapped my brain around
syntactic/semantic predicates and may have missed something obvious
there that would make this trivial.)

I'm parsing a language that allows a great range of number constructs.
1, -3, 3.14, -5.8, 2/3, -3/-5, 3+2i, -4+i, 5-i, -3/5+2/3i, 3/5-2/3i,
7+2.3i and some other things are all legal numbers.

For the language grammar itself, it would be very nice to just have a
NUMBER token, but it would also be nice to have the ability to parse
numbers into component parts since creating a complex number involves
calling a constructor with the two real parts. The other tricky thing
is that whitespace is irrelevant in the language grammar, but relevant
in parsing numbers. For example, 3+ 2i (with a space before the 2) is
not a legal number.

I've thought about identifying a NUMBER token in the grammar and then
calling a number parser on that token as part of processing, but what
I'm afraid of is keeping the number parser and the NUMBER token in
sync. What I'm feeling like is that I need token fragments in the
language grammar that aren't fragments but are actual tokens in the
number grammar.

Am I missing a simple way to deal with this or is this just a nasty problem?

Todd