[antlr-interest] [Antlr3 grammar] how to specify alpha token, numeric token and mix of both

Gavin Lambert antlr at mirality.co.nz
Fri Oct 23 02:39:13 PDT 2009


At 16:45 23/10/2009, Hieu Phung wrote:
>Alpha   = %x41-5A;
>Numeric = %x30-39;
>Decimal = %x30-39 / ".";
>Mixed   = Alpha / Numeric;
>Text    = %x41-5A / %x30-39 / "." / "-" / " ";   <--- this is my 
>MIX token
>
>This format can be written in ABNF easily... but in Antlr, once I 
>introduce the MIX token, everything which is mixed of numeric and 
>alpha is returned as a MIX. Currently I have to use Java code in 
>action to split the MIX string. I wonder if there's a better way 
>to define tokens because my grammar now is full of Java code :(!

If you don't want to continue down that path, then I think the 
only other options are:

1. eliminate the MIX token and live with multiple sub-tokens 
generated in contexts where mixed values are expected; at the 
parser level examine sequences of other tokens and determine which 
ones are really part of one value.

2. eliminate all other tokens and just produce MIXes (ie. the 
lexer is purely consolidating whitespace vs. non-whitespace vs. 
SLANTs), then in the parser figure out which MIXes consist 
entirely of numbers or letters and accordingly whether they're 
valid in the position you find them.

Also note that if you're generating a tree for later use by a tree 
parser, you can have the parser convert the tokens once it figures 
out what type they should really be from their context.



More information about the antlr-interest mailing list