[antlr-interest] "Context Sensitive" Tokens

Wed Dec 17 17:20:59 PST 2008

Hi,

I have a fairly straightforward grammar that, unlike most mainstream formal
languages, doesn't quote strings. It also features two alphanumeric strings
(TODAY and NOW) with a special meaning as dates:

query:    field '=' value;
field:    (DIGIT | ALPHA | '_')+;
value:    string | date;
date:     isoDate | 'TODAY' | 'NOW';
string:   (DIGIT | ALPHA)+;
isoDate:  DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT;
DIGIT:    '0'..'9';
ALPHA:    'a'..'z' | 'A'..'Z';

The problem with this grammar is that TODAY and NOW become their own tokens
and can't be used as string literals or as field names. These work:
field=TODAY, field=NOW, but these don't: TODAY=string (TODAY is a valid
field name) and field=TODAY (TODAY is a valid string).

The nasty solution is to extend the field and string rules to match these
tokens:

query:    field '=' value;
field:    (DIGIT | ALPHA | '_')+ | TODAY | NOW;
value:    string | date;
date:     isoDate | TODAY | NOW;
string:   (DIGIT | ALPHA)+ | TODAY | NOW;
isoDate:  DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT;
DIGIT:    '0'..'9';
ALPHA:    'a'..'z' | 'A'..'Z';
TODAY:    'TODAY';
NOW:      'NOW';

But these are nasty and I'd rather not use them. Fragments didn't seem to
work for me. What is the standard solution to this problem, if any?

I realize I could rewrite the grammar to use very longer tokens like
STRING_OR_NUMBER, but that would pose the same problem; moreover, using
these would reduce the readability of the grammar even further.

Thanks,
Mihai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081217/fa5631a8/attachment.html