[antlr-interest] "Context Sensitive" Tokens
Mihai Danila
viridium at gmail.com
Wed Dec 17 17:20:59 PST 2008
Hi,
I have a fairly straightforward grammar that, unlike most mainstream formal
languages, doesn't quote strings. It also features two alphanumeric strings
(TODAY and NOW) with a special meaning as dates:
query: field '=' value;
field: (DIGIT | ALPHA | '_')+;
value: string | date;
date: isoDate | 'TODAY' | 'NOW';
string: (DIGIT | ALPHA)+;
isoDate: DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT;
DIGIT: '0'..'9';
ALPHA: 'a'..'z' | 'A'..'Z';
The problem with this grammar is that TODAY and NOW become their own tokens
and can't be used as string literals or as field names. These work:
field=TODAY, field=NOW, but these don't: TODAY=string (TODAY is a valid
field name) and field=TODAY (TODAY is a valid string).
The nasty solution is to extend the field and string rules to match these
tokens:
query: field '=' value;
field: (DIGIT | ALPHA | '_')+ | TODAY | NOW;
value: string | date;
date: isoDate | TODAY | NOW;
string: (DIGIT | ALPHA)+ | TODAY | NOW;
isoDate: DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT;
DIGIT: '0'..'9';
ALPHA: 'a'..'z' | 'A'..'Z';
TODAY: 'TODAY';
NOW: 'NOW';
But these are nasty and I'd rather not use them. Fragments didn't seem to
work for me. What is the standard solution to this problem, if any?
I realize I could rewrite the grammar to use very longer tokens like
STRING_OR_NUMBER, but that would pose the same problem; moreover, using
these would reduce the readability of the grammar even further.
Thanks,
Mihai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081217/fa5631a8/attachment.html
More information about the antlr-interest
mailing list