[antlr-interest] "Context Sensitive" Tokens

Wed Dec 17 18:15:15 PST 2008

I guess in a more formal formulation, my problem boils down to the problem
of using a keyword as a variable in a context-free grammar, something that
is apparently handled by semantic predicates.

One solution, according to the book, would be to drop the 'TODAY' and 'NOW'
tokens, make the date rule match any alphanumeric, and use semantic
predicates to restrict the domain of matched values for that rule. Something
akin to:

query:    field '=' value;
field:    (DIGIT | ALPHA | '_')+;
value:    string | date;
date:     isoDate | { "NOW".equals(input.LT(1).getText()) ||
"TODAY".equals(input.LT(1).getText()) }? string;
string:   (DIGIT | ALPHA)+;
isoDate:  DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT;
DIGIT:    '0'..'9';
ALPHA:    'a'..'z' | 'A'..'Z';

However, according to the book, "the DFA will evaluate the predicates only
upon ambiguous sequences" which means the date rule now has the potential of
matching any alphanumeric input; certainly not what is intended.

Are there any options guys?

On Wed, Dec 17, 2008 at 8:20 PM, Mihai Danila <viridium at gmail.com> wrote:

>
> Hi,
>
> I have a fairly straightforward grammar that, unlike most mainstream formal
> languages, doesn't quote strings. It also features two alphanumeric strings
> (TODAY and NOW) with a special meaning as dates:
>
> query:    field '=' value;
> field:    (DIGIT | ALPHA | '_')+;
> value:    string | date;
> date:     isoDate | 'TODAY' | 'NOW';
> string:   (DIGIT | ALPHA)+;
> isoDate:  DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT;
> DIGIT:    '0'..'9';
> ALPHA:    'a'..'z' | 'A'..'Z';
>
> The problem with this grammar is that TODAY and NOW become their own tokens
> and can't be used as string literals or as field names. These work:
> field=TODAY, field=NOW, but these don't: TODAY=string (TODAY is a valid
> field name) and field=TODAY (TODAY is a valid string).
>
> The nasty solution is to extend the field and string rules to match these
> tokens:
>
> query:    field '=' value;
> field:    (DIGIT | ALPHA | '_')+ | TODAY | NOW;
> value:    string | date;
> date:     isoDate | TODAY | NOW;
> string:   (DIGIT | ALPHA)+ | TODAY | NOW;
> isoDate:  DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT;
> DIGIT:    '0'..'9';
> ALPHA:    'a'..'z' | 'A'..'Z';
> TODAY:    'TODAY';
> NOW:      'NOW';
>
> But these are nasty and I'd rather not use them. Fragments didn't seem to
> work for me. What is the standard solution to this problem, if any?
>
> I realize I could rewrite the grammar to use very longer tokens like
> STRING_OR_NUMBER, but that would pose the same problem; moreover, using
> these would reduce the readability of the grammar even further.
>
>
> Thanks,
> Mihai
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081217/af1b7d2a/attachment.html