[antlr-interest] "Context Sensitive" Tokens

Mihai Danila viridium at gmail.com
Thu Dec 18 06:20:51 PST 2008


Hi Gavin,
In fact, this is what I ended up doing. I did create an additional umbrella
rule for all these tokens, to avoid duplicating tokens like TODAY and NOW
beyond this rule:

alphanumericToken: ALPHA | DIGIT | TODAY | NOW | ... ;
field: ('_' | alphanumericToken)+;
string: alphanumericToken+;

I understand your observation about what the rule can match, but the new
rule interpretation is now shifted from the natural interpretation of rules:
date - matches ISO dates or the string TODAY or the string NOW.
string - matches any alphanumeric string.

Here, there is no mention of the string TODAY, the string NOW, or AND, NOT,
and OR (which also happen to be tokens the actual grammar) in the context of
a string rule. A reader of the formal grammar will have to stop and think
about why the string rule includes TODAY and NOW; are these special strings?

As another disadvantage, note that this approach forces a more strict
management of the tokens. One can no longer add new tokens such as
'CURRENT_TIME' in the body of other rules without extending
alphanumericToken accordingly; I don't see this as a problem for my current
gramar, as I expect little maintenance, but in general the method adds
maintenance overhead.


Mihai


On Thu, Dec 18, 2008 at 2:44 AM, Gavin Lambert <antlr at mirality.co.nz> wrote:

> At 14:20 18/12/2008, Mihai Danila wrote:
>
>> The problem with this grammar is that TODAY and NOW become their own
>> tokens and can't be used as string literals or as field names. These work:
>> field=TODAY, field=NOW, but these don't: TODAY=string (TODAY is a valid
>> field name) and field=TODAY (TODAY is a valid string).
>>
>> The nasty solution is to extend the field and string rules to match these
>> tokens:
>
>
>>
>> query:    field '=' value;
>> field:    (DIGIT | ALPHA | '_')+ | TODAY | NOW;
>> value:    string | date;
>> date:     isoDate | TODAY | NOW;
>> string:   (DIGIT | ALPHA)+ | TODAY | NOW;
>> isoDate:  DIGIT DIGIT '-' DIGIT DIGIT '-' DIGIT DIGIT;
>> DIGIT:    '0'..'9';
>> ALPHA:    'a'..'z' | 'A'..'Z';
>> TODAY:    'TODAY';
>> NOW:      'NOW';
>>
>> But these are nasty and I'd rather not use them. Fragments didn't seem to
>> work for me. What is the standard solution to this problem, if any?
>>
>
> My standard solution is to do exactly that (although normally I would try
> to consolidate DIGIT and ALPHA into single multi-digit and alphanumeric
> tokens).  If, in the context of a "field", you can match either a DIGIT or
> an ALPHA or a TODAY then that's what the rule should express.  (If you like,
> when you match a TODAY you can convert it to a different token type [eg.
> multiple ALPHAs] when constructing an AST.  If you *are* constructing an
> AST, of course.)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081218/d57b5b91/attachment.html 


More information about the antlr-interest mailing list