[antlr-interest] IDENTifier rule not working for some tokens

Wed Oct 29 17:34:15 PDT 2008

On Thu, Oct 23, 2008 at 8:32 AM, Gavin Lambert <antlr at mirality.co.nz> wrote:
> At 11:27 23/10/2008, brainstorm wrote:
>>> options
>>> {
>>>         output = AST;
>>> //      backtrack = true;
>>>
>>> Don't use this unless there is no other readable way.
>>
>>What do you mean by that ? By the way, looks like it's the
>>preferred way for ANTLR if output is not defined:
>
> I think he was referring to the backtrack option.  While it can sometimes be
> useful, it can significantly slow performance of the parser, so it's better
> to avoid it if possible.
>
>>In fact, I hit a problem when defining those tokens:
>>
>>tokens {
>>(... other tokens defined...)
>>INT = 'INT';
>>}
>>
>>If I just declare "INT" (only LHS), ANTLR complains:
>>
>>warning(105): CL.g:120:14: no lexer rule corresponding to token:
>>INT
>>
>>I have to keep writing redundant statements like: INT = 'INT'; why
>>is that ?
>
> Using INT by itself defines what's called an "imaginary token" -- one that
> cannot match any input by itself, but can be emitted from either the lexer
> or parser via explicit code.
>
> Using INT='INT' defines a real token that matches that literal text in the
> input -- it's exactly identical to defining the following rule at the top of
> your grammar:
>
> INT: 'INT';
>
> So it's not redundant nor a duplication -- one is defining the name of the
> token while the other is defining the text that it matches.

I understand the concept, but I'm more on the side of:

1) Declaring an imaginary token without defining it: using the
imaginary token as real token *by default*.
2) If you want to override behaviour 1 (imaginary token differs from
the real one), actually *define* the token, for instance: ASSIG =
':=';

I think this could lead to way cleaner grammars, getting rid of
fragments and other hacks. I'm more on the side of "convention over
configuration"[1] ideal applied to code. The less verbose, the
simpler, the better.

[1] http://en.wikipedia.org/wiki/Convention_over_configuration

So I think I'm sticking with the warnings for now :_/ Any way of
silencing them ?

By the way, I've been trying the NoCaseFileStream, and cannot get it
working right for jp0:

PROGRAM

yields:

line 1:0 no viable alternative at character 'P'
line 1:2 no viable alternative at character 'O'
line 1:2 no viable alternative at character 'O'
line 1:3 no viable alternative at character 'G'
line 1:5 no viable alternative at character 'A'
line 1:5 no viable alternative at character 'A'
line 1:6 no viable alternative at character 'M'

Anyone knows why is this happening ? By the way, having to use an
special FileStream, means no practical way to use ANTLRWorks excellent
debugger... meaning: back to hard dark days of PCCTS and C++... any
easy workaround not involving setting up eclipse or similar external
IDE for more visual debugging ?

And by the way, I'm not the one to restart the case insensitivity &
i18n thread, but I definitively advocate for a simple ASCII case
insensitive FileStreamer *by default*, extending it for weird cases
such as non-ASCII exotic languages. Or perhaps a useful caseInsensitve
option ? Please, Terence, please, take it back ;P

Thanks in advance !
Roman

>
> If you did want to create an imaginary token for use in the lexer, there is
> however one somewhat annoying quirk where it also generates the warning you
> mentioned above.  You can either choose to ignore this warning (which is why
> it's a warning, not an error), or remove it from the tokens section and
> declare it as a rule like this instead:
>
> fragment INT: '0';
>
> The important points here are that it should be a fragment rule (since you
> don't want ANTLR to try to generate it itself, you just want to create a
> token id that you can refer to from other rules), and unless you're actually
> using it within the matching side of another lexer rule then its actual
> contents don't really matter (but they can't be empty or you'll get another
> warning).
>
>