[antlr-interest] Lexical rules calling lexical rules

Mon Jul 23 11:34:33 PDT 2007

On 7/24/07, mitchellch <mitchellch at comcast.net> wrote:
>
>
>
>
> Why does the following generate
> (MismatchedTokenException(0!=0)) when I reference either
> C1ID or C2ID?
>
>
>
> fragment ID :
> ('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'_'|'0'..'9')* ;
>
> C1ID        :     ID;
>
> C2ID        :     ID;
>
>
>
> For now I want C1 and C2 IDs to be generic IDs, but I eventually plan to
> evolve them to be more specific.
>
Lexing is done independently of parsing, so you can't have multiple
lexer rules matching the same characters with which rule to use
determined by parser context. In the above grammar ANTLR will choose
to match C1ID for any occurrence of ID (it being first in the
grammar), so any reference to C2ID would result in a mismatched token.
Assuming some overlap between the final C1ID and C2ID rules (otherwise
specialise them now) you are probably best splitting it a rule for
C1ID only IDs, a rule for C2ID only IDs and one for common IDs. Then
use parser rules to combine common and specific tokens.
For instance, if C1IDs could include numbers while C2IDs couldn't
(tokens with only letters being valid for either C1ID or C2ID):
c1id: ID|C1ID;
c2id: ID;
ID: 'a'..'z'+;
C1ID: 'a'..'z' ('a'..'z'|'0'..'9')+;

Then use c1id and c2id instead of C1ID and C2ID.

Tom.
>
>
> Thanks.
>
> -Mitch