[antlr-interest] Matching keywords in the lexer
Micheal J
open.zone at virgin.net
Fri Aug 11 00:38:16 PDT 2006
Hi,
> I defined this in my lexer to distinguish between the
> keyword "include" with other identifiers.
>
> protected
> ID options{testLiterals=true;/*I have other keywords
> defined in the parser*/}
> :
> ('_'|'a'..'z'|'A'..'Z')('_'|'a'..'z'|'A'..'Z'|'0'..'9')*
> ;
> protected
> INCLUDE : ("include" STRING SEMI) {/*I will add the
> code to deal w/ this later*/}
> ;
> ID_OR_INCLUDE
> : (ID)=>ID {$setType(ID);}
> | INCLUDE
> ;
>
> Is this correct? If so, it doesn't seem to work. What am I
> doing wrong?
>
> Any help would be greatly appreciated. Thanks.
That is one option. Using the tokens {...} construct is another option. See
the KCSParse/csharp_v1 C# grammar sample for an example of how keywords can
be defined using the tokens {...} construct. That sample actually shows both
options as it defines non-keyword (but nevertheless significant) lietrals
using the technique above.
Regardless of which option you take, I'd advise losing the ID_OR_INCLUDE
lexer rule. Let the parser decide when/where INCLUDE is a keyword or ID. For
instance the parser in KCSParse includes these rules:
nonKeywordLiterals
: "add"
....
....
| "type"
;
identifier
: IDENTIFIER
| n:nonKeywordLiterals { #n.setType(IDENTIFIER); }
;
You can see how the 'identifier' rule decides which tokens represent valid
identifiers.
Micheal
More information about the antlr-interest
mailing list