[antlr-interest] Matching keywords in the lexer

Micheal J open.zone at virgin.net
Fri Aug 11 00:38:16 PDT 2006


Hi,

>  I defined this in my lexer to distinguish between the 
> keyword "include" with other identifiers.
> 
>      protected
>      ID options{testLiterals=true;/*I have other keywords 
> defined in the parser*/}
>               : 
> ('_'|'a'..'z'|'A'..'Z')('_'|'a'..'z'|'A'..'Z'|'0'..'9')*
>               ;
>      protected
>      INCLUDE  : ("include" STRING SEMI) {/*I will add the 
> code to deal w/ this later*/}
>               ;
>      ID_OR_INCLUDE 
>               : (ID)=>ID  {$setType(ID);}
>               | INCLUDE                          
> 		  ;
> 
>  Is this correct? If so, it doesn't seem to work. What am I 
> doing wrong?
> 
>  Any help would be greatly appreciated. Thanks.

That is one option. Using the tokens {...} construct is another option. See
the KCSParse/csharp_v1 C# grammar sample for an example of how keywords can
be defined using the tokens {...} construct. That sample actually shows both
options as it defines non-keyword (but nevertheless significant) lietrals
using the technique above.

Regardless of which option you take, I'd advise losing the ID_OR_INCLUDE
lexer rule. Let the parser decide when/where INCLUDE is a keyword or ID. For
instance the parser in KCSParse includes these rules:

nonKeywordLiterals
	:	"add"
	....
	....
	|	"type"
	;
	
identifier
	:	IDENTIFIER
	|	n:nonKeywordLiterals { #n.setType(IDENTIFIER); }
	;

You can see how the 'identifier' rule decides which tokens represent valid
identifiers.


Micheal



More information about the antlr-interest mailing list