[antlr-interest] Lexer not pulling in fragments?

Jim Idle jimi at temporal-wave.com
Thu Apr 2 07:27:27 PDT 2009


Joseph Klumpp wrote:
> I'm trying to create tokens for the guards of C header files (with
> filter=true), e.g. '#define __hello_h_' => <GUARD, #define
> __hello_h_>, and have the following rules defined:
>
> GUARD	:	'#' LETTER+ WS+ IDPART '_';
> ID	:	IDPART;
>
> WS	: 	(' ' | '\n')+	{$channel = HIDDEN;};
>
> fragment
> IDPART	:	LETTER ( LETTER | DIGIT )*;
>
> fragment
> LETTER
> 	:	'$'
> 	|	'\u0041'..'\u005a'
> 	|	'\u0061'..'\u007a'
> 	|	'_'
> 	;
> 	
> fragment
> DIGIT	: 	'0'..'9';
>
> Using these rules GUARD will never appear in the token stream. If I
> change it to:
> GUARD	:	'#' LETTER+ WS+ LETTER (LETTER | DIGIT)* '_';
> the rule lexes correctly. I have two questions:
> 1. Why does it not lex correctly when I lex with IDPART?
>   
You have WS+, but the WS rule is already a +, you just need WS. This is 
probably scrweing with the analysis in some way. You shoudl be getting a 
warning about htis thoguh, are you not?
> 2. Is there a way to set the value of token GUARD to be just the
> IDPART portion of the lexem?
>   

GUARD	:	'#' LETTER+ WS idp=IDPART '_'
			{ $text = $idp.text; } // Should work
        ;





More information about the antlr-interest mailing list