[antlr-interest] Lexer not pulling in fragments?
Jim Idle
jimi at temporal-wave.com
Thu Apr 2 07:27:27 PDT 2009
Joseph Klumpp wrote:
> I'm trying to create tokens for the guards of C header files (with
> filter=true), e.g. '#define __hello_h_' => <GUARD, #define
> __hello_h_>, and have the following rules defined:
>
> GUARD : '#' LETTER+ WS+ IDPART '_';
> ID : IDPART;
>
> WS : (' ' | '\n')+ {$channel = HIDDEN;};
>
> fragment
> IDPART : LETTER ( LETTER | DIGIT )*;
>
> fragment
> LETTER
> : '$'
> | '\u0041'..'\u005a'
> | '\u0061'..'\u007a'
> | '_'
> ;
>
> fragment
> DIGIT : '0'..'9';
>
> Using these rules GUARD will never appear in the token stream. If I
> change it to:
> GUARD : '#' LETTER+ WS+ LETTER (LETTER | DIGIT)* '_';
> the rule lexes correctly. I have two questions:
> 1. Why does it not lex correctly when I lex with IDPART?
>
You have WS+, but the WS rule is already a +, you just need WS. This is
probably scrweing with the analysis in some way. You shoudl be getting a
warning about htis thoguh, are you not?
> 2. Is there a way to set the value of token GUARD to be just the
> IDPART portion of the lexem?
>
GUARD : '#' LETTER+ WS idp=IDPART '_'
{ $text = $idp.text; } // Should work
;
More information about the antlr-interest
mailing list