[antlr-interest] C-style includes: problem with parser vs. lexer rules

Mon Aug 27 05:32:04 PDT 2007

Bjoern Doebel wrote:
> Hi,
> 
> I want to parse C-style #include statements and got a working version like
> this:
> 
> fragment DIGIT  : '0'..'9';
> fragment CHAR : 'a'..'z' | 'A'..'Z';
> 
> IMPORT : '#include' ;
> GT : '>' ;
> LT : '<' ;
> WORD : CHAR (CHAR|DIGIT|'_'|'-')*;
> WS     : (' '|'\t'|'\n'|'\r')+ { self.skip(); } ;
> 
> filename : WORD ('/' WORD)* '.' WORD ;
> 
> import_r : IMPORT LT filename GT ;
> 
> 
> This works, but now I'd like to transfer the filename rule into a lexer
> rule, so I get only one single token from it. Therefore, I change the last
> two rules:
> 
> FNAME : WORD ('/' WORD)* '.' WORD ;
> 
> import_r : IMPORT LT FNAME GT;
> 
> But when I run it with e.g., "#include <foo/bar/baz.h>", I get an error:
> line 1:8 mismatched input 'foo/baz/bar.h' expecting FNAME
> 
> What am I doing wrong and why does the lexer not recognize the filename as
> FNAME?
> 
> Regards,
> Bjoern
> 

My guess is that FNAME should be a parser rule, not a lexer rule. Or
WORD hat do be changed into a fragment rule.

Best regards,
Johannes Luber