[antlr-interest] syntactic predicates in the lexer

Matt Barringer mbarringer at suse.de
Sat Aug 11 17:34:22 PDT 2007


Hi,

I'm trying to parse some strange syntax that looks like this:

# Comment
#Comment
#include <file>
include <file>
# include (this is a valid comment)

Where lines 1, 2, and 5 should be COMMENT tokens, as they need to remain 
on the main token stream with all the others, and lines 3 & 4 need to be 
INCLUDE tokens.

With ANTLR2, I used a predicate like this, which worked fine:

COMMENT_OR_INCLUDE 
	:
( '#' "include" (' '|'<'))=>INCLUDE 
	{ $setType(INCLUDE); }
| ( COMMENT{ $setType(COMMENT); } )
        ;

Trying that predicate using the C target of ANTLR 3 causes a compiler 
error about a missing REWINDFULL() function or something, so I tried this 
with no success, as COMMENT tokens are all that are found:

COMMENT_OR_INCLUDE
	: '#' ('include')=>INCLUDE
		{ $type = INCLUDE; }
	| '#' COMMENT
		{ $type=COMMENT; }
	;

fragment
COMMENT
	: (~('\n'|'\r'))* ('\n'|'\r'('\n')?)
	;

Trying variations on this didn't work, either:

COMMENT_OR_INCLUDE
	: '#'
	( INCLUDE
	| COMMENT )
	;

Does the lexer no longer support syntactic predicates?  Is there a better 
way to distinguish '# include' from '#include' in the lexer?

Thanks,
Matt


More information about the antlr-interest mailing list