[antlr-interest] syntactic predicates in the lexer

Jim Idle jimi at temporal-wave.com
Sat Aug 11 22:02:44 PDT 2007


You need the latest and greatest to get rid of that macro error, however
the answer to this does not lie specifically with the C target but just
the lexing in general. The thing to do is to consider how/if you can
write the rules without the predicates (which although they are 'always
execute', can be discarded in the analysis phase if there appears to be
just one decision.

So, here you have a number of tokesn that start with '#' (well, 2 I
guess) and the key is to key one rule from the '#' and then set the
token type once you discover what comes next:

// Make some fragments to define the token and avoid
// the warning about tokens not being defined if they
// are only specified as imaginary in the tokens{} section
//
fragment
COMMENT: (~('\n'|'\r'))*;
fragment
INCLUDE: 'include;

C_OR_I: '#'
		(
			  (('include')=>'include')	{ $type =
INCLUDE; }
			| COMMENT				{ $type
= COMMENT;}
		)
	;

Should do what you wish (you can obviously modify the COMMENT rule if
you wish it to consume the EOL sequence).

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Matt Barringer
> Sent: Saturday, August 11, 2007 5:34 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] syntactic predicates in the lexer
> 
> Hi,
> 
> I'm trying to parse some strange syntax that looks like this:
> 
> # Comment
> #Comment
> #include <file>
> include <file>
> # include (this is a valid comment)
> 
> Where lines 1, 2, and 5 should be COMMENT tokens, as they need to
> remain
> on the main token stream with all the others, and lines 3 & 4 need to
> be
> INCLUDE tokens.
> 
> With ANTLR2, I used a predicate like this, which worked fine:
> 
> COMMENT_OR_INCLUDE
> 	:
> ( '#' "include" (' '|'<'))=>INCLUDE
> 	{ $setType(INCLUDE); }
> | ( COMMENT{ $setType(COMMENT); } )
>         ;
> 
> Trying that predicate using the C target of ANTLR 3 causes a compiler
> error about a missing REWINDFULL() function or something, so I tried
> this
> with no success, as COMMENT tokens are all that are found:
> 
> COMMENT_OR_INCLUDE
> 	: '#' ('include')=>INCLUDE
> 		{ $type = INCLUDE; }
> 	| '#' COMMENT
> 		{ $type=COMMENT; }
> 	;
> 
> fragment
> COMMENT
> 	: (~('\n'|'\r'))* ('\n'|'\r'('\n')?)
> 	;
> 
> Trying variations on this didn't work, either:
> 
> COMMENT_OR_INCLUDE
> 	: '#'
> 	( INCLUDE
> 	| COMMENT )
> 	;
> 
> Does the lexer no longer support syntactic predicates?  Is there a
> better
> way to distinguish '# include' from '#include' in the lexer?
> 
> Thanks,
> Matt


More information about the antlr-interest mailing list