[antlr-interest] Lexing nested comments

Jim Idle jimi at temporal-wave.com
Wed Feb 10 11:58:54 PST 2010


Have you listed you BOOL rule before your NESTED rule? That would probably work as is.

However I would do this:

fragment BOOL :;
NESTED : '/*'

		( options {greedy=false;} :
			  ('bool*/')=> 'bool' { $type = BOOL; }
			| ('/*')=>NESTED
			| .
		)*
         '*/'
       ;

I must say however that whoever thought that was a good idea really did not think it through. What language are you parsing that someone thought it was clever to do that?

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Michael Siff
> Sent: Wednesday, February 10, 2010 11:35 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Lexing nested comments
> 
> Hi, I am an ANTLR newbie, so I apologize if the answer to this question
> ends up being trivial.
> 
> I am trying to write an ANTLR lexer for a language that ignores nested
> C-style comments. So, something like:
> 
>   x = 3 /* /* this is ignored */ as is this */ ;
> 
> should just produce four non-hidden tokens: ID = NUMBER ;
> 
> I know there are several ways to approach this including using
> recursive
> definitions for the comment tokens as in something like:
> 
>  NESTED : '/*' (NESTED | .)* '*/' { $channel = HIDDEN } ;
> 
> However, the language in question has the need to consider tokens like:
> 
>  /*:bool:*/
> 
> as a way of specifying explicit type information. Currently, what I
> have
> gets the nested comments correctly, but then ignores the /*:bool:*/ as
> if it is a comment even though I have a separate rule like:
> 
>   BOOL : '/*:bool:*/' ;
> 
> Is there an easy way around this problem?
> 
> Years ago I accomplished something very similar using lex/flex, and
> then, later, in SableCC using explicit lexer states where I used a
> separate token '/*' to mark the beginning of a comment and then to
> enter
> the "comment" state (and as a side effect bumped up a nested-comment
> counter). Since '/*' is shorter than '/*:bool:*/' it did not prevent
> the
> BOOL token from being discovered; explicit states were used to indicate
> that the BOOL token should only be scanned if in the "normal" (not the
> "comment") state.
> 
> It seems to be that possibly ANTLR's semantic predicates could be used
> to solve this problem, but whenever I try as in:
> 
>   BOOL : { n == 0 }? '/*:bool:*/' ;
> 
> if n > 0 it just throws an exception rather than ignoring that rule.
> 
> Any light that can be shed on this will be greatly appreciated.
> 
> Thanks in advance,
> 
> - Michael
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address





More information about the antlr-interest mailing list