[antlr-interest] Lexing nested comments

Michael Siff siff.michael at gmail.com
Wed Feb 10 12:24:33 PST 2010


Jim, first of all, thank you very much for the prompt reply. What you
sent seems to do the trick quite nicely. I had tried listing BOOL first,
but, for whatever reason, NESTED still seems to take precedence.

You are right, the source language in question is not something I would
recommend programming in. What I sent was a simplification of a
pedagogical language I have designed call php-- that is essentially a
subset (and much less powerful subset at that) of PHP. The idea is to
add explicit types to an implicitly typed language and still let the
explicitly typed version sneak through the original language's parser
(so to have the new language remain essentially a subset). This way
students can try out simple php-- programs using command-line php. (The
same idea can be applied to most any language that has multiline
comments.) Of course, the nested comment ability is not strictly
necessary - but as a pedagogical language it demonstrates to
introductory compilers students some of the lexical-analysis challenges
that compiler writers have historically faced.

- Michael



On 2/10/10 2:58 PM, Jim Idle wrote:
> Have you listed you BOOL rule before your NESTED rule? That would probably work as is.
> 
> However I would do this:
> 
> fragment BOOL :;
> NESTED : '/*'
> 
> 		( options {greedy=false;} :
> 			  ('bool*/')=> 'bool' { $type = BOOL; }
> 			| ('/*')=>NESTED
> 			| .
> 		)*
>          '*/'
>        ;
> 
> I must say however that whoever thought that was a good idea really did not think it through. What language are you parsing that someone thought it was clever to do that?
> 
> Jim
> 
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of Michael Siff
>> Sent: Wednesday, February 10, 2010 11:35 AM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] Lexing nested comments
>>
>> Hi, I am an ANTLR newbie, so I apologize if the answer to this question
>> ends up being trivial.
>>
>> I am trying to write an ANTLR lexer for a language that ignores nested
>> C-style comments. So, something like:
>>
>>   x = 3 /* /* this is ignored */ as is this */ ;
>>
>> should just produce four non-hidden tokens: ID = NUMBER ;
>>
>> I know there are several ways to approach this including using
>> recursive
>> definitions for the comment tokens as in something like:
>>
>>  NESTED : '/*' (NESTED | .)* '*/' { $channel = HIDDEN } ;
>>
>> However, the language in question has the need to consider tokens like:
>>
>>  /*:bool:*/
>>
>> as a way of specifying explicit type information. Currently, what I
>> have
>> gets the nested comments correctly, but then ignores the /*:bool:*/ as
>> if it is a comment even though I have a separate rule like:
>>
>>   BOOL : '/*:bool:*/' ;
>>
>> Is there an easy way around this problem?
>>
>> Years ago I accomplished something very similar using lex/flex, and
>> then, later, in SableCC using explicit lexer states where I used a
>> separate token '/*' to mark the beginning of a comment and then to
>> enter
>> the "comment" state (and as a side effect bumped up a nested-comment
>> counter). Since '/*' is shorter than '/*:bool:*/' it did not prevent
>> the
>> BOOL token from being discovered; explicit states were used to indicate
>> that the BOOL token should only be scanned if in the "normal" (not the
>> "comment") state.
>>
>> It seems to be that possibly ANTLR's semantic predicates could be used
>> to solve this problem, but whenever I try as in:
>>
>>   BOOL : { n == 0 }? '/*:bool:*/' ;
>>
>> if n > 0 it just throws an exception rather than ignoring that rule.
>>
>> Any light that can be shed on this will be greatly appreciated.
>>
>> Thanks in advance,
>>
>> - Michael
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list