[antlr-interest] Lexing nested comments

Michael Siff siff.michael at gmail.com
Wed Feb 10 13:16:21 PST 2010


Thanks again, Jim and Gavin for your very helpful responses.

Unfortunately, there's one more twist which I did not explain because I
had not realized it was part of the problem. To explain in more detail,
I do want nested multiline comments of the form /* ... */ as
mentioned. But the competing non-comment tokens are not really:

  /*:bool:*/

but instead,

  /*:bool

and then, later a TYPE_END token which is ':*/'

the reason for this is the language uses comment-like syntax to specify
explicit types which can actually be recursively constructed to allow
for arrays and even multidimensional arrays. So a syntactically valid
assignment might be:

  /*:bool[]:*/ $a = $c ;

which should produce tokens:
  BOOL [ ] TYPE_END ID = ID ;

the problem with the solutions that Jim and Gavin have just proposed is
that it now gives me just:

  BOOL ID = ID ;

because it matches the entire bool to the closing */ as part of the
multiline-comment rule. Here is what I am using (based on what Jim
sent):

  fragment BOOL :  ;
  TYPE_END : ':*/' ;
  MULTILINE_COMMENT : '/*'
    ( options {greedy=false;} :
       (':bool')=> ':bool' { $type = BOOL; }
    | ('/*')=> MULTILINE_COMMENT { $channel = HIDDEN; }
    | .
    )*
    '*/'
    ;

Any further suggestions for how to properly scan /*:bool[]:*/?

Thanks!

- Michael


On 2/10/10 3:27 PM, Jim Idle wrote:
> 
> 
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of Michael Siff
>> Sent: Wednesday, February 10, 2010 12:25 PM
>> To: antlr-interest at antlr.org
>> Subject: Re: [antlr-interest] Lexing nested comments
>>
>> Jim, first of all, thank you very much for the prompt reply. What you
>> sent seems to do the trick quite nicely. I had tried listing BOOL
>> first,
>> but, for whatever reason, NESTED still seems to take precedence.
>>
>> You are right, the source language in question is not something I would
>> recommend programming in. What I sent was a simplification of a
>> pedagogical language I have designed call php-- that is essentially a
>> subset (and much less powerful subset at that) of PHP. The idea is to
>> add explicit types to an implicitly typed language and still let the
>> explicitly typed version sneak through the original language's parser
>> (so to have the new language remain essentially a subset). This way
>> students can try out simple php-- programs using command-line php. (The
>> same idea can be applied to most any language that has multiline
>> comments.) Of course, the nested comment ability is not strictly
>> necessary - but as a pedagogical language it demonstrates to
>> introductory compilers students some of the lexical-analysis challenges
>> that compiler writers have historically faced.
> 
> Ah OK - I see the point of it now :-)
> 
> Jim
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list