[antlr-interest] Lexing nested comments
Michael Siff
siff.michael at gmail.com
Wed Feb 10 13:16:21 PST 2010
Thanks again, Jim and Gavin for your very helpful responses.
Unfortunately, there's one more twist which I did not explain because I
had not realized it was part of the problem. To explain in more detail,
I do want nested multiline comments of the form /* ... */ as
mentioned. But the competing non-comment tokens are not really:
/*:bool:*/
but instead,
/*:bool
and then, later a TYPE_END token which is ':*/'
the reason for this is the language uses comment-like syntax to specify
explicit types which can actually be recursively constructed to allow
for arrays and even multidimensional arrays. So a syntactically valid
assignment might be:
/*:bool[]:*/ $a = $c ;
which should produce tokens:
BOOL [ ] TYPE_END ID = ID ;
the problem with the solutions that Jim and Gavin have just proposed is
that it now gives me just:
BOOL ID = ID ;
because it matches the entire bool to the closing */ as part of the
multiline-comment rule. Here is what I am using (based on what Jim
sent):
fragment BOOL : ;
TYPE_END : ':*/' ;
MULTILINE_COMMENT : '/*'
( options {greedy=false;} :
(':bool')=> ':bool' { $type = BOOL; }
| ('/*')=> MULTILINE_COMMENT { $channel = HIDDEN; }
| .
)*
'*/'
;
Any further suggestions for how to properly scan /*:bool[]:*/?
Thanks!
- Michael
On 2/10/10 3:27 PM, Jim Idle wrote:
>
>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of Michael Siff
>> Sent: Wednesday, February 10, 2010 12:25 PM
>> To: antlr-interest at antlr.org
>> Subject: Re: [antlr-interest] Lexing nested comments
>>
>> Jim, first of all, thank you very much for the prompt reply. What you
>> sent seems to do the trick quite nicely. I had tried listing BOOL
>> first,
>> but, for whatever reason, NESTED still seems to take precedence.
>>
>> You are right, the source language in question is not something I would
>> recommend programming in. What I sent was a simplification of a
>> pedagogical language I have designed call php-- that is essentially a
>> subset (and much less powerful subset at that) of PHP. The idea is to
>> add explicit types to an implicitly typed language and still let the
>> explicitly typed version sneak through the original language's parser
>> (so to have the new language remain essentially a subset). This way
>> students can try out simple php-- programs using command-line php. (The
>> same idea can be applied to most any language that has multiline
>> comments.) Of course, the nested comment ability is not strictly
>> necessary - but as a pedagogical language it demonstrates to
>> introductory compilers students some of the lexical-analysis challenges
>> that compiler writers have historically faced.
>
> Ah OK - I see the point of it now :-)
>
> Jim
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list