[antlr-interest] Parsing whole-line comments?

Sun Jun 6 15:49:51 PDT 2010

You can, of course, do

COMMENT : '\n' 'C' (~'\n')+ ;

NEWLINE: '\n' ;

(the ordering matters for ANTLR 3's DFA construction), but the approach Brodie suggested is the common idiom since it costs less in terms of performance and does not depend on the quirks of ANTLR DFA construction.  "Start of line" is a semantic notion, whereas '\n' 'C' specifies syntax.

--Loring

----- Original Message ----
> From: Christian Convey <christian.convey at gmail.com>
> To: John B. Brodie <jbb at acm.org>
> Cc: antlr-interest at antlr.org
> Sent: Sun, June 6, 2010 12:09:02 PM
> Subject: Re: [antlr-interest] Parsing whole-line comments?
> 
> Hi John,

Thanks for the ideas.  The "{ $type = ..." approach sounds 
> viable.
But it still seems like a messier solution than I was hoping for 
> when
I decided to take ANTLR for a test drive.

Do you know why ANTLR 
> lacks regular expressions that can match the
beginning-of-line?  It 
> seems to me like it would go a long way to
making line-oriented languages 
> easier to describe.  I can't think of
any good reason for ANTLR to not 
> support this, at least as an option.

- C

On Sun, Jun 6, 2010 at 
> 2:16 PM, John B. Brodie <
> href="mailto:jbb at acm.org">jbb at acm.org> wrote:
> 
> Greetings!
>
> On Sun, 2010-06-06 at 12:19 -0400, Christian Convey 
> wrote:
>> > Alternatively, you can apply semantic predicate to lexer 
> rules like this:
>> > ------------------------
>> 
> >
>> > C:  { $pos == 0 }?=> 'C' ;
>> >
>> 
> > ------------------------
>> >
>> > It should only 
> match "C" at the beginning of the line, but I found (in
>> > my noob 
> experiences) semantic predicate can be pretty tricky due to
>> > 
> "hoisting out" business and how it affects prediction DFA construction 
> -
>> > I'm sure more experienced hands can tell you 
> better.
>>
>> Thanks.  But I'm actually pretty against 
> intermixing lexical,
>> grammatical, and semantic rules.  At that point 
> (at least in my
>> particular project) I've given up most of the 
> clarity that I was
>> hoping to gain by using ANTLR as opposed to a 
> hand-written recursive
>> descent parser.
>>
>> I 
> think at this point I'm just going to hand-write the parser for my
>> 
> DSL.  Thanks very much for the help.
>>
>
> you might look 
> at the Python lexer in the examples. It seems to me the
> Python lexer 
> would have a similar problem to yours --- identifying white
> space at the 
> beginning of a line --- your case seems a little simpler
> because you 
> seem to care about just the first letter at the beginning of
> the 
> line.
>
> also perhaps realizing that the first character of a line 
> must be
> preceeded by a new-line character (except the very first 
> line).
>
> so:
>
> tokens { C; E; }
>
> 
> ......
>
> NEWLINE : ( '\r' | '\n' )+  // for the last 
> line....
>   ( 'C' { $type = C; }
>   | 'E' { $type = E; }
> 
> //..... other first-char possibilities go here
>   )
>   
> ;
>
> CALL : 'CALL' ;
> ID : ('a'..'z'|'A'..'Z')+ // or 
> whatever
>
> and of course create a wrapper around the input stream 
> in order to
> supply a new-line as the very first character and then the 
> actual input
> text as the rest of the stream. (in effect append a 
> new-line to the
> front of the input)
>
> just a 
> thought.....
>   -jbb
>
>
>

List: 
> href="http://www.antlr.org/mailman/listinfo/antlr-interest" target=_blank 
> >http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
> href="http://www.antlr.org/mailman/options/antlr-interest/your-email-address" 
> target=_blank 
> >http://www.antlr.org/mailman/options/antlr-interest/your-email-address