[antlr-interest] Parsing whole-line comments?

Christian Convey christian.convey at gmail.com
Sun Jun 6 12:09:02 PDT 2010


Hi John,

Thanks for the ideas.  The "{ $type = ..." approach sounds viable.
But it still seems like a messier solution than I was hoping for when
I decided to take ANTLR for a test drive.

Do you know why ANTLR lacks regular expressions that can match the
beginning-of-line?  It seems to me like it would go a long way to
making line-oriented languages easier to describe.  I can't think of
any good reason for ANTLR to not support this, at least as an option.

- C

On Sun, Jun 6, 2010 at 2:16 PM, John B. Brodie <jbb at acm.org> wrote:
> Greetings!
>
> On Sun, 2010-06-06 at 12:19 -0400, Christian Convey wrote:
>> > Alternatively, you can apply semantic predicate to lexer rules like this:
>> > ------------------------
>> >
>> > C:  { $pos == 0 }?=> 'C' ;
>> >
>> > ------------------------
>> >
>> > It should only match "C" at the beginning of the line, but I found (in
>> > my noob experiences) semantic predicate can be pretty tricky due to
>> > "hoisting out" business and how it affects prediction DFA construction -
>> > I'm sure more experienced hands can tell you better.
>>
>> Thanks.  But I'm actually pretty against intermixing lexical,
>> grammatical, and semantic rules.  At that point (at least in my
>> particular project) I've given up most of the clarity that I was
>> hoping to gain by using ANTLR as opposed to a hand-written recursive
>> descent parser.
>>
>> I think at this point I'm just going to hand-write the parser for my
>> DSL.  Thanks very much for the help.
>>
>
> you might look at the Python lexer in the examples. It seems to me the
> Python lexer would have a similar problem to yours --- identifying white
> space at the beginning of a line --- your case seems a little simpler
> because you seem to care about just the first letter at the beginning of
> the line.
>
> also perhaps realizing that the first character of a line must be
> preceeded by a new-line character (except the very first line).
>
> so:
>
> tokens { C; E; }
>
> ......
>
> NEWLINE : ( '\r' | '\n' )+  // for the last line....
>   ( 'C' { $type = C; }
>   | 'E' { $type = E; }
> //..... other first-char possibilities go here
>   )
>   ;
>
> CALL : 'CALL' ;
> ID : ('a'..'z'|'A'..'Z')+ // or whatever
>
> and of course create a wrapper around the input stream in order to
> supply a new-line as the very first character and then the actual input
> text as the rest of the stream. (in effect append a new-line to the
> front of the input)
>
> just a thought.....
>   -jbb
>
>
>


More information about the antlr-interest mailing list