[antlr-interest] Parsing whole-line comments?

Christian Convey christian.convey at gmail.com
Sun Jun 6 04:14:14 PDT 2010


>> That is, <beginning of line> <the letter C> <zero or more
>> non-end-of-line characters> <end-of-line>
>>
>> My problem is, to my knowledge ANTLR won't let me define tokens that
>> match on the beginning of a line ('^').
>>
>> Any suggestions?
>
>
> There is no need to match such positions: when you match a certain line (a
> token that ends with a line break), the next character will be the first in
> a (new) line.
> Something like this should do the trick:
>
> grammar Test;
> parse
>   : (Comment | Line)+ EOF
>   ;
> Comment
>   :  'C' ~('\r' | '\n')* (NewLine | EOF)
>   ;
> Line
>   :  ~'C' ~('\r' | '\n')* (NewLine | EOF)
>   ;
> fragment
> NewLine
>   :  '\r'? '\n'
>   |  '\r'
>   ;

Thanks, that may work for my particular language, because I may have
no other tokens that begin with a capital letter 'C'.

But let me wax hypothetical for a minute.  Suppose that in other,
non-comment lines, I have need to support another token that begins
with a capital C.  For example, 'CALL'.   So my DSL might have a
program like this:

C My test
E CALL FOO
CALL This is a comment because 'C' is in first column.

Any suggestions for how to an ANTLR lexeme/grammar should handle this?
 My impression is that something like Flex, whose token regex's can
match the beginning-of-line imaginary character, would just let me do
this:

CommentToken ::= ^C.*$
CallToken ::= ~(^)CALL


More information about the antlr-interest mailing list