[antlr-interest] Parsing whole-line comments?
Junkman
j at junkwallah.org
Sun Jun 6 07:39:34 PDT 2010
It's probably better to keep lexer simple - just convert character
stream into a token stream - and push contextual constraints like
"beginning of the line" into parsing rules, like this:
----------------
/* Tokens */
NEWLINE: '\n' ;
E: 'E';
C: 'C';
CALL: 'CALL';
// default greediness ensures "CALL" is matched as CALL instead of C.
/* Parsing rules */
stmt : E ... NEWLINE
| C ... NEWLINE
| CALL ... NEWLINE
;
----------------
Use stmt as the start symbol for the parser, and you have imposed
contextual rules for tokens via defining what are valid stmt's.
Christian Convey wrote:
>>> That is, <beginning of line> <the letter C> <zero or more
>>> non-end-of-line characters> <end-of-line>
>>>
>>> My problem is, to my knowledge ANTLR won't let me define tokens that
>>> match on the beginning of a line ('^').
>>>
>>> Any suggestions?
>>
>> There is no need to match such positions: when you match a certain line (a
>> token that ends with a line break), the next character will be the first in
>> a (new) line.
>> Something like this should do the trick:
>>
>> grammar Test;
>> parse
>> : (Comment | Line)+ EOF
>> ;
>> Comment
>> : 'C' ~('\r' | '\n')* (NewLine | EOF)
>> ;
>> Line
>> : ~'C' ~('\r' | '\n')* (NewLine | EOF)
>> ;
>> fragment
>> NewLine
>> : '\r'? '\n'
>> | '\r'
>> ;
>
> Thanks, that may work for my particular language, because I may have
> no other tokens that begin with a capital letter 'C'.
>
> But let me wax hypothetical for a minute. Suppose that in other,
> non-comment lines, I have need to support another token that begins
> with a capital C. For example, 'CALL'. So my DSL might have a
> program like this:
>
> C My test
> E CALL FOO
> CALL This is a comment because 'C' is in first column.
>
> Any suggestions for how to an ANTLR lexeme/grammar should handle this?
> My impression is that something like Flex, whose token regex's can
> match the beginning-of-line imaginary character, would just let me do
> this:
>
> CommentToken ::= ^C.*$
> CallToken ::= ~(^)CALL
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
More information about the antlr-interest
mailing list