[antlr-interest] Parsing whole-line comments?

Junkman j at junkwallah.org
Sun Jun 6 07:39:34 PDT 2010


It's probably better to keep lexer simple - just convert character
stream into a token stream - and push contextual constraints like
"beginning of the line" into parsing rules, like this:

----------------
/* Tokens */
NEWLINE: '\n' ;
E:  'E';
C:  'C';
CALL: 'CALL';
// default greediness ensures "CALL" is matched as CALL instead of C.


/* Parsing rules */
stmt : E ... NEWLINE
     | C ... NEWLINE
     | CALL ... NEWLINE
     ;
----------------

Use stmt as the start symbol for the parser, and you have imposed
contextual rules for tokens via defining what are valid stmt's.

Christian Convey wrote:
>>> That is, <beginning of line> <the letter C> <zero or more
>>> non-end-of-line characters> <end-of-line>
>>>
>>> My problem is, to my knowledge ANTLR won't let me define tokens that
>>> match on the beginning of a line ('^').
>>>
>>> Any suggestions?
>>
>> There is no need to match such positions: when you match a certain line (a
>> token that ends with a line break), the next character will be the first in
>> a (new) line.
>> Something like this should do the trick:
>>
>> grammar Test;
>> parse
>>   : (Comment | Line)+ EOF
>>   ;
>> Comment
>>   :  'C' ~('\r' | '\n')* (NewLine | EOF)
>>   ;
>> Line
>>   :  ~'C' ~('\r' | '\n')* (NewLine | EOF)
>>   ;
>> fragment
>> NewLine
>>   :  '\r'? '\n'
>>   |  '\r'
>>   ;
> 
> Thanks, that may work for my particular language, because I may have
> no other tokens that begin with a capital letter 'C'.
> 
> But let me wax hypothetical for a minute.  Suppose that in other,
> non-comment lines, I have need to support another token that begins
> with a capital C.  For example, 'CALL'.   So my DSL might have a
> program like this:
> 
> C My test
> E CALL FOO
> CALL This is a comment because 'C' is in first column.
> 
> Any suggestions for how to an ANTLR lexeme/grammar should handle this?
>  My impression is that something like Flex, whose token regex's can
> match the beginning-of-line imaginary character, would just let me do
> this:
> 
> CommentToken ::= ^C.*$
> CallToken ::= ~(^)CALL
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 



More information about the antlr-interest mailing list