[antlr-interest] Parsing whole-line comments?

Junkman j at junkwallah.org
Sun Jun 6 08:19:29 PDT 2010


Christian Convey wrote:
>> ----------------
>> /* Tokens */
>> NEWLINE: '\n' ;
>> E:  'E';
>> C:  'C';
>> CALL: 'CALL';
>> // default greediness ensures "CALL" is matched as CALL instead of C.
> 
> Thanks, but 'C' can also be the name of a variable, as long as it's
> not in the first column.  So I don't think greediness solves the whole
> problem.
> 

I wonder if this would work better in that case:
---------------------------
/* Tokens */
NEWLINE: '\n' ;

/* Parsing rules */
stmt : 'E' ... NEWLINE
     | 'C' ... NEWLINE
     | 'CALL'  ... NEWLINE
     ;
---------------------------

Nor sure since I don't know how explicitly defined tokens are treated
differently from tokens implicitly defined in parsing rules.

Alternatively, you can apply semantic predicate to lexer rules like this:
------------------------

C:  { $pos == 0 }?=> 'C' ;

------------------------

It should only match "C" at the beginning of the line, but I found (in
my noob experiences) semantic predicate can be pretty tricky due to
"hoisting out" business and how it affects prediction DFA construction -
I'm sure more experienced hands can tell you better.

Good luck.


More information about the antlr-interest mailing list