[antlr-interest] Matching Last Line in ANTLR?

David-Sarah Hopwood david-sarah at jacaranda.org
Tue Aug 18 14:22:06 PDT 2009


consiliens at gmail.com wrote:
> On 09-08-18 12:43 PM, David-Sarah Hopwood wrote:
>> consiliens at gmail.com wrote:
>>> The last line, b., doesn't match the MC_INCORRECT token because there's
>>> no newline after it. Is there an easy way to match this in ANTLR?
>> Yes. I had the same problem when matching the end of a //-style comment,
>> and solved it like this:
>>
>> fragment ENDOFLINE
>>    : NEWLINE
>>    | { input.LA(1) == EOF }?
>>    ;
>>
>> (If this were a non-fragment rule, it would be a problem that it can
>> sometimes match no characters, but since it's a fragment, that's OK.)
> 
> I want to use your solution, however it throws errors about "The 
> following alternatives can never be matched: 1" for MC_QUESTION and 
> MC_INCORRECT. Shouldn't the below work?
> 
> MC_QUESTION  : INT ('.'|')') .* ENDOFLINE;
> MC_INCORRECT : LETTER '.' .* ENDOFLINE;
> MC_CORRECT   : '*' MC_INCORRECT;
> 
> fragment ENDOFLINE : NEWLINE | { input.LA(1) == EOF }?;
> fragment NEWLINE : '\r'? '\n';
> fragment LETTER  : ('a'..'z'|'A'..'Z');
> fragment INT     : '0'..'9'+;

ANTLR is correctly warning that .* will always match to the end of the
input (since . includes '\r' and '\n'), so the NEWLINE will never be
matched. This problem occurs in both the MC_QUESTION and MC_INCORRECT
rules.

Normally you would be able to use "(options { greedy=false; } : .)*" in
place of ".*" to correct a problem like this. However, that doesn't appear
to work in this particular case (in my experience the greedy=false option
can be a bit fragile), so I would suggest:

MC_QUESTION  : INT ('.' | ')') NOTNEWLINE* ENDOFLINE;
MC_INCORRECT : LETTER '.' NOTNEWLINE* ENDOFLINE;

// ~NEWLINE won't work in this case.
fragment NOTNEWLINE
  : ~('\r' | '\n')
  | { input.LA(2) != '\n' }? '\r'
  ;

// ENDOFLINE, NEWLINE, LETTER and INT unchanged


Alternatively, you might want to match bare '\r' as a newline (as used
in original MacOS). In that case change MC_QUESTION and MC_INCORRECT as
above, but use:

fragment NOTNEWLINE : ~('\r' | '\n');
fragment NEWLINE : '\r' '\n'? | '\n';

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com



More information about the antlr-interest mailing list