[antlr-interest] Matching Last Line in ANTLR?
David-Sarah Hopwood
david-sarah at jacaranda.org
Tue Aug 18 14:22:06 PDT 2009
consiliens at gmail.com wrote:
> On 09-08-18 12:43 PM, David-Sarah Hopwood wrote:
>> consiliens at gmail.com wrote:
>>> The last line, b., doesn't match the MC_INCORRECT token because there's
>>> no newline after it. Is there an easy way to match this in ANTLR?
>> Yes. I had the same problem when matching the end of a //-style comment,
>> and solved it like this:
>>
>> fragment ENDOFLINE
>> : NEWLINE
>> | { input.LA(1) == EOF }?
>> ;
>>
>> (If this were a non-fragment rule, it would be a problem that it can
>> sometimes match no characters, but since it's a fragment, that's OK.)
>
> I want to use your solution, however it throws errors about "The
> following alternatives can never be matched: 1" for MC_QUESTION and
> MC_INCORRECT. Shouldn't the below work?
>
> MC_QUESTION : INT ('.'|')') .* ENDOFLINE;
> MC_INCORRECT : LETTER '.' .* ENDOFLINE;
> MC_CORRECT : '*' MC_INCORRECT;
>
> fragment ENDOFLINE : NEWLINE | { input.LA(1) == EOF }?;
> fragment NEWLINE : '\r'? '\n';
> fragment LETTER : ('a'..'z'|'A'..'Z');
> fragment INT : '0'..'9'+;
ANTLR is correctly warning that .* will always match to the end of the
input (since . includes '\r' and '\n'), so the NEWLINE will never be
matched. This problem occurs in both the MC_QUESTION and MC_INCORRECT
rules.
Normally you would be able to use "(options { greedy=false; } : .)*" in
place of ".*" to correct a problem like this. However, that doesn't appear
to work in this particular case (in my experience the greedy=false option
can be a bit fragile), so I would suggest:
MC_QUESTION : INT ('.' | ')') NOTNEWLINE* ENDOFLINE;
MC_INCORRECT : LETTER '.' NOTNEWLINE* ENDOFLINE;
// ~NEWLINE won't work in this case.
fragment NOTNEWLINE
: ~('\r' | '\n')
| { input.LA(2) != '\n' }? '\r'
;
// ENDOFLINE, NEWLINE, LETTER and INT unchanged
Alternatively, you might want to match bare '\r' as a newline (as used
in original MacOS). In that case change MC_QUESTION and MC_INCORRECT as
above, but use:
fragment NOTNEWLINE : ~('\r' | '\n');
fragment NEWLINE : '\r' '\n'? | '\n';
--
David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
More information about the antlr-interest
mailing list