[antlr-interest] Matching Last Line in ANTLR?

Tue Aug 18 14:05:11 PDT 2009

On 09-08-18 02:22 PM, Gavin Lambert wrote:
> At 08:08 19/08/2009, consiliens at gmail.com wrote:
>  >I want to use your solution, however it throws errors about "The
>  >following alternatives can never be matched: 1" for MC_QUESTION
>  >and MC_INCORRECT. Shouldn't the below work?
>  >
>  >MC_QUESTION : INT ('.'|')') .* ENDOFLINE;
>  >MC_INCORRECT : LETTER '.' .* ENDOFLINE;
>  >MC_CORRECT : '*' MC_INCORRECT;
>  >
>  >fragment ENDOFLINE : NEWLINE | { input.LA(1) == EOF }?;
>
> No. You can't use a .* wildcard loop without (a) always having at least
> one termination character and (b) specifying it inline rather than in a
> subrule.
>
> If you remove the .* (or make it more specific, eg. WS*) then it should
> work.
>
>

For testing I removed the .* and, while there are no errors, it still 
doesn't match b. as the token MC_INCORRECT unless there is a newline 
after it. The purpose of .*, within the context of this grammar, is to 
match the text between the line identifier and the line end. So the 
input could be
1. Is ANTLR useful?
*a. True
b. False

The existing regular expression based parser solves many of these issues 
in an elegant way, however I want to use another tool for language 
recognition. I'm hoping that this ANTLR grammar will at least be able to 
reach feature parity.

Sample Input:
1.
*a.
b.

MC_QUESTION  : INT ('.'|')') ENDOFLINE;
MC_INCORRECT : LETTER '.' ENDOFLINE;
MC_CORRECT   : '*' MC_INCORRECT;

fragment ENDOFLINE : NEWLINE | { input.LA(1) == EOF }?;
fragment NEWLINE : '\r'? '\n';
fragment LETTER  : ('a'..'z'|'A'..'Z');
fragment INT     : '0'..'9'+;