[antlr-interest] Adding a Space Leads to Mismatch

Fri Feb 10 01:09:32 PST 2012

I have further simplified the grammer to the following.

Changing the "If " to "If" causes a perfectly fine match. Still
ALPHANUMERICSPACE is predicted as the input. It results in this error:
line 1:3 required (...)+ loop did not match anything at character '<EOF>'

It keeps predicting the wrong input. I have read through tons of
documents and am not seeing how to fix this whilst keeping
ALPHANUMERICSPACE (which is needed to match multiword tokens).

grammar DebugA;

@members {
  public static void main(String[] args) throws Exception {
    DebugALexer lex = new DebugALexer(new ANTLRStringStream("If "));
    Token token;
    while ((token = lex.nextToken())!=null) {
      if ("<EOF>".equals(token.getText())) break;
      System.out.println("Token: " + token.getType() + "/" + token.getText());
    }
  }
}

ruleExpression
  : IF NEWLINE?
    EOF
  ;

IF
  : 'If';

ALPHANUMERICSPACE
  : ('a'..'z' | 'A'..'Z' | '0'..'9')+ (' '+ ('a'..'z' | 'A'..'Z' | '0'..'9')+)*
  ;

WS
  : (' '|'\t')+ {skip();}
  ;

NEWLINE
  : '\r'? '\n'
  ;

Quintin Beukes

On Fri, Feb 10, 2012 at 10:17 AM, Quintin Beukes
<quintin.beukes at signio.co.za> wrote:
> I have tried to skip whitespace and have used tokens. The above
> grammar is mostly just in debug state.
>
> If I can narrow down the problem even further. The lexer keeps
> predicting the "If " to be ALPHANUMERICSPACE, so the lexer fails. I
> can actually not see why it would even do that, because this string
> can never even match ALPHANUMERICSPACE.
>
> Input:
> (If )
>
> grammar DebugA;
>
> tokens {
>  IF = 'If';
>  OB = '(';
>  CB = ')';
> }
>
> fieldRules
>  : rule
>    EOF
>  ;
>
> rule
>  : OB ruleExpression CB NEWLINE
>  ;
>
> ruleExpression
>  : IF ALPHANUMERIC
>  ;
>
> ALPHANUMERIC
>  : ('a'..'z' | 'A'..'Z' | '0'..'9')+
>  ;
>
> ALPHANUMERICSPACE
>  : ('a'..'z' | 'A'..'Z' | '0'..'9')+ (' '+ ('a'..'z' | 'A'..'Z' | '0'..'9')+)*
>  ;
>
> WS
>  : (' '|'\t')+ {skip();}
>  ;
>
> NEWLINE
>  : '\r'? '\n'
>  ;
>
>
> Quintin Beukes
>
> On Thu, Feb 9, 2012 at 9:30 PM, Jim Idle <jimi at temporal-wave.com> wrote:
>> Don't use 'strings' in your parser, create real tokens and list the
>> keywords and punctuation in the lexer before the generic rule. Also, it
>> does not look like you need the spaces, so try skipping them:
>>
>> LPAREN: '(' ;
>> ...
>> KEYWORD: 'keyword';
>> ....
>> ALPHANUMERICSPACE: 'A'..'Z'+ ... etc
>>
>> WS: (' '|'\t')+ { skip(); } ;  // Then remove WS refs in your parser
>>
>>
>> Jim
>>
>>> -----Original Message-----
>>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>>> bounces at antlr.org] On Behalf Of Quintin Beukes
>>> Sent: Thursday, February 09, 2012 11:20 AM
>>> To: antlr-interest at antlr.org
>>> Subject: Re: [antlr-interest] Adding a Space Leads to Mismatch
>>>
>>> I debugged the Lexer, and it seems that it's predictions for the next
>>> token always seems to match against ALPHANUMERICSPACE.
>>>
>>> How can I resolve such a prediction error? Even if just pointing me to
>>> the wiki.
>>>
>>> thanks,
>>> Quintin Beukes
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>>> email-address
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address