[antlr-interest] Adding a Space Leads to Mismatch

Jim Idle jimi at temporal-wave.com
Fri Feb 10 14:14:22 PST 2012


Logically, this is never going to work, regardless of what ANTLR is
predicting or not. As soon as you enter the ALPHANUMERIC rule you will
consume the next space plus whatever, whether it is a keyword or anything
else. You are somehow expecting the LEXER to 'know' what you mean and it
cannot do that.

You do not need to consume the space and next word, you need to just have:

ALPHANUMERIC
  : ('a'..'z' | 'A'..'Z' | '0'..'9')+ ;

and

words: ALPHANUMERIC+ ;


You can get the whole text of the words rule easily enough if you need it.

However, if you will have cases where the words like 'If' are not always
keywords, then you will need a parser rule that allows that (keywords as
identifiers basically).

I am not sure what you are trying to achieve here, but perhaps you are
over simplifying your problem?

Jim


> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Quintin Beukes
> Sent: Friday, February 10, 2012 1:10 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Adding a Space Leads to Mismatch
>
> I have further simplified the grammer to the following.
>
> Changing the "If " to "If" causes a perfectly fine match. Still
> ALPHANUMERICSPACE is predicted as the input. It results in this error:
> line 1:3 required (...)+ loop did not match anything at character
> '<EOF>'
>
> It keeps predicting the wrong input. I have read through tons of
> documents and am not seeing how to fix this whilst keeping
> ALPHANUMERICSPACE (which is needed to match multiword tokens).
>
> grammar DebugA;
>
> @members {
>   public static void main(String[] args) throws Exception {
>     DebugALexer lex = new DebugALexer(new ANTLRStringStream("If "));
>     Token token;
>     while ((token = lex.nextToken())!=null) {
>       if ("<EOF>".equals(token.getText())) break;
>       System.out.println("Token: " + token.getType() + "/" +
> token.getText());
>     }
>   }
> }
>
> ruleExpression
>   : IF NEWLINE?
>     EOF
>   ;
>
> IF
>   : 'If';
>
> ALPHANUMERICSPACE
>   : ('a'..'z' | 'A'..'Z' | '0'..'9')+ (' '+ ('a'..'z' | 'A'..'Z' |
> '0'..'9')+)*
>   ;
>
> WS
>   : (' '|'\t')+ {skip();}
>   ;
>
> NEWLINE
>   : '\r'? '\n'
>   ;
>
> Quintin Beukes
>
> On Fri, Feb 10, 2012 at 10:17 AM, Quintin Beukes
> <quintin.beukes at signio.co.za> wrote:
> > I have tried to skip whitespace and have used tokens. The above
> > grammar is mostly just in debug state.
> >
> > If I can narrow down the problem even further. The lexer keeps
> > predicting the "If " to be ALPHANUMERICSPACE, so the lexer fails. I
> > can actually not see why it would even do that, because this string
> > can never even match ALPHANUMERICSPACE.
> >
> > Input:
> > (If )
> >
> > grammar DebugA;
> >
> > tokens {
> >  IF = 'If';
> >  OB = '(';
> >  CB = ')';
> > }
> >
> > fieldRules
> >  : rule
> >    EOF
> >  ;
> >
> > rule
> >  : OB ruleExpression CB NEWLINE
> >  ;
> >
> > ruleExpression
> >  : IF ALPHANUMERIC
> >  ;
> >
> > ALPHANUMERIC
> >  : ('a'..'z' | 'A'..'Z' | '0'..'9')+
> >  ;
> >
> > ALPHANUMERICSPACE
> >  : ('a'..'z' | 'A'..'Z' | '0'..'9')+ (' '+ ('a'..'z' | 'A'..'Z' |
> > '0'..'9')+)*
> >  ;
> >
> > WS
> >  : (' '|'\t')+ {skip();}
> >  ;
> >
> > NEWLINE
> >  : '\r'? '\n'
> >  ;
> >
> >
> > Quintin Beukes
> >
> > On Thu, Feb 9, 2012 at 9:30 PM, Jim Idle <jimi at temporal-wave.com>
> wrote:
> >> Don't use 'strings' in your parser, create real tokens and list the
> >> keywords and punctuation in the lexer before the generic rule. Also,
> >> it does not look like you need the spaces, so try skipping them:
> >>
> >> LPAREN: '(' ;
> >> ...
> >> KEYWORD: 'keyword';
> >> ....
> >> ALPHANUMERICSPACE: 'A'..'Z'+ ... etc
> >>
> >> WS: (' '|'\t')+ { skip(); } ;  // Then remove WS refs in your parser
> >>
> >>
> >> Jim
> >>
> >>> -----Original Message-----
> >>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> >>> bounces at antlr.org] On Behalf Of Quintin Beukes
> >>> Sent: Thursday, February 09, 2012 11:20 AM
> >>> To: antlr-interest at antlr.org
> >>> Subject: Re: [antlr-interest] Adding a Space Leads to Mismatch
> >>>
> >>> I debugged the Lexer, and it seems that it's predictions for the
> >>> next token always seems to match against ALPHANUMERICSPACE.
> >>>
> >>> How can I resolve such a prediction error? Even if just pointing me
> >>> to the wiki.
> >>>
> >>> thanks,
> >>> Quintin Beukes
> >>>
> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>> Unsubscribe:
> >>> http://www.antlr.org/mailman/options/antlr-interest/your-
> >>> email-address
> >>
> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:
> >> http://www.antlr.org/mailman/options/antlr-interest/your-email-
> addres
> >> s
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list