[antlr-interest] Adding a Space Leads to Mismatch

Sat Feb 11 00:01:24 PST 2012

Hi,

Thanks for your input.

I eventually settled for matching multiword keywords with
ALPHANUMERIC, and then using the parser to group it. Very similar to
the solution you proposed.

Quintin Beukes

On Sat, Feb 11, 2012 at 12:14 AM, Jim Idle <jimi at temporal-wave.com> wrote:
>
> Logically, this is never going to work, regardless of what ANTLR is
> predicting or not. As soon as you enter the ALPHANUMERIC rule you will
> consume the next space plus whatever, whether it is a keyword or anything
> else. You are somehow expecting the LEXER to 'know' what you mean and it
> cannot do that.
>
> You do not need to consume the space and next word, you need to just have:
>
> ALPHANUMERIC
>  : ('a'..'z' | 'A'..'Z' | '0'..'9')+ ;
>
> and
>
> words: ALPHANUMERIC+ ;
>
>
> You can get the whole text of the words rule easily enough if you need it.
>
> However, if you will have cases where the words like 'If' are not always
> keywords, then you will need a parser rule that allows that (keywords as
> identifiers basically).
>
> I am not sure what you are trying to achieve here, but perhaps you are
> over simplifying your problem?
>
> Jim
>
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Quintin Beukes
> > Sent: Friday, February 10, 2012 1:10 AM
> > To: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Adding a Space Leads to Mismatch
> >
> > I have further simplified the grammer to the following.
> >
> > Changing the "If " to "If" causes a perfectly fine match. Still
> > ALPHANUMERICSPACE is predicted as the input. It results in this error:
> > line 1:3 required (...)+ loop did not match anything at character
> > '<EOF>'
> >
> > It keeps predicting the wrong input. I have read through tons of
> > documents and am not seeing how to fix this whilst keeping
> > ALPHANUMERICSPACE (which is needed to match multiword tokens).
> >
> > grammar DebugA;
> >
> > @members {
> >   public static void main(String[] args) throws Exception {
> >     DebugALexer lex = new DebugALexer(new ANTLRStringStream("If "));
> >     Token token;
> >     while ((token = lex.nextToken())!=null) {
> >       if ("<EOF>".equals(token.getText())) break;
> >       System.out.println("Token: " + token.getType() + "/" +
> > token.getText());
> >     }
> >   }
> > }
> >
> > ruleExpression
> >   : IF NEWLINE?
> >     EOF
> >   ;
> >
> > IF
> >   : 'If';
> >
> > ALPHANUMERICSPACE
> >   : ('a'..'z' | 'A'..'Z' | '0'..'9')+ (' '+ ('a'..'z' | 'A'..'Z' |
> > '0'..'9')+)*
> >   ;
> >
> > WS
> >   : (' '|'\t')+ {skip();}
> >   ;
> >
> > NEWLINE
> >   : '\r'? '\n'
> >   ;
> >
> > Quintin Beukes
> >
> > On Fri, Feb 10, 2012 at 10:17 AM, Quintin Beukes
> > <quintin.beukes at signio.co.za> wrote:
> > > I have tried to skip whitespace and have used tokens. The above
> > > grammar is mostly just in debug state.
> > >
> > > If I can narrow down the problem even further. The lexer keeps
> > > predicting the "If " to be ALPHANUMERICSPACE, so the lexer fails. I
> > > can actually not see why it would even do that, because this string
> > > can never even match ALPHANUMERICSPACE.
> > >
> > > Input:
> > > (If )
> > >
> > > grammar DebugA;
> > >
> > > tokens {
> > >  IF = 'If';
> > >  OB = '(';
> > >  CB = ')';
> > > }
> > >
> > > fieldRules
> > >  : rule
> > >    EOF
> > >  ;
> > >
> > > rule
> > >  : OB ruleExpression CB NEWLINE
> > >  ;
> > >
> > > ruleExpression
> > >  : IF ALPHANUMERIC
> > >  ;
> > >
> > > ALPHANUMERIC
> > >  : ('a'..'z' | 'A'..'Z' | '0'..'9')+
> > >  ;
> > >
> > > ALPHANUMERICSPACE
> > >  : ('a'..'z' | 'A'..'Z' | '0'..'9')+ (' '+ ('a'..'z' | 'A'..'Z' |
> > > '0'..'9')+)*
> > >  ;
> > >
> > > WS
> > >  : (' '|'\t')+ {skip();}
> > >  ;
> > >
> > > NEWLINE
> > >  : '\r'? '\n'
> > >  ;
> > >
> > >
> > > Quintin Beukes
> > >
> > > On Thu, Feb 9, 2012 at 9:30 PM, Jim Idle <jimi at temporal-wave.com>
> > wrote:
> > >> Don't use 'strings' in your parser, create real tokens and list the
> > >> keywords and punctuation in the lexer before the generic rule. Also,
> > >> it does not look like you need the spaces, so try skipping them:
> > >>
> > >> LPAREN: '(' ;
> > >> ...
> > >> KEYWORD: 'keyword';
> > >> ....
> > >> ALPHANUMERICSPACE: 'A'..'Z'+ ... etc
> > >>
> > >> WS: (' '|'\t')+ { skip(); } ;  // Then remove WS refs in your parser
> > >>
> > >>
> > >> Jim
> > >>
> > >>> -----Original Message-----
> > >>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > >>> bounces at antlr.org] On Behalf Of Quintin Beukes
> > >>> Sent: Thursday, February 09, 2012 11:20 AM
> > >>> To: antlr-interest at antlr.org
> > >>> Subject: Re: [antlr-interest] Adding a Space Leads to Mismatch
> > >>>
> > >>> I debugged the Lexer, and it seems that it's predictions for the
> > >>> next token always seems to match against ALPHANUMERICSPACE.
> > >>>
> > >>> How can I resolve such a prediction error? Even if just pointing me
> > >>> to the wiki.
> > >>>
> > >>> thanks,
> > >>> Quintin Beukes
> > >>>
> > >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > >>> Unsubscribe:
> > >>> http://www.antlr.org/mailman/options/antlr-interest/your-
> > >>> email-address
> > >>
> > >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > >> Unsubscribe:
> > >> http://www.antlr.org/mailman/options/antlr-interest/your-email-
> > addres
> > >> s
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address