[antlr-interest] Fwd: Fwd: Rule precedence works differently when using a predicate?

Thu Oct 27 12:22:54 PDT 2011

Apologies Jim, forgot to send to the list...

---------- Forwarded message ----------
From: Bart Kiers <bkiers at gmail.com>
Date: Thu, Oct 27, 2011 at 9:21 PM
Subject: Re: [antlr-interest] Fwd: Rule precedence works differently when
using a predicate?
To: Jim Idle <jimi at temporal-wave.com>

On Thu, Oct 27, 2011 at 8:54 PM, Jim Idle <jimi at temporal-wave.com> wrote:

> As I said earlier you need more predicates:
>
>
Sorry Jim, I did not know you replied to my message below before.

> But you also need to not use .+, which essentially match anything anyway
> once it is triggered.
>

Err, no, not with a predicate, AFAIK (see the rule ANY_EXEPT_B in my example
below which does not match anything).

> Try something like this.
> fragment KEY : ;
>
> ANY
>   : {!test()}?=> 'KEY')
>   | ({test()}?=> . )
>   ;
>
>
> But once you take out .+ , then it might just work as it was anyway.
>
> Jim
>

Thanks for your suggestion, but I know how to make it work. My question was
more about why, when two rules match the same amount of characters, the rule
later defined in the grammar is used to create a token.
Let me give another example grammar:

grammar T;

@parser::members {
  public static void main(String[] args) throws Exception {
    TLexer lexer = new TLexer(new ANTLRStringStream("aaaBaa"));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

@lexer::members {
  private boolean noBAhead() {
    return input.LA(1) != 'B';
  }
}

parse
  :  (t=. {System.out.printf("\%-15s \%s\n", tokenNames[$t.type],
$t.text);})+ EOF
  ;

MANY_A
  :  'a'+
  ;

B
  :  'B'
  ;

ANY_EXEPT_B
  :  ({noBAhead()}?=> . )+
  ;

If you run the TParser class, you will see the following output when parsing
"aaaBaa":

ANY_EXEPT_B     aaa
B               B
ANY_EXEPT_B     aa

I.e., although the rule MANY_A also matches both "aaa" and "aa", ANY_EXEPT_B
matches them where I thought the rule defined first (MANY_A) would match
them.

Regards,

Bart.

> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Bart Kiers
> > Sent: Thursday, October 27, 2011 10:56 AM
> > To: antlr-interest at antlr.org interest
> > Subject: [antlr-interest] Fwd: Rule precedence works differently when
> > using a predicate?
> >
> > Just a little bump, in case it got buried under some of the newer
> > posts.
> > And in case my previous grammar wasn't entirely clear, the following
> > grammar:
> >
> > grammar T;
> >
> > @lexer::members {
> >   private boolean test() {
> >     return true;
> >   }
> > }
> >
> > parse
> >   :  KEY EOF
> >   ;
> >
> > KEY
> >   :  'key'
> >   ;
> >
> > ANY
> >   :  ({test()}?=> . )+
> >   ;
> >
> >
> > with the test class:
> >
> > import org.antlr.runtime.*;
> >
> > public class Main {
> >   public static void main(String[] args) throws Exception {
> >     TLexer lexer = new TLexer(new ANTLRStringStream("key"));
> >     TParser parser = new TParser(new CommonTokenStream(lexer));
> >     parser.parse();
> >   }
> > }
> >
> >
> > Produces the following error:
> >
> > line 1:0 mismatched input 'key' expecting KEY
> >
> >
> > In other words, 'key' is being tokenized as ANY instead of KEY.
> > Is this expected behavior or a bug? And if it's expected behavior,
> > could someone point me to the documentation (book) or wiki-link that
> > explains this?
> >
> > Cheers & regards,
> >
> > Bart.
> >
> > -------------------
> >
> > From: Bart Kiers <bkiers at gmail.com>
> > Date: Mon, Oct 24, 2011 at 11:46 AM
> > Subject: Rule precedence works differently when using a predicate?
> > To: "antlr-interest at antlr.org interest" <antlr-interest at antlr.org>
> >
> >
> > Hi all,
> >
> > As I understand it, ANTLR's lexer matches rules from top to bottom in
> > the .g grammar file and when two rules match the same number of
> > characters, the one that is defined first has precedence over the later
> > one(s).
> >
> > However, take the following grammar:
> >
> > grammar T;
> >
> > @lexer::members {
> >   private boolean test() {
> >     return true;
> >   }
> > }
> >
> > parse
> >   :  (t=. {System.out.println(tokenNames[$t.type] + " :: " +
> > $t.text);})* EOF
> >   ;
> >
> > KEY
> >   :  'key'
> >   ;
> >
> > ANY
> >   :  ({test()}?=> . )+
> >   ;
> >
> >
> > And the test class:"
> >
> > import org.antlr.runtime.*;
> >
> >
> > public class Main {
> >   public static void main(String[] args) throws Exception {
> >     TLexer lexer = new TLexer(new ANTLRStringStream("key"));
> >     TParser parser = new TParser(new CommonTokenStream(lexer));
> >     parser.parse();
> >   }
> > }
> >
> >
> > I'd expected "KEY :: key" to be printed to the console, however, "ANY
> > :: key"
> > is printed instead. So the last rule is matched, while the KEY rule
> > also matches the same input and is defined before ANY. Why?
> >
> > Kind regards,
> >
> > Bart.
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>