[antlr-interest] Fwd: Rule precedence works differently when using a predicate?

Bart Kiers bkiers at gmail.com
Thu Oct 27 14:02:55 PDT 2011


Hi Jim, others,

Sorry, but I'd appreciate it if you (or someone else) could answer my
question with a bit more detail because I really don't understand you (Jim).

You say `.+` matches forever, but in my example, there is a predicate in
front of the `.` causing it _not_ to match forever as you can see yourself.
The input "aaaBaa" is tokenized into 3 different tokens: "aaa", "B" and "aa"
and _not_ into one single token by the rule that has the `.+` and the
predicate in it. Your last comment suggests to me that you imply that
"aaaBaa" will be tokenized as a single token (which, again, is not the
case).

My question therefor remains the same: why are "aaa" and "aa" from the input
"aaaBaa" being tokenized as ANY_EXEPT_B instead of MANY_A, where MANY_A is
defined  before ANY_EXEPT_B and MANY_A matches exactly the same amount of
characters as ANY_EXEPT_B does?

To me, it's as if input "while" would be matched by the ID rule instead of
the WHILE rule in:

WHILE : 'while';
ID : 'a'..'z'+;

(which is not the case, of course!)

Regards,

Bart.


On Thu, Oct 27, 2011 at 10:34 PM, Jim Idle <jimi at temporal-wave.com> wrote:

> .+ matches forever
>
>
>
> Jim
>
>
>
> *From:* Bart Kiers [mailto:bkiers at gmail.com]
> *Sent:* Thursday, October 27, 2011 12:22 PM
> *To:* Jim Idle
> *Subject:* Re: [antlr-interest] Fwd: Rule precedence works differently
> when using a predicate?
>
>
>
> On Thu, Oct 27, 2011 at 8:54 PM, Jim Idle <jimi at temporal-wave.com> wrote:
>
> As I said earlier you need more predicates:
>
>
>
> Sorry Jim, I did not know you replied to my message below before.
>
>
>
>
>
> But you also need to not use .+, which essentially match anything anyway
> once it is triggered.
>
>
>
> Err, no, not with a predicate, AFAIK (see the rule ANY_EXEPT_B in my
> example below which does not match anything).
>
>
>
>
>
> Try something like this.
> fragment KEY : ;
>
> ANY
>   : {!test()}?=> 'KEY')
>   | ({test()}?=> . )
>   ;
>
>
> But once you take out .+ , then it might just work as it was anyway.
>
> Jim
>
>
>
> Thanks for your suggestion, but I know how to make it work. My question was
> more about why, when two rules match the same amount of characters, the rule
> later defined in the grammar is used to create a token.
>
> Let me give another example grammar:
>
>
>
> grammar T;
>
>
>
> @parser::members {
>
>   public static void main(String[] args) throws Exception {
>
>     TLexer lexer = new TLexer(new ANTLRStringStream("aaaBaa"));
>
>     TParser parser = new TParser(new CommonTokenStream(lexer));
>
>     parser.parse();
>
>   }
>
> }
>
>
>
> @lexer::members {
>
>   private boolean noBAhead() {
>
>     return input.LA(1) != 'B';
>
>   }
>
> }
>
>
>
> parse
>
>   :  (t=. {System.out.printf("\%-15s \%s\n", tokenNames[$t.type],
> $t.text);})+ EOF
>
>   ;
>
>
>
> MANY_A
>
>   :  'a'+
>
>   ;
>
>
>
> B
>
>   :  'B'
>
>   ;
>
>
>
> ANY_EXEPT_B
>
>   :  ({noBAhead()}?=> . )+
>
>   ;
>
>
>
> If you run the TParser class, you will see the following output when
> parsing "aaaBaa":
>
>
>
> ANY_EXEPT_B     aaa
>
> B               B
>
> ANY_EXEPT_B     aa
>
>
>
> I.e., although the rule MANY_A also matches both "aaa" and
> "aa", ANY_EXEPT_B matches them where I thought the rule defined first
> (MANY_A) would match them.
>
>
>
> Regards,
>
>
>
> Bart.
>
>
>
>
>
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Bart Kiers
> > Sent: Thursday, October 27, 2011 10:56 AM
> > To: antlr-interest at antlr.org interest
> > Subject: [antlr-interest] Fwd: Rule precedence works differently when
>
> > using a predicate?
> >
> > Just a little bump, in case it got buried under some of the newer
> > posts.
> > And in case my previous grammar wasn't entirely clear, the following
> > grammar:
> >
> > grammar T;
> >
> > @lexer::members {
> >   private boolean test() {
> >     return true;
> >   }
> > }
> >
> > parse
> >   :  KEY EOF
> >   ;
> >
> > KEY
> >   :  'key'
> >   ;
> >
> > ANY
> >   :  ({test()}?=> . )+
> >   ;
> >
> >
> > with the test class:
> >
> > import org.antlr.runtime.*;
> >
> > public class Main {
> >   public static void main(String[] args) throws Exception {
> >     TLexer lexer = new TLexer(new ANTLRStringStream("key"));
> >     TParser parser = new TParser(new CommonTokenStream(lexer));
> >     parser.parse();
> >   }
> > }
> >
> >
> > Produces the following error:
> >
> > line 1:0 mismatched input 'key' expecting KEY
> >
> >
> > In other words, 'key' is being tokenized as ANY instead of KEY.
> > Is this expected behavior or a bug? And if it's expected behavior,
> > could someone point me to the documentation (book) or wiki-link that
> > explains this?
> >
> > Cheers & regards,
> >
> > Bart.
> >
> > -------------------
> >
> > From: Bart Kiers <bkiers at gmail.com>
> > Date: Mon, Oct 24, 2011 at 11:46 AM
> > Subject: Rule precedence works differently when using a predicate?
> > To: "antlr-interest at antlr.org interest" <antlr-interest at antlr.org>
> >
> >
> > Hi all,
> >
> > As I understand it, ANTLR's lexer matches rules from top to bottom in
> > the .g grammar file and when two rules match the same number of
> > characters, the one that is defined first has precedence over the later
> > one(s).
> >
> > However, take the following grammar:
> >
> > grammar T;
> >
> > @lexer::members {
> >   private boolean test() {
> >     return true;
> >   }
> > }
> >
> > parse
> >   :  (t=. {System.out.println(tokenNames[$t.type] + " :: " +
> > $t.text);})* EOF
> >   ;
> >
> > KEY
> >   :  'key'
> >   ;
> >
> > ANY
> >   :  ({test()}?=> . )+
> >   ;
> >
> >
> > And the test class:"
> >
> > import org.antlr.runtime.*;
> >
> >
> > public class Main {
> >   public static void main(String[] args) throws Exception {
> >     TLexer lexer = new TLexer(new ANTLRStringStream("key"));
> >     TParser parser = new TParser(new CommonTokenStream(lexer));
> >     parser.parse();
> >   }
> > }
> >
> >
> > I'd expected "KEY :: key" to be printed to the console, however, "ANY
> > :: key"
> > is printed instead. So the last rule is matched, while the KEY rule
> > also matches the same input and is defined before ANY. Why?
> >
> > Kind regards,
> >
> > Bart.
> >
>
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>


More information about the antlr-interest mailing list