[antlr-interest] Fwd: Rule precedence works differently when using a predicate?
Bart Kiers
bkiers at gmail.com
Thu Oct 27 14:02:55 PDT 2011
Hi Jim, others,
Sorry, but I'd appreciate it if you (or someone else) could answer my
question with a bit more detail because I really don't understand you (Jim).
You say `.+` matches forever, but in my example, there is a predicate in
front of the `.` causing it _not_ to match forever as you can see yourself.
The input "aaaBaa" is tokenized into 3 different tokens: "aaa", "B" and "aa"
and _not_ into one single token by the rule that has the `.+` and the
predicate in it. Your last comment suggests to me that you imply that
"aaaBaa" will be tokenized as a single token (which, again, is not the
case).
My question therefor remains the same: why are "aaa" and "aa" from the input
"aaaBaa" being tokenized as ANY_EXEPT_B instead of MANY_A, where MANY_A is
defined before ANY_EXEPT_B and MANY_A matches exactly the same amount of
characters as ANY_EXEPT_B does?
To me, it's as if input "while" would be matched by the ID rule instead of
the WHILE rule in:
WHILE : 'while';
ID : 'a'..'z'+;
(which is not the case, of course!)
Regards,
Bart.
On Thu, Oct 27, 2011 at 10:34 PM, Jim Idle <jimi at temporal-wave.com> wrote:
> .+ matches forever
>
>
>
> Jim
>
>
>
> *From:* Bart Kiers [mailto:bkiers at gmail.com]
> *Sent:* Thursday, October 27, 2011 12:22 PM
> *To:* Jim Idle
> *Subject:* Re: [antlr-interest] Fwd: Rule precedence works differently
> when using a predicate?
>
>
>
> On Thu, Oct 27, 2011 at 8:54 PM, Jim Idle <jimi at temporal-wave.com> wrote:
>
> As I said earlier you need more predicates:
>
>
>
> Sorry Jim, I did not know you replied to my message below before.
>
>
>
>
>
> But you also need to not use .+, which essentially match anything anyway
> once it is triggered.
>
>
>
> Err, no, not with a predicate, AFAIK (see the rule ANY_EXEPT_B in my
> example below which does not match anything).
>
>
>
>
>
> Try something like this.
> fragment KEY : ;
>
> ANY
> : {!test()}?=> 'KEY')
> | ({test()}?=> . )
> ;
>
>
> But once you take out .+ , then it might just work as it was anyway.
>
> Jim
>
>
>
> Thanks for your suggestion, but I know how to make it work. My question was
> more about why, when two rules match the same amount of characters, the rule
> later defined in the grammar is used to create a token.
>
> Let me give another example grammar:
>
>
>
> grammar T;
>
>
>
> @parser::members {
>
> public static void main(String[] args) throws Exception {
>
> TLexer lexer = new TLexer(new ANTLRStringStream("aaaBaa"));
>
> TParser parser = new TParser(new CommonTokenStream(lexer));
>
> parser.parse();
>
> }
>
> }
>
>
>
> @lexer::members {
>
> private boolean noBAhead() {
>
> return input.LA(1) != 'B';
>
> }
>
> }
>
>
>
> parse
>
> : (t=. {System.out.printf("\%-15s \%s\n", tokenNames[$t.type],
> $t.text);})+ EOF
>
> ;
>
>
>
> MANY_A
>
> : 'a'+
>
> ;
>
>
>
> B
>
> : 'B'
>
> ;
>
>
>
> ANY_EXEPT_B
>
> : ({noBAhead()}?=> . )+
>
> ;
>
>
>
> If you run the TParser class, you will see the following output when
> parsing "aaaBaa":
>
>
>
> ANY_EXEPT_B aaa
>
> B B
>
> ANY_EXEPT_B aa
>
>
>
> I.e., although the rule MANY_A also matches both "aaa" and
> "aa", ANY_EXEPT_B matches them where I thought the rule defined first
> (MANY_A) would match them.
>
>
>
> Regards,
>
>
>
> Bart.
>
>
>
>
>
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Bart Kiers
> > Sent: Thursday, October 27, 2011 10:56 AM
> > To: antlr-interest at antlr.org interest
> > Subject: [antlr-interest] Fwd: Rule precedence works differently when
>
> > using a predicate?
> >
> > Just a little bump, in case it got buried under some of the newer
> > posts.
> > And in case my previous grammar wasn't entirely clear, the following
> > grammar:
> >
> > grammar T;
> >
> > @lexer::members {
> > private boolean test() {
> > return true;
> > }
> > }
> >
> > parse
> > : KEY EOF
> > ;
> >
> > KEY
> > : 'key'
> > ;
> >
> > ANY
> > : ({test()}?=> . )+
> > ;
> >
> >
> > with the test class:
> >
> > import org.antlr.runtime.*;
> >
> > public class Main {
> > public static void main(String[] args) throws Exception {
> > TLexer lexer = new TLexer(new ANTLRStringStream("key"));
> > TParser parser = new TParser(new CommonTokenStream(lexer));
> > parser.parse();
> > }
> > }
> >
> >
> > Produces the following error:
> >
> > line 1:0 mismatched input 'key' expecting KEY
> >
> >
> > In other words, 'key' is being tokenized as ANY instead of KEY.
> > Is this expected behavior or a bug? And if it's expected behavior,
> > could someone point me to the documentation (book) or wiki-link that
> > explains this?
> >
> > Cheers & regards,
> >
> > Bart.
> >
> > -------------------
> >
> > From: Bart Kiers <bkiers at gmail.com>
> > Date: Mon, Oct 24, 2011 at 11:46 AM
> > Subject: Rule precedence works differently when using a predicate?
> > To: "antlr-interest at antlr.org interest" <antlr-interest at antlr.org>
> >
> >
> > Hi all,
> >
> > As I understand it, ANTLR's lexer matches rules from top to bottom in
> > the .g grammar file and when two rules match the same number of
> > characters, the one that is defined first has precedence over the later
> > one(s).
> >
> > However, take the following grammar:
> >
> > grammar T;
> >
> > @lexer::members {
> > private boolean test() {
> > return true;
> > }
> > }
> >
> > parse
> > : (t=. {System.out.println(tokenNames[$t.type] + " :: " +
> > $t.text);})* EOF
> > ;
> >
> > KEY
> > : 'key'
> > ;
> >
> > ANY
> > : ({test()}?=> . )+
> > ;
> >
> >
> > And the test class:"
> >
> > import org.antlr.runtime.*;
> >
> >
> > public class Main {
> > public static void main(String[] args) throws Exception {
> > TLexer lexer = new TLexer(new ANTLRStringStream("key"));
> > TParser parser = new TParser(new CommonTokenStream(lexer));
> > parser.parse();
> > }
> > }
> >
> >
> > I'd expected "KEY :: key" to be printed to the console, however, "ANY
> > :: key"
> > is printed instead. So the last rule is matched, while the KEY rule
> > also matches the same input and is defined before ANY. Why?
> >
> > Kind regards,
> >
> > Bart.
> >
>
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>
More information about the antlr-interest
mailing list