From kferrio at gmail.com Fri Jan 1 11:10:54 2010 From: kferrio at gmail.com (Kyle Ferrio) Date: Fri, 1 Jan 2010 12:10:54 -0700 Subject: [antlr-interest] Repost: ANTLRworks: Why do these rules behave differently in the embedded interpreter? Message-ID: <4608cec11001011110w6f5d7401pc8f0f9a5a730e566@mail.gmail.com> Hi, I originally posted the question below on 13 December. I'm guessing I didn't get any replies because it rolled off the end of everyone's inbox during the holiday seasons. So please excuse the repost; I'd be grateful if someone could tell me whether I'm on the right track. Since posting this question, I have observed similar (not identical) behavior in the ANTLR IDE for Eclipse. My guess (please confirm or debunk) is that the built-in interpreters build the concrete syntax tree by (correctly) pursuing the first viable alternative at each decision point but (unfortunately) not necessarily rewinding the input stream upon encountering an exception. Since posting this question, at least one other person has independently encountered the same problem, in connection with Scott Stanchfield's excellent ANTLR 3 video tutorials [ http://javadude.com/articles/antlr3xtut/index.html ]. I've been using ANTLR for a little over a year, almost exclusively by running the ANTLR tool from teh command line. I'm just a CLI guy. So I'm encountering questions with ANTLRworks perhaps later than I should. Now, here's my previous post, with new comments indicated in square brackets: This question is so rudimentary that I am almost embarrassed to ask. But since I almost never try to use ANTLRWorks for my parsers, I'll risk injuring my pride in exchange for learning something. If I paste the Expr.g *verbatim* from http://www.antlr.org/works/help/tutorial/content/Expr.g into ANTLRWorks 1.3.1 and feed it the following test input: 3+1 3-1 both run (via the Run menu) fine and produce the expected numerical outputs. But for the same test input, the ANTLRWorks interpreter produces the expected parse tree for only 3+1 and gives a MisMatchedTokenException on the '-' in 3-1. If I reverse the '+' and '-' alternatives in rule expr, the results are also reversed: it's the second alternative that goes bad in the ANTLRWorks interpreter. Thinking this might have something to do with the embedded actions which the interpreter does not understand, I stripped them all out. That leaves us with the following rule, for which the interpreter runs without error on our test input: expr : multExpr ( ( '+' multExpr | '-' multExpr ) )* ; [This is potentially ambiguous. Does a token bind more tightly to another token, or to the binary operator '|' for alternatives? Yes, we know the official ANTLR answer, but I'm questioning my understanding of the specific implementation embodied in ANTLRworks. See next rule.] So I figured [maybe wrongly?] I was right about actions causing problems. But wait. Let's dig deeper. This second rule expr : multExpr ( ( '+' multExpr ) | ( '-' multExpr ) )* ; works in the interpreter as expected for the first alternative (used for 3+1) but produces a MisMatchedTokenException for the second alternative (used for 3-1). And better yet, this third rule expr : multExpr ( ( ( '+' multExpr ) | ( '-' multExpr ) ) )* ; works great in the interpreter for both 3+1 and 3-1, just like the first rule does. All three rules actually run (from the Run menu) as expected. Of course, running them isn't very interesting with the actions stripped out, but they do run without error. So I suspect that they would all produce equally viable parsers outside ANTLRWorks, but I have not checked. Have I stumbled onto an issue with the interpreter embedded in ANTLRWorks, or have I done something silly? (Or both?) Thanks [and Happy New Year], Kyle From jimi at temporal-wave.com Fri Jan 1 11:20:01 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Fri, 01 Jan 2010 11:20:01 -0800 Subject: [antlr-interest] Repost: ANTLRworks: Why do these rules behave differently in the embedded interpreter? In-Reply-To: <4608cec11001011110w6f5d7401pc8f0f9a5a730e566@mail.gmail.com> Message-ID: <02af690b17664b429cb288d6615f2638@temporal-wave.com> The interpreter is just a quick testing device and is easily fooled by grammar rules, use the debugger and not the interpreter and all will be fine. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Kyle Ferrio > Sent: Friday, January 01, 2010 11:11 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Repost: ANTLRworks: Why do these rules behave > differently in the embedded interpreter? > > Hi, > > I originally posted the question below on 13 December. I'm guessing I > didn't get any replies because it rolled off the end of everyone's > inbox during the holiday seasons. So please excuse the repost; I'd be > grateful if someone could tell me whether I'm on the right track. > Since posting this question, I have observed similar (not identical) > behavior in the ANTLR IDE for Eclipse. My guess (please confirm or > debunk) is that the built-in interpreters build the concrete syntax > tree by (correctly) pursuing the first viable alternative at each > decision point but (unfortunately) not necessarily rewinding the input > stream upon encountering an exception. Since posting this question, > at least one other person has independently encountered the same > problem, in connection with Scott Stanchfield's excellent ANTLR 3 > video tutorials [ http://javadude.com/articles/antlr3xtut/index.html > ]. I've been using ANTLR for a little over a year, almost exclusively > by running the ANTLR tool from teh command line. I'm just a CLI guy. > So I'm encountering questions with ANTLRworks perhaps later than I > should. > > Now, here's my previous post, with new comments indicated in square > brackets: > > This question is so rudimentary that I am almost embarrassed to ask. > But since I almost never try to use ANTLRWorks for my parsers, I'll > risk injuring my pride in exchange for learning something. > > If I paste the Expr.g *verbatim* from > http://www.antlr.org/works/help/tutorial/content/Expr.g into > ANTLRWorks 1.3.1 and feed it the following test input: > > 3+1 > 3-1 > > both run (via the Run menu) fine and produce the expected numerical > outputs. But for the same test input, the ANTLRWorks interpreter > produces the expected parse tree for only 3+1 and gives a > MisMatchedTokenException on the '-' in 3-1. If I reverse the '+' and > '-' alternatives in rule expr, the results are also reversed: it's the > second alternative that goes bad in the ANTLRWorks interpreter. > > Thinking this might have something to do with the embedded actions > which the interpreter does not understand, I stripped them all out. > That leaves us with the following rule, for which the interpreter runs > without error on our test input: > > expr > : multExpr ( ( '+' multExpr | '-' multExpr ) )* > ; > > [This is potentially ambiguous. Does a token bind more tightly to > another token, or to the binary operator '|' for alternatives? Yes, > we know the official ANTLR answer, but I'm questioning my > understanding of the specific implementation embodied in ANTLRworks. > See next rule.] > > So I figured [maybe wrongly?] I was right about actions causing > problems. But wait. Let's dig deeper. This second rule > > expr > : multExpr ( ( '+' multExpr ) | ( '-' multExpr ) )* > ; > > works in the interpreter as expected for the first alternative (used > for 3+1) but produces a MisMatchedTokenException for the second > alternative (used for 3-1). > > And better yet, this third rule > > expr > : multExpr ( ( ( '+' multExpr ) | ( '-' multExpr ) ) )* > ; > > works great in the interpreter for both 3+1 and 3-1, just like the > first rule does. > > All three rules actually run (from the Run menu) as expected. Of > course, running them isn't very interesting with the actions stripped > out, but they do run without error. So I suspect that they would all > produce equally viable parsers outside ANTLRWorks, but I have not > checked. Have I stumbled onto an issue with the interpreter embedded > in ANTLRWorks, or have I done something silly? (Or both?) > > Thanks [and Happy New Year], > Kyle > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From david-sarah at jacaranda.org Fri Jan 1 11:45:22 2010 From: david-sarah at jacaranda.org (David-Sarah Hopwood) Date: Fri, 01 Jan 2010 19:45:22 +0000 Subject: [antlr-interest] Repost: ANTLRworks: Why do these rules behave differently in the embedded interpreter? In-Reply-To: <02af690b17664b429cb288d6615f2638@temporal-wave.com> References: <02af690b17664b429cb288d6615f2638@temporal-wave.com> Message-ID: <4B3E50D2.6030301@jacaranda.org> Jim Idle wrote: > The interpreter is just a quick testing device and is easily fooled by grammar rules, use the debugger and not the interpreter and all will be fine. Yes, but what Kyle pointed out seems like an obvious bug in the interpreter in a case that it is supposed to be able to handle. Either bugs like this should be fixed, or there is no point in having the interpreter and all and it should be removed, with its functionality being replaced by the debugger. >> -----Original Message----- >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >> bounces at antlr.org] On Behalf Of Kyle Ferrio >> Sent: Friday, January 01, 2010 11:11 AM >> To: antlr-interest at antlr.org >> Subject: [antlr-interest] Repost: ANTLRworks: Why do these rules behave >> differently in the embedded interpreter? >> [...] >> That leaves us with the following rule, for which the interpreter runs >> without error on our test input: >> >> expr >> : multExpr ( ( '+' multExpr | '-' multExpr ) )* >> ; >> >> [This is potentially ambiguous. Does a token bind more tightly to >> another token, or to the binary operator '|' for alternatives? Yes, >> we know the official ANTLR answer, but I'm questioning my >> understanding of the specific implementation embodied in ANTLRworks. >> See next rule.] >> >> So I figured [maybe wrongly?] I was right about actions causing >> problems. But wait. Let's dig deeper. This second rule >> >> expr >> : multExpr ( ( '+' multExpr ) | ( '-' multExpr ) )* >> ; >> >> works in the interpreter as expected for the first alternative (used >> for 3+1) but produces a MisMatchedTokenException for the second >> alternative (used for 3-1). >> >> And better yet, this third rule >> >> expr >> : multExpr ( ( ( '+' multExpr ) | ( '-' multExpr ) ) )* >> ; >> >> works great in the interpreter for both 3+1 and 3-1, just like the >> first rule does. -- David-Sarah Hopwood ? http://davidsarah.livejournal.com -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 292 bytes Desc: OpenPGP digital signature Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100101/3721e965/attachment.bin From parrt at cs.usfca.edu Fri Jan 1 12:31:37 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 1 Jan 2010 12:31:37 -0800 Subject: [antlr-interest] Repost: ANTLRworks: Why do these rules behave differently in the embedded interpreter? In-Reply-To: <4B3E50D2.6030301@jacaranda.org> References: <02af690b17664b429cb288d6615f2638@temporal-wave.com> <4B3E50D2.6030301@jacaranda.org> Message-ID: <6C54C56B-42D4-44D2-8994-77157A5BB472@cs.usfca.edu> On Jan 1, 2010, at 11:45 AM, David-Sarah Hopwood wrote: > Jim Idle wrote: >> The interpreter is just a quick testing device and is easily fooled by grammar rules, use the debugger and not the interpreter and all will be fine. > > Yes, but what Kyle pointed out seems like an obvious bug in the > interpreter in a case that it is supposed to be able to handle. > > Either bugs like this should be fixed, or there is no point in having > the interpreter and all and it should be removed, with its functionality > being replaced by the debugger. yup. it's on my to-do list. T From parrt at cs.usfca.edu Fri Jan 1 13:02:15 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 1 Jan 2010 13:02:15 -0800 Subject: [antlr-interest] change to list text/html processing Message-ID: Hi, Graham W pointed out that "there has been a sharp increase in HTML-only emails this past year", which yield empty messages with links to an attachment. I think we should auto convert those to text. I.e., should Mailman convert text/html parts to plain text? This conversion happens after MIME attachments have been stripped. I'll be turning on filtering and who knows what else this will mess up. Anybody care if i try this feature out? Ter From kferrio at gmail.com Fri Jan 1 13:58:39 2010 From: kferrio at gmail.com (Kyle Ferrio) Date: Fri, 1 Jan 2010 14:58:39 -0700 Subject: [antlr-interest] Repost: ANTLRworks: Why do these rules behave differently in the embedded interpreter? In-Reply-To: <6C54C56B-42D4-44D2-8994-77157A5BB472@cs.usfca.edu> References: <02af690b17664b429cb288d6615f2638@temporal-wave.com> <4B3E50D2.6030301@jacaranda.org> <6C54C56B-42D4-44D2-8994-77157A5BB472@cs.usfca.edu> Message-ID: <4608cec11001011358n116b0135q1fb8442fb827ecfb@mail.gmail.com> Thanks, folks. David-Sarah makes a good point, which I would make perhaps just a bit differently, and this is I see the interpreter as a kind of learning tool, a crutch if you will. If it goes wonky in a way that makes it obvious to a knave that it's at fault, fine. But if it goes wonky in a way which causes the student to wonder, "Is this me, or is this the tool?" then learning is impeded at a critical juncture, even if the ultimate resolution (assuming a persistent student) does produce deeper insight. Note: I manage a commercial software development group serving highly specialized engineering customers. More than one customer has told me that he or she would rather have "a tool that fails in an obvious way" more often than "a tool which fails in an ambiguous way less often." {Quotes added to assist parsing. :) } It's not about being right or wrong. It's about knowing when you can trust your tools, and when you shouldn't. Kyle On Fri, Jan 1, 2010 at 1:31 PM, Terence Parr wrote: > > On Jan 1, 2010, at 11:45 AM, David-Sarah Hopwood wrote: > >> Jim Idle wrote: >>> The interpreter is just a quick testing device and is easily fooled by grammar rules, use the debugger and not the interpreter and all will be fine. >> >> Yes, but what Kyle pointed out seems like an obvious bug in the >> interpreter in a case that it is supposed to be able to handle. >> >> Either bugs like this should be fixed, or there is no point in having >> the interpreter and all and it should be removed, with its functionality >> being replaced by the debugger. > > yup. it's on my to-do list. > T > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From parrt at cs.usfca.edu Fri Jan 1 16:28:35 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 1 Jan 2010 16:28:35 -0800 Subject: [antlr-interest] Repost: ANTLRworks: Why do these rules behave differently in the embedded interpreter? In-Reply-To: <4608cec11001011358n116b0135q1fb8442fb827ecfb@mail.gmail.com> References: <02af690b17664b429cb288d6615f2638@temporal-wave.com> <4B3E50D2.6030301@jacaranda.org> <6C54C56B-42D4-44D2-8994-77157A5BB472@cs.usfca.edu> <4608cec11001011358n116b0135q1fb8442fb827ecfb@mail.gmail.com> Message-ID: yeah, the interp has never been quite right...just haven't had time to fix. now that book is done (printed just about now I guess) I can get back to ANTLR. Ter On Jan 1, 2010, at 1:58 PM, Kyle Ferrio wrote: > Thanks, folks. > > David-Sarah makes a good point, which I would make perhaps just a bit > differently, and this is I see the interpreter as a kind of learning > tool, a crutch if you will. If it goes wonky in a way that makes it > obvious to a knave that it's at fault, fine. But if it goes wonky in > a way which causes the student to wonder, "Is this me, or is this the > tool?" then learning is impeded at a critical juncture, even if the > ultimate resolution (assuming a persistent student) does produce > deeper insight. > > Note: I manage a commercial software development group serving highly > specialized engineering customers. More than one customer has told me > that he or she would rather have "a tool that fails in an obvious way" > more often than "a tool which fails in an ambiguous way less often." > {Quotes added to assist parsing. :) } It's not about being right or > wrong. It's about knowing when you can trust your tools, and when you > shouldn't. > > Kyle > > > > On Fri, Jan 1, 2010 at 1:31 PM, Terence Parr > wrote: >> >> On Jan 1, 2010, at 11:45 AM, David-Sarah Hopwood wrote: >> >>> Jim Idle wrote: >>>> The interpreter is just a quick testing device and is easily >>>> fooled by grammar rules, use the debugger and not the interpreter >>>> and all will be fine. >>> >>> Yes, but what Kyle pointed out seems like an obvious bug in the >>> interpreter in a case that it is supposed to be able to handle. >>> >>> Either bugs like this should be fixed, or there is no point in >>> having >>> the interpreter and all and it should be removed, with its >>> functionality >>> being replaced by the debugger. >> >> yup. it's on my to-do list. >> T >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> From parrt at cs.usfca.edu Fri Jan 1 16:52:54 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 1 Jan 2010 16:52:54 -0800 Subject: [antlr-interest] change to list text/html processing In-Reply-To: References: Message-ID: <51F86334-6BCE-4710-8CB2-C745206023C9@cs.usfca.edu> Ok, I just ran: for f in */*.html; do sed -i 's///g' $f; done to fix all the old port numbers. that helps as we can see the attachments from before I changed it. I'll update mailman too. Thanks to Graham for fnding this and making correct suggestions. Ter On Jan 1, 2010, at 1:02 PM, Terence Parr wrote: > Hi, Graham W pointed out that "there has been a sharp increase in > HTML-only emails this past year", which yield empty messages with > links to an attachment. > > I think we should auto convert those to text. I.e., should Mailman > convert text/html parts to plain text? This conversion happens after > MIME attachments have been stripped. I'll be turning on filtering > and who knows what else this will mess up. Anybody care if i try > this feature out? > > Ter > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From parrt at cs.usfca.edu Fri Jan 1 17:07:13 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 1 Jan 2010 17:07:13 -0800 Subject: [antlr-interest] ok, set mailman to convert text/html to text/plain Message-ID: I'm also trying to send this as HTML by using a bold font. This is MONACO. Ter From parrt2000 at yahoo.com Fri Jan 1 17:13:21 2010 From: parrt2000 at yahoo.com (Terence Parr) Date: Fri, 1 Jan 2010 17:13:21 -0800 (PST) Subject: [antlr-interest] testing from yahoo Message-ID: <769637.13438.qm@web81007.mail.mud.yahoo.com> I hope it sends some complicated stuff to the list. 1. ________________________________ a 2. b 3. c Ter From parrt at cs.usfca.edu Fri Jan 1 17:16:22 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 1 Jan 2010 17:16:22 -0800 Subject: [antlr-interest] images scrubbed out test Message-ID: Hmm...missing my images i think. here's a picture of booboo the kitten: -------------- next part -------------- Ter From parrt at cs.usfca.edu Fri Jan 1 17:17:40 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 1 Jan 2010 17:17:40 -0800 Subject: [antlr-interest] images scrubbed out test In-Reply-To: References: Message-ID: <1AC859E1-A4A1-416D-ADC2-F0CE397B2363@cs.usfca.edu> ah ha! they we're removed. ok, trying again. here's booboo: -------------- next part -------------- A non-text attachment was scrubbed... Name: booboo-kitten.jpg Type: image/jpeg Size: 17021 bytes Desc: not available Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100101/05b14eec/attachment.jpg -------------- next part -------------- Ter On Jan 1, 2010, at 5:16 PM, Terence Parr wrote: > Hmm...missing my images i think. here's a picture of booboo the > kitten: > > > > > Ter > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From parrt at cs.usfca.edu Fri Jan 1 17:23:15 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 1 Jan 2010 17:23:15 -0800 Subject: [antlr-interest] and more attachments (I have to specify mime types to allow) Message-ID: <5377988E-9C02-4072-A67F-B02B3C93431C@cs.usfca.edu> Test java text attachment and PDF (same image again). Ter -------------- next part -------------- A non-text attachment was scrubbed... Name: MyForm.java Type: application/octet-stream Size: 305 bytes Desc: not available Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100101/c33641b1/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: booboo-kitten.pdf Type: application/pdf Size: 19228 bytes Desc: not available Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100101/c33641b1/attachment.pdf From kferrio at gmail.com Fri Jan 1 17:53:24 2010 From: kferrio at gmail.com (Kyle Ferrio) Date: Fri, 1 Jan 2010 18:53:24 -0700 Subject: [antlr-interest] Repost: ANTLRworks: Why do these rules behave differently in the embedded interpreter? In-Reply-To: References: <02af690b17664b429cb288d6615f2638@temporal-wave.com> <4B3E50D2.6030301@jacaranda.org> <6C54C56B-42D4-44D2-8994-77157A5BB472@cs.usfca.edu> <4608cec11001011358n116b0135q1fb8442fb827ecfb@mail.gmail.com> Message-ID: <4608cec11001011753w51e7ad7aoab78301f146fda96@mail.gmail.com> On Fri, Jan 1, 2010 at 5:28 PM, Terence Parr wrote: > yeah, the interp has never been quite right...just haven't had time to > fix. now that book is done (printed just about now I guess) I can get > back to ANTLR. > Ter Cool. Having not looked at the code, I might have guessed that Jean Bovet was the guy to talk to about the AW interp. I'm glad I posted to the list. Yep, looks like your book will ship at just the right time to maximize the number of questions you get during midterms. lol. There is no rest for the creative mind. I had a crazy (read: probably deeply flawed) idea while playing with ANTLRWorks and the ANTLR IDE. As I tried to black-box about what might be going on inside the interp, I thought about how the java-targeted output always ran fine. And so I began to appreciate all over again some of the challenges faced by anyone trying to write a fault-tolerant interpreter. I realized that much of the work is probably redundant with what has already been done for the target codegen. So, I thought, why not just build and run the target? Sure, codegen takes a second, and compiling to bytecode takes another second. So what? Small price for knowing it's right. Ok, but what about drawing concrete syntax trees? No problem, just insert actions. Ok, but what about debugging with single stepping and peeking into state variables? No problem, just insert a callback to the GUI at each decision point. In fact, it might even be possible to make predicates work in such an interp, by either "gating off" the callbacks or just "marking in the debugger" when we're processing a predicate. Sure, an "instrumented parser" may be an ugly way to implement an interp. But if fidelity to the final product is a goal, as in emulators, then speed and beauty may be negotiable. How far out in left field am I? I realize that the objective of this line of reasoning may be to solve a problem outside the intended scope of ANTLRworks. I assumed without justification that the interp would "tell me how my grammar would perform." But that is not at all the same as "being a tool for demonstrating simple cases." So I have no basis for critiquing the interp, and I'm surely not suggesting a course of action. And before I dig a deeper hole for myself, I preemptively apologize for not having time to implement any of this. But maybe there's the germ of a class project in this for someone. Kind Regards, Kyle From antlr at mirality.co.nz Sat Jan 2 02:11:11 2010 From: antlr at mirality.co.nz (Gavin Lambert) Date: Sat, 02 Jan 2010 23:11:11 +1300 Subject: [antlr-interest] Repost: ANTLRworks: Why do these rules behave differently in the embedded interpreter? In-Reply-To: <4608cec11001011753w51e7ad7aoab78301f146fda96@mail.gmail.co m> References: <02af690b17664b429cb288d6615f2638@temporal-wave.com> <4B3E50D2.6030301@jacaranda.org> <6C54C56B-42D4-44D2-8994-77157A5BB472@cs.usfca.edu> <4608cec11001011358n116b0135q1fb8442fb827ecfb@mail.gmail.com> <4608cec11001011753w51e7ad7aoab78301f146fda96@mail.gmail.com> Message-ID: <20100102101120.5DA603418431@www.antlr.org> At 14:53 2/01/2010, Kyle Ferrio wrote: >So, I thought, why not just build and run the target? >Sure, codegen takes a second, and compiling to bytecode takes >another second. So what? Small price for knowing it's right. >Ok, but what about drawing concrete syntax trees? No problem, >just insert actions. Either I'm misinterpreting what you're talking about, or you're describing what the ANTLRWorks debugger already does. >In fact, it might even be possible to make predicates work in >such an interp, by either "gating off" the callbacks or just >"marking in the debugger" when we're processing a predicate. The problem with predicates is that they're arbitrary target language code; ANTLR simply doesn't have enough information to emulate their functionality (for semantic predicates, at least; syntactic predicates could be dealt with correctly). But that's what the Debug Remote feature is for. From kferrio at gmail.com Sat Jan 2 11:25:08 2010 From: kferrio at gmail.com (kferrio at gmail.com) Date: Sat, 2 Jan 2010 19:25:08 +0000 Subject: [antlr-interest] Repost: ANTLRworks: Why do these rules behave differently in the embedded interpreter? Message-ID: <653987599-1262460308-cardhu_decombobulator_blackberry.rim.net-2078986118-@bda428.bisx.prod.on.blackberry> Thanks, Gavin. Sorry to top-post, but I think it might be clearer than multiply interspersed remarks. I think you understood my logic. If anyone is unclear, it's me. I take your point about what the debugger already does, which invites the question 'why does the interp need to be distinct from the debugger?' Your point about remote debugging is also well taken. So maybe my question should be, when would I want the interp? Do you guys really use it? Since I challenged the correctness of the interp, the implication was that it needs fixing. But if it's not the right tool for me in the first place then it does not need fixing. My bad. It's more about expectations (soft requirements) than about correctness. I know of two people in addition to myself who have/had perfectly plausible but perhaps inappropriate expectations of the interpreter. If we are not uniquely misguided then the question may be more than an academic curiosity. I'm sure people who live and breath antlr would not be confused. Occasional users like me may not be so expert. Thanks for setting me straight about the debugger. I'm slow but deliberate. :) Kyle ------Original Message------ From: Gavin Lambert To: Kyle Ferrio To: Terence Parr Cc: ANTLR Interest Mailing List Subject: Re: [antlr-interest] Repost: ANTLRworks: Why do these rules behave differently in the embedded interpreter? Sent: Jan 2, 2010 3:11 AM At 14:53 2/01/2010, Kyle Ferrio wrote: >So, I thought, why not just build and run the target? >Sure, codegen takes a second, and compiling to bytecode takes >another second. So what? Small price for knowing it's right. >Ok, but what about drawing concrete syntax trees? No problem, >just insert actions. Either I'm misinterpreting what you're talking about, or you're describing what the ANTLRWorks debugger already does. >In fact, it might even be possible to make predicates work in >such an interp, by either "gating off" the callbacks or just >"marking in the debugger" when we're processing a predicate. The problem with predicates is that they're arbitrary target language code; ANTLR simply doesn't have enough information to emulate their functionality (for semantic predicates, at least; syntactic predicates could be dealt with correctly). But that's what the Debug Remote feature is for. Sent from my Verizon Wireless BlackBerry From parrt at cs.usfca.edu Mon Jan 4 16:32:35 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Mon, 4 Jan 2010 16:32:35 -0800 Subject: [antlr-interest] Consequence of Bug ANTLR-413 in GUnit In-Reply-To: <200912311445.41271.kaleb.pederson@gmail.com> References: <200912311445.41271.kaleb.pederson@gmail.com> Message-ID: fixed. :) It will go out in next release. thanks, Kaleb. Ter On Dec 31, 2009, at 2:45 PM, Kaleb Pederson wrote: > I just got around to debugging an issue in GUnit's tree walking capabilities only to find that it's caused by a bug I reported previously: > > ANTLR-413. > > The failure of the CommonTreeNodeStream to pass in the adaptor to the tree iterator results in numerous ClassCastException's being thrown in GUnit's tree walking tests. > > Since the fix is: > > - it = new TreeIterator(root); > + it = new TreeIterator(adaptor, root); > > in CommonTreeNodeStream.java... I hope it can be fixed for the next release :). > > Thanks. > > -- > Kaleb Pederson > > Blog - http://kalebpederson.com > Twitter - http://twitter.com/kalebpederson > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From wclodius at los-alamos.net Mon Jan 4 20:49:25 2010 From: wclodius at los-alamos.net (William B. Clodius) Date: Mon, 4 Jan 2010 21:49:25 -0700 Subject: [antlr-interest] Errors associated with target languages Message-ID: I am experimenting with generating parsers and lexers for a complicated grammar using as many available target languages as possible mostly to see how legible the code is as a possible guide to my hobby language syntax. For Java, Python, and C I have no obvious problems. For Delphi I am consistently getting the error (even for a very simple lexer) error(10): internal error: Exprtoken.g : java.lang.IllegalArgumentException: Can't find template actionGate.st; group hierarchy is [Delphi] I have also been experimenting with creating the infrastructure for a target language. I have made a mistake somewhere and am getting a different error for the simple lexer error(10): internal error: Class org.antlr.tool.Grammar has no such attribute: recognizername in template context [outputFile lexer] : java.lang.NoSuchFieldException: recognizername Are there any suggestions as to how to fix these errors? From jp.raven at worldonline.fr Tue Jan 5 06:52:10 2010 From: jp.raven at worldonline.fr (Jean-Pierre LAMBERT) Date: Tue, 05 Jan 2010 15:52:10 +0100 Subject: [antlr-interest] Parser generation takes hours Message-ID: <4B43521A.6000501@worldonline.fr> Hello everybody, I'm currently rewriting a LR parser to be used for ANTLR. As a result, ANTLR works literaly for hours before it outputs errors about my grammar. My work is not finished; I have removed all left-recursions but I still have to do left-factorisations. The problem being that since ANTLR works for hours before I get the errors, it isn't very practical for me to fix the grammar. Do you have any suggestions in this case? What could be done so that ANTLR would take only dozen of minutes? Is there something capital that I missed about ANTLR and LL grammars? How should be written ANTLR rules to avoid such a problem? Thanks in advance, any adice will be welcome. JP From parrt at cs.usfca.edu Tue Jan 5 09:22:24 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Tue, 5 Jan 2010 09:22:24 -0800 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <4B43521A.6000501@worldonline.fr> References: <4B43521A.6000501@worldonline.fr> Message-ID: <7B529C9C-6516-4DD1-8E78-1A8B518BCAD4@cs.usfca.edu> very strange. antlr has a fail-safe so it cannot do that. what command line options do you use? command line or ANTLWorks? Ter On Jan 5, 2010, at 6:52 AM, Jean-Pierre LAMBERT wrote: > Hello everybody, > > I'm currently rewriting a LR parser to be used for ANTLR. As a result, > ANTLR works literaly for hours before it outputs errors about my > grammar. > > My work is not finished; I have removed all left-recursions but I > still > have to do left-factorisations. The problem being that since ANTLR > works > for hours before I get the errors, it isn't very practical for me to > fix > the grammar. > > Do you have any suggestions in this case? What could be done so that > ANTLR would take only dozen of minutes? Is there something capital > that > I missed about ANTLR and LL grammars? How should be written ANTLR > rules > to avoid such a problem? > > Thanks in advance, any adice will be welcome. > > JP > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From jimi at temporal-wave.com Tue Jan 5 10:47:38 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Tue, 05 Jan 2010 10:47:38 -0800 Subject: [antlr-interest] Errors associated with target languages In-Reply-To: Message-ID: <34aa83daaa8b1247b9c3559e4e725358@temporal-wave.com> I don't think that the Delphi target is being maintained to be honest. Perhaps the original author will comment? For your purposes I think that if you looked at C, Java and C#, then you would have all the information you needed as essentially the generated code will follow the same patterns regardless of the language, but the implementation will be oriented toward the target language. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of William B. Clodius > Sent: Monday, January 04, 2010 8:49 PM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Errors associated with target languages > > I am experimenting with generating parsers and lexers for a complicated > grammar using as many available target languages as possible mostly to > see how legible the code is as a possible guide to my hobby language > syntax. For Java, Python, and C I have no obvious problems. For Delphi > I am consistently getting the error (even for a very simple lexer) > > error(10): internal error: Exprtoken.g : > java.lang.IllegalArgumentException: Can't find template actionGate.st; > group hierarchy is [Delphi] > > I have also been experimenting with creating the infrastructure for a > target language. I have made a mistake somewhere and am getting a > different error for the simple lexer > > error(10): internal error: Class org.antlr.tool.Grammar has no such > attribute: recognizername in template context [outputFile lexer] : > java.lang.NoSuchFieldException: recognizername > > Are there any suggestions as to how to fix these errors? > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From jimi at temporal-wave.com Tue Jan 5 11:04:07 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Tue, 05 Jan 2010 11:04:07 -0800 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <7B529C9C-6516-4DD1-8E78-1A8B518BCAD4@cs.usfca.edu> Message-ID: Perhaps you could send us your grammar too? You might find that you just need to comment out one or two rules until you get to reworking them. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Terence Parr > Sent: Tuesday, January 05, 2010 9:22 AM > To: Jean-Pierre LAMBERT > Cc: antlr-interest at antlr.org > Subject: Re: [antlr-interest] Parser generation takes hours > > very strange. antlr has a fail-safe so it cannot do that. what > command line options do you use? command line or ANTLWorks? > Ter > On Jan 5, 2010, at 6:52 AM, Jean-Pierre LAMBERT wrote: > > > Hello everybody, > > > > I'm currently rewriting a LR parser to be used for ANTLR. As a > result, > > ANTLR works literaly for hours before it outputs errors about my > > grammar. > > > > My work is not finished; I have removed all left-recursions but I > > still > > have to do left-factorisations. The problem being that since ANTLR > > works > > for hours before I get the errors, it isn't very practical for me to > > fix > > the grammar. > > > > Do you have any suggestions in this case? What could be done so that > > ANTLR would take only dozen of minutes? Is there something capital > > that > > I missed about ANTLR and LL grammars? How should be written ANTLR > > rules > > to avoid such a problem? > > > > Thanks in advance, any adice will be welcome. > > > > JP > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > > Unsubscribe: http://www.antlr.org/mailman/options/antlr- > interest/your-email-address > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From gokul007 at gmail.com Tue Jan 5 22:42:08 2010 From: gokul007 at gmail.com (Gokulakannan Somasundaram) Date: Wed, 6 Jan 2010 12:12:08 +0530 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <4B43521A.6000501@worldonline.fr> References: <4B43521A.6000501@worldonline.fr> Message-ID: <9362e74e1001052242s192e7ae7u4beef375108e297d@mail.gmail.com> Hi Jean, I faced up with a similar issue, when i tried the migration of a LR parser. But it's definitely because of recursion stuffs. The way i removed is sort of layman stuff, but thought of just informing you. Try to split the grammar into multiple sections(group of rules) and try to add them one-by-one. You don't need to wait till the errors are emitted. As soon as the parser generation takes more than 3-4 mins, just stop the generation. The last section, which resulted in the increase most probably contains the problematic code. Bear with me, if this approach looks very awkward. Thanks, Gokul. On Tue, Jan 5, 2010 at 8:22 PM, Jean-Pierre LAMBERT wrote: > Hello everybody, > > I'm currently rewriting a LR parser to be used for ANTLR. As a result, > ANTLR works literaly for hours before it outputs errors about my grammar. > > My work is not finished; I have removed all left-recursions but I still > have to do left-factorisations. The problem being that since ANTLR works > for hours before I get the errors, it isn't very practical for me to fix > the grammar. > > Do you have any suggestions in this case? What could be done so that > ANTLR would take only dozen of minutes? Is there something capital that > I missed about ANTLR and LL grammars? How should be written ANTLR rules > to avoid such a problem? > > Thanks in advance, any adice will be welcome. > > JP > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From gokul007 at gmail.com Tue Jan 5 22:46:01 2010 From: gokul007 at gmail.com (Gokulakannan Somasundaram) Date: Wed, 6 Jan 2010 12:16:01 +0530 Subject: [antlr-interest] Resetting the Lexer and Parser in C-Target Message-ID: <9362e74e1001052246x5d1394acg3400f424cc5dff3d@mail.gmail.com> Hi, I have a grammar with close to 1000 rules, because of which the size of the parser in C-Target is close to 8k. I was looking at the parser and it has a function pointer for each of my rule. This portion is not going to change for ever. So i was wondering, if there is a way to reset the parser and re-use it, instead of allocating and initializing it from scratch. I am trying to form something more specific to my project. In the meanwhile, i thought of asking, whether there is a easy way to do the same. Thanks, Gokul. From jimi at temporal-wave.com Tue Jan 5 23:46:03 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Tue, 05 Jan 2010 23:46:03 -0800 Subject: [antlr-interest] Resetting the Lexer and Parser in C-Target In-Reply-To: <9362e74e1001052246x5d1394acg3400f424cc5dff3d@mail.gmail.com> Message-ID: <463de4cfabd70245bece05e1894cb50d@temporal-wave.com> Yes, the next release [of the C runtime] generates a reuse() method for all components of the sequence and reuses all memory allocations. This is a big performance win if you have many inputs to parse. Also, the next release has a universal input stream that deals with UTFxx (with or without BOM) and EBCDIC. This release is a good few weeks away yet though and is tied to ANTLR v3 using ANTLR v3 for the various recognizers. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Gokulakannan Somasundaram > Sent: Tuesday, January 05, 2010 10:46 PM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Resetting the Lexer and Parser in C-Target > > Hi, > I have a grammar with close to 1000 rules, because of which the > size of > the parser in C-Target is close to 8k. I was looking at the parser and > it > has a function pointer for each of my rule. This portion is not going > to > change for ever. So i was wondering, if there is a way to reset the > parser > and re-use it, instead of allocating and initializing it from scratch. > I am > trying to form something more specific to my project. In the meanwhile, > i > thought of asking, whether there is a easy way to do the same. > > Thanks, > Gokul. > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From christian.kihm at googlemail.com Wed Jan 6 02:36:01 2010 From: christian.kihm at googlemail.com (Christian Kihm) Date: Wed, 6 Jan 2010 11:36:01 +0100 Subject: [antlr-interest] each keyword allowed as Identifier Message-ID: Hi, I try to parse a log file which probably was never intented to be parsed. It is an log file of an poker client. My problem is that there are nearly no constraints are existing for playernames. A playername could be a sequens of any charactor of the full unicode range. The only contraints are: min length = 4 max length = 12 no leading or trailing white space white spaces in between are allowed, but never more than one in a row Here are some examples: INPUT: Seat 9: The Player ( ($76 in chips) Where the Playername is "The Player (" INPUT: posts small:: posts small blind $2 Where the Playername is "posts small:" I have no glue how to solve this problem. I already tried some stuff I found in the FAQs like: - syncing to the follow set (Article Custom Syntax Error Recovery) which dosnt work if a token of the follow set is also part of the name - non greedy matching ( .+ to match the name) - a list of all tokens in the rule playername which dosnt work because the playername can consist not just of one token but an sequense of tokens Generelly it must be possible because out ther are severeal commercial tools which are able to parse these log files. So I hope somebody of you has an Idea. Thanks and regards, Christian From ttmrichter at gmail.com Wed Jan 6 03:33:23 2010 From: ttmrichter at gmail.com (Michael Richter) Date: Wed, 6 Jan 2010 19:33:23 +0800 Subject: [antlr-interest] Issue with antlrworks 1.3.1 and JDK 1.6 update 17? Message-ID: I did a recent round of upgrading software on my machines (real and virtual) and somewhere in the process I've got ANTLRworks in unusable shape. (I tried reporting this through the antlr.org web site but it doesn't seem to have taken.) On *every* machine I have access to (both real and virtual, running Windows XP or Linux) I get the following pretty nasty behaviour: 1. *java -jar antlrworks.jar* (I can also use javaw on Windows for a similar, more annoying effect.) 2. *The splash screen pops up briefly.* 3. *The "New Document" dialogue replaces it.* 4. I hit "Cancel" (or alternatively press "Esc" on the keyboard). At this point, no matter the platform, no matter what I try, I have a dead executable until I hit Ctrl+C (or, if I used javaw, I kill it in the task manager). I've tried this on Ubuntu 9.04, on Slackware 13.0 (virtualized), on Windows XP (four different machines, one virtualized) and get this behaviour consistently. Whatever's supposed to happen when I cancel the new document dialogue freezes and can only unfreeze through lethal injection of Ctrl+C. (There are, of course, no messages on the console that could tell me what's going on.) The behaviour on Windows after this if I choose "OK" is acceptable. Up comes the wizard for a new project which works normally and, more importantly, can be cancelled and gets me into the ANTLRworks GUI. It's a bit obnoxious having to go that route, but it works. If I choose to use the wizard everything works as expected. The behaviour on Linux is less acceptable. The new project wizard pops up but the text input focus is on ANTLRworks' editor window and CANNOT be put into the wizard at all on any spot. I have to cancel the wizard to get to the main window (which then works as expected). This also happens if I go File -> New from the main window: I simply cannot get text input into any field of the new project wizard. The last time I did anything with ANTLRworks was v1.3.0 using JDK 1.6 update 16. I did not see this behaviour then at all, so something has happened between then and now. Any advice for debugging this further? From jp.raven at worldonline.fr Wed Jan 6 03:44:35 2010 From: jp.raven at worldonline.fr (Jean-Pierre LAMBERT) Date: Wed, 06 Jan 2010 12:44:35 +0100 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <7B529C9C-6516-4DD1-8E78-1A8B518BCAD4@cs.usfca.edu> References: <4B43521A.6000501@worldonline.fr> <7B529C9C-6516-4DD1-8E78-1A8B518BCAD4@cs.usfca.edu> Message-ID: <4B4477A3.7050908@worldonline.fr> I'm using command-line. The last time I used these options but they do not seem to change anything : -report -Xmultithreaded -verbose Originally I did not use any of these options. I was just experimenting, I should remove them by now. Le 05/01/2010 18:22, Terence Parr a ?crit : > very strange. antlr has a fail-safe so it cannot do that. what command > line options do you use? command line or ANTLWorks? > Ter > On Jan 5, 2010, at 6:52 AM, Jean-Pierre LAMBERT wrote: > >> Hello everybody, >> >> I'm currently rewriting a LR parser to be used for ANTLR. As a result, >> ANTLR works literaly for hours before it outputs errors about my grammar. >> >> My work is not finished; I have removed all left-recursions but I still >> have to do left-factorisations. The problem being that since ANTLR works >> for hours before I get the errors, it isn't very practical for me to fix >> the grammar. >> >> Do you have any suggestions in this case? What could be done so that >> ANTLR would take only dozen of minutes? Is there something capital that >> I missed about ANTLR and LL grammars? How should be written ANTLR rules >> to avoid such a problem? >> >> Thanks in advance, any adice will be welcome. >> >> JP >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > > From jp.raven at worldonline.fr Wed Jan 6 03:47:06 2010 From: jp.raven at worldonline.fr (Jean-Pierre LAMBERT) Date: Wed, 06 Jan 2010 12:47:06 +0100 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: References: Message-ID: <4B44783A.8040409@worldonline.fr> Sorry but I'm unable to send you my grammar. My boss doesn't want this grammar to get out of the company. If I'm able to narrow the problem to a small subset of my grammar I may share it with everybody, however. Le 05/01/2010 20:04, Jim Idle a ?crit : > Perhaps you could send us your grammar too? You might find that you just need to comment out one or two rules until you get to reworking them. > > Jim > >> -----Original Message----- >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >> bounces at antlr.org] On Behalf Of Terence Parr >> Sent: Tuesday, January 05, 2010 9:22 AM >> To: Jean-Pierre LAMBERT >> Cc: antlr-interest at antlr.org >> Subject: Re: [antlr-interest] Parser generation takes hours >> >> very strange. antlr has a fail-safe so it cannot do that. what >> command line options do you use? command line or ANTLWorks? >> Ter >> On Jan 5, 2010, at 6:52 AM, Jean-Pierre LAMBERT wrote: >> >>> Hello everybody, >>> >>> I'm currently rewriting a LR parser to be used for ANTLR. As a >> result, >>> ANTLR works literaly for hours before it outputs errors about my >>> grammar. >>> >>> My work is not finished; I have removed all left-recursions but I >>> still >>> have to do left-factorisations. The problem being that since ANTLR >>> works >>> for hours before I get the errors, it isn't very practical for me to >>> fix >>> the grammar. >>> >>> Do you have any suggestions in this case? What could be done so that >>> ANTLR would take only dozen of minutes? Is there something capital >>> that >>> I missed about ANTLR and LL grammars? How should be written ANTLR >>> rules >>> to avoid such a problem? >>> >>> Thanks in advance, any adice will be welcome. >>> >>> JP >>> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>> Unsubscribe: http://www.antlr.org/mailman/options/antlr- >> interest/your-email-address >> >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >> email-address > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > From jp.raven at worldonline.fr Wed Jan 6 03:52:08 2010 From: jp.raven at worldonline.fr (Jean-Pierre LAMBERT) Date: Wed, 06 Jan 2010 12:52:08 +0100 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <9362e74e1001052242s192e7ae7u4beef375108e297d@mail.gmail.com> References: <4B43521A.6000501@worldonline.fr> <9362e74e1001052242s192e7ae7u4beef375108e297d@mail.gmail.com> Message-ID: <4B447968.9070806@worldonline.fr> Thank you for the feedback. If find very interesting that basically we've done the same kind of task (translating a LR parser to ANTLR) and then we encounter the same problem doing it. Furthermore it's very encouraging to know that you could overcome it. I have already started to remove parts of the grammar and the problem is still there. Your advice is very helpful, thanks again. JP Le 06/01/2010 07:42, Gokulakannan Somasundaram a ?crit : > Hi Jean, > I faced up with a similar issue, when i tried the migration > of a LR parser. But it's definitely because of recursion stuffs. The > way i removed is sort of layman stuff, but thought of just informing you. > Try to split the grammar into multiple sections(group of > rules) and try to add them one-by-one. You don't need to wait till the > errors are emitted. As soon as the parser generation takes more than 3-4 > mins, just stop the generation. The last section, which resulted in the > increase most probably contains the problematic code. Bear with me, if > this approach looks very awkward. > > Thanks, > Gokul. > > On Tue, Jan 5, 2010 at 8:22 PM, Jean-Pierre LAMBERT > > wrote: > > Hello everybody, > > I'm currently rewriting a LR parser to be used for ANTLR. As a result, > ANTLR works literaly for hours before it outputs errors about my > grammar. > > My work is not finished; I have removed all left-recursions but I still > have to do left-factorisations. The problem being that since ANTLR works > for hours before I get the errors, it isn't very practical for me to fix > the grammar. > > Do you have any suggestions in this case? What could be done so that > ANTLR would take only dozen of minutes? Is there something capital that > I missed about ANTLR and LL grammars? How should be written ANTLR rules > to avoid such a problem? > > Thanks in advance, any adice will be welcome. > > JP > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > From jp.raven at worldonline.fr Wed Jan 6 06:17:38 2010 From: jp.raven at worldonline.fr (Jean-Pierre LAMBERT) Date: Wed, 06 Jan 2010 15:17:38 +0100 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <4B447968.9070806@worldonline.fr> References: <4B43521A.6000501@worldonline.fr> <9362e74e1001052242s192e7ae7u4beef375108e297d@mail.gmail.com> <4B447968.9070806@worldonline.fr> Message-ID: <4B449B82.5050102@worldonline.fr> After investigating the problem further, it looks like I have rounded up the faulty rules. In my grammar I have four sets of productions who are mutually (indirectly) left-recursive. After removing left-recursion, I have the "3 hours parser generation" problem. If I remove from the grammar any one of these four sets, after removing left-recursion the parser generation takes less than 5 minutes, which is the expected behavior. I will try tackling the other problems of the grammar (namely left factorisation for start) and I will see later if that changes anything when I include back all the four sets of mutually left-recursive rules. Thanks everybody. JP Le 06/01/2010 12:52, Jean-Pierre LAMBERT a ?crit : > I have already started to remove parts of the grammar and the problem is > still there. > > > Le 06/01/2010 07:42, Gokulakannan Somasundaram a ?crit : >> Hi Jean, >> I faced up with a similar issue, when i tried the migration >> of a LR parser. But it's definitely because of recursion stuffs. The >> way i removed is sort of layman stuff, but thought of just informing you. >> Try to split the grammar into multiple sections(group of >> rules) and try to add them one-by-one. You don't need to wait till the >> errors are emitted. As soon as the parser generation takes more than 3-4 >> mins, just stop the generation. The last section, which resulted in the >> increase most probably contains the problematic code. Bear with me, if >> this approach looks very awkward. >> >> Thanks, >> Gokul. >> >> On Tue, Jan 5, 2010 at 8:22 PM, Jean-Pierre LAMBERT >> > wrote: >> >> Hello everybody, >> >> I'm currently rewriting a LR parser to be used for ANTLR. As a result, >> ANTLR works literaly for hours before it outputs errors about my >> grammar. >> >> My work is not finished; I have removed all left-recursions but I still >> have to do left-factorisations. The problem being that since ANTLR works >> for hours before I get the errors, it isn't very practical for me to fix >> the grammar. >> >> Do you have any suggestions in this case? What could be done so that >> ANTLR would take only dozen of minutes? Is there something capital that >> I missed about ANTLR and LL grammars? How should be written ANTLR rules >> to avoid such a problem? >> >> Thanks in advance, any adice will be welcome. >> >> JP >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> >> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > From gokul007 at gmail.com Wed Jan 6 07:59:27 2010 From: gokul007 at gmail.com (Gokulakannan Somasundaram) Date: Wed, 6 Jan 2010 21:29:27 +0530 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <4B449B82.5050102@worldonline.fr> References: <4B43521A.6000501@worldonline.fr> <9362e74e1001052242s192e7ae7u4beef375108e297d@mail.gmail.com> <4B447968.9070806@worldonline.fr> <4B449B82.5050102@worldonline.fr> Message-ID: <9362e74e1001060759i3a58cbccmbad1ae04b971e3c6@mail.gmail.com> Hi JP, One of most tough problem in the migration for me was to resolve the left factoring. I couldn't find any easy way of doing it. If your project is a big one, then definitely the solution for left factoring discussed in the antlr website may not work for you. If your parser is not performance critical, then you can use syntactic predicates heavily and solve the issue. But then the compilation(atleast if you keep a higher value for k) and runtime will be more for that. My approach was approximately like this. if there are a set of rules like a: b|c; b : d f | X; c : d e | Y; I rewrote the rules like this a: d ( b_minus_d | c_minus_d ) | b_without_d | c_without_d; b: d f | X; b_minus_d : f; b_without_d : X; c : d e | Y; c_minus_d : e; c_without_d : Y; This helped me to keep the relevant rules together and with minimal code repetition in the actions. If you find out any other elegant way to resolve left factoring(without using syntactic predicates), please do let me know. Thanks, Gokul. On Wed, Jan 6, 2010 at 7:47 PM, Jean-Pierre LAMBERT wrote: > After investigating the problem further, it looks like I have rounded up > the faulty rules. > > > In my grammar I have four sets of productions who are mutually > (indirectly) left-recursive. After removing left-recursion, I have the > "3 hours parser generation" problem. > > If I remove from the grammar any one of these four sets, after removing > left-recursion the parser generation takes less than 5 minutes, which is > the expected behavior. > > > I will try tackling the other problems of the grammar (namely left > factorisation for start) and I will see later if that changes anything > when I include back all the four sets of mutually left-recursive rules. > > > Thanks everybody. > > > JP > > > > Le 06/01/2010 12:52, Jean-Pierre LAMBERT a ?crit : > > I have already started to remove parts of the grammar and the problem is > > still there. > > > > > > Le 06/01/2010 07:42, Gokulakannan Somasundaram a ?crit : > >> Hi Jean, > >> I faced up with a similar issue, when i tried the migration > >> of a LR parser. But it's definitely because of recursion stuffs. The > >> way i removed is sort of layman stuff, but thought of just informing > you. > >> Try to split the grammar into multiple sections(group of > >> rules) and try to add them one-by-one. You don't need to wait till the > >> errors are emitted. As soon as the parser generation takes more than 3-4 > >> mins, just stop the generation. The last section, which resulted in the > >> increase most probably contains the problematic code. Bear with me, if > >> this approach looks very awkward. > >> > >> Thanks, > >> Gokul. > >> > >> On Tue, Jan 5, 2010 at 8:22 PM, Jean-Pierre LAMBERT > >> > wrote: > >> > >> Hello everybody, > >> > >> I'm currently rewriting a LR parser to be used for ANTLR. As a > result, > >> ANTLR works literaly for hours before it outputs errors about my > >> grammar. > >> > >> My work is not finished; I have removed all left-recursions but I > still > >> have to do left-factorisations. The problem being that since ANTLR > works > >> for hours before I get the errors, it isn't very practical for me > to fix > >> the grammar. > >> > >> Do you have any suggestions in this case? What could be done so > that > >> ANTLR would take only dozen of minutes? Is there something capital > that > >> I missed about ANTLR and LL grammars? How should be written ANTLR > rules > >> to avoid such a problem? > >> > >> Thanks in advance, any adice will be welcome. > >> > >> JP > >> > >> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >> Unsubscribe: > >> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > >> > >> > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From gokul007 at gmail.com Wed Jan 6 08:14:07 2010 From: gokul007 at gmail.com (Gokulakannan Somasundaram) Date: Wed, 6 Jan 2010 21:44:07 +0530 Subject: [antlr-interest] Request for preinclude_c option Message-ID: <9362e74e1001060814k7a28abd3tf1213a25e8bbfe25@mail.gmail.com> Hi Jim, One more request that would help people, who would develop parsers for C++. As you might know, there is a requirement to include C++ Headers(atleast the ones with templates) before the C Headers, in order to avoid lot of cumbersome errors. Currently we have the following options a) to include something before the antlr headers in .h file (preinclude b) to include something after the antlr headers in .h file c) to include something after the headers in the .cpp file So the fourth permutation might help people who develop with C++ and not make the headers heavy. Thanks, Gokul. From jimi at temporal-wave.com Wed Jan 6 08:32:59 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Wed, 06 Jan 2010 08:32:59 -0800 Subject: [antlr-interest] each keyword allowed as Identifier In-Reply-To: Message-ID: <2d115e12737faf4c9cfb151f2d512c43@temporal-wave.com> Please search antlr.markmail.org for using keywords as identifiers and for the word 'poker' as there must now be about 50 people who have written ANTLR parsers for this! If someone would donate a parser to the ANTLR grammar list it would save a lot of people a lot of time, but I suggest it would not save most people money ;-) Anyway: id: ID | MIN | MAX | ...... etc ; The use this instead of ID. I suspect though that these log files are easier to 'parse' in a manual fashion where you lex in context. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Christian Kihm > Sent: Wednesday, January 06, 2010 2:36 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] each keyword allowed as Identifier > > Hi, > > I try to parse a log file which probably was never intented to be > parsed. It is an log file of an poker client. My problem is that there > are nearly no constraints are existing for playernames. > > A playername could be a sequens of any charactor of the full unicode > range. The only contraints are: > > min length = 4 > max length = 12 > no leading or trailing white space > white spaces in between are allowed, but never more than one in a row > > Here are some examples: > > INPUT: > Seat 9: The Player ( ($76 in chips) > > Where the Playername is "The Player (" > > INPUT: > > posts small:: posts small blind $2 > > > Where the Playername is "posts small:" > > > I have no glue how to solve this problem. I already tried some stuff I > found in the FAQs like: > > - syncing to the follow set (Article Custom Syntax Error Recovery) > which dosnt work if a token of the follow set is also part of the name > - non greedy matching ( .+ to match the name) > - a list of all tokens in the rule playername which dosnt work because > the playername can consist not just of one token but an sequense of > tokens > > Generelly it must be possible because out ther are severeal commercial > tools which are able to parse these log files. So I hope somebody of > you has an Idea. > > Thanks and regards, > Christian > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From jimi at temporal-wave.com Wed Jan 6 09:08:32 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Wed, 06 Jan 2010 09:08:32 -0800 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <4B44783A.8040409@worldonline.fr> Message-ID: OK - just try it without any options then and if the behavior changes add back each option in turn and see which one affects it. If you can pin it down a bit, then it can be fixed (assuming that there is a bug here). Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Jean-Pierre LAMBERT > Sent: Wednesday, January 06, 2010 3:47 AM > To: antlr-interest at antlr.org > Subject: Re: [antlr-interest] Parser generation takes hours > > Sorry but I'm unable to send you my grammar. My boss doesn't want this > grammar to get out of the company. > > If I'm able to narrow the problem to a small subset of my grammar I may > share it with everybody, however. > > > Le 05/01/2010 20:04, Jim Idle a ?crit : > > Perhaps you could send us your grammar too? You might find that you > just need to comment out one or two rules until you get to reworking > them. > > > > Jim From jimi at temporal-wave.com Wed Jan 6 09:22:15 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Wed, 06 Jan 2010 09:22:15 -0800 Subject: [antlr-interest] Request for preinclude_c option In-Reply-To: <9362e74e1001060814k7a28abd3tf1213a25e8bbfe25@mail.gmail.com> Message-ID: <10e2f2e0de8ec14dac09a612afa82f66@temporal-wave.com> Guess I am not quite following this - would not using the @header section solve this? All headers should protect themselves against multiple #include of course. I can add an @preinclude easily enough but I don't want to clutter the options unless I must of course. @header is inserted before the #include of the generated header file. Also, I am not sure that you really need to do this. You should place any code using C++ templates and headers etc in external files and create an API that you call from action code. That API should have a header and I can't see that including that header after .h should be a problem. That doesn't mean that there isn't one, just that I am not seeing why. Can you post an example to the list? If @header won't do it and there is a valid reason, then I will certainly add another @option to fix it. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Gokulakannan Somasundaram > Sent: Wednesday, January 06, 2010 8:14 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Request for preinclude_c option > > Hi Jim, > One more request that would help people, who would develop > parsers > for C++. As you might know, there is a requirement to include C++ > Headers(atleast the ones with templates) before the C Headers, in order > to > avoid lot of cumbersome errors. Currently we have the following options > a) to include something before the antlr headers in .h file (preinclude > b) to include something after the antlr headers in .h file > c) to include something after the headers in the .cpp file > > So the fourth permutation might help people who develop with C++ and > not > make the headers heavy. > > Thanks, > Gokul. > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From laurie at holoweb.net Wed Jan 6 11:58:54 2010 From: laurie at holoweb.net (Laurie Harper) Date: Wed, 6 Jan 2010 14:58:54 -0500 Subject: [antlr-interest] tree rewrite: breaking apart subtrees Message-ID: I'm trying to construct a parser/translator that will transform an extended version of a C-like language 'X' into standard 'X'. I can't figure out quite what I need in my tree grammar to get the result I want... For example, I have an input AST that looks something like this: (VARDECL integer (VARIABLE ivar1 (LITERAL 1)) (VARIABLE ivar2 (LITERAL 2)) (VARIABLE ivar3)) (VARDECL integer (VARIABLE ivar4)) I need to rewrite it to look like this: (VARDECL integer (VARIABLE ivar1 (LITERAL 1))) (VARDECL integer (VARIABLE ivar2 (LITERAL 2))) (VARDECL integer (VARIABLE ivar3)) (VARDECL integer (VARIABLE ivar)) My tree grammar contains a rule like this: vars : ^(VARDECL type (^(VARIABLE ID literal?))+) -> ^(VARDECL type)+ ^(VARIABLE ID literal)+; but that's not giving a result that's even close to right :-) I've tried all sorts of variations as I try to puzzle out the tree rewrite syntax, to no avail. Can anyone offer any insight? Thanks, L. From kaleb.pederson at gmail.com Wed Jan 6 12:51:03 2010 From: kaleb.pederson at gmail.com (Kaleb Pederson) Date: Wed, 6 Jan 2010 12:51:03 -0800 Subject: [antlr-interest] Issue with antlrworks 1.3.1 and JDK 1.6 update 17? In-Reply-To: References: Message-ID: On Wed, Jan 6, 2010 at 3:33 AM, Michael Richter wrote: > I did a recent round of upgrading software on my machines (real and virtual) > and somewhere in the process I've got ANTLRworks in unusable shape. ?(I > tried reporting this through the antlr.org web site but it doesn't seem to > have taken.) > > On *every* machine I have access to (both real and virtual, running Windows > XP or Linux) I get the following pretty nasty behaviour: [...snip...] > The behaviour on Linux is less acceptable. ?The new project wizard pops up > but the text input focus is on ANTLRworks' editor window and CANNOT be put > into the wizard at all on any spot. ?I have to cancel the wizard to get to > the main window (which then works as expected). My AW preferences were set load the last file on each invocation, which seems to work fine. I changed my preferences to go to use the wizard after which I started seeing some problems. I started up AW, the 'New Document' dialog showed up. I hit Cancel. The UI disappeared but the application kept running. I did a 'kill -QUIT $AW_PID' and received the attached dump (I know Ter's been playing with the mailing list filters and things, so we'll see if it actually goes through). The dump shows that AW is awaiting feedback, but with no GUI present, it will never receive it. This happens with both 1.3 and 1.3.1, although the dump is for the 1.3.1. >?This also happens if I go > File -> New from the main window: I simply cannot get text input into any > field of the new project wizard. I can replicate this behavior on Linux. Does the following workaround work for you: a) Click OK (using an empty grammar name) b) Dismiss the dialog that says you used an empty grammar name c) Left click in the grammar name input field to give it focus d) Now type in the wizard as usual? A related note, I've seen this behavior on many different Java applications, so I'm not sure if it's Java related, or if it's just an error that is easy to make when writing the application using Java. > Any advice for debugging this further? I also tried removing the AW preferences and disabling focus-stealing prevention in my window manager, but neither of those helped either. Looks like a couple of real bugs to me. -- Kaleb Pederson Blog - http://kalebpederson.com Twitter - http://twitter.com/kalebpederson -------------- next part -------------- 2010-01-06 12:22:44 Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.3-b01 mixed mode): "Timer-1" prio=10 tid=0x0000000040e40800 nid=0x69a1 in Object.wait() [0x00007f06eb31b000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00007f0721c6d7c8> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Timer.java:509) - locked <0x00007f0721c6d7c8> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:462) "DestroyJavaVM" prio=10 tid=0x00007f06ec3ec000 nid=0x698d waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Timer-0" daemon prio=10 tid=0x00007f06ec8f7800 nid=0x699f in Object.wait() [0x00007f06eb41c000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00007f072148e720> (a java.util.TaskQueue) at java.util.TimerThread.mainLoop(Timer.java:509) - locked <0x00007f072148e720> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:462) "AWT-XAWT" daemon prio=10 tid=0x00007f06ec2d2000 nid=0x699b runnable [0x00007f06f0288000] java.lang.Thread.State: RUNNABLE at sun.awt.X11.XToolkit.waitForEvents(Native Method) at sun.awt.X11.XToolkit.run(XToolkit.java:548) at sun.awt.X11.XToolkit.run(XToolkit.java:523) at java.lang.Thread.run(Thread.java:619) "Java2D Disposer" daemon prio=10 tid=0x0000000040b04800 nid=0x699a in Object.wait() [0x00007f06f0389000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00007f072226ddf8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x00007f072226ddf8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at sun.java2d.Disposer.run(Disposer.java:125) at java.lang.Thread.run(Thread.java:619) "Low Memory Detector" daemon prio=10 tid=0x0000000040b1c000 nid=0x6998 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x0000000040b19000 nid=0x6997 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x0000000040b16000 nid=0x6996 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x0000000040b14000 nid=0x6995 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x0000000040af6800 nid=0x6994 in Object.wait() [0x00007f06f1fde000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00007f072226e3f0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x00007f072226e3f0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x0000000040aef000 nid=0x6993 in Object.wait() [0x00007f06f20df000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00007f072226e540> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x00007f072226e540> (a java.lang.ref.Reference$Lock) "VM Thread" prio=10 tid=0x0000000040ae8800 nid=0x6992 runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x00000000409cf800 nid=0x698e runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x00000000409d1800 nid=0x698f runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x00000000409d3000 nid=0x6990 runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x00000000409d5000 nid=0x6991 runnable "VM Periodic Task Thread" prio=10 tid=0x0000000040b1e800 nid=0x6999 waiting on condition JNI global references: 1281 Heap PSYoungGen total 18496K, used 13856K [0x00007f07212e0000, 0x00007f0722780000, 0x00007f0735ce0000) eden space 15872K, 70% used [0x00007f07212e0000,0x00007f0721ddf770,0x00007f0722260000) from space 2624K, 98% used [0x00007f0722260000,0x00007f07224e88a0,0x00007f07224f0000) to space 2624K, 0% used [0x00007f07224f0000,0x00007f07224f0000,0x00007f0722780000) PSOldGen total 42240K, used 488K [0x00007f06f7ee0000, 0x00007f06fa820000, 0x00007f07212e0000) object space 42240K, 1% used [0x00007f06f7ee0000,0x00007f06f7f5a000,0x00007f06fa820000) PSPermGen total 21248K, used 17747K [0x00007f06f2ae0000, 0x00007f06f3fa0000, 0x00007f06f7ee0000) object space 21248K, 83% used [0x00007f06f2ae0000,0x00007f06f3c34f10,0x00007f06f3fa0000) From jp.raven at worldonline.fr Wed Jan 6 13:31:39 2010 From: jp.raven at worldonline.fr (Jean-Pierre LAMBERT) Date: Wed, 06 Jan 2010 22:31:39 +0100 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <4B449B82.5050102@worldonline.fr> References: <4B43521A.6000501@worldonline.fr> <9362e74e1001052242s192e7ae7u4beef375108e297d@mail.gmail.com> <4B447968.9070806@worldonline.fr> <4B449B82.5050102@worldonline.fr> Message-ID: <4B45013B.8070600@worldonline.fr> Looks like I finally hit the nail on the head! After doing some crucial left-factorizations, I can put all four sets of mutually left-recursive productions all together, and the parser generation takes only a couple of minutes. It seems even faster than with only three sets without left-factorization. It definitely looks like ANTLR is *very* sensible to left-factorization in rules. In some way, it is quite normal since LL parsers requires it. A big thank you for the help. It was greatly appreciated. JP Le 06/01/2010 15:17, Jean-Pierre LAMBERT a ?crit : > After investigating the problem further, it looks like I have rounded up > the faulty rules. > > > In my grammar I have four sets of productions who are mutually > (indirectly) left-recursive. After removing left-recursion, I have the > "3 hours parser generation" problem. > > If I remove from the grammar any one of these four sets, after removing > left-recursion the parser generation takes less than 5 minutes, which is > the expected behavior. > > > I will try tackling the other problems of the grammar (namely left > factorisation for start) and I will see later if that changes anything > when I include back all the four sets of mutually left-recursive rules. > > > Thanks everybody. > > > JP > > > > Le 06/01/2010 12:52, Jean-Pierre LAMBERT a ?crit : >> I have already started to remove parts of the grammar and the problem is >> still there. >> >> >> Le 06/01/2010 07:42, Gokulakannan Somasundaram a ?crit : >>> Hi Jean, >>> I faced up with a similar issue, when i tried the migration >>> of a LR parser. But it's definitely because of recursion stuffs. The >>> way i removed is sort of layman stuff, but thought of just informing you. >>> Try to split the grammar into multiple sections(group of >>> rules) and try to add them one-by-one. You don't need to wait till the >>> errors are emitted. As soon as the parser generation takes more than 3-4 >>> mins, just stop the generation. The last section, which resulted in the >>> increase most probably contains the problematic code. Bear with me, if >>> this approach looks very awkward. >>> >>> Thanks, >>> Gokul. >>> >>> On Tue, Jan 5, 2010 at 8:22 PM, Jean-Pierre LAMBERT >>> > wrote: >>> >>> Hello everybody, >>> >>> I'm currently rewriting a LR parser to be used for ANTLR. As a result, >>> ANTLR works literaly for hours before it outputs errors about my >>> grammar. >>> >>> My work is not finished; I have removed all left-recursions but I still >>> have to do left-factorisations. The problem being that since ANTLR works >>> for hours before I get the errors, it isn't very practical for me to fix >>> the grammar. >>> >>> Do you have any suggestions in this case? What could be done so that >>> ANTLR would take only dozen of minutes? Is there something capital that >>> I missed about ANTLR and LL grammars? How should be written ANTLR rules >>> to avoid such a problem? >>> >>> Thanks in advance, any adice will be welcome. >>> >>> JP >>> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>> Unsubscribe: >>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >>> >>> >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> >> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > From jp.raven at worldonline.fr Wed Jan 6 13:42:35 2010 From: jp.raven at worldonline.fr (Jean-Pierre LAMBERT) Date: Wed, 06 Jan 2010 22:42:35 +0100 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: References: Message-ID: <4B4503CB.20005@worldonline.fr> Well, now that I succeeded to get passed the problem... I don't know if there is a bug or not. Looking at LL parser theory, it may not be that surprising -- combinatory explosions when building an LL parser who needs more left-factorizations. Besides I'm working on a quite big grammar and that probably plays a role here. ANTLR probably handles such rules quite well on not-so big grammars. Finally, all this mess occured on a non-working grammar. So I have the feeling that it's not that big an issue for the user. Well, one entry in some FAQ for people migrating LR parsers to ANTLR would have done the trick, I'd say. If I have just been reassured that fixing the left-factorization would have solved the problem, I'd simply worked on left-factorizing my grammar and stopped worrying. In absence of any advice on the matter I kind of panicked instead. :-) JP Le 06/01/2010 18:08, Jim Idle a ?crit : > OK - just try it without any options then and if the behavior changes add back each option in turn and see which one affects it. If you can pin it down a bit, then it can be fixed (assuming that there is a bug here). > > Jim > >> -----Original Message----- >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >> bounces at antlr.org] On Behalf Of Jean-Pierre LAMBERT >> Sent: Wednesday, January 06, 2010 3:47 AM >> To: antlr-interest at antlr.org >> Subject: Re: [antlr-interest] Parser generation takes hours >> >> Sorry but I'm unable to send you my grammar. My boss doesn't want this >> grammar to get out of the company. >> >> If I'm able to narrow the problem to a small subset of my grammar I may >> share it with everybody, however. >> >> >> Le 05/01/2010 20:04, Jim Idle a ?crit : >>> Perhaps you could send us your grammar too? You might find that you >> just need to comment out one or two rules until you get to reworking >> them. >>> >>> Jim > > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > From parrt at cs.usfca.edu Wed Jan 6 16:28:05 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Wed, 6 Jan 2010 16:28:05 -0800 Subject: [antlr-interest] printed finally: Language implementation patterns Message-ID: <7275E965-B513-43DB-8E90-338B78F6EE2E@cs.usfca.edu> Hiya, Just to let you know that Language implementation patterns is now available as a physically printed book. hooray! If you would like to put up a review at Amazon (whatever your honest opinion is, good or bad), here's the link: http://tinyurl.com/y8r9kts Apparently it's important to have a lot of different reviews/comments in terms of marketing. Thanks, Terence From bios.bob.frankel at gmail.com Wed Jan 6 17:24:20 2010 From: bios.bob.frankel at gmail.com (Bob Frankel) Date: Wed, 06 Jan 2010 17:24:20 -0800 Subject: [antlr-interest] tracking token position when original file is pre-processed Message-ID: <4B4537C4.3000001@gmail.com> my language has a simple pre-processor that expands text of the form ${} as a first phase of translation; the expanded stream is then input to my ANTLRInputStream, where it proceeds onward to the lexer/parser in the usual fashion. said another way, neither the lexer nor the parser is aware of the ${...} construct. needless to say, character-position information (eg., token start/stop) are relative to the expanded stream and not the original file; this creates an problem, of course, when error indicators are not correctly positioned in the original source file (as i'm doing through some editor integration inside eclipse). is there some pattern and/or (simple!) example that illustrates a technique for managing this situation; is there some way (say) i might embedded the equivalent of #line directives in the expanded stream which are then stripped further downstream while adjusting token offsets??? From antlr at mirality.co.nz Wed Jan 6 17:29:02 2010 From: antlr at mirality.co.nz (Gavin Lambert) Date: Thu, 07 Jan 2010 14:29:02 +1300 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <4B4503CB.20005@worldonline.fr> References: <4B4503CB.20005@worldonline.fr> Message-ID: <20100107012920.415513418415@www.antlr.org> At 10:42 7/01/2010, Jean-Pierre LAMBERT wrote: >Well, now that I succeeded to get passed the problem... I don't >know if there is a bug or not. > >Looking at LL parser theory, it may not be that surprising -- >combinatory explosions when building an LL parser who needs more >left-factorizations. It's probably not a bug that it choked on it, but it might be a bug that it didn't *detect* that it was choking on it and give you an error message instead... :) But error detection in general in ANTLR is fairly rudimentary at the moment. Hopefully that'll get better once ANTLR v3 is self-hosted. (Which'll be in 3.3, isn't it?) From jimi at temporal-wave.com Wed Jan 6 18:14:34 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Wed, 06 Jan 2010 18:14:34 -0800 Subject: [antlr-interest] tracking token position when original file is pre-processed In-Reply-To: <4B4537C4.3000001@gmail.com> Message-ID: <2e55fea51941eb4bb139c5413deb901a@temporal-wave.com> Easiest is to have the preprocessor mark the input stream like cpp does: # 555 "myfile.c" And then add a reference to the file into the token or similar. You can also incorporate the preprocessor in to your lexer and stack input streams if it isn't in need of a parser to do the pre-processing. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Bob Frankel > Sent: Wednesday, January 06, 2010 5:24 PM > To: antlr-interest at antlr.org > Subject: [antlr-interest] tracking token position when original file is > pre-processed > > my language has a simple pre-processor that expands text of the form > ${} as a first phase of translation; the expanded stream > is then input to my ANTLRInputStream, where it proceeds onward to the > lexer/parser in the usual fashion. said another way, neither the lexer > nor the parser is aware of the ${...} construct. > > needless to say, character-position information (eg., token start/stop) > are relative to the expanded stream and not the original file; this > creates an problem, of course, when error indicators are not correctly > positioned in the original source file (as i'm doing through some > editor > integration inside eclipse). > > is there some pattern and/or (simple!) example that illustrates a > technique for managing this situation; is there some way (say) i might > embedded the equivalent of #line directives in the expanded stream > which > are then stripped further downstream while adjusting token offsets??? > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From jsrs701 at yahoo.com Wed Jan 6 18:42:20 2010 From: jsrs701 at yahoo.com (J. Stephen Riley Silber) Date: Wed, 6 Jan 2010 18:42:20 -0800 (PST) Subject: [antlr-interest] printed finally: Language implementation patterns In-Reply-To: <7275E965-B513-43DB-8E90-338B78F6EE2E@cs.usfca.edu> References: <7275E965-B513-43DB-8E90-338B78F6EE2E@cs.usfca.edu> Message-ID: <456784.77255.qm@web33306.mail.mud.yahoo.com> Here's how my evening went: 1. Finish up work stuff at the office. 2. Check ANTLR email--oh look! An exhortation to write an Amazon review! 3. Write Amazon review. (I can't remember, is one star good or bad?) 4. Go home and check snail mail. 5. Do a happy dance, since there's the book!And it looks glorious! (Though it feels so familiar... :-) Congrats, Ter, it looks great! And I love having it in dead tree format, too! ________________________________ From: Terence Parr To: "antlr-interest at antlr.org interest" ; stringtemplate-interest List Sent: Wed, January 6, 2010 4:28:05 PM Subject: [antlr-interest] printed finally: Language implementation patterns Hiya, Just to let you know that Language implementation patterns is now available as a physically printed book. hooray! If you would like to put up a review at Amazon (whatever your honest opinion is, good or bad), here's the link: http://tinyurl.com/y8r9kts Apparently it's important to have a lot of different reviews/comments in terms of marketing. Thanks, Terence List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From manharg at yahoo.com Wed Jan 6 19:21:12 2010 From: manharg at yahoo.com (Manhar Goindi) Date: Wed, 6 Jan 2010 19:21:12 -0800 (PST) Subject: [antlr-interest] Fw: ANTLR Parser Queries Message-ID: <716065.77567.qm@web57208.mail.re3.yahoo.com> --- On Wed, 1/6/10, Manhar Goindi wrote: > From: Manhar Goindi > Subject: ANTLR Parser Queries > To: antlr-interest at antlr.org > Date: Wednesday, January 6, 2010, 7:18 PM > Hi, > > We are using the ANTLR Parser and found it to be useful in > generating C# code.? However, we would like to know the > following about this parser?s capabilities: > > 1.??? Is it possible to advance the parser > from some input text position to skip some portion of the > text and let it resume parsing from some new position in the > input text? > 2.??? Is it possible to delink the default > ANTLR lexical analyzer from ANTLR Parser and link the ANTLR > Parser to some custom Lexical Analyzer? > > > Thanks & Best Regards, > Manhar Goindi > > > > From wclodius at los-alamos.net Wed Jan 6 21:04:07 2010 From: wclodius at los-alamos.net (William B. Clodius) Date: Wed, 6 Jan 2010 22:04:07 -0700 Subject: [antlr-interest] Undesirable ANTLRWorks behavior Message-ID: ANTLRWorks 3.2 on a Mac OS X 10.6.2, regular download from the ANTLR site so I don't believe it is Eclipse hoasted, is showing different odd behaviors on a large grammar file and a .stg file I am editing, that I suspect are related. First, the syntax checking runs after every keystroke. As most changes in words or strings etc. result in an invalid token the console gets flooded with errors. This would be greatly reduced if the checking were performed only after any carriage return, or better yet, when the code is generated. Second, it will sometimes give messages that it is running short of memory that can be temporarily fixed by closing other applications, particularly applications that run Java code. Third it will sometimes slow down to a crawl with no messages. I suspect, but don't know how to prove, that this is a symptom of stressed garbage collection due to short memory. It would not surprise me that both of the last two are indirect symptoms of the first problem. In particular I suspect it is keeping track of edits to an unnecessary level of detail. From parrt at cs.usfca.edu Wed Jan 6 21:37:38 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Wed, 6 Jan 2010 21:37:38 -0800 Subject: [antlr-interest] printed finally: Language implementation patterns In-Reply-To: <456784.77255.qm@web33306.mail.mud.yahoo.com> References: <7275E965-B513-43DB-8E90-338B78F6EE2E@cs.usfca.edu> <456784.77255.qm@web33306.mail.mud.yahoo.com> Message-ID: <7461F5FA-9B6D-4BA3-AEE9-77561F8753BC@cs.usfca.edu> On Jan 6, 2010, at 6:42 PM, J. Stephen Riley Silber wrote: > Here's how my evening went: > Finish up work stuff at the office. > Check ANTLR email--oh look! An exhortation to write an Amazon review! > Write Amazon review. (I can't remember, is one star good or bad?) > Go home and check snail mail. > Do a happy dance, since there's the book! > And it looks glorious! (Though it feels so familiar... :-) > > Congrats, Ter, it looks great! And I love having it in dead tree > format, too! Dang it! I haven't even gotten my copy yet! ;) Can't wait to hold in my paws. Ter From gokul007 at gmail.com Wed Jan 6 21:38:18 2010 From: gokul007 at gmail.com (Gokulakannan Somasundaram) Date: Thu, 7 Jan 2010 11:08:18 +0530 Subject: [antlr-interest] Request for preinclude_c option In-Reply-To: <10e2f2e0de8ec14dac09a612afa82f66@temporal-wave.com> References: <9362e74e1001060814k7a28abd3tf1213a25e8bbfe25@mail.gmail.com> <10e2f2e0de8ec14dac09a612afa82f66@temporal-wave.com> Message-ID: <9362e74e1001062138t38e8a636s16f3deaefe5864e0@mail.gmail.com> Jim, I have tried to put forward my argument. Guess I am not quite following this - would not using the @header section > solve this? All headers should protect themselves against multiple #include > of course. > @header section places it in both .h and .c, This makes the headers heavy. of-course multiple #include is protected. My request is to place the section only in .c, before placing the ANTLR headers(#include). > > Also, I am not sure that you really need to do this. You should place any > code using C++ templates and headers etc in external files and create an API > that you call from action code. That API should have a header and I can't > see that including that header after .h should be a problem. That > doesn't mean that there isn't one, just that I am not seeing why. Can you > post an example to the list? If @header won't do it and there is a valid > reason, then I will certainly add another @option to fix it. > > Well, atleast we have done it in a way, which uses STL and std::bitset in the action part. Sometimes we are even returning a std::bitset and boost::variant, which are all template based. Sometimes to decide on which token to be issued in lexer, we are using the hashmap. I think ANTLR somewhere uses winsock.h and including winsock2.h after that causes some issues for us. Basically we are not facing any issues, if we are including the ANTLR headers after our headers. But there is no way to do that currently without making the generated header files heavy. So i had to resort to using @preincludes option. This is the problem by making the headers heavy. Say i have two headers, one for CplusplusLexer.h and CplusplusParser.h. Say inside the lexer header, i have included a C++ library that has templates. Now this should get placed before the C headers. So CplusplusLexer.h looks like this #include #include Similarly i have CplusplusParser.h, which looks like this #include #include Now in the .cpp file, if i have to do parsing, i have to include both lexer.h and parser.h. Now there is no way template files can be placed before the antlr header, unless i do something like this by again re-declaring the headers before the antlr files #include #include #include "CplusplusLexer.h" #include "Cplusplusparser.h" While the fix is straight forward, identifying that this is the problem, will take sometime. The code organization will be more better, if i don't include them in the CplusplusParser.h and CplusplusLexer.h and the round about fixes may not be required. There is just one stuff to be kept in mind - to include the ANTLR headers after the C++headers(with templates). Hope i was able to put forward a case. Thanks, Gokul. From Heiko.Folkerts at david-bs.de Wed Jan 6 22:09:35 2010 From: Heiko.Folkerts at david-bs.de (Heiko Folkerts) Date: Thu, 7 Jan 2010 07:09:35 +0100 Subject: [antlr-interest] Using paraphrase option when using the C target in ANTLR Message-ID: <93FCBF72DCE7634481C5DF1654D8FF13035A80C8@DC2> Hi all, I am currently trying to improve the quality of our error messages generated bvy our ANTLR generated parser. Since our error messages are generally in german I'd like to take advance of the paraphrase option for rules and tokens to assign a clear name to those things. Unfortunately I get errors from ANTLR when using the following token definition: ALPHASTRING options { paraphrase="Zeichenkette";} : ('a'..'z' | 'A'..'Z' | '0'..'9' | '/' | '-' | '\u00c0' .. '\u00d6' | '\u00d8' .. '\u00fc')+; ANTLR reports: "unexpected token "Zeichenkette" So can't I use paraphrases in the C target? I am using antlr3.2. Is there a workarround for the paraphrases? Regards Heiko Mit freundlichem Gru? Heiko Folkerts Systementwicklung und -design -- ______________________________________________ DAVID GmbH ? Wendenring 1 ? 38114 Braunschweig Tel.: +49 531 24379-14 Fax.: +49 531 24379-79 E-Mail: mailto:Heiko.Folkerts at david-bs.de WWW: http://www.david-bs.de? Eintragung: Amtsgericht Braunschweig, HRB 3167 Gesch?ftsf?hrer: Frank Ptok ______________________________________________ From sandworm87 at yahoo.se Thu Jan 7 01:12:45 2010 From: sandworm87 at yahoo.se (=?iso-8859-1?Q?Christer_L=F6fving?=) Date: Thu, 7 Jan 2010 09:12:45 +0000 (GMT) Subject: [antlr-interest] First project ? Message-ID: <755474.29082.qm@web24715.mail.ird.yahoo.com> Hi all!I am an experienced software developer, but relative new to antlr.Do any of you have an idea for a suitable "middle-sized" firstproject to work with, and by that way getting started with the whole thing ? BR/Christer __________________________________________________________ L?na pengar utan s?kerhet. J?mf?r vilkor online hos Kelkoo. http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014 From jklumpp at harmonia.com Thu Jan 7 06:15:28 2010 From: jklumpp at harmonia.com (Jared Klumpp) Date: Thu, 7 Jan 2010 06:15:28 -0800 Subject: [antlr-interest] tree rewrite: breaking apart subtrees References: Message-ID: See "Rewrite rule element cardinality" in the Definitive Antlr Reference (pg. 184), it seems you want something like: vars : VARDECL type (VARIABLE ID literal?)+ -> ^(VARDECL type ^(VARIABLE ID literal))+; -J Date: Wed, 6 Jan 2010 14:58:54 -0500 From: Laurie Harper Subject: [antlr-interest] tree rewrite: breaking apart subtrees To: antlr-interest at antlr.org Message-ID: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes I'm trying to construct a parser/translator that will transform an extended version of a C-like language 'X' into standard 'X'. I can't figure out quite what I need in my tree grammar to get the result I want... For example, I have an input AST that looks something like this: (VARDECL integer (VARIABLE ivar1 (LITERAL 1)) (VARIABLE ivar2 (LITERAL 2)) (VARIABLE ivar3)) (VARDECL integer (VARIABLE ivar4)) I need to rewrite it to look like this: (VARDECL integer (VARIABLE ivar1 (LITERAL 1))) (VARDECL integer (VARIABLE ivar2 (LITERAL 2))) (VARDECL integer (VARIABLE ivar3)) (VARDECL integer (VARIABLE ivar)) My tree grammar contains a rule like this: vars : ^(VARDECL type (^(VARIABLE ID literal?))+) -> ^(VARDECL type)+ ^(VARIABLE ID literal)+; but that's not giving a result that's even close to right :-) I've tried all sorts of variations as I try to puzzle out the tree rewrite syntax, to no avail. Can anyone offer any insight? Thanks, L. From marcin.rzeznicki at gmail.com Thu Jan 7 08:16:18 2010 From: marcin.rzeznicki at gmail.com (=?UTF-8?Q?Marcin_Rze=C5=BAnicki?=) Date: Thu, 7 Jan 2010 17:16:18 +0100 Subject: [antlr-interest] Problem with AST tree with heterogeneous nodes Message-ID: <14799bf61001070816g6c418b23r370598cdb8befee7@mail.gmail.com> Hi all, I have a curious problem when populating AST tree with custom nodes, or to be more precise, with their constructors. If, in the tree grammar (I basically construct AST tree in parser and in the next step I rewrite it using tree walker), I am using: STORE[$ID, expressionResolver] then constructor UnresolvedLocal(int ttype, CommonTree id, ExpressionResolver expressionResolver) is picked up as expected But, if I am using the following form: STORE $lhs expression then it gets transformed to: new LValueError(stream_STORE.nextNode()), where nextNode() returns Object, where I expected integer carrying token type to be used Is this a bug of some kind? Can you explain it to me? Thank you very much in advance -- Greetings Marcin Rze?nicki From jsrs701 at yahoo.com Thu Jan 7 09:21:50 2010 From: jsrs701 at yahoo.com (J. Stephen Riley Silber) Date: Thu, 7 Jan 2010 09:21:50 -0800 (PST) Subject: [antlr-interest] First project ? In-Reply-To: <755474.29082.qm@web24715.mail.ird.yahoo.com> References: <755474.29082.qm@web24715.mail.ird.yahoo.com> Message-ID: <850996.41243.qm@web33308.mail.mud.yahoo.com> Hi Crister, Have you read this article? "Humans Should Not Have to Grok XML" http://www.ibm.com/developerworks/xml/library/x-sbxml.html I really got to know ANTLR3 when I built a system to translate a simple declarative scripting language (like the "{{8, 17, 1964}, instructor}" example in the article, though much richer) into XML. (I had a large sample of XML files, which represented a scripting language used in a company I worked for--but XML is lousy for coding, so I wanted a scripting language that would compile into that XML.) It was a good project that really taught me the in's and out's of ANTLR3. Easy in concept, but tough enough that I had to do some thinking. Have fun! Stephen ________________________________ From: Christer L?fving To: antlr-interest at antlr.org Sent: Thu, January 7, 2010 1:12:45 AM Subject: [antlr-interest] First project ? Hi all!I am an experienced software developer, but relative new to antlr.Do any of you have an idea for a suitable "middle-sized" firstproject to work with, and by that way getting started with the whole thing ? BR/Christer __________________________________________________________ L?na pengar utan s?kerhet. J?mf?r vilkor online hos Kelkoo. http://www.kelkoo.se/c-100390123-lan-utan-sakerhet.html?partnerId=96915014 List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From antlr at mirality.co.nz Thu Jan 7 12:36:16 2010 From: antlr at mirality.co.nz (Gavin Lambert) Date: Fri, 08 Jan 2010 09:36:16 +1300 Subject: [antlr-interest] Using paraphrase option when using the C target in ANTLR In-Reply-To: <93FCBF72DCE7634481C5DF1654D8FF13035A80C8@DC2> References: <93FCBF72DCE7634481C5DF1654D8FF13035A80C8@DC2> Message-ID: <20100107203621.C890A3418383@www.antlr.org> At 19:09 7/01/2010, Heiko Folkerts wrote: >I am currently trying to improve the quality of our error messages >generated bvy our ANTLR generated parser. Since our error messages >are generally in german I'd like to take advance of the paraphrase >option for rules and tokens to assign a clear name to those things. >Unfortunately I get errors from ANTLR when using the following >token definition: >ALPHASTRING >options { paraphrase="Zeichenkette";} >: ('a'..'z' | 'A'..'Z' | '0'..'9' | '/' | '-' | '\u00c0' .. >'\u00d6' | '\u00d8' .. '\u00fc')+; > >ANTLR reports: "unexpected token "Zeichenkette" > >So can't I use paraphrases in the C target? I am using antlr3.2. The paraphrase option is a v2 option; there is no equivalent in v3. If you want to change the text of the error messages then you will need to alter the exception text yourself, using the error reporting hooks (see the wiki). From cross at kojeware.com Thu Jan 7 13:06:57 2010 From: cross at kojeware.com (Cameron Ross) Date: Thu, 07 Jan 2010 16:06:57 -0500 Subject: [antlr-interest] An ANTLR-based XMI translator Message-ID: <4B464CF1.3070704@kojeware.com> Hi, I need to construct a program that will translate UML models specified in XMI into language 'X'. I already have an ANTLR-based parser for language X that generates a CommonTree as an intermediate form. I also have a TreeWalker that uses StringTemplate to emit valid X given an AST in this intermediate form. I currently use these two components to implement a pretty-printer for language X. I was thinking that I could implement an ANTLR parser that would take XMI models as input and generate some (different) intermediate form AST. I would then implement a TreeWalker to convert this AST into the intermediate form AST for language X. This would allow me to use my existing emitter to output the model in language X. 1) Does this sound like a reasonable strategy? 2) Is anyone aware of an existing ANLTR3 grammar for XMI? 3) Is there a better way? Thanks, Cameron. From Sanus at gmx.de Thu Jan 7 14:04:16 2010 From: Sanus at gmx.de (Christian Hoffmann) Date: Thu, 7 Jan 2010 23:04:16 +0100 Subject: [antlr-interest] c-target tree creation Message-ID: <1445674875.20100107230416@gmx.de> Hi, I'm stumbling over a unhappy circumstance. Normaly a tree is built witch a nil node and the children. But if the parser regognises just one line, the nil node is not used (probably there are no children). Example 1 - generated tree has just one node Source: int a; AST: , > Example 2 - generated tree has two nodes Source: int a; int b; AST: , >, , > > I think it would be easier for walking in a loop if the nil-node is always created. Is this possible in future versions? Regards, Christian -- Christian Hoffmann ?tzenkamp 4 38118 Braunschweig Tel: 0171/7300609 Web: www.c-hoffmann.de www.logical-arts.de From jimi at temporal-wave.com Thu Jan 7 14:24:27 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Thu, 07 Jan 2010 14:24:27 -0800 Subject: [antlr-interest] c-target tree creation In-Reply-To: <1445674875.20100107230416@gmx.de> Message-ID: <638ecb884e29654ab45d785286fd7743@temporal-wave.com> Actually the nil node should never be there so there must be something awry with your grammar. Try making sure that your tope rule looks like: top : myrule EOF! ; Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Christian Hoffmann > Sent: Thursday, January 07, 2010 2:04 PM > To: antlr-interest at antlr.org > Subject: [antlr-interest] c-target tree creation > > Hi, > > I'm stumbling over a unhappy circumstance. Normaly a tree is built > witch a > nil node and the children. But if the parser regognises just one line, > the nil node is not used (probably there are no children). > > Example 1 - generated tree has just one node > Source: > int a; > AST: > , > > > > > > Example 2 - generated tree has two nodes > Source: > int a; > int b; > AST: > , > > >, > , > > > > > > > > I think it would be easier for walking in a loop if the nil-node is > always created. Is this possible in future versions? > > Regards, > Christian > > -- > Christian Hoffmann > ?tzenkamp 4 > 38118 Braunschweig > Tel: 0171/7300609 > Web: www.c-hoffmann.de > www.logical-arts.de > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From jimi at temporal-wave.com Thu Jan 7 14:49:59 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Thu, 07 Jan 2010 14:49:59 -0800 Subject: [antlr-interest] c-target tree creation In-Reply-To: <1166891626.20100107233333@gmx.de> Message-ID: <2c1f92302f6e8a40abdae0162e37e119@temporal-wave.com> You should always rewrite the top node so you have a single root node and that is your problem as the nil node is created to hold the children and then as you don't rewrite it, it stays there but it is not created when there is just a single node. So just do this: Try: translation_unit @init{ _pParser->m_bError = false; _pParser->m_ScopeDelimiter = "@"; } : telement EOF ->^(TUNIT telement) ; telement : ( pragma | expression_statement )* ; And you will be all set. Jim PS: Please use the list rather than emailing me directly :-) > -----Original Message----- > From: Christian Hoffmann [mailto:Sanus at gmx.de] > Sent: Thursday, January 07, 2010 2:34 PM > To: Jim Idle > Subject: Re: [antlr-interest] c-target tree creation > > Hi Jim, > > this is my top-rule... > > translation_unit > @init{ _pParser->m_bError = false; _pParser->m_ScopeDelimiter = > "@"; } > : ( pragma > | expression_statement > )* EOF! > ; > > Can you see a problem? > > Thx > Chris > > > > JI> Actually the nil node should never be there so there must be > JI> something awry with your grammar. Try making sure that your tope > rule looks like: > > JI> top : myrule EOF! ; > > JI> Jim > > >> -----Original Message----- > >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > >> bounces at antlr.org] On Behalf Of Christian Hoffmann > >> Sent: Thursday, January 07, 2010 2:04 PM > >> To: antlr-interest at antlr.org > >> Subject: [antlr-interest] c-target tree creation > >> > >> Hi, > >> > >> I'm stumbling over a unhappy circumstance. Normaly a tree is built > >> witch a > >> nil node and the children. But if the parser regognises just one > line, > >> the nil node is not used (probably there are no children). > >> > >> Example 1 - generated tree has just one node > >> Source: > >> int a; > >> AST: > >> >> , > >> > >> > > >> > >> > >> Example 2 - generated tree has two nodes > >> Source: > >> int a; > >> int b; > >> AST: > >> >> >> , > >> > >> >, > >> >> , > >> > >> > > >> > > >> > >> > >> I think it would be easier for walking in a loop if the nil-node is > >> always created. Is this possible in future versions? > >> > >> Regards, > >> Christian > >> > >> -- > >> Christian Hoffmann > >> ?tzenkamp 4 > >> 38118 Braunschweig > >> Tel: 0171/7300609 > >> Web: www.c-hoffmann.de > >> www.logical-arts.de > >> > >> > >> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >> Unsubscribe: http://www.antlr.org/mailman/options/antlr- > interest/your- > >> email-address > > > > > JI> List: http://www.antlr.org/mailman/listinfo/antlr-interest > JI> Unsubscribe: > JI> http://www.antlr.org/mailman/options/antlr-interest/your-email- > address > > > > -- > Mit freundlichen Gr??en > Christian Hoffmann > > mailto:Sanus at gmx.de > From denis.debarbieux at ateji.com Fri Jan 8 03:05:18 2010 From: denis.debarbieux at ateji.com (Denis Debarbieux) Date: Fri, 08 Jan 2010 12:05:18 +0100 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <9362e74e1001052242s192e7ae7u4beef375108e297d@mail.gmail.com> References: <4B43521A.6000501@worldonline.fr> <9362e74e1001052242s192e7ae7u4beef375108e297d@mail.gmail.com> Message-ID: <4B47116E.9000407@ateji.com> Hi everybody, > One of most tough problem in the migration for me was to resolve the > left factoring. I am surprised by this discussion. I thought that there are algorithms that automatically removes left recursions and left factorizations. Did I learn those algorithms at school but they are never used in real problem? Why ANTLR does not use them? Regards Denis Gokulakannan Somasundaram a ?crit : > Hi Jean, > I faced up with a similar issue, when i tried the migration of a > LR parser. But it's definitely because of recursion stuffs. The way i > removed is sort of layman stuff, but thought of just informing you. > Try to split the grammar into multiple sections(group of rules) and > try to add them one-by-one. You don't need to wait till the errors are > emitted. As soon as the parser generation takes more than 3-4 mins, just > stop the generation. The last section, which resulted in the increase most > probably contains the problematic code. Bear with me, if this approach looks > very awkward. > > Thanks, > Gokul. > From Heiko.Folkerts at david-bs.de Fri Jan 8 05:32:56 2010 From: Heiko.Folkerts at david-bs.de (Heiko Folkerts) Date: Fri, 8 Jan 2010 14:32:56 +0100 Subject: [antlr-interest] Doxygen errors when using the C Target with ANTLR Message-ID: <93FCBF72DCE7634481C5DF1654D8FF13035A819E@DC2> Hi all, When I run doxygen over the code generated by ANTLR using the C target, I get the following error message in doxygen.log: C:/Projekte/modelisar/trunk/src/TFSSParser/grammar/TFSSBaseParser.h:303: Warning: argument 'POinter' of command @param is not found in the argument list of TFSSBaseParser_tfs_SCOPE_struct::void(ANTLR3_CDECL *free) C:/Projekte/modelisar/trunk/src/TFSSParser/grammar/TFSSBaseParser.h:303: Warning: The following parameters of TFSSBaseParser_tfs_SCOPE_struct::void(ANTLR3_CDECL *free) are not documented: parameter 'free' I found a fitting place in c.stg and tried to fix the template to solve the problem, but in the generated code the errorneous code still exists. What have I made wrong? Any solution how to fix it? The Code for the mentioned file TFSSBase.h is: /** Function that the user may provide to be called when the * scope is destroyed (so you can free pANTLR3_HASH_TABLES and so on) * * \param POinter to an instance of this typedef/struct */ Thx Heiko Mit freundlichem Gru? Heiko Folkerts Systementwicklung und -design -- ______________________________________________ DAVID GmbH ? Wendenring 1 ? 38114 Braunschweig Tel.: +49 531 24379-14 Fax.: +49 531 24379-79 E-Mail: mailto:Heiko.Folkerts at david-bs.de WWW: http://www.david-bs.de? Eintragung: Amtsgericht Braunschweig, HRB 3167 Gesch?ftsf?hrer: Frank Ptok ______________________________________________ From fridi70 at gmx.de Fri Jan 8 07:14:35 2010 From: fridi70 at gmx.de (fridi) Date: Fri, 08 Jan 2010 16:14:35 +0100 Subject: [antlr-interest] Match anything until a specific phrase Message-ID: <4B474BDB.4030105@gmx.de> Hello all, maybe someone can help me to get this done with ANTLR 3.2 My file has a header starting with 'test', some comments and then several blocks named 'Page 1', 'Page 2' etc. with integers, i.e. test This is a comment and we are not interested in. Today is friday. Page 1: 123 456 789 I want to have a rule that consumes everything of the header until the word 'Page'. 'Page' should not be consumed by the header, it be consumed by another rule. So I tried the following: grammar TestNot; options { language = Java; } rule : file; file : header PAGE INT ':' INT+ EOF; header : 'test' ~PAGE; PAGE : 'Page'; INT : DIGIT+; fragment DIGIT : '0'..'9'; Any idea? Thanks in advance. From steel at kryas.com Fri Jan 8 07:35:49 2010 From: steel at kryas.com (Stanley Steel) Date: Fri, 08 Jan 2010 08:35:49 -0700 Subject: [antlr-interest] Binary Message Parsing Message-ID: <4B4750D5.8050700@kryas.com> Is ANTLR suitable to build a binary message parser? From KLPauba at west.com Fri Jan 8 07:50:05 2010 From: KLPauba at west.com (Pauba, Kevin L) Date: Fri, 8 Jan 2010 09:50:05 -0600 Subject: [antlr-interest] Can ST be used to generate binary output? In-Reply-To: <4B4750D5.8050700@kryas.com> References: <4B4750D5.8050700@kryas.com> Message-ID: <226316B3E1F749498E28ACA66321D5BA01315F2CDC@oma00cexmbx03.corp.westworlds.com> I would like to use my ANTLR-based DSL compiler to generate pretty-printed source, documentation (similar to javadocs) and bytecode output using StringTemplate. The interpreter for this DSL needs bytecode in binary form. Is ST able to generate binary output (I know it can do the gruntwork for pretty-printing and documentation)? If so, might you have some pointers on how to do it? Thanks! From jimi at temporal-wave.com Fri Jan 8 10:05:25 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Fri, 08 Jan 2010 10:05:25 -0800 Subject: [antlr-interest] Doxygen errors when using the C Target with ANTLR In-Reply-To: <93FCBF72DCE7634481C5DF1654D8FF13035A819E@DC2> Message-ID: Well to be honest I started adding doxygen to the generated code but after looking at what you get from it I decided that it wasn't really of much help. In the next version of ANTLR doc comments of rules will be passed through to code gen and that should help. To change the template you need to either rebuild ANTLR or set your class path up so that it finds your version of C.stg before mine. I suspect though that what you will get is not really that useful. Better to document the grammar than the generated code. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Heiko Folkerts > Sent: Friday, January 08, 2010 5:33 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Doxygen errors when using the C Target with > ANTLR > > Hi all, > When I run doxygen over the code generated by ANTLR using the C target, > I get the following error message in doxygen.log: > > C:/Projekte/modelisar/trunk/src/TFSSParser/grammar/TFSSBaseParser.h:303 > : Warning: argument 'POinter' of command @param is not found in the > argument list of TFSSBaseParser_tfs_SCOPE_struct::void(ANTLR3_CDECL > *free) > C:/Projekte/modelisar/trunk/src/TFSSParser/grammar/TFSSBaseParser.h:303 > : Warning: The following parameters of > TFSSBaseParser_tfs_SCOPE_struct::void(ANTLR3_CDECL *free) are not > documented: > parameter 'free' > > I found a fitting place in c.stg and tried to fix the template to solve > the problem, but in the generated code the errorneous code still > exists. What have I made wrong? Any solution how to fix it? > The Code for the mentioned file TFSSBase.h is: > /** Function that the user may provide to be called when the > * scope is destroyed (so you can free pANTLR3_HASH_TABLES and so > on) > * > * \param POinter to an instance of this typedef/struct > */ > > Thx > Heiko > > Mit freundlichem Gru? > Heiko Folkerts > Systementwicklung und -design > -- > ______________________________________________ > DAVID GmbH ? Wendenring 1 ? 38114 Braunschweig > Tel.: +49 531 24379-14 > Fax.: +49 531 24379-79 > E-Mail: mailto:Heiko.Folkerts at david-bs.de > WWW: http://www.david-bs.de > Eintragung: Amtsgericht Braunschweig, HRB 3167 > Gesch?ftsf?hrer: Frank Ptok > ______________________________________________ > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From jimi at temporal-wave.com Fri Jan 8 10:09:09 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Fri, 08 Jan 2010 10:09:09 -0800 Subject: [antlr-interest] Match anything until a specific phrase In-Reply-To: <4B474BDB.4030105@gmx.de> Message-ID: <880718c24120444aa9e21ed91a6b5f01@temporal-wave.com> Why don't you just remove the header before sending it to the lexer? Or write a function/method to do input.consume() until you find 'P' then check for 'Page', stop consuming if found, carry on consuming if not. Trigger the method as appropriate in action code for tokens or at lexer start up. I would remove the 'literals' from your parser and make real lexer rules. Remember that the lexer runs, then the parser runs, you cannot direct the lexer from the parser. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of fridi > Sent: Friday, January 08, 2010 7:15 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Match anything until a specific phrase > > Hello all, > maybe someone can help me to get this done with ANTLR 3.2 > > My file has a header starting with 'test', some comments and then > several blocks named 'Page 1', 'Page 2' etc. with integers, i.e. > > test This is a comment and > we are not interested in. > Today is friday. > > Page 1: > 123 > 456 > 789 > > > I want to have a rule that consumes everything of the header until the > word 'Page'. > 'Page' should not be consumed by the header, it be consumed by another > rule. > > So I tried the following: > > grammar TestNot; > > options { > language = Java; > } > > rule : > file; > > file : > header PAGE INT ':' INT+ EOF; > > header : > 'test' ~PAGE; > > PAGE : > 'Page'; > > INT : > DIGIT+; > > fragment > DIGIT : > '0'..'9'; > > > Any idea? Thanks in advance. > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From jimi at temporal-wave.com Fri Jan 8 12:30:55 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Fri, 08 Jan 2010 12:30:55 -0800 Subject: [antlr-interest] Can ST be used to generate binary output? In-Reply-To: <226316B3E1F749498E28ACA66321D5BA01315F2CDC@oma00cexmbx03.corp.westworlds.com> Message-ID: It is usually better to produce an intermediate assembler representation of your byte code then have a parser that can assemble that in to byte code. You will for instance need to resolve the targets of 'jmp' and things like that and having an assembly language listing of the 'byte code' lets you be more productive when debugging and so on. Such assembly/intermediate languages are also good for optimizing phases. When I need multiple different outputs like this then I create an Abstract class with common functionality for code generation and derive generators for each target from that. I have the code generator create StringTemplates when this is what is needed and also have a code generator that produces the byte code. Then I have the tree walker call the code generation methods rather than create templates directly in the tree grammar. You can then do multiple walks with different code generators. If you are trying to generate Java byte code then write a code generator that interfaces to the ASM package: http://asm.ow2.org/ which is very good. There is also LLVM of course. Finally I think that you would benefit greatly from reading the new book: http://pragprog.com/titles/tpdsl/language-implementation-patterns Which will guide you through some working of examples of all of this stuff. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Pauba, Kevin L > Sent: Friday, January 08, 2010 7:50 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Can ST be used to generate binary output? > > I would like to use my ANTLR-based DSL compiler to generate pretty- > printed source, documentation (similar to javadocs) and bytecode output > using StringTemplate. The interpreter for this DSL needs bytecode in > binary form. > > Is ST able to generate binary output (I know it can do the gruntwork > for pretty-printing and documentation)? If so, might you have some > pointers on how to do it? > > Thanks! > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From KLPauba at west.com Fri Jan 8 13:53:01 2010 From: KLPauba at west.com (Pauba, Kevin L) Date: Fri, 8 Jan 2010 15:53:01 -0600 Subject: [antlr-interest] Can ST be used to generate binary output? In-Reply-To: References: <226316B3E1F749498E28ACA66321D5BA01315F2CDC@oma00cexmbx03.corp.westworlds.com> Message-ID: <226316B3E1F749498E28ACA66321D5BA01315F2EE5@oma00cexmbx03.corp.westworlds.com> Thanks (again!) Jim. I'll take that under advisement. I've had the LIPs ebook for some time now (just received the hardcopy this week) but haven't dug too deep in it. I'll do that now. -----Original Message----- From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle Sent: Friday, January 08, 2010 2:31 PM To: antlr-interest at antlr.org Subject: Re: [antlr-interest] Can ST be used to generate binary output? It is usually better to produce an intermediate assembler representation of your byte code then have a parser that can assemble that in to byte code. You will for instance need to resolve the targets of 'jmp' and things like that and having an assembly language listing of the 'byte code' lets you be more productive when debugging and so on. Such assembly/intermediate languages are also good for optimizing phases. When I need multiple different outputs like this then I create an Abstract class with common functionality for code generation and derive generators for each target from that. I have the code generator create StringTemplates when this is what is needed and also have a code generator that produces the byte code. Then I have the tree walker call the code generation methods rather than create templates directly in the tree grammar. You can then do multiple walks with different code generators. If you are trying to generate Java byte code then write a code generator that interfaces to the ASM package: http://asm.ow2.org/ which is very good. There is also LLVM of course. Finally I think that you would benefit greatly from reading the new book: http://pragprog.com/titles/tpdsl/language-implementation-patterns Which will guide you through some working of examples of all of this stuff. Jim From ttmrichter at gmail.com Fri Jan 8 20:32:05 2010 From: ttmrichter at gmail.com (Michael Richter) Date: Sat, 9 Jan 2010 12:32:05 +0800 Subject: [antlr-interest] Question about idiom. Message-ID: I keep coming across a pattern in a grammar I'm working on. This pattern looks something like this: - A production can be *A*. - A production can be *B*. - A production can be *A B.* In the grammar I'm transcribing this from, the notation used is *(A & B)*. Is there some convenient way to code that in ANTLR's EBNF notation? I keep having to do *(A | B | A B)*. As is that isn't all that onerous as-is, I admit, but imagine if A is five tokens long and B is also five tokens long and then imagine this kind of pattern happening about twenty times in the grammar. Is there a way to concisely do this? From parrt at cs.usfca.edu Fri Jan 8 21:36:40 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 8 Jan 2010 21:36:40 -0800 Subject: [antlr-interest] Parser generation takes hours In-Reply-To: <4B47116E.9000407@ateji.com> References: <4B43521A.6000501@worldonline.fr> <9362e74e1001052242s192e7ae7u4beef375108e297d@mail.gmail.com> <4B47116E.9000407@ateji.com> Message-ID: <7D9F6A30-7A5C-4EB6-B77A-89BDE539F8B1@cs.usfca.edu> ANTLRWorks can do some left-factoring automatically. Ter On Jan 8, 2010, at 3:05 AM, Denis Debarbieux wrote: > Hi everybody, >> One of most tough problem in the migration for me was to resolve the >> left factoring. > > I am surprised by this discussion. > > I thought that there are algorithms that automatically removes left > recursions and left factorizations. Did I learn those algorithms at > school but they are never used in real problem? Why ANTLR does not > use > them? > > Regards > > Denis > > Gokulakannan Somasundaram a ?crit : >> Hi Jean, >> I faced up with a similar issue, when i tried the migration >> of a >> LR parser. But it's definitely because of recursion stuffs. The way i >> removed is sort of layman stuff, but thought of just informing you. >> Try to split the grammar into multiple sections(group of >> rules) and >> try to add them one-by-one. You don't need to wait till the errors >> are >> emitted. As soon as the parser generation takes more than 3-4 mins, >> just >> stop the generation. The last section, which resulted in the >> increase most >> probably contains the problematic code. Bear with me, if this >> approach looks >> very awkward. >> >> Thanks, >> Gokul. >> > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From fridi70 at gmx.de Sat Jan 9 02:04:05 2010 From: fridi70 at gmx.de (fridi) Date: Sat, 09 Jan 2010 11:04:05 +0100 Subject: [antlr-interest] Match anything until a specific phrase In-Reply-To: <880718c24120444aa9e21ed91a6b5f01@temporal-wave.com> References: <880718c24120444aa9e21ed91a6b5f01@temporal-wave.com> Message-ID: <4B485495.3050803@gmx.de> Jim Idle wrote: > Why don't you just remove the header before sending it to the lexer? Yes, that is a good idea, too. I thought it should be possible to get this done with ANTLR > Or write a function/method to do input.consume() until you find 'P' then check for 'Page', stop consuming if found, carry on consuming if not. Trigger the method as appropriate in action code for tokens or at lexer start up. > Do you have any simple example or hint how to do that? > I would remove the 'literals' from your parser and make real lexer rules. Yes, that is what I have done in my real grammar, this one here was just an example. Thanks a lot - Fridi > Remember that the lexer runs, then the parser runs, you cannot direct the lexer from the parser. > > Jim > > >> -----Original Message----- >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >> bounces at antlr.org] On Behalf Of fridi >> Sent: Friday, January 08, 2010 7:15 AM >> To: antlr-interest at antlr.org >> Subject: [antlr-interest] Match anything until a specific phrase >> >> Hello all, >> maybe someone can help me to get this done with ANTLR 3.2 >> >> My file has a header starting with 'test', some comments and then >> several blocks named 'Page 1', 'Page 2' etc. with integers, i.e. >> >> test This is a comment and >> we are not interested in. >> Today is friday. >> >> Page 1: >> 123 >> 456 >> 789 >> >> >> I want to have a rule that consumes everything of the header until the >> word 'Page'. >> 'Page' should not be consumed by the header, it be consumed by another >> rule. >> >> So I tried the following: >> >> grammar TestNot; >> >> options { >> language = Java; >> } >> >> rule : >> file; >> >> file : >> header PAGE INT ':' INT+ EOF; >> >> header : >> 'test' ~PAGE; >> >> PAGE : >> 'Page'; >> >> INT : >> DIGIT+; >> >> fragment >> DIGIT : >> '0'..'9'; >> >> >> Any idea? Thanks in advance. >> >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >> email-address >> > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > From kroepke at classdump.org Sat Jan 9 06:41:33 2010 From: kroepke at classdump.org (=?iso-8859-1?Q?Kay_R=F6pke?=) Date: Sat, 9 Jan 2010 15:41:33 +0100 Subject: [antlr-interest] Question about idiom. In-Reply-To: References: Message-ID: <53C75D1D-B191-42E5-BF37-7E5E50BA35D9@classdump.org> On Jan 9, 2010, at 5:32 AM, Michael Richter wrote: > I keep coming across a pattern in a grammar I'm working on. This pattern > looks something like this: > > - A production can be *A*. > - A production can be *B*. > - A production can be *A B.* > > In the grammar I'm transcribing this from, the notation used is *(A & B)*. > Is there some convenient way to code that in ANTLR's EBNF notation? I keep > having to do *(A | B | A B)*. As is that isn't all that onerous as-is, I > admit, but imagine if A is five tokens long and B is also five tokens long > and then imagine this kind of pattern happening about twenty times in the > grammar. Is there a way to concisely do this? What is the restriction on the parts of the production? I.e. what differentiates a valid production from an invalid one? I'll take a wild guess, maybe I'm right ;) Given the tokens A, B, C, D, i suspect that the allowed combination is any permutation of these tokens, i.e. A B C D, C B A, D, A, B etc are all valid inputs? Then the question is, how do you a) make it easy to write in the grammar and b) still ensure no repeated element in the production. One way to do it is to use semantic predicates (turning off or validating parts of the grammar depending on semantic infomation). Depending on whether you want the FailedPredicateException or not, you would use a gated sempred ( {}?=> ) or a non-gated one ( {}? ). Gated sempreds "turn off" parts of the grammar, while regular validating predicates do not. Disclaimer: written in mail, assuming Java target, not enough coffee yadda yadda: primaryOne @init { Map seenToken = new HashMap(); } : ( {! seenToken.containsKey(input.LT(1).getText()) }? prim=primaryOneToken { seenToken.put($prim.start.getText(), Boolean.TRUE); } )+ ; primaryOneToken : 'A' | 'B' | 'C' | 'D' ; expr : primaryOne '&' primaryOne 'A' /* the 'A' is just to demonstrate that ANTLR will carry on matching input correctly */ ; That should allow lists of non-repeated A, B, C, D in any order. Maybe there is a more clever way of writing that, but it eludes me right now. Try it in ANTLRWorks on input like: A B C & A A and see what it matches where and what changes if you change the the sempred to a gated one. cheers, -k From pureza at gmail.com Sat Jan 9 09:03:33 2010 From: pureza at gmail.com (Luis Pureza) Date: Sat, 9 Jan 2010 17:03:33 +0000 Subject: [antlr-interest] Unexpected behavior while using += in a tree grammar In-Reply-To: <3e1533501001090858le8e6d05m43327c6be60ec561@mail.gmail.com> References: <3e1533501001090858le8e6d05m43327c6be60ec561@mail.gmail.com> Message-ID: <3e1533501001090903g70140323jfb08fed7984ab76d@mail.gmail.com> Hi, I've started using antlr a few days ago, so let me begin by thanking everyone that contributed for creating this fantastic project. Unfortunately, I think I ran into a bug and I'm hoping you might help me. I'm using a tree grammar where I have the following rule: expr returns [Expr value] ? ?| ID ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?{ $value = new Var($ID.text); } ? ?| ^(APP fn=expr (args+=expr)+ { $value = new App($fn.value, $args); } ? ?... Surprisingly, $args is a list of CommonTrees, and not a list of Expr as I was hoping it would be. Is this a bug or a feature? If it's the latter, is there any way to "convert" the tree into an Expr? For now, I'm collecting args manually, with the following workaround: expr returns [Expr value] @init { ?List ops = new ArrayList(); } ? ?| ^(APP fn=expr (op=expr { ops.add($op.value); })+) { ... } ? ?| ID ? ? ? ? ? ? ? ? ? ?{ ... } Thanks! Lu?s Pureza From antonio.petrelli at gmail.com Sat Jan 9 11:38:33 2010 From: antonio.petrelli at gmail.com (Antonio Petrelli) Date: Sat, 9 Jan 2010 20:38:33 +0100 Subject: [antlr-interest] Problems with Maven plugin Message-ID: Hi all Sorry for being an ANTLR newbie. I would like to use the Maven plugin. When I try to generate (through mvn compile) sources of the Java.g 1.6 parser, the plugin gives me an error (full log below in the mail): error(7): cannot find or open file: null/Java.g However, if I copy the same file under the "null" directory, it generates the code! I am using Maven 2.2.1 under Linux Kubuntu 9.10 amd64, OpenJDK 1.6 b16 You can check it live at this address: http://svn.eu.apache.org/repos/asf/tiles/sandbox/trunk/tiles-autotag/tiles-autotag-core/ Thanks in advance Antonio Petrelli ----------------- Full Log mvn clean compile -e + Error stacktraces are turned on. [INFO] Scanning for projects... [INFO] ------------------------------------------------------------------------ [INFO] Building Autotag - Core [INFO] task-segment: [clean, compile] [INFO] ------------------------------------------------------------------------ [INFO] [clean:clean {execution: default-clean}] [INFO] Deleting directory /home/antonio/javadev/workspace-sandbox/tiles-autotag/tiles-autotag-core/target [INFO] [antlr3:antlr {execution: default}] [INFO] ANTLR: Processing source directory /home/antonio/javadev/workspace-sandbox/tiles-autotag/tiles-autotag-core/src/main/antlr3 ANTLR Parser Generator Version 3.2 Sep 23, 2009 14:05:07 error(7): cannot find or open file: null/Java.g [INFO] ------------------------------------------------------------------------ [ERROR] BUILD ERROR [INFO] ------------------------------------------------------------------------ [INFO] ANTLR caught 1 build errors. [INFO] ------------------------------------------------------------------------ [INFO] Trace org.apache.maven.lifecycle.LifecycleExecutionException: ANTLR caught 1 build errors. at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:719) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:556) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:535) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:348) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138) at org.apache.maven.cli.MavenCli.main(MavenCli.java:362) at org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) at org.codehaus.classworlds.Launcher.main(Launcher.java:375) Caused by: org.apache.maven.plugin.MojoExecutionException: ANTLR caught 1 build errors. at org.antlr.mojo.antlr3.Antlr3Mojo.execute(Antlr3Mojo.java:397) at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:490) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:694) ... 17 more [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1 second [INFO] Finished at: Sat Jan 09 20:36:42 CET 2010 [INFO] Final Memory: 9M/67M [INFO] ------------------------------------------------------------------------ ------------- Maven configuration: 4.0.0 tiles-autotag org.apache.tiles 1.0-SNAPSHOT org.apache.tiles tiles-autotag-core 1.0-SNAPSHOT Autotag - Core Core classes for Autotag. org.antlr antlr3-maven-plugin 3.2 Java.g antlr org.antlr antlr-runtime 3.2 jar compile From antonio.petrelli at gmail.com Sat Jan 9 11:50:00 2010 From: antonio.petrelli at gmail.com (Antonio Petrelli) Date: Sat, 9 Jan 2010 20:50:00 +0100 Subject: [antlr-interest] Problems with Maven plugin In-Reply-To: References: Message-ID: I forgot to say that I am using the 3.2 version of the plugin and of ANTLR. Moreover if I move the Java.g file under another subdirectory (say 'foo'), it is created under the 'foo' directory but the "package" instruction is not included in the Java code. Thanks Antonio 2010/1/9 Antonio Petrelli : > Hi all > Sorry for being an ANTLR newbie. I would like to use the Maven plugin. > When I try to generate (through mvn compile) sources of the Java.g 1.6 > parser, the plugin gives me an error (full log below in the mail): > > error(7): ?cannot find or open file: null/Java.g > > However, if I copy the same file under the "null" directory, it > generates the code! > > I am using Maven 2.2.1 under Linux Kubuntu 9.10 amd64, OpenJDK 1.6 b16 > > You can check it live at this address: > http://svn.eu.apache.org/repos/asf/tiles/sandbox/trunk/tiles-autotag/tiles-autotag-core/ > > Thanks in advance > Antonio Petrelli > > ----------------- > Full Log > > mvn clean compile -e > + Error stacktraces are turned on. > [INFO] Scanning for projects... > [INFO] ------------------------------------------------------------------------ > [INFO] Building Autotag - Core > [INFO] ? ?task-segment: [clean, compile] > [INFO] ------------------------------------------------------------------------ > [INFO] [clean:clean {execution: default-clean}] > [INFO] Deleting directory > /home/antonio/javadev/workspace-sandbox/tiles-autotag/tiles-autotag-core/target > [INFO] [antlr3:antlr {execution: default}] > [INFO] ANTLR: Processing source directory > /home/antonio/javadev/workspace-sandbox/tiles-autotag/tiles-autotag-core/src/main/antlr3 > ANTLR Parser Generator ?Version 3.2 Sep 23, 2009 14:05:07 > error(7): ?cannot find or open file: null/Java.g > [INFO] ------------------------------------------------------------------------ > [ERROR] BUILD ERROR > [INFO] ------------------------------------------------------------------------ > [INFO] ANTLR caught 1 build errors. > [INFO] ------------------------------------------------------------------------ > [INFO] Trace > org.apache.maven.lifecycle.LifecycleExecutionException: ANTLR caught 1 > build errors. > ? ? ? ?at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:719) > ? ? ? ?at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:556) > ? ? ? ?at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:535) > ? ? ? ?at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387) > ? ? ? ?at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:348) > ? ? ? ?at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180) > ? ? ? ?at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328) > ? ? ? ?at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138) > ? ? ? ?at org.apache.maven.cli.MavenCli.main(MavenCli.java:362) > ? ? ? ?at org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60) > ? ? ? ?at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ? ? ? ?at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > ? ? ? ?at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ? ? ? ?at java.lang.reflect.Method.invoke(Method.java:616) > ? ? ? ?at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) > ? ? ? ?at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) > ? ? ? ?at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) > ? ? ? ?at org.codehaus.classworlds.Launcher.main(Launcher.java:375) > Caused by: org.apache.maven.plugin.MojoExecutionException: ANTLR > caught 1 build errors. > ? ? ? ?at org.antlr.mojo.antlr3.Antlr3Mojo.execute(Antlr3Mojo.java:397) > ? ? ? ?at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:490) > ? ? ? ?at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:694) > ? ? ? ?... 17 more > [INFO] ------------------------------------------------------------------------ > [INFO] Total time: 1 second > [INFO] Finished at: Sat Jan 09 20:36:42 CET 2010 > [INFO] Final Memory: 9M/67M > [INFO] ------------------------------------------------------------------------ > > ------------- > > Maven configuration: > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > ? ?xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 > http://maven.apache.org/maven-v4_0_0.xsd"> > ? ?4.0.0 > ? ? > ? ? ? ?tiles-autotag > ? ? ? ?org.apache.tiles > ? ? ? ?1.0-SNAPSHOT > ? ? > ? ?org.apache.tiles > ? ?tiles-autotag-core > ? ?1.0-SNAPSHOT > ? ?Autotag - Core > ? ?Core classes for Autotag. > ? ? > ? ? ? ? > ? ? ? ? ? ? > ? ? ? ? ? ? ? ?org.antlr > ? ? ? ? ? ? ? ?antlr3-maven-plugin > ? ? ? ? ? ? ? ?3.2 > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ?Java.g > ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ?antlr > ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ? ? > ? ? ? ? > ? ? > ? ? > ? ? ? ? > ? ? ? ? ? ?org.antlr > ? ? ? ? ? ?antlr-runtime > ? ? ? ? ? ?3.2 > ? ? ? ? ? ?jar > ? ? ? ? ? ?compile > ? ? ? ? > ? ? > > From michael.guyver at gmail.com Sat Jan 9 15:11:04 2010 From: michael.guyver at gmail.com (Michael Guyver) Date: Sat, 9 Jan 2010 23:11:04 +0000 Subject: [antlr-interest] antlr3-maven-plugin (v3.2): "error(7): cannot find or open file: null/MyGrammar.g" In-Reply-To: References: Message-ID: Hi there, There's a a bug in the Antlr4Mojo class where the grammar files are stored in the src/main/antlr3 root (for example src/main/antlr3/MyGrammar.g). Despite scanning and finding the grammar file (and reporting its location nicely), it results in a 'null' value being passed back from findSourceSubdir(File,String) such that the following error occurs: error(7): ?cannot find or open file: null/MyGrammar.g and results in the following exception trace: Caused by: org.apache.maven.plugin.MojoExecutionException: ANTLR caught 2 build errors. ? ? ? ?at org.antlr.mojo.antlr3.Antlr3Mojo.execute(Antlr3Mojo.java:397) ? ? ? ?at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:451) ? ? ? ?at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:558) ? ? ? ?... 16 more I had formerly been using the codehaus 1.0 release and been setting the output directory to target/generated-sources/antlr/my/full/package/path/ so that the generated files arrived in the right place. Happily the new plugin does this for you so simply moving the grammar to src/main/antlr3/my/full/package/path/MyGrammar.g solved the problem and meant I didn't have to specify the output directory either \:D/ Hope this helps any other people perplexed by the issue and that it might result in a fix (not that I'm dependent on it any longer;)? Best wishes, Michael From ttmrichter at gmail.com Sat Jan 9 18:04:06 2010 From: ttmrichter at gmail.com (Michael Richter) Date: Sun, 10 Jan 2010 10:04:06 +0800 Subject: [antlr-interest] Question about idiom. In-Reply-To: <53C75D1D-B191-42E5-BF37-7E5E50BA35D9@classdump.org> References: <53C75D1D-B191-42E5-BF37-7E5E50BA35D9@classdump.org> Message-ID: 2010/1/9 Kay R?pke > > On Jan 9, 2010, at 5:32 AM, Michael Richter wrote: > > > I keep coming across a pattern in a grammar I'm working on. This pattern > > looks something like this: > > > > - A production can be *A*. > > - A production can be *B*. > > - A production can be *A B.* > > > > In the grammar I'm transcribing this from, the notation used is *(A & > B)*. > > Is there some convenient way to code that in ANTLR's EBNF notation? I > keep > > having to do *(A | B | A B)*. As is that isn't all that onerous as-is, I > > admit, but imagine if A is five tokens long and B is also five tokens > long > > and then imagine this kind of pattern happening about twenty times in the > > grammar. Is there a way to concisely do this? > > What is the restriction on the parts of the production? > I.e. what differentiates a valid production from an invalid one? > The restriction is exactly as I put it: You can have A (where A is a multi-token set of specified order), B (where B is a multi-token set of specified order) or A B. It *must* be in the order provided and A and B are fixed token sets. Think of it this way: you're declaring a variable. You have a token for the variable, then an optional type specification (A -- multiple tokens) and an optional initializer (B -- multiple tokens). Both parts are optional, but you *must* have at least one and the declarations *must* be in the order of type then initializer if both are present. The only way I've found to do it is (A | B | A B), but this is painful when A and B are more than one token in length and I've got about 20 of these things in the grammar. This is just begging for typos. From jbb at acm.org Sat Jan 9 18:40:04 2010 From: jbb at acm.org (John B. Brodie) Date: Sat, 09 Jan 2010 21:40:04 -0500 Subject: [antlr-interest] Question about idiom. In-Reply-To: References: <53C75D1D-B191-42E5-BF37-7E5E50BA35D9@classdump.org> Message-ID: <1263091204.3473.20.camel@gecko.home.org> Greetings! On Sun, 2010-01-10 at 10:04 +0800, Michael Richter wrote: > 2010/1/9 Kay R?pke > > > > > On Jan 9, 2010, at 5:32 AM, Michael Richter wrote: > > > > > I keep coming across a pattern in a grammar I'm working on. This pattern > > > looks something like this: > > > > > > - A production can be *A*. > > > - A production can be *B*. > > > - A production can be *A B.* > > > > > > In the grammar I'm transcribing this from, the notation used is *(A & > > B)*. > > > Is there some convenient way to code that in ANTLR's EBNF notation? I > > keep > > > having to do *(A | B | A B)*. As is that isn't all that onerous as-is, I > > > admit, but imagine if A is five tokens long and B is also five tokens > > long > > > and then imagine this kind of pattern happening about twenty times in the > > > grammar. Is there a way to concisely do this? > > > > What is the restriction on the parts of the production? > > I.e. what differentiates a valid production from an invalid one? > > > > The restriction is exactly as I put it: You can have A (where A is a > multi-token set of specified order), B (where B is a multi-token set of > specified order) or A B. It *must* be in the order provided and A and B are > fixed token sets. > 1) make a parser rule to recognize the sequence of Tokens (and/or other parser rules) comprising A; and call it, say, as: recognize_A. 2) make a parser rule to recognize the sequence of Tokens(and/or other parser rules) comprising B; and call it, say, as: recognize_B. 3) make a parser rule of the form: an_A_or_B_or_AB : recognize_A ( recognize_B )? | recognize_B ; observe the proper left-factoring in the above... 4) use the above parser rule `an_A_or_B_or_AB` from 3) everywhere you have the (A|B|A B) stuff. note that if A and B share a common prefix (e.g. a common left-factor) you will probably experience issues with the above 4 steps. > Think of it this way: you're declaring a variable. You have a token for the > variable, then an optional type specification (A -- multiple tokens) and an > optional initializer (B -- multiple tokens). Both parts are optional, but > you *must* have at least one and the declarations *must* be in the order of > type then initializer if both are present. The only way I've found to do it > is (A | B | A B), but this is painful when A and B are more than one token > in length and I've got about 20 of these things in the grammar. This is > just begging for typos. this example REALLY FAILS for me. It is hard for me to envision a language the can initialize a variable (e.g. B) without any declaration of that variable (e.g. A). So having a bare naked B under the above example makes no sense to me. Maybe you meant something like: (A B? C?) where A is the var decl, B is its type and C is its initial value... Hope this helps.... -jbb From ttmrichter at gmail.com Sat Jan 9 23:17:50 2010 From: ttmrichter at gmail.com (Michael Richter) Date: Sun, 10 Jan 2010 15:17:50 +0800 Subject: [antlr-interest] Question about idiom. In-Reply-To: <1263091204.3473.20.camel@gecko.home.org> References: <53C75D1D-B191-42E5-BF37-7E5E50BA35D9@classdump.org> <1263091204.3473.20.camel@gecko.home.org> Message-ID: 2010/1/10 John B. Brodie > > Think of it this way: you're declaring a variable. You have a token for > the > > variable, then an optional type specification (A -- multiple tokens) and > an > > optional initializer (B -- multiple tokens). Both parts are optional, > but > > you *must* have at least one and the declarations *must* be in the order > of > > type then initializer if both are present. The only way I've found to do > it > > is (A | B | A B), but this is painful when A and B are more than one > token > > in length and I've got about 20 of these things in the grammar. This is > > just begging for typos. > > this example REALLY FAILS for me. It is hard for me to envision a > language the can initialize a variable (e.g. B) without any declaration > of that variable (e.g. A). So having a bare naked B under the above > example makes no sense to me. Maybe you meant something like: (A B? C?) > where A is the var decl, B is its type and C is its initial value... > That's what I said. A token for the variable THEN an optional type specification (A) and an optional initializer (B). Three elements in total with only two of them named. I'll look over your other possible solutions there. Having (A B? | B) looks good enough especially since there's no left-commonality with A and B in ... I think in any case, actually. From christian.schladetsch at gmail.com Sun Jan 10 01:41:09 2010 From: christian.schladetsch at gmail.com (Christian Schladetsch) Date: Sun, 10 Jan 2010 20:41:09 +1100 Subject: [antlr-interest] New Example Project: HLSL Parser + Tree walker using C# Target Message-ID: <6442c4ae1001100141v2f069f9fk67f2e2e33313eef4@mail.gmail.com> Hi All, I just spent a few hours tidying up an FX Parser and adding it to my *GoogleCode *depot. It uses ANTLR 3 to parse HLSL files to an AST, then a Tree Walker and StringTemplate to write out the HLSL again. The target language is C#. I think its all there now, including all dependencies and custom build rules for VS 2008. To try it, you will need to checkout my repository. For example: svn checkout http://schladetsch.googlecode.com/svn/trunk/ . start Effects\Tools\FXParser\FXParser.sln Regards, Christian. From ttmrichter at gmail.com Sun Jan 10 02:00:59 2010 From: ttmrichter at gmail.com (Michael Richter) Date: Sun, 10 Jan 2010 18:00:59 +0800 Subject: [antlr-interest] What is going on here? Message-ID: Here's a snippet from a grammar I'm working on that's just failing in the most bizarre ways. grammar junk; tokens { INTERFACE = 'INTERFACE'; IDENT = 'IDENT'; END = 'END'; } interface : INTERFACE IDENT import* declaration* END '.' ; import : ; When I run antlr on it I get the following output: error(100): junk.g:10:9: syntax error: antlr: junk.g:10:9: unexpected token: INTERFACE error(100): junk.g:10:19: syntax error: antlr: junk.g:10:19: unexpected token: IDENT error(100): junk.g:10:50: syntax error: antlr: junk.g:10:50: unexpected token: '.' error(100): junk.g:13:1: syntax error: antlr: junk.g:13:1: unexpected token: import error(150): grammar file junk.g has no rules error(100): junk.g:0:0: syntax error: assign.types: :0:0: unexpected end of subtree error(100): junk.g:0:0: syntax error: define: :0:0: unexpected end of subtree error(10): internal error: junk.g : java.lang.NullPointerException org.antlr.grammar.v2.DefineGrammarItemsWalker.trimGrammar(DefineGrammarItemsWalker.java:94) org.antlr.grammar.v2.DefineGrammarItemsWalker.finish(DefineGrammarItemsWalker.java:77) org.antlr.grammar.v2.DefineGrammarItemsWalker.grammar(DefineGrammarItemsWalker.java:206) org.antlr.tool.Grammar.defineGrammarSymbols(Grammar.java:702) org.antlr.tool.CompositeGrammar.defineGrammarSymbols(CompositeGrammar.java:351) org.antlr.Tool.process(Tool.java:451) org.antlr.Tool.main(Tool.java:91) I have tried renaming the INTERFACE token, the IDENT token, the interface production, etc. in various combinations and none of it works. What incredibly obvious thing am I overlooking? From andy at andymcm.com Sun Jan 10 03:21:46 2010 From: andy at andymcm.com (Andy McMullan) Date: Sun, 10 Jan 2010 11:21:46 +0000 Subject: [antlr-interest] What is going on here? In-Reply-To: References: Message-ID: Did you try renaming 'import'? From stevenraemaekers at gmail.com Sun Jan 10 03:34:55 2010 From: stevenraemaekers at gmail.com (Steven Raemaekers) Date: Sun, 10 Jan 2010 12:34:55 +0100 Subject: [antlr-interest] ANTLR compile problem Message-ID: <46450b021001100334t738a7304ma9c8d2b4cf611ddd@mail.gmail.com> Hello, A project i'm working on includes an ANTLR parser, it worked fine a couple of days ago but now I get the following error message: Exception in thread "main" java.lang.NoSuchMethodException: stevenr.yali.antlr.LogoParser.(org.antlr.runtime.TokenStream, org.antlr.runtime.debug.DebugEventListener) at java.lang.Class.getConstructor0(Class.java:2706) at java.lang.Class.getDeclaredConstructor(Class.java:1985) at org.deved.antlride.runtime.LaunchParser.launch(LaunchParser.java:118) at org.deved.antlride.runtime.LaunchParser.main(LaunchParser.java:228) It seems like ANTLR runtime is trying to call a constructor that is not there, namely a constructor with a tokenstream and a debugeventlistener as arguments. Why would ANTLR want to do this? Previously this problem never occurred. Did it accidentaly go in some kind of "debug" mode that it is trying to attach an event listener? Why doesn't it just call the instructor it created itself (without a debugeventlistener)? Maybe there is some kind of debug option i turned on somewhere? Eclipse says that the java output file that ANTLR generates does not contain any errors. Can somebody please help me? Thanks. -- Regards, Steven Raemaekers From r66092 at freescale.com Sun Jan 10 19:11:28 2010 From: r66092 at freescale.com (Chen Hongjun-R66092) Date: Mon, 11 Jan 2010 11:11:28 +0800 Subject: [antlr-interest] An error occurs in template example Message-ID: <3A45394FD742FA419B760BB8D398F9ED011E1A07@zch01exm26.fsl.freescale.net> Hi, I am new to ANTLR, and am reading the book The Definitive ANTLR Reference. When I tried the template example 'template/generator/2pass' without any modification, and met an error as below: Exception in thread "main" java.util.NoSuchElementException: no such attribute: init in template context [jasminFile] at org.antlr.stringtemplate.StringTemplate.checkNullAttributeAgainstFormalA rguments(StringTemplate.java:1311) at org.antlr.stringtemplate.StringTemplate.getAttribute(StringTemplate.java :684) at org.antlr.stringtemplate.language.ActionEvaluator.attribute(ActionEvalua tor.java:360) at org.antlr.stringtemplate.language.ActionEvaluator.expr(ActionEvaluator.j ava:136) at org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluator .java:84) at org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149) at org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705) at org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java:167 0) at org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java:166 1) at Test.main(Test.java:45) I appreciate your any suggestions or ideas! Thanks, Hongjun From aurelien.larive at 4dconcept.fr Mon Jan 11 03:51:57 2010 From: aurelien.larive at 4dconcept.fr (=?ISO-8859-1?Q?Aur=E9lien_LARIVE?=) Date: Mon, 11 Jan 2010 12:51:57 +0100 Subject: [antlr-interest] Problems writing a searchbar language Message-ID: <4B4B10DD.6050200@4dconcept.fr> Hi, I'm currently writing a small grammar to parse a searchbar language and I'm failing at making whitespaces behave like the AND keyword. Here is my grammar : grammar SearchBar; options { output=AST; } WS : ( ' ' | '\t' ) { skip(); } ; AND : 'AND' ; OR : 'OR' ; NOT : 'NOT' ; LEFT_PAREN : '(' ; RIGHT_PAREN : ')' ; TERM : ~(' '|'\t'|'"'|RIGHT_PAREN|LEFT_PAREN|NOT|OR|AND)* ; QUOTEDTERM : '"' ~('"')* '"' ; orexpression : andexpression ( OR^ andexpression )* ; andexpression : notexpression ( (AND^)? notexpression )* ; notexpression : (NOT^)? searchterm ; searchterm : TERM | QUOTEDTERM | LEFT_PAREN! orexpression RIGHT_PAREN! ; And here is my tree grammar : tree grammar SearchBarEval; options { ASTLabelType=CommonTree; tokenVocab=SearchBar; } prog : expr+ ; expr returns [XMSExpression expression] : ^(OR a=expr b=expr) { $expression = new Or($a.expression, $b.expression); } | ^(AND a=expr b=expr) { $expression = new And($a.expression, $b.expression); } | ^(NOT a=expr) { $expression = new Not($a.expression); } | TERM { $expression = new Term($TERM.text); } | QUOTEDTERM { $expression = new QuotedTerm($QUOTEDTERM.text); } ; When I try to evaluate, for example, the input 'apples bananas tomatos', I only get the Term 'apples'. I understand why I'm having this problem but I was unable to find a good solution. Thanks in advance, -- Aur?lien From espina.edgar at gmail.com Mon Jan 11 04:04:16 2010 From: espina.edgar at gmail.com (Edgar Espina) Date: Mon, 11 Jan 2010 09:04:16 -0300 Subject: [antlr-interest] Problems writing a searchbar language In-Reply-To: <4B4B10DD.6050200@4dconcept.fr> References: <4B4B10DD.6050200@4dconcept.fr> Message-ID: <92b42db61001110404j44d65070yd7f053f457590ea7@mail.gmail.com> Hi, try this: WS : ( ' ' | '\t' ) { $channel=HIDDEN; } ; Regards On Mon, Jan 11, 2010 at 8:51 AM, Aur?lien LARIVE < aurelien.larive at 4dconcept.fr> wrote: > Hi, > > I'm currently writing a small grammar to parse a searchbar language and > I'm failing at making whitespaces behave like the AND keyword. > > Here is my grammar : > > grammar SearchBar; > > options { > output=AST; > } > > WS : ( ' ' | '\t' ) { skip(); } ; > AND : 'AND' ; > OR : 'OR' ; > NOT : 'NOT' ; > LEFT_PAREN : '(' ; > RIGHT_PAREN : ')' ; > TERM : ~(' '|'\t'|'"'|RIGHT_PAREN|LEFT_PAREN|NOT|OR|AND)* ; > QUOTEDTERM : '"' ~('"')* '"' ; > > orexpression > : andexpression ( OR^ andexpression )* > ; > > andexpression > : notexpression ( (AND^)? notexpression )* > ; > > notexpression > : (NOT^)? searchterm > ; > > searchterm > : TERM > | QUOTEDTERM > | LEFT_PAREN! orexpression RIGHT_PAREN! > ; > > And here is my tree grammar : > > tree grammar SearchBarEval; > > options { > ASTLabelType=CommonTree; > tokenVocab=SearchBar; > } > > prog > : expr+ ; > > expr returns [XMSExpression expression] > : ^(OR a=expr b=expr) { > $expression = new Or($a.expression, $b.expression); > } > | ^(AND a=expr b=expr) { > $expression = new And($a.expression, $b.expression); > } > | ^(NOT a=expr) { > $expression = new Not($a.expression); > } > | TERM { > $expression = new Term($TERM.text); > } > | QUOTEDTERM { > $expression = new QuotedTerm($QUOTEDTERM.text); > } > ; > > When I try to evaluate, for example, the input 'apples bananas tomatos', > I only get the Term 'apples'. I understand why I'm having this problem > but I was unable to find a good solution. > > Thanks in advance, > > -- > Aur?lien > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > -- edgar From aurelien.larive at 4dconcept.fr Mon Jan 11 05:20:36 2010 From: aurelien.larive at 4dconcept.fr (=?ISO-8859-1?Q?Aur=E9lien_LARIVE?=) Date: Mon, 11 Jan 2010 14:20:36 +0100 Subject: [antlr-interest] Problems writing a searchbar language In-Reply-To: <92b42db61001110404j44d65070yd7f053f457590ea7@mail.gmail.com> References: <4B4B10DD.6050200@4dconcept.fr> <92b42db61001110404j44d65070yd7f053f457590ea7@mail.gmail.com> Message-ID: <4B4B25A4.5020600@4dconcept.fr> That does not seem to change anything. Did I miss something ? Edgar Espina a ?crit : > Hi, > > try this: > > WS : ( ' ' | '\t' ) { $channel=HIDDEN; } ; > > Regards > > On Mon, Jan 11, 2010 at 8:51 AM, Aur?lien LARIVE > > > wrote: > > Hi, > > I'm currently writing a small grammar to parse a searchbar > language and > I'm failing at making whitespaces behave like the AND keyword. > > Here is my grammar : > > grammar SearchBar; > > options { > output=AST; > } > > WS : ( ' ' | '\t' ) { skip(); } ; > AND : 'AND' ; > OR : 'OR' ; > NOT : 'NOT' ; > LEFT_PAREN : '(' ; > RIGHT_PAREN : ')' ; > TERM : ~(' '|'\t'|'"'|RIGHT_PAREN|LEFT_PAREN|NOT|OR|AND)* ; > QUOTEDTERM : '"' ~('"')* '"' ; > > orexpression > : andexpression ( OR^ andexpression )* > ; > > andexpression > : notexpression ( (AND^)? notexpression )* > ; > > notexpression > : (NOT^)? searchterm > ; > > searchterm > : TERM > | QUOTEDTERM > | LEFT_PAREN! orexpression RIGHT_PAREN! > ; > > And here is my tree grammar : > > tree grammar SearchBarEval; > > options { > ASTLabelType=CommonTree; > tokenVocab=SearchBar; > } > > prog > : expr+ ; > > expr returns [XMSExpression expression] > : ^(OR a=expr b=expr) { > $expression = new Or($a.expression, $b.expression); > } > | ^(AND a=expr b=expr) { > $expression = new And($a.expression, $b.expression); > } > | ^(NOT a=expr) { > $expression = new Not($a.expression); > } > | TERM { > $expression = new Term($TERM.text); > } > | QUOTEDTERM { > $expression = new QuotedTerm($QUOTEDTERM.text); > } > ; > > When I try to evaluate, for example, the input 'apples bananas > tomatos', > I only get the Term 'apples'. I understand why I'm having this problem > but I was unable to find a good solution. > > Thanks in advance, > > -- > Aur?lien > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > > > > -- > edgar From aurelien.larive at 4dconcept.fr Mon Jan 11 07:46:39 2010 From: aurelien.larive at 4dconcept.fr (=?ISO-8859-1?Q?Aur=E9lien_LARIVE?=) Date: Mon, 11 Jan 2010 16:46:39 +0100 Subject: [antlr-interest] Operators and rewrite rules equivalence Message-ID: <4B4B47DF.1070400@4dconcept.fr> Hi, I successfully buit an AST using the operators notation (^) but I need to customize a bit my AST construction. Could someone tell me what's the rewrite rules version of the following rule ? andexpression : notexpression ( AND^ notexpression )* ; I found a similar example at http://www.antlr.org/wiki/display/ANTLR3/Tree+construction but I failed to apply this to my problem. Thanks in advance, -- Aur?lien From aurelien.larive at 4dconcept.fr Mon Jan 11 08:31:20 2010 From: aurelien.larive at 4dconcept.fr (=?UTF-8?B?QXVyw6lsaWVuIExBUklWRQ==?=) Date: Mon, 11 Jan 2010 17:31:20 +0100 Subject: [antlr-interest] Problems writing a searchbar language In-Reply-To: <4B4B10DD.6050200@4dconcept.fr> References: <4B4B10DD.6050200@4dconcept.fr> Message-ID: <4B4B5258.8060506@4dconcept.fr> Below is the e-mail John B. Brodie sent to me, which solved my problem. John B. Brodie wrote : Greetings! (I tried to send this to the mail-list, but the list seems to be rejecting my e-mail at the moment.... sigh) When you have an implicit AND (e.g. whitespace), your andexpression sub-tree will not have any root. It will be just a list of notexpression sub-trees, which your tree walker is not prepared to handle. More below..... On Mon, 2010-01-11 at 12:51 +0100, Aur?lien LARIVE wrote: > Hi, > > I'm currently writing a small grammar to parse a searchbar language and > I'm failing at making whitespaces behave like the AND keyword. > > Here is my grammar : > > grammar SearchBar; > > options { > output=AST; > } > > WS : ( ' ' | '\t' ) { skip(); } ; > AND : 'AND' ; > OR : 'OR' ; > NOT : 'NOT' ; > LEFT_PAREN : '(' ; > RIGHT_PAREN : ')' ; > TERM : ~(' '|'\t'|'"'|RIGHT_PAREN|LEFT_PAREN|NOT|OR|AND)* ; > QUOTEDTERM : '"' ~('"')* '"' ; > > orexpression > : andexpression ( OR^ andexpression )* > ; > > andexpression > : notexpression ( (AND^)? notexpression )* > ; when the AND is absent e.g. an implied AND via whitespace there will be no root. so (I THINK) you will just end up with a simple list of notexpression sub-trees. suggest these parsing rules instead (tested!): andexpression : notexpression ( and_operator^ notexpression )* ; and_operator : AND | (/*empty*/->AND["implicit_AND"]) ; NOTE!!! The token spawned for "implicit_AND" above may not contain meaningful location information (e.g. line number, column, ...whatever). If that information is important to your application (usually for error messages), you may need to dig into the details of the "X[...]" ANTLR meta-notation for token insertion.... > > notexpression > : (NOT^)? searchterm > ; > > searchterm > : TERM > | QUOTEDTERM > | LEFT_PAREN! orexpression RIGHT_PAREN! > ; > > And here is my tree grammar : > > tree grammar SearchBarEval; > > options { > ASTLabelType=CommonTree; > tokenVocab=SearchBar; > } > > prog > : expr+ ; > > expr returns [XMSExpression expression] > : ^(OR a=expr b=expr) { > $expression = new Or($a.expression, $b.expression); > } > | ^(AND a=expr b=expr) { > $expression = new And($a.expression, $b.expression); > } > | ^(NOT a=expr) { > $expression = new Not($a.expression); > } > | TERM { > $expression = new Term($TERM.text); > } > | QUOTEDTERM { > $expression = new QuotedTerm($QUOTEDTERM.text); > } if you would rather not apply the above suggested parser changes, you might be able to alter the tree grammar as follows (UNTESTED!): add an alternative to the expr rule (i think it has to be at the end, not sure...): | implicit_and > ; > and then add an implicit_and rule (UNTESTED!): implicit_and returns [XMSExpression expression] : a=expr {$expression = $a.expression;} ( b=implicit_and { $expression = new And($a.expression, $b.expression); } )? ; > When I try to evaluate, for example, the input 'apples bananas tomatos', > I only get the Term 'apples'. I understand why I'm having this problem > but I was unable to find a good solution. > > Thanks in advance, Hope this helps.... -jbb From aurelien.larive at 4dconcept.fr Mon Jan 11 08:37:30 2010 From: aurelien.larive at 4dconcept.fr (=?UTF-8?B?QXVyw6lsaWVuIExBUklWRQ==?=) Date: Mon, 11 Jan 2010 17:37:30 +0100 Subject: [antlr-interest] Operators and rewrite rules equivalence In-Reply-To: <4B4B47DF.1070400@4dconcept.fr> References: <4B4B47DF.1070400@4dconcept.fr> Message-ID: <4B4B53CA.7000001@4dconcept.fr> Below is the message John B. Brodie sent to me : (again tried to send a copy of this to the list, but failed) On Mon, 2010-01-11 at 16:46 +0100, Aur?lien LARIVE wrote: > Hi, > > I successfully buit an AST using the operators notation (^) but I need > to customize a bit my AST construction. Could someone tell me what's the > rewrite rules version of the following rule ? > > andexpression > : notexpression ( AND^ notexpression )* > ; > > I found a similar example at > http://www.antlr.org/wiki/display/ANTLR3/Tree+construction but I failed > to apply this to my problem. > off the top of my head: andexpression : l=notexpression ( AND r=andexpression -> ^(AND $l $r) )? ; but i have a vague memory that the associativity of these two are different. would need to look into that if associativity matters in your application. From jimi at temporal-wave.com Mon Jan 11 09:36:34 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Mon, 11 Jan 2010 09:36:34 -0800 Subject: [antlr-interest] Problems writing a searchbar language In-Reply-To: <4B4B10DD.6050200@4dconcept.fr> Message-ID: <203633f967209c439c19322dff181ddc@temporal-wave.com> You need to rewrite the absence of AND as the AND keyword for a start as your SPACE becomes the binary operator AND, and so should not just be ignored. andexpression : notexpression ( andWord^ notexpression )* ; andWord : a=AND -> $a | -> AND ; Then you probably want a root node and a rule that consumes to EOF: search: orexpression EOF -> ^(QUERY orexpression) ; And tree: prog : ^(QUERY expr) ; > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Aur?lien LARIVE > Sent: Monday, January 11, 2010 3:52 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Problems writing a searchbar language > > Hi, > > I'm currently writing a small grammar to parse a searchbar language and > I'm failing at making whitespaces behave like the AND keyword. > > Here is my grammar : > > grammar SearchBar; > > options { > output=AST; > } > > WS : ( ' ' | '\t' ) { skip(); } ; > AND : 'AND' ; > OR : 'OR' ; > NOT : 'NOT' ; > LEFT_PAREN : '(' ; > RIGHT_PAREN : ')' ; > TERM : ~(' '|'\t'|'"'|RIGHT_PAREN|LEFT_PAREN|NOT|OR|AND)* ; > QUOTEDTERM : '"' ~('"')* '"' ; > > orexpression > : andexpression ( OR^ andexpression )* > ; > > andexpression > : notexpression ( (AND^)? notexpression )* > ; > > notexpression > : (NOT^)? searchterm > ; > > searchterm > : TERM > | QUOTEDTERM > | LEFT_PAREN! orexpression RIGHT_PAREN! > ; > > And here is my tree grammar : > > tree grammar SearchBarEval; > > options { > ASTLabelType=CommonTree; > tokenVocab=SearchBar; > } > > prog > : expr+ ; > > expr returns [XMSExpression expression] > : ^(OR a=expr b=expr) { > $expression = new Or($a.expression, $b.expression); > } > | ^(AND a=expr b=expr) { > $expression = new And($a.expression, $b.expression); > } > | ^(NOT a=expr) { > $expression = new Not($a.expression); > } > | TERM { > $expression = new Term($TERM.text); > } > | QUOTEDTERM { > $expression = new QuotedTerm($QUOTEDTERM.text); > } > ; > > When I try to evaluate, for example, the input 'apples bananas > tomatos', > I only get the Term 'apples'. I understand why I'm having this problem > but I was unable to find a good solution. > > Thanks in advance, > > -- > Aur?lien > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From felix_do at web.de Mon Jan 11 11:57:46 2010 From: felix_do at web.de (Felix Dorner) Date: Mon, 11 Jan 2010 20:57:46 +0100 Subject: [antlr-interest] =?windows-1252?q?R=E9f=2E_=3A__Re=3A__Maven_prob?= =?windows-1252?q?lems_with_ANTLR_3=2E2?= In-Reply-To: <41dcd1c202dbd44b842b489d1a12d052@temporal-wave.com> References: <41dcd1c202dbd44b842b489d1a12d052@temporal-wave.com> Message-ID: <4B4B82BA.3010102@web.de> Jim Idle wrote: > > Cool ? thanks for that Lo?c ? I will update the build with this once I > have read the article. > > Jim > Hi I cloned antlr from github yesterday and run into the same issue Adding Lo?c's tags seems to help indeed. btw, will antlr's git repository on github persist in the future or is this just an experiment? Cheers From parrt at cs.usfca.edu Mon Jan 11 12:18:54 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Mon, 11 Jan 2010 12:18:54 -0800 Subject: [antlr-interest] =?windows-1252?q?R=E9f=2E_=3A__Re=3A__Maven_prob?= =?windows-1252?q?lems_with_ANTLR_3=2E2?= In-Reply-To: <4B4B82BA.3010102@web.de> References: <41dcd1c202dbd44b842b489d1a12d052@temporal-wave.com> <4B4B82BA.3010102@web.de> Message-ID: <407CF246-3671-4A6E-AF58-F50FE7FCAF71@cs.usfca.edu> On Jan 11, 2010, at 11:57 AM, Felix Dorner wrote: > btw, will antlr's git repository on github persist in the future or is > this just an experiment? it should persist as it's pulled automagically. T From felix_do at web.de Mon Jan 11 11:57:18 2010 From: felix_do at web.de (Felix Dorner) Date: Mon, 11 Jan 2010 20:57:18 +0100 Subject: [antlr-interest] =?windows-1252?q?R=E9f=2E_=3A__Re=3A__Maven_prob?= =?windows-1252?q?lems_with_ANTLR_3=2E2?= In-Reply-To: <41dcd1c202dbd44b842b489d1a12d052@temporal-wave.com> References: <41dcd1c202dbd44b842b489d1a12d052@temporal-wave.com> Message-ID: <4B4B829E.40200@web.de> Jim Idle wrote: > > Cool ? thanks for that Lo?c ? I will update the build with this once I > have read the article. > > Jim > Hi I cloned antlr from github yesterday and run into the same issue Adding Lo?c's tags seems to help indeed. btw, will antlr's git repository on github persist in the future or is this just an experiment? Cheers From zep_mailinglist at bahj.com Mon Jan 11 16:46:13 2010 From: zep_mailinglist at bahj.com (Zachary Palmer) Date: Mon, 11 Jan 2010 19:46:13 -0500 Subject: [antlr-interest] ANTLR Errors on Line Zero In-Reply-To: <4B4BC4CC.6030001@bahj.com> References: <4B4BC4CC.6030001@bahj.com> Message-ID: <4B4BC655.1010803@bahj.com> Hello, all. I have what I expected to be a fairly common problem but couldn't find a FAQ or Google result that addressed it. Most of the errors coming out of my grammar appear to be for line number zero. For example: [antlr:antlr3] error(117): .../compiler/grammar/Bsj.g:0:0: missing attribute access on rule scope: primary The "..." was my own edit to eliminate a very long path. Can anyone recommend how I can get line numbers for these errors? They become very difficult to track down after a while. I have gotten some errors with line numbers from various positions in my file; I'm unable to discern a pattern. Any suggestions? Thanks much! Cheers, Zachary Palmer From egrimm at dds.nl Tue Jan 12 08:06:24 2010 From: egrimm at dds.nl (Olaf Keijsers) Date: Tue, 12 Jan 2010 17:06:24 +0100 Subject: [antlr-interest] Using own ASTLabelType and quantification Message-ID: Greetings, I am trying to make a treewalker for my grammar in order to check if it contains nondeterminism. I would like to be able to set some properties for every node I encounter, so I figured it would be a good idea to use my own ASTLabelType. I have set "ASTLabelType=GrooveTree" in my options, and my grammar uses this labeltype now, but I get the following exception when trying to use the checker: java.lang.ClassCastException: org.antlr.runtime.tree.CommonTree cannot be cast to groove.control.parse.GrooveTree at groove.control.parse.GCLDeterminismChecker.program(GCLDeterminismChecker.java:139) This line contains: root_0 = (GrooveTree)adaptor.nil(); and is part of the program() method. Somehow I think this is a beginner's error, but I cannot find the solution. I have tried to work around it by using the default ASTLabelType and keeping a Map to keep track of the property I would like, but this seems cumbersome. Could anyone point me in a good direction? Thanks! Olaf Keijsers From jbb at acm.org Tue Jan 12 10:09:41 2010 From: jbb at acm.org (John B. Brodie) Date: Tue, 12 Jan 2010 13:09:41 -0500 Subject: [antlr-interest] Using own ASTLabelType and quantification In-Reply-To: References: Message-ID: <1263319781.769.27.camel@gecko.home.org> Greetings! On Tue, 2010-01-12 at 17:06 +0100, Olaf Keijsers wrote: > Greetings, > > I am trying to make a treewalker for my grammar in order to check if it > contains nondeterminism. I would like to be able to set some properties for > every node I encounter, so I figured it would be a good idea to use my own > ASTLabelType. > > I have set "ASTLabelType=GrooveTree" in my options, and my grammar uses this > labeltype now, but I get the following exception when trying to use the > checker: > java.lang.ClassCastException: org.antlr.runtime.tree.CommonTree cannot be > cast to groove.control.parse.GrooveTree > at > groove.control.parse.GCLDeterminismChecker.program(GCLDeterminismChecker.java:139) > > This line contains: > root_0 = (GrooveTree)adaptor.nil(); > > and is part of the program() method. Somehow I think this is a beginner's > error, but I cannot find the solution. I have tried to work around it by > using the default ASTLabelType and keeping a Map to keep > track of the property I would like, but this seems cumbersome. Could anyone > point me in a good direction? You need to setup a tree adaptor so that the runtime knows how to construct your nodes. These are the things I had to do in order to get my own ASTLabelType, note that my AST is called ExprAST -- so replace all occurrances of that string below with yours. also note that I did this over a year ago using an earlier version of ANTLR v3, so altho this still works, just re-ran my tests, today's version of ANTLR may make some of my steps simpler and/or entirely un-necessary... YMMV 1) in the grammar add the ASTLabelType= option (as you have already done) 2) create your new tree node class, ensuring that it extends CommonTree. Here is my ExprAST (note that Type is also one of my classes): //----begin ExprAST here.... import org.antlr.runtime.Token; import org.antlr.runtime.tree.*; public class ExprAST extends CommonTree { public Type type; public ExprAST() { super(); type = null; } public ExprAST(Token tok) { super(tok); type = null; } public ExprAST(ExprAST tree) { super(tree); this.type = tree.type; } public ExprAST(Token tok, Type type) { super(tok); this.type = type; } @Override public Tree dupNode() { return new ExprAST(this); } @Override public String toString() { final String result; if (type==null) { result = super.toString(); } else { result = String.format("%s[%s]", super.toString(),type.nickName()); } return result; } } //----end ExprAST 3) copy org.antlr.runtime.tree.CommonErrorNode from the ANTLR run-time sources. I called mine ExprASTErrorNode. Edit your copy so that is extends your new tree node class rather than CommonTree. 4) create an instance of the adaptor class, i do this in my main: //---begin adaptor code here... // Custom adaptor to create ExprAST node type private static final TreeAdaptor adaptor = new CommonTreeAdaptor() { @Override public Object create(Token payload) { return new ExprAST(payload); } @Override public Object dupNode(Object old) { return (old==null)? null : ((ExprAST)old).dupNode(); } @Override public Object errorNode(TokenStream input, Token start, Token stop, RecognitionException e) { return new ExprASTErrorNode(input, start, stop, e); } }; //----end adaptor code. 5) call the parser's setAdaptor method with the above adaptor. I invoke my parser with something similar to this: //----begin parser invocation code here... ExprLexer lexer = new ExprLexer(...whatever....); CommonTokenStream tokens = new CommonTokenStream(lexer); ExprParser parser = new ExprParser(tokens); parser.setTreeAdaptor(adaptor); ExprParser.program_return p_result = parser.program(); ast = p_result.tree; //----end parser invocation code. > > Thanks! Hope this helps... -jbb From antonio.petrelli at gmail.com Tue Jan 12 12:42:10 2010 From: antonio.petrelli at gmail.com (Antonio Petrelli) Date: Tue, 12 Jan 2010 21:42:10 +0100 Subject: [antlr-interest] antlr3-maven-plugin (v3.2): "error(7): cannot find or open file: null/MyGrammar.g" In-Reply-To: References: Message-ID: Just right now I noticed that I did not answer to the mailing list but directly to the poster, sorry :-) 2010/1/10 Michael Guyver : > I had formerly been using the codehaus 1.0 release and been setting > the output directory to > > target/generated-sources/antlr/my/full/package/path/ so that the > > generated files arrived in the right place. Happily the new plugin > does this for you so simply moving the grammar to > > src/main/antlr3/my/full/package/path/MyGrammar.g > > solved the problem and meant I didn't have to specify the output > directory either \:D/ It does not work wit Java.g, the package is still the default! http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g This is definitely a double bug I think. I would file a bug myself, but I can for a strange policy (never seen anywhere else!) that Antlr team have about bugs. Thanks anyway Antonio P.S. Luckily I noticed that I don't need Antlr anymore, thanks to the Compiler Tree API of JDK 6, so, well, who cares :-D From wwilbur3 at yahoo.com Tue Jan 12 14:58:13 2010 From: wwilbur3 at yahoo.com (Warren Wilbur) Date: Tue, 12 Jan 2010 14:58:13 -0800 (PST) Subject: [antlr-interest] Issue with antlrworks 1.3.1 and JDK 1.6 update 17? In-Reply-To: Message-ID: <212800.78090.qm@web65606.mail.ac4.yahoo.com> Here are a few debugging ideas as I've seen each of these issues before... 1. Try increasing the heap memory for Java on the command line. e.g. to increase to 1GB use: java -Xmx1024M -jar antlrworks-1.3.1.jar 2. Check if you are really using the Sun Java JRE/JDK on Ubuntu Linux (this will give you the right idea: http://www.cyberciti.biz/faq/howto-ubuntu-linux-install-configure-jdk-jre) . If multiple alternatives are installed you might not be... Using another JRE/JDK could be the cause of your problems. 3. Run antlrworks by command line from a terminal. If you have any 'out of memory' errors you will see console messages in the Ubuntu terminal you executed it from. Date: Wed, 6 Jan 2010 19:33:23 +0800 From: Michael Richter Subject: [antlr-interest] Issue with antlrworks 1.3.1 and JDK 1.6 ??? update 17? To: antlr-interest at antlr.org Message-ID: ??? Content-Type: text/plain; charset=UTF-8 I did a recent round of upgrading software on my machines (real and virtual) and somewhere in the process I've got ANTLRworks in unusable shape.? (I tried reporting this through the antlr.org web site but it doesn't seem to have taken.) On *every* machine I have access to (both real and virtual, running Windows XP or Linux) I get the following pretty nasty behaviour: ???1. *java -jar antlrworks.jar* (I can also use javaw on Windows for a ???similar, more annoying effect.) ???2. *The splash screen pops up briefly.* ???3. *The "New Document" dialogue replaces it.* ???4. I hit "Cancel" (or alternatively press "Esc" on the keyboard). At this point, no matter the platform, no matter what I try, I have a dead executable until I hit Ctrl+C (or, if I used javaw, I kill it in the task manager).? I've tried this on Ubuntu 9.04, on Slackware 13.0 (virtualized), on Windows XP (four different machines, one virtualized) and get this behaviour consistently.? Whatever's supposed to happen when I cancel the new document dialogue freezes and can only unfreeze through lethal injection of Ctrl+C.? (There are, of course, no messages on the console that could tell me what's going on.) The behaviour on Windows after this if I choose "OK" is acceptable.? Up comes the wizard for a new project which works normally and, more importantly, can be cancelled and gets me into the ANTLRworks GUI.? It's a bit obnoxious having to go that route, but it works.? If I choose to use the wizard everything works as expected. The behaviour on Linux is less acceptable.? The new project wizard pops up but the text input focus is on ANTLRworks' editor window and CANNOT be put into the wizard at all on any spot.? I have to cancel the wizard to get to the main window (which then works as expected).? This also happens if I go File -> New from the main window: I simply cannot get text input into any field of the new project wizard. The last time I did anything with ANTLRworks was v1.3.0 using JDK 1.6 update 16.? I did not see this behaviour then at all, so something has happened between then and now. Any advice for debugging this further? From jimi at temporal-wave.com Tue Jan 12 15:46:35 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Tue, 12 Jan 2010 15:46:35 -0800 Subject: [antlr-interest] Issue with antlrworks 1.3.1 and JDK 1.6 update 17? In-Reply-To: <212800.78090.qm@web65606.mail.ac4.yahoo.com> Message-ID: <0482aaf0e226ef43abff16e2f78c9db2@temporal-wave.com> I have never had much success with the OpenJDK/JRE it is better to use Sun's JDK (installed from their Web Site). Ubuntu was nothing but trouble for me too but it was 64 Bit Ubuntu. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Warren Wilbur > Sent: Tuesday, January 12, 2010 2:58 PM > To: antlr-interest at antlr.org > Subject: Re: [antlr-interest] Issue with antlrworks 1.3.1 and JDK 1.6 > update 17? > > Here are a few debugging ideas as I've seen each of these issues > before... > > 1. Try increasing the heap memory for Java on the command line. e.g. to > increase to 1GB use: java -Xmx1024M -jar antlrworks-1.3.1.jar > > 2. Check if you are really using the Sun Java JRE/JDK on Ubuntu Linux > (this will give you the right idea: http://www.cyberciti.biz/faq/howto- > ubuntu-linux-install-configure-jdk-jre) . If multiple alternatives are > installed you might not be... Using another JRE/JDK could be the cause > of your problems. > > manager).? I've tried this on Ubuntu 9.04, on Slackware 13.0 > (virtualized), From nikmd23 at gmail.com Tue Jan 12 17:52:48 2010 From: nikmd23 at gmail.com (Nik Molnar) Date: Tue, 12 Jan 2010 20:52:48 -0500 Subject: [antlr-interest] Noob Question Message-ID: Hello all, I am rather new to ANTLR and seem to be running into a small issue I can't figure out. I'm writing a very simple grammar based on many tutorials online, the calculator. This grammar generates C# code that compiles perfectly, and works for the most part in ANTLRWorks Interpreter, Debugger and in a sample app I made in .NET to call the generated Parser/Lexer. The problem I run into is what I put in invalid syntax, expecting an error. Output like so: Valid Syntax: "3+3" => Works in interpreter, debugger and compiled .net code. Invalid Syntax: "3+/3" => Gives error in interpreter, debugger and compiled .net code, as expected. Invalid Syntax: "3_3" => The interpreter shows nothing, the debugger cannot connect and the .net code hangs for a while then throws an out of memory exception. I'm sure I'm doing something wrong in my grammar but don't know what. I've included it below. Please help me! Thanks, grammar Test; /*options { language = 'CSharp2'; }*/ expression : amExpression; amExpression :mdExpression ((PLUS|DASH) mdExpression)* ; mdExpression :INT ((STAR|SLASH) INT)* ; DASH :'-' ; SLASH :'/' ; WS : (' ' | '\t' | '\n' | '\r')* { $channel = HIDDEN; } ; STAR : '*' ; PLUS : '+' ; fragment DIGIT : '0'..'9' ; INT : (DIGIT)+ ; From jbb at acm.org Tue Jan 12 18:21:03 2010 From: jbb at acm.org (John B. Brodie) Date: Tue, 12 Jan 2010 21:21:03 -0500 Subject: [antlr-interest] Noob Question In-Reply-To: References: Message-ID: <1263349263.8618.17.camel@gecko.home.org> Greetings! Your WS lexer rule can recognize the empty string, this is VERY bad. Because WS can recognize the empty string your lexer will enter an infinite loop when encountering a character it can not deal with - like the '_' in your example - you have no lexer rule that can handle a '_'. More below... On Tue, 2010-01-12 at 20:52 -0500, Nik Molnar wrote: > Hello all, > > I am rather new to ANTLR and seem to be running into a small issue I can't > figure out. > > I'm writing a very simple grammar based on many tutorials online, the > calculator. > > This grammar generates C# code that compiles perfectly, and works for the > most part in ANTLRWorks Interpreter, Debugger and in a sample app I made in > .NET to call the generated Parser/Lexer. > > The problem I run into is what I put in invalid syntax, expecting an error. > Output like so: > > Valid Syntax: "3+3" => Works in interpreter, debugger and compiled .net > code. > Invalid Syntax: "3+/3" => Gives error in interpreter, debugger and compiled > .net code, as expected. > Invalid Syntax: "3_3" => The interpreter shows nothing, the debugger cannot > connect and the .net code hangs for a while then throws an out of memory > exception. Your lexer will correctly identify the first '3' as an INT. Next your lexer will see the '_' which it is unable to deal with. BUT since your WS rule says that the empty string - the non-stuff between the first '3' and the '_' - is legal, your lexer accepts that empty string as a WS token and deposits it into the HIDDEN channel. Now the lexer is still looking at the '_' which it is unable to deal with. BUT since your WS rule says that the empty string - the non-stuff between the first '3' and the '_' - is legal, your lexer accepts that empty string as a WS token and deposits it into the HIDDEN channel. Now the lexer is still looking at the '_' which it is unable to deal with. BUT since your WS rule says that the empty string - the non-stuff between the first '3' and the '_' - is legal, your lexer accepts that empty string as a WS token and deposits it into the HIDDEN channel. Now the lexer is still looking at the '_' .... and so nothing good results. Your .NET app runs out of memory because the infinite sequence of empty WS tokens appended onto the HIDDEN channel just gobbles up all memory. The debugger can not connect because the connections happens after the lexer has finished tokenizing the input text. Your lexer never finishes so the debugger won't connect. I bet if you waited long enuf you would eventually run out of memory in this case too. Same drill for the interpreter.... > > I'm sure I'm doing something wrong in my grammar but don't know what. > > I've included it below. Please help me! > > Thanks, > > grammar Test; > > /*options > { > language = 'CSharp2'; > }*/ > > expression > : amExpression; > > amExpression > :mdExpression ((PLUS|DASH) mdExpression)* > ; > > mdExpression > :INT ((STAR|SLASH) INT)* > ; > > DASH > :'-' > ; > > SLASH > :'/' > ; > > WS > : (' ' > | '\t' > | '\n' > | '\r')* > { $channel = HIDDEN; } > ; the * above should really be a + be VERY careful with rules that can recognize the empty string, e.g. have just a * or ? operator. I have NEVER found an instance where a lexer rule that accepts nothing (the empty string) does anything that helps. On RARE occasions, a parser rule that accepts the empty string can be appropriate, but needs to be examined VERY closely. > > STAR > : '*' > ; > > PLUS > : '+' > ; > > fragment DIGIT > : '0'..'9' > ; > > INT > : (DIGIT)+ > ; Hope this helps... -jbb From ttmrichter at gmail.com Tue Jan 12 18:30:50 2010 From: ttmrichter at gmail.com (Michael Richter) Date: Wed, 13 Jan 2010 10:30:50 +0800 Subject: [antlr-interest] Issue with antlrworks 1.3.1 and JDK 1.6 update 17? In-Reply-To: <212800.78090.qm@web65606.mail.ac4.yahoo.com> References: <212800.78090.qm@web65606.mail.ac4.yahoo.com> Message-ID: Off the top of my head 2 and 3 are confirmed. I run my own copy of the Sun JDK in my user directory precisely because of the whole GNU "java" compiler fiasco. gjc is not on my path anywhere and java points to ~/software/jdk/bin/java. And I only ever actually use antlrworks from the command line (as an alias, to be fair) so it's the only way I can show you how it's running. ;) I'll test the heap memory thing now, though. ....And we're back. The behaviour is identical with 1GB of heap memory on both Windows and Linux. 2010/1/13 Warren Wilbur > Here are a few debugging ideas as I've seen each of these issues before... > > 1. Try increasing the heap memory for Java on the command line. e.g. to > increase to 1GB use: java -Xmx1024M -jar antlrworks-1.3.1.jar > > 2. Check if you are really using the Sun Java JRE/JDK on Ubuntu Linux (this > will give you the right idea: > http://www.cyberciti.biz/faq/howto-ubuntu-linux-install-configure-jdk-jre) > . If multiple alternatives are installed you might not be... Using another > JRE/JDK could be the cause of your problems. > > 3. Run antlrworks by command line from a terminal. If you have any 'out of > memory' errors you will see console messages in the Ubuntu terminal you > executed it from. > > Date: Wed, 6 Jan 2010 19:33:23 +0800 > From: Michael Richter > Subject: [antlr-interest] Issue with antlrworks 1.3.1 and JDK 1.6 > update 17? > To: antlr-interest at antlr.org > Message-ID: > > Content-Type: text/plain; charset=UTF-8 > > I did a recent round of upgrading software on my machines (real and > virtual) > and somewhere in the process I've got ANTLRworks in unusable shape. (I > tried reporting this through the antlr.org web site but it doesn't seem to > have taken.) > > On *every* machine I have access to (both real and virtual, running Windows > XP or Linux) I get the following pretty nasty behaviour: > > 1. *java -jar antlrworks.jar* (I can also use javaw on Windows for a > similar, more annoying effect.) > 2. *The splash screen pops up briefly.* > 3. *The "New Document" dialogue replaces it.* > 4. I hit "Cancel" (or alternatively press "Esc" on the keyboard). > > At this point, no matter the platform, no matter what I try, I have a dead > executable until I hit Ctrl+C (or, if I used javaw, I kill it in the task > manager). I've tried this on Ubuntu 9.04, on Slackware 13.0 (virtualized), > on Windows XP (four different machines, one virtualized) and get this > behaviour consistently. Whatever's supposed to happen when I cancel the > new > document dialogue freezes and can only unfreeze through lethal injection of > Ctrl+C. (There are, of course, no messages on the console that could tell > me what's going on.) > > The behaviour on Windows after this if I choose "OK" is acceptable. Up > comes the wizard for a new project which works normally and, more > importantly, can be cancelled and gets me into the ANTLRworks GUI. It's a > bit obnoxious having to go that route, but it works. If I choose to use > the > wizard everything works as expected. > > The behaviour on Linux is less acceptable. The new project wizard pops up > but the text input focus is on ANTLRworks' editor window and CANNOT be put > into the wizard at all on any spot. I have to cancel the wizard to get to > the main window (which then works as expected). This also happens if I go > File -> New from the main window: I simply cannot get text input into any > field of the new project wizard. > > The last time I did anything with ANTLRworks was v1.3.0 using JDK 1.6 > update > 16. I did not see this behaviour then at all, so something has happened > between then and now. > > Any advice for debugging this further? > > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From nikmd23 at gmail.com Tue Jan 12 18:32:50 2010 From: nikmd23 at gmail.com (Nik Molnar) Date: Tue, 12 Jan 2010 21:32:50 -0500 Subject: [antlr-interest] Noob Question In-Reply-To: <1263349263.8618.17.camel@gecko.home.org> References: <1263349263.8618.17.camel@gecko.home.org> Message-ID: JOHN! THANK YOU! You don't know how long I've been struggling with this - and now that you explain it, it makes perfect sense! I will heed your warning about * and ? - I see how they match empty strings now. Thanks, Nik On Tue, Jan 12, 2010 at 9:21 PM, John B. Brodie wrote: > Greetings! > > Your WS lexer rule can recognize the empty string, this is VERY bad. > > Because WS can recognize the empty string your lexer will enter an > infinite loop when encountering a character it can not deal with - like > the '_' in your example - you have no lexer rule that can handle a '_'. > > More below... > > On Tue, 2010-01-12 at 20:52 -0500, Nik Molnar wrote: > > Hello all, > > > > I am rather new to ANTLR and seem to be running into a small issue I > can't > > figure out. > > > > I'm writing a very simple grammar based on many tutorials online, the > > calculator. > > > > This grammar generates C# code that compiles perfectly, and works for the > > most part in ANTLRWorks Interpreter, Debugger and in a sample app I made > in > > .NET to call the generated Parser/Lexer. > > > > The problem I run into is what I put in invalid syntax, expecting an > error. > > Output like so: > > > > Valid Syntax: "3+3" => Works in interpreter, debugger and compiled .net > > code. > > Invalid Syntax: "3+/3" => Gives error in interpreter, debugger and > compiled > > .net code, as expected. > > Invalid Syntax: "3_3" => The interpreter shows nothing, the debugger > cannot > > connect and the .net code hangs for a while then throws an out of memory > > exception. > > Your lexer will correctly identify the first '3' as an INT. Next your > lexer will see the '_' which it is unable to deal with. BUT since your > WS rule says that the empty string - the non-stuff between the first '3' > and the '_' - is legal, your lexer accepts that empty string as a WS > token and deposits it into the HIDDEN channel. Now the lexer is still > looking at the '_' which it is unable to deal with. BUT since your WS > rule says that the empty string - the non-stuff between the first '3' > and the '_' - is legal, your lexer accepts that empty string as a WS > token and deposits it into the HIDDEN channel. Now the lexer is still > looking at the '_' which it is unable to deal with. BUT since your WS > rule says that the empty string - the non-stuff between the first '3' > and the '_' - is legal, your lexer accepts that empty string as a WS > token and deposits it into the HIDDEN channel. Now the lexer is still > looking at the '_' .... and so nothing good results. > > Your .NET app runs out of memory because the infinite sequence of empty > WS tokens appended onto the HIDDEN channel just gobbles up all memory. > > The debugger can not connect because the connections happens after the > lexer has finished tokenizing the input text. Your lexer never finishes > so the debugger won't connect. I bet if you waited long enuf you would > eventually run out of memory in this case too. > > Same drill for the interpreter.... > > > > > I'm sure I'm doing something wrong in my grammar but don't know what. > > > > I've included it below. Please help me! > > > > Thanks, > > > > grammar Test; > > > > /*options > > { > > language = 'CSharp2'; > > }*/ > > > > expression > > : amExpression; > > > > amExpression > > :mdExpression ((PLUS|DASH) mdExpression)* > > ; > > > > mdExpression > > :INT ((STAR|SLASH) INT)* > > ; > > > > DASH > > :'-' > > ; > > > > SLASH > > :'/' > > ; > > > > WS > > : (' ' > > | '\t' > > | '\n' > > | '\r')* > > { $channel = HIDDEN; } > > ; > > the * above should really be a + > > be VERY careful with rules that can recognize the empty string, e.g. > have just a * or ? operator. > > I have NEVER found an instance where a lexer rule that accepts nothing > (the empty string) does anything that helps. > > On RARE occasions, a parser rule that accepts the empty string can be > appropriate, but needs to be examined VERY closely. > > > > > STAR > > : '*' > > ; > > > > PLUS > > : '+' > > ; > > > > fragment DIGIT > > : '0'..'9' > > ; > > > > INT > > : (DIGIT)+ > > ; > > Hope this helps... > -jbb > > > From r66092 at freescale.com Tue Jan 12 18:35:40 2010 From: r66092 at freescale.com (Chen Hongjun-R66092) Date: Wed, 13 Jan 2010 10:35:40 +0800 Subject: [antlr-interest] An error occurs in template example Message-ID: <3A45394FD742FA419B760BB8D398F9ED011E1D2A@zch01exm26.fsl.freescale.net> Hi, I am new to ANTLR, and am reading the book The Definitive ANTLR Reference. When I tried the template example 'template/generator/2pass' without any modification, and met an error as below: Exception in thread "main" java.util.NoSuchElementException: no such attribute: init in template context [jasminFile] at org.antlr.stringtemplate.StringTemplate.checkNullAttributeAgainstFormalA rguments(StringTemplate.java:1311) at org.antlr.stringtemplate.StringTemplate.getAttribute(StringTemplate.java :684) at org.antlr.stringtemplate.language.ActionEvaluator.attribute(ActionEvalua tor.java:360) at org.antlr.stringtemplate.language.ActionEvaluator.expr(ActionEvaluator.j ava:136) at org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluator .java:84) at org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149) at org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705) at org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java:167 0) at org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java:166 1) at Test.main(Test.java:45) I appreciate your any suggestions or ideas! Thanks, Hongjun From parrt at cs.usfca.edu Tue Jan 12 18:52:28 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Tue, 12 Jan 2010 18:52:28 -0800 Subject: [antlr-interest] An error occurs in template example In-Reply-To: <3A45394FD742FA419B760BB8D398F9ED011E1D2A@zch01exm26.fsl.freescale.net> References: <3A45394FD742FA419B760BB8D398F9ED011E1D2A@zch01exm26.fsl.freescale.net> Message-ID: the error says you don't have an "init" parameter to the template. do you have one? Ter On Jan 12, 2010, at 6:35 PM, Chen Hongjun-R66092 wrote: > Hi, > > I am new to ANTLR, and am reading the book The Definitive ANTLR > Reference. When I tried the template example 'template/generator/2pass' > without any modification, and met an error as below: > > Exception in thread "main" java.util.NoSuchElementException: no such > attribute: init in template context [jasminFile] > at > org.antlr.stringtemplate.StringTemplate.checkNullAttributeAgainstFormalA > rguments(StringTemplate.java:1311) > at > org.antlr.stringtemplate.StringTemplate.getAttribute(StringTemplate.java > :684) > at > org.antlr.stringtemplate.language.ActionEvaluator.attribute(ActionEvalua > tor.java:360) > at > org.antlr.stringtemplate.language.ActionEvaluator.expr(ActionEvaluator.j > ava:136) > at > org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluator > .java:84) > at > org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149) > at > org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705) > at > org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java:167 > 0) > at > org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java:166 > 1) > at Test.main(Test.java:45) > > I appreciate your any suggestions or ideas! > > Thanks, > Hongjun > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From r66092 at freescale.com Tue Jan 12 19:12:16 2010 From: r66092 at freescale.com (Chen Hongjun-R66092) Date: Wed, 13 Jan 2010 11:12:16 +0800 Subject: [antlr-interest] An error occurs in template example In-Reply-To: References: <3A45394FD742FA419B760BB8D398F9ED011E1D2A@zch01exm26.fsl.freescale.net> Message-ID: <3A45394FD742FA419B760BB8D398F9ED011E1D45@zch01exm26.fsl.freescale.net> Hi Terence, Thanks for your response. For the example 'templates/generator/2pass', I used the following commands to try it out: # java org.antlr.Tool *.g # javac *.java # java Test < input Do I miss anything? What is the "init" parameter needed by template? How to provide this "init" parameter for template? Thanks again, Hongjun > -----Original Message----- > From: Terence Parr [mailto:parrt at cs.usfca.edu] > Sent: Wednesday, January 13, 2010 10:52 AM > To: Chen Hongjun-R66092 > Cc: antlr-interest at antlr.org > Subject: Re: [antlr-interest] An error occurs in template example > > the error says you don't have an "init" parameter to the > template. do you have one? > Ter > On Jan 12, 2010, at 6:35 PM, Chen Hongjun-R66092 wrote: > > > Hi, > > > > I am new to ANTLR, and am reading the book The Definitive ANTLR > > Reference. When I tried the template example > 'template/generator/2pass' > > without any modification, and met an error as below: > > > > Exception in thread "main" java.util.NoSuchElementException: no such > > attribute: init in template context [jasminFile] > > at > > > org.antlr.stringtemplate.StringTemplate.checkNullAttributeAgainstForma > > lA > > rguments(StringTemplate.java:1311) > > at > > > org.antlr.stringtemplate.StringTemplate.getAttribute(StringTemplate.ja > > va > > :684) > > at > > > org.antlr.stringtemplate.language.ActionEvaluator.attribute(ActionEval > > ua > > tor.java:360) > > at > > > org.antlr.stringtemplate.language.ActionEvaluator.expr(ActionEvaluator > > .j > > ava:136) > > at > > > org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluat > > or > > .java:84) > > at > > org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149) > > at > > > org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705) > > at > > > org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java:1 > > 67 > > 0) > > at > > > org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java:1 > > 66 > > 1) > > at Test.main(Test.java:45) > > > > I appreciate your any suggestions or ideas! > > > > Thanks, > > Hongjun > > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > > Unsubscribe: > > > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > > From scott at javadude.com Wed Jan 13 10:11:42 2010 From: scott at javadude.com (Scott Stanchfield) Date: Wed, 13 Jan 2010 13:11:42 -0500 Subject: [antlr-interest] ANTLR 3.x Video Tutorial Message-ID: Hey all! I've posted the first parts of my new ANTLR 3.x video tutorial (in Eclipse) at http://javadude.com/articles/antlr3xtut I plan to do vids on all phases of the sample compiler. Right now it builds a recognizer and did examples of interpreting an expression (in the parser grammar, using GoF Interpreter Pattern and an ANTLR tree parser - good demonstration of how much simpler the tree parser is) I'd love to hear any comments/suggestions/errors on the tutorials. They're in 10-30 minute chunks, so if I royally screwed something up I can redo parts ;) Note that I did each of these with very little rehearsal, and there are some spots where I make a mistake and walk through correcting it. I like doing the tuts this way as they feel more "human" and get to show a bit more thought process. -- Scott ---------------------------------------- Scott Stanchfield http://javadude.com From espina.edgar at gmail.com Wed Jan 13 10:39:22 2010 From: espina.edgar at gmail.com (Edgar Espina) Date: Wed, 13 Jan 2010 15:39:22 -0300 Subject: [antlr-interest] ANTLR 3.x Video Tutorial In-Reply-To: References: Message-ID: <92b42db61001131039x357eb39flf462ea461a93536b@mail.gmail.com> Hi Scott, All the videos are really awesome. Thanks you for choosing ANTLR IDE. Regards, edgar On Wed, Jan 13, 2010 at 3:11 PM, Scott Stanchfield wrote: > Hey all! > > I've posted the first parts of my new ANTLR 3.x video tutorial (in Eclipse) > at > > http://javadude.com/articles/antlr3xtut > > I plan to do vids on all phases of the sample compiler. Right now it > builds a recognizer and did examples of interpreting an expression (in > the parser grammar, using GoF Interpreter Pattern and an ANTLR tree > parser - good demonstration of how much simpler the tree parser is) > > I'd love to hear any comments/suggestions/errors on the tutorials. > They're in 10-30 minute chunks, so if I royally screwed something up I > can redo parts ;) > > Note that I did each of these with very little rehearsal, and there > are some spots where I make a mistake and walk through correcting it. > I like doing the tuts this way as they feel more "human" and get to > show a bit more thought process. > > -- Scott > > ---------------------------------------- > Scott Stanchfield > http://javadude.com > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > -- edgar From parrt at cs.usfca.edu Wed Jan 13 11:05:17 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Wed, 13 Jan 2010 11:05:17 -0800 Subject: [antlr-interest] ANTLR 3.x Video Tutorial In-Reply-To: References: Message-ID: <741CD6EA-9CF9-47CB-BDE8-CFF3E45683D7@cs.usfca.edu> Thanks Scott. great stuff. I took a peek. Ter On Jan 13, 2010, at 10:11 AM, Scott Stanchfield wrote: > Hey all! > > I've posted the first parts of my new ANTLR 3.x video tutorial (in Eclipse) at > > http://javadude.com/articles/antlr3xtut > > I plan to do vids on all phases of the sample compiler. Right now it > builds a recognizer and did examples of interpreting an expression (in > the parser grammar, using GoF Interpreter Pattern and an ANTLR tree > parser - good demonstration of how much simpler the tree parser is) > > I'd love to hear any comments/suggestions/errors on the tutorials. > They're in 10-30 minute chunks, so if I royally screwed something up I > can redo parts ;) > > Note that I did each of these with very little rehearsal, and there > are some spots where I make a mistake and walk through correcting it. > I like doing the tuts this way as they feel more "human" and get to > show a bit more thought process. > > -- Scott > > ---------------------------------------- > Scott Stanchfield > http://javadude.com > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From parrt at cs.usfca.edu Wed Jan 13 11:06:52 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Wed, 13 Jan 2010 11:06:52 -0800 Subject: [antlr-interest] An error occurs in template example In-Reply-To: <3A45394FD742FA419B760BB8D398F9ED011E1D45@zch01exm26.fsl.freescale.net> References: <3A45394FD742FA419B760BB8D398F9ED011E1D2A@zch01exm26.fsl.freescale.net> <3A45394FD742FA419B760BB8D398F9ED011E1D45@zch01exm26.fsl.freescale.net> Message-ID: <63ACB2EF-E813-4B3F-BDE1-941BF6C77C2B@cs.usfca.edu> weird. and youdidn't alter the software at all? Ter On Jan 12, 2010, at 7:12 PM, Chen Hongjun-R66092 wrote: > Hi Terence, > > Thanks for your response. For the example 'templates/generator/2pass', I > used the following commands to try it out: > > # java org.antlr.Tool *.g > # javac *.java > # java Test < input > > Do I miss anything? What is the "init" parameter needed by template? How > to provide this "init" parameter for template? > > Thanks again, > Hongjun > >> -----Original Message----- >> From: Terence Parr [mailto:parrt at cs.usfca.edu] >> Sent: Wednesday, January 13, 2010 10:52 AM >> To: Chen Hongjun-R66092 >> Cc: antlr-interest at antlr.org >> Subject: Re: [antlr-interest] An error occurs in template example >> >> the error says you don't have an "init" parameter to the >> template. do you have one? >> Ter >> On Jan 12, 2010, at 6:35 PM, Chen Hongjun-R66092 wrote: >> >>> Hi, >>> >>> I am new to ANTLR, and am reading the book The Definitive ANTLR >>> Reference. When I tried the template example >> 'template/generator/2pass' >>> without any modification, and met an error as below: >>> >>> Exception in thread "main" java.util.NoSuchElementException: no such >>> attribute: init in template context [jasminFile] >>> at >>> >> org.antlr.stringtemplate.StringTemplate.checkNullAttributeAgainstForma >>> lA >>> rguments(StringTemplate.java:1311) >>> at >>> >> org.antlr.stringtemplate.StringTemplate.getAttribute(StringTemplate.ja >>> va >>> :684) >>> at >>> >> org.antlr.stringtemplate.language.ActionEvaluator.attribute(ActionEval >>> ua >>> tor.java:360) >>> at >>> >> org.antlr.stringtemplate.language.ActionEvaluator.expr(ActionEvaluator >>> .j >>> ava:136) >>> at >>> >> org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluat >>> or >>> .java:84) >>> at >>> org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149) >>> at >>> >> org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705) >>> at >>> >> org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java:1 >>> 67 >>> 0) >>> at >>> >> org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java:1 >>> 66 >>> 1) >>> at Test.main(Test.java:45) >>> >>> I appreciate your any suggestions or ideas! >>> >>> Thanks, >>> Hongjun >>> >>> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>> Unsubscribe: >>> >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> >> >> From felix_do at web.de Wed Jan 13 12:23:04 2010 From: felix_do at web.de (Felix Dorner) Date: Wed, 13 Jan 2010 21:23:04 +0100 Subject: [antlr-interest] =?iso-8859-1?q?Building_the_=DCberjar_fails?= Message-ID: <4B4E2BA8.7090000@web.de> Hi, I do: mvn -Dmaven.test.skip=true package assembly:assembly which fails with the output shown below. Any help welcome. [...] [INFO] [antlr3:antlr {execution: default}] ANTLR installation corrupted; cannot find ANTLR messages format file org/antlr/tool/templates/messages/formats/antlr.stg [INFO] ------------------------------------------------------------------------ [ERROR] FATAL ERROR [INFO] ------------------------------------------------------------------------ [INFO] ANTLR ErrorManager panic [INFO] ------------------------------------------------------------------------ [INFO] Trace java.lang.Error: ANTLR ErrorManager panic at org.antlr.tool.ErrorManager.panic(ErrorManager.java:955) at org.antlr.tool.ErrorManager.setFormat(ErrorManager.java:465) at org.antlr.Tool.setMessageFormat(Tool.java:1222) at org.antlr.mojo.antlr3.Antlr3Mojo.execute(Antlr3Mojo.java:336) at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:490) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:694) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:556) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.forkProjectLifecycle(DefaultLifecycleExecutor.java:1205) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.forkLifecycle(DefaultLifecycleExecutor.java:1033) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:643) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:569) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:539) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:284) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138) at org.apache.maven.cli.MavenCli.main(MavenCli.java:362) at org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) at org.codehaus.classworlds.Launcher.main(Launcher.java:375) [INFO] ------------------------------------------------------------------------ [INFO] Total time: 18 seconds [INFO] Finished at: Wed Jan 13 21:20:27 CET 2010 [INFO] Final Memory: 36M/89M [INFO] ------------------------------------------------------------------------ From jimi at temporal-wave.com Wed Jan 13 12:29:50 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Wed, 13 Jan 2010 12:29:50 -0800 Subject: [antlr-interest] =?iso-8859-1?q?Building_the_=DCberjar_fails?= In-Reply-To: <4B4E2BA8.7090000@web.de> Message-ID: It's a Maven bug. Do a clean then try again and it will eventually work. I believe that I mention this in the BUILD.txt file, which you should read to the end before trying to build. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Felix Dorner > Sent: Wednesday, January 13, 2010 12:23 PM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Building the ?berjar fails > > Hi, > > I do: > > mvn -Dmaven.test.skip=true package assembly:assembly > > which fails with the output shown below. Any help welcome. > > [...] > [INFO] [antlr3:antlr {execution: default}] > ANTLR installation corrupted; cannot find ANTLR messages format file > org/antlr/tool/templates/messages/formats/antlr.stg > [INFO] > ----------------------------------------------------------------------- > - > [ERROR] FATAL ERROR > [INFO] > ----------------------------------------------------------------------- > - > [INFO] ANTLR ErrorManager panic > [INFO] > ----------------------------------------------------------------------- > - > [INFO] Trace From r66092 at freescale.com Wed Jan 13 17:35:27 2010 From: r66092 at freescale.com (Chen Hongjun-R66092) Date: Thu, 14 Jan 2010 09:35:27 +0800 Subject: [antlr-interest] An error occurs in template example In-Reply-To: <63ACB2EF-E813-4B3F-BDE1-941BF6C77C2B@cs.usfca.edu> References: <3A45394FD742FA419B760BB8D398F9ED011E1D2A@zch01exm26.fsl.freescale.net> <3A45394FD742FA419B760BB8D398F9ED011E1D45@zch01exm26.fsl.freescale.net> <63ACB2EF-E813-4B3F-BDE1-941BF6C77C2B@cs.usfca.edu> Message-ID: <3A45394FD742FA419B760BB8D398F9ED011E1E91@zch01exm26.fsl.freescale.net> No, I didn't modify anything in this example. Best Regards, Hongjun > -----Original Message----- > From: Terence Parr [mailto:parrt at cs.usfca.edu] > Sent: Thursday, January 14, 2010 3:07 AM > To: Chen Hongjun-R66092 > Cc: antlr-interest at antlr.org > Subject: Re: [antlr-interest] An error occurs in template example > > weird. and youdidn't alter the software at all? > Ter > On Jan 12, 2010, at 7:12 PM, Chen Hongjun-R66092 wrote: > > > Hi Terence, > > > > Thanks for your response. For the example > 'templates/generator/2pass', > > I used the following commands to try it out: > > > > # java org.antlr.Tool *.g > > # javac *.java > > # java Test < input > > > > Do I miss anything? What is the "init" parameter needed by > template? > > How to provide this "init" parameter for template? > > > > Thanks again, > > Hongjun > > > >> -----Original Message----- > >> From: Terence Parr [mailto:parrt at cs.usfca.edu] > >> Sent: Wednesday, January 13, 2010 10:52 AM > >> To: Chen Hongjun-R66092 > >> Cc: antlr-interest at antlr.org > >> Subject: Re: [antlr-interest] An error occurs in template example > >> > >> the error says you don't have an "init" parameter to the > template. do > >> you have one? > >> Ter > >> On Jan 12, 2010, at 6:35 PM, Chen Hongjun-R66092 wrote: > >> > >>> Hi, > >>> > >>> I am new to ANTLR, and am reading the book The Definitive ANTLR > >>> Reference. When I tried the template example > >> 'template/generator/2pass' > >>> without any modification, and met an error as below: > >>> > >>> Exception in thread "main" > java.util.NoSuchElementException: no such > >>> attribute: init in template context [jasminFile] > >>> at > >>> > >> > org.antlr.stringtemplate.StringTemplate.checkNullAttributeAgainstForm > >> a > >>> lA > >>> rguments(StringTemplate.java:1311) > >>> at > >>> > >> > org.antlr.stringtemplate.StringTemplate.getAttribute(StringTemplate.j > >> a > >>> va > >>> :684) > >>> at > >>> > >> > org.antlr.stringtemplate.language.ActionEvaluator.attribute(ActionEva > >> l > >>> ua > >>> tor.java:360) > >>> at > >>> > >> > org.antlr.stringtemplate.language.ActionEvaluator.expr(ActionEvaluato > >> r > >>> .j > >>> ava:136) > >>> at > >>> > >> > org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvalua > >> t > >>> or > >>> .java:84) > >>> at > >>> org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149) > >>> at > >>> > >> > org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705 > >> ) > >>> at > >>> > >> > org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java: > >> 1 > >>> 67 > >>> 0) > >>> at > >>> > >> > org.antlr.stringtemplate.StringTemplate.toString(StringTemplate.java: > >> 1 > >>> 66 > >>> 1) > >>> at Test.main(Test.java:45) > >>> > >>> I appreciate your any suggestions or ideas! > >>> > >>> Thanks, > >>> Hongjun > >>> > >>> > >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >>> Unsubscribe: > >>> > >> > http://www.antlr.org/mailman/options/antlr-interest/your-email-addres > >> s > >> > >> > >> > > > From lord.of.board at gmx.de Thu Jan 14 01:10:24 2010 From: lord.of.board at gmx.de (lord.of.board at gmx.de) Date: Thu, 14 Jan 2010 10:10:24 +0100 Subject: [antlr-interest] parsing boolean expressions: not not or abc Message-ID: <20100114091024.175140@gmx.net> Hello, I am trying to build a grammar which accepts boolean expressions for filtering. I found some interesting articles on the web, but now I got stuck. I try to parse something like this: not not or abc The first "not" is the boolean operator and the second is a text. Or even worse not not and not or and not and My grammar look like this: grammar TextFilterGrammar; options { output=AST; } content : orexpression ; orexpression : andexpression (OR^ andexpression)* ; andexpression : expression (AND^ expression)* ; expression : (NOT^)? term ; term : WORD ; NOT : 'not' ; AND : 'and' ; OR : 'or' ; WORD : ('a'..'z' | '0'..'9' | '%' | '_')+ ; WS : (' ' | '\r' | '\n' | '\t') { skip(); } ; In ANTLRWorks I always get a MismatchedTokenException when trying to parse "not not or ljsdf". Parsing e.g. "not noti or ljsdf" works fine. I managed to get it working with quotation marks, but I would prefer to have a solution without. Best regards, Lordi -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From Heiko.Folkerts at david-bs.de Thu Jan 14 05:00:59 2010 From: Heiko.Folkerts at david-bs.de (Heiko Folkerts) Date: Thu, 14 Jan 2010 14:00:59 +0100 Subject: [antlr-interest] Tree pattern maching using the C target Message-ID: <93FCBF72DCE7634481C5DF1654D8FF13035A8401@DC2> Hi all, I wrote al litle tree pattern matcher for a specific validation we need in our grammar. ANTLR and the C compiler compile it all well but there is now "downup" mehtod for running the matcher. Instead I only see our own rules in the generated parser. So, is the method to run when using a tree pattern macher in the C target different than ^"downup"? How to run the matcher? I tried to find an answer in the C examples but there was only a treeparser and no tree pattern matcher. Thx+ Heiko Mit freundlichem Gru? Heiko Folkerts Systementwicklung und -design -- ______________________________________________ DAVID GmbH ? Wendenring 1 ? 38114 Braunschweig Tel.: +49 531 24379-14 Fax.: +49 531 24379-79 E-Mail: mailto:Heiko.Folkerts at david-bs.de WWW: http://www.david-bs.de? Eintragung: Amtsgericht Braunschweig, HRB 3167 Gesch?ftsf?hrer: Frank Ptok ______________________________________________ From cummings at kjchome.homeip.net Thu Jan 14 08:20:28 2010 From: cummings at kjchome.homeip.net (Kevin J. Cummings) Date: Thu, 14 Jan 2010 11:20:28 -0500 Subject: [antlr-interest] parsing boolean expressions: not not or abc In-Reply-To: <20100114091024.175140@gmx.net> References: <20100114091024.175140@gmx.net> Message-ID: <4B4F444C.10103@kjchome.homeip.net> On 01/14/2010 04:10 AM, lord.of.board at gmx.de wrote: > Hello, > > I am trying to build a grammar which accepts boolean expressions for filtering. I found some interesting articles on the web, but now I got stuck. > I try to parse something like this: > > not not or abc > > The first "not" is the boolean operator and the second is a text. NOT term OR term > Or even worse > > not not and not or and not and Gawk! NOT term AND NOT term AND NOT term ???? It took me a couple of seconds to figure out how this would be legal! B^) The parser is *definitely* going to need help figuring out when "not" is a NOT and when it is a term! > My grammar look like this: > > grammar TextFilterGrammar; > options { > output=AST; > } > content : orexpression > ; > orexpression > : andexpression (OR^ andexpression)* > ; > andexpression > : expression (AND^ expression)* > ; > expression > : (NOT^)? term > ; > term : WORD > ; > > NOT : 'not' > ; > AND : 'and' > ; > OR : 'or' > ; So, NOT, AND, and OR are reserved words in your grammar. > WORD : ('a'..'z' | '0'..'9' | '%' | '_')+ > ; > WS : (' ' | '\r' | '\n' | '\t') { skip(); } > ; > > In ANTLRWorks I always get a MismatchedTokenException when trying to parse "not not or ljsdf". Parsing e.g. "not noti or ljsdf" works fine. > > I managed to get it working with quotation marks, but I would prefer to have a solution without. "not" will always match your TOKEN named NOT. It will never be a WORD. If you wish to allow it as a term, you might want to change your term production to be: term : WORD | NOT | AND | OR ; This should effectively allow "not", "and", and "or" to be keywords instead of reserved words. But then, how do you want the parser to handle the sequence "not not"? Is that a NOT WORD or NOT NOT? Given that you are only allowing one optional NOT in your expression production, adding the operators to your term production should work. But, you'll be in a world of hurt if you change (NOT)? term to (NOT)* term, as then there is no way to know if a following "not" is a term or a NOT.... [gawk! the puns are getting bad!] You may need to add a syntactic predicate to your grammar around the NOT stuff: expression : (NOT term)=> (NOT^) term | term ; should help you out here.... > Best regards, > Lordi -- Kevin J. Cummings kjchome at rcn.com cummings at kjchome.homeip.net cummings at kjc386.framingham.ma.us Registered Linux User #1232 (http://counter.li.org) From lord.of.board at gmx.de Thu Jan 14 08:24:19 2010 From: lord.of.board at gmx.de (lord.of.board at gmx.de) Date: Thu, 14 Jan 2010 17:24:19 +0100 Subject: [antlr-interest] parsing boolean expressions: not not or abc Message-ID: <20100114162419.142900@gmx.net> I received an email describing a working solution. See below: ----------- Looks like it should work if you change term to term : WORD | NOT | AND | OR ; I tried it with a few examples and it seems to do ok. (I added in ASTLabelType=CommonTree; in the options and printed toStringTree() on the resulting tree and things look good.) The problem is going to be showing syntax errors - because the keywords are non-reserved it's more likely that something the user didn't intend will acually parse. If you can't stop people from using the keywords as terms, you should at least discourage it. Remember PL/I : IF IF = THEN THEN THEN = ELSE ELSE ELSE = IF; Sigh... You can do it, but no one really did (if I recall my PL/I syntax... been a while) and it was highly discouraged. Good luck! -- Scott > Hello, > > I am trying to build a grammar which accepts boolean expressions for filtering. I found some interesting articles on the web, but now I got stuck. > I try to parse something like this: > > ?not not or abc > > The first "not" is the boolean operator and the second is a text. > > Or even worse > > ?not not and not or and not and > > My grammar look like this: > > grammar TextFilterGrammar; > options { > ? ? ? ?output=AST; > } > content : ? ? ? orexpression > ? ? ? ?; > orexpression > ? ? ? ?: ? ? ? andexpression (OR^ andexpression)* > ? ? ? ?; > andexpression > ? ? ? ?: ? ? ? expression (AND^ expression)* > ? ? ? ?; > expression > ? ? ? ?: ? ? ? (NOT^)? term > ? ? ? ?; > term ? ?: ? ? ? WORD > ? ? ? ?; > > NOT ? ? : ? ? ? 'not' > ? ? ? ?; > AND ? ? : ? ? ? 'and' > ? ? ? ?; > OR ? ? ?: ? ? ? 'or' > ? ? ? ?; > WORD ? ?: ? ? ? ('a'..'z' | '0'..'9' | '%' | '_')+ > ? ? ? ?; > WS ? ? ?: ? ? ? (' ' | '\r' | '\n' | '\t') ?{ skip(); } > ? ? ? ?; > > In ANTLRWorks I always get a MismatchedTokenException when trying to parse "not not or ljsdf". Parsing e.g. "not noti or ljsdf" works fine. > > I managed to get it working with quotation marks, but I would prefer to have a solution without. > -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser From jimi at temporal-wave.com Thu Jan 14 08:59:57 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Thu, 14 Jan 2010 08:59:57 -0800 Subject: [antlr-interest] parsing boolean expressions: not not or abc In-Reply-To: <20100114091024.175140@gmx.net> Message-ID: Change your grammar to: grammar T; options { output=AST; } tokens { EXPR; } content : orexpression EOF ->^(EXPR orexpression) ; orexpression : andexpression (OR^ andexpression)* ; andexpression : expression (AND^ expression)* ; expression : (NOT^)? term ; term : ( t=WORD | t=AND | t=OR | t=NOT ) { $t.setType(WORD); } ; NOT : 'not' ; AND : 'and' ; OR : 'or' ; WORD : ('a'..'z' | '0'..'9' | '%' | '_')+ ; WS : (' ' | '\r' | '\n' | '\t') { skip(); } However note that the grammar has to make some assumptions here such as the word 'not' on its own is a term and not (pun not intended) a syntax error where the not is the operator and should expect a term. Also I suspect that your not processing rule should actually be: expression : NOT^ expression | term ; But this would eat not not not as a repeated not as in NOT NOT WORD If the expression rule gets more complicated then ANTLR may not be able to predict properly. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of lord.of.board at gmx.de > Sent: Thursday, January 14, 2010 1:10 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] parsing boolean expressions: not not or abc > > Hello, > > I am trying to build a grammar which accepts boolean expressions for > filtering. I found some interesting articles on the web, but now I got > stuck. > I try to parse something like this: > > not not or abc > > The first "not" is the boolean operator and the second is a text. > > Or even worse > > not not and not or and not and > > My grammar look like this: > > grammar TextFilterGrammar; > options { > output=AST; > } > content : orexpression > ; > orexpression > : andexpression (OR^ andexpression)* > ; > andexpression > : expression (AND^ expression)* > ; > expression > : (NOT^)? term > ; > term : WORD > ; > > NOT : 'not' > ; > AND : 'and' > ; > OR : 'or' > ; > WORD : ('a'..'z' | '0'..'9' | '%' | '_')+ > ; > WS : (' ' | '\r' | '\n' | '\t') { skip(); } > ; > > In ANTLRWorks I always get a MismatchedTokenException when trying to > parse "not not or ljsdf". Parsing e.g. "not noti or ljsdf" works fine. > > I managed to get it working with quotation marks, but I would prefer to > have a solution without. > > Best regards, > Lordi > > -- > GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! > Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From jimi at temporal-wave.com Thu Jan 14 09:02:02 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Thu, 14 Jan 2010 09:02:02 -0800 Subject: [antlr-interest] Tree pattern maching using the C target In-Reply-To: <93FCBF72DCE7634481C5DF1654D8FF13035A8401@DC2> Message-ID: Pattern matcher or normal tree walker? The pattern stuff is not implemented in the C target yet. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Heiko Folkerts > Sent: Thursday, January 14, 2010 5:01 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Tree pattern maching using the C target > > Hi all, > I wrote al litle tree pattern matcher for a specific validation we need > in our grammar. ANTLR and the C compiler compile it all well but there > is now "downup" mehtod for running the matcher. Instead I only see our > own rules in the generated parser. So, is the method to run when using > a tree pattern macher in the C target different than ^"downup"? How to > run the matcher? > > I tried to find an answer in the C examples but there was only a > treeparser and no tree pattern matcher. > > Thx+ > Heiko > > > Mit freundlichem Gru? > Heiko Folkerts > Systementwicklung und -design > -- > ______________________________________________ > DAVID GmbH ? Wendenring 1 ? 38114 Braunschweig > Tel.: +49 531 24379-14 > Fax.: +49 531 24379-79 > E-Mail: mailto:Heiko.Folkerts at david-bs.de > WWW: http://www.david-bs.de > Eintragung: Amtsgericht Braunschweig, HRB 3167 > Gesch?ftsf?hrer: Frank Ptok > ______________________________________________ > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From scott at javadude.com Thu Jan 14 09:47:55 2010 From: scott at javadude.com (Scott Stanchfield) Date: Thu, 14 Jan 2010 12:47:55 -0500 Subject: [antlr-interest] parsing boolean expressions: not not or abc In-Reply-To: References: <20100114091024.175140@gmx.net> Message-ID: Good catch on changing the type of the token; I had forgotten to do that on the note I sent... -- Scott ---------------------------------------- Scott Stanchfield http://javadude.com On Thu, Jan 14, 2010 at 11:59 AM, Jim Idle wrote: > Change your grammar to: > > grammar T; > options { > ? ? ? ?output=AST; > } > tokens { > ? ? ? ?EXPR; > } > > content : ? ? ? orexpression EOF > ? ? ? ? ? ? ? ?->^(EXPR orexpression) > ? ? ? ?; > > orexpression > ? ? ? ?: ? ? ? andexpression (OR^ andexpression)* > ? ? ? ?; > andexpression > ? ? ? ?: ? ? ? expression (AND^ expression)* > ? ? ? ?; > expression > ? ? ? ?: ? ? ? (NOT^)? term > ? ? ? ?; > term ? ?: ( > ? ? ? ? ? ? ? ? ?t=WORD > ? ? ? ? ? ? ? ?| t=AND > ? ? ? ? ? ? ? ?| t=OR > ? ? ? ? ? ? ? ?| t=NOT > ? ? ? ? ?) > ? ? ? ? ?{ > ? ? ? ? ? ? ? ?$t.setType(WORD); > ? ? ? ? ?} > ? ? ? ?; > > NOT ? ? : ? ? ? 'not' > ? ? ? ?; > AND ? ? : ? ? ? 'and' > ? ? ? ?; > OR ? ? ?: ? ? ? 'or' > ? ? ? ?; > WORD ? ?: ? ? ? ('a'..'z' | '0'..'9' | '%' | '_')+ > ? ? ? ?; > WS ? ? ?: ? ? ? (' ' | '\r' | '\n' | '\t') ?{ skip(); } > > > However note that the grammar has to make some assumptions here such as the word 'not' on its own is a term and not (pun not intended) a syntax error where the not is the operator and should expect a term. > > Also I suspect that your not processing rule should actually be: > > expression > ? ? ? ?: ? ? ? NOT^ expression > ? ? ? ?| ? ? ? term > ? ? ? ?; > > But this would eat not not not as a repeated not as in NOT NOT WORD > > If the expression rule gets more complicated then ANTLR may not be able to predict properly. > > Jim > >> -----Original Message----- >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >> bounces at antlr.org] On Behalf Of lord.of.board at gmx.de >> Sent: Thursday, January 14, 2010 1:10 AM >> To: antlr-interest at antlr.org >> Subject: [antlr-interest] parsing boolean expressions: not not or abc >> >> Hello, >> >> I am trying to build a grammar which accepts boolean expressions for >> filtering. I found some interesting articles on the web, but now I got >> stuck. >> I try to parse something like this: >> >> ? not not or abc >> >> The first "not" is the boolean operator and the second is a text. >> >> Or even worse >> >> ? not not and not or and not and >> >> My grammar look like this: >> >> grammar TextFilterGrammar; >> options { >> ? ? ? output=AST; >> } >> content : ? ? orexpression >> ? ? ? ; >> orexpression >> ? ? ? : ? ? ? andexpression (OR^ andexpression)* >> ? ? ? ; >> andexpression >> ? ? ? : ? ? ? expression (AND^ expression)* >> ? ? ? ; >> expression >> ? ? ? : ? ? ? (NOT^)? term >> ? ? ? ; >> term ?: ? ? ? WORD >> ? ? ? ; >> >> NOT ? : ? ? ? 'not' >> ? ? ? ; >> AND ? : ? ? ? 'and' >> ? ? ? ; >> OR ? ?: ? ? ? 'or' >> ? ? ? ; >> WORD ?: ? ? ? ('a'..'z' | '0'..'9' | '%' | '_')+ >> ? ? ? ; >> WS ? ?: ? ? ? (' ' | '\r' | '\n' | '\t') ?{ skip(); } >> ? ? ? ; >> >> In ANTLRWorks I always get a MismatchedTokenException when trying to >> parse "not not or ljsdf". Parsing e.g. "not noti or ljsdf" works fine. >> >> I managed to get it working with quotation marks, but I would prefer to >> have a solution without. >> >> Best regards, >> Lordi >> >> -- >> GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! >> Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >> email-address > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From parrt at cs.usfca.edu Thu Jan 14 16:47:05 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Thu, 14 Jan 2010 16:47:05 -0800 Subject: [antlr-interest] ANTLRWorks plugin for intellij Message-ID: hi. how many people use AW as a plugin *inside* intellij? it really complicates the code and I'm thinking of dumping it; might make it easier for eclipse plugins too if theyr'e not worried about intellij plugin code intermingled in AW. just getting an idea of how many people use it that way. It's not the best integration with intellij so I use AW standalone personally. Thanks, Ter From bkiers at gmail.com Thu Jan 14 23:28:20 2010 From: bkiers at gmail.com (Bart Kiers) Date: Fri, 15 Jan 2010 08:28:20 +0100 Subject: [antlr-interest] ANTLRWorks plugin for intellij In-Reply-To: References: Message-ID: I first used the plug-in with IntelliJ (v9), but found it a bit buggy: quite a few error messages (sorry, not too concrete...). I use the stand alone ANTLRWorks (much to my liking!). Regards, Bart. On Fri, Jan 15, 2010 at 1:47 AM, Terence Parr wrote: > hi. how many people use AW as a plugin *inside* intellij? it really > complicates the code and I'm thinking of dumping it; might make it easier > for eclipse plugins too if theyr'e not worried about intellij plugin code > intermingled in AW. > > just getting an idea of how many people use it that way. It's not the best > integration with intellij so I use AW standalone personally. > > Thanks, > Ter > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From arne.schroeder at gmail.com Fri Jan 15 00:57:24 2010 From: arne.schroeder at gmail.com (=?ISO-8859-1?Q?Arne_Schr=F6der?=) Date: Fri, 15 Jan 2010 09:57:24 +0100 Subject: [antlr-interest] Missing error when tokens are left to parse Message-ID: Hello, I am trying to write a parser for an initialization-file. This file is devided in sections which are not embraced but have a keyword to start them. Unfortunately the parser stops when encountering a problem and just ends the parsing-process, not even reporting an error. For demostration of the problem I wrote the following example-grammar: file : section1 section2? ; section1: 'Section1' ; section2: 'Section2' ; ID : ('a'..'z'|'A'..'Z')+ ; SPACE : ' ' {$channel = HIDDEN;} ; Now using the input "Section1 bla Section2", I would expect the parser to stop at "bla", throw an UnwantedTokenException, do a SingleTokenDeletion as error-recovery and just continue parsing "Section2". What happens is that it stops at "bla", does not recognize it as section2 and just terminates, leaving the two tokens unparsed and not reporting any error. So my question is: How can I avoid my parser doing stuff like that without changing my files' syntax? Thanks in advance Arne From arne.schroeder at gmail.com Fri Jan 15 01:43:28 2010 From: arne.schroeder at gmail.com (=?ISO-8859-1?Q?Arne_Schr=F6der?=) Date: Fri, 15 Jan 2010 10:43:28 +0100 Subject: [antlr-interest] [il-antlr-interest: 27542] Missing error when tokens are left to parse In-Reply-To: <1ec078df1001150127r753cb368p3e70c1039d59101d@mail.gmail.com> References: <1ec078df1001150127r753cb368p3e70c1039d59101d@mail.gmail.com> Message-ID: Thank you for your quick help. It might work in that case but does not help me with my real problem. So I will alter the example to have it closer to my real problem: file : section1 section2? ; section1: 'Section1' method* ; section2: 'Section2' method* ; method : ID LPARENT RPARENT ; ID : ('a'..'z'|'A'..'Z')+ ; LPARENT : '(' ; RPARENT : ')' ; SPACE : ' ' {$channel = HIDDEN;} ; If I now try to parse "Section1 bla()) Section2" something similar happens: It parses up to the second ")" and then decides to skip the rest. And I definitely do not want the second ")" to be there i.e. want it to throw a recognition-error and recover itself. On Fri, Jan 15, 2010 at 10:27 AM, Akira Akira wrote: > I am not sure if this is what you want, but what about changing to > something like the following? (the parts I added are in bold) > > > file : section1 section2? > ; > > section1: 'Section1' *CONTENTS* > ; > > section2: 'Section2' *CONTENTS* > > ; > > ID : ('a'..'z'|'A'..'Z')+ > ; > > *CONTENTS : ('a'..'z'|'A'..'Z')* > ;* > > SPACE : ' ' {$channel = HIDDEN;} > ; > > > > 2010/1/15 Arne Schr?der > >> Hello, >> >> I am trying to write a parser for an initialization-file. This file is >> devided in sections which are not embraced but have a keyword to start >> them. >> >> Unfortunately the parser stops when encountering a problem and just ends >> the >> parsing-process, not even reporting an error. >> >> For demostration of the problem I wrote the following example-grammar: >> >> file : section1 section2? >> ; >> >> section1: 'Section1' >> ; >> >> section2: 'Section2' >> ; >> >> ID : ('a'..'z'|'A'..'Z')+ >> ; >> >> SPACE : ' ' {$channel = HIDDEN;} >> ; >> >> Now using the input "Section1 bla Section2", I would expect the parser to >> stop at "bla", throw an UnwantedTokenException, do a SingleTokenDeletion >> as >> error-recovery and just continue parsing "Section2". >> What happens is that it stops at "bla", does not recognize it as section2 >> and just terminates, leaving the two tokens unparsed and not reporting any >> error. >> >> So my question is: How can I avoid my parser doing stuff like that without >> changing my files' syntax? >> >> >> Thanks in advance >> >> Arne >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> >> -- >> You received this message because you are subscribed to the Google Groups >> "il-antlr-interest" group. >> To post to this group, send email to il-antlr-interest at googlegroups.com. >> To unsubscribe from this group, send email to >> il-antlr-interest+unsubscribe at googlegroups.com >> . >> For more options, visit this group at >> http://groups.google.com/group/il-antlr-interest?hl=en. >> >> >> >> > From antlr at mirality.co.nz Fri Jan 15 03:10:05 2010 From: antlr at mirality.co.nz (Gavin Lambert) Date: Sat, 16 Jan 2010 00:10:05 +1300 Subject: [antlr-interest] Missing error when tokens are left to parse In-Reply-To: References: <1ec078df1001150127r753cb368p3e70c1039d59101d@mail.gmail.com> Message-ID: <20100115111014.230933418446@www.antlr.org> At 22:43 15/01/2010, Arne Schr?der wrote: >file : section1 section2? > ; [...] >If I now try to parse "Section1 bla()) Section2" something similar >happens: >It parses up to the second ")" and then decides to skip the rest. >And I definitely do not want the second ")" to be there i.e. want >it to throw a recognition-error and recover itself. Try adding EOF to the end of your top-level rule. Without that, ANTLR assumes that it is not required to parse all the input, so if it successfully parses a section1 it will just decide that the section2 has been omitted (since it's optional). From m.y.speyer at inter.nl.net Fri Jan 15 03:57:12 2010 From: m.y.speyer at inter.nl.net (Marc Speyer) Date: Fri, 15 Jan 2010 12:57:12 +0100 Subject: [antlr-interest] Tree pattern maching using the C target Message-ID: <000901ca95d9$df6dd740$9e4985c0$@y.speyer@inter.nl.net> Hi all, I have a similar issue using the C# target. Using the Cymbol.g example of pattern 17 Symbol Table for Nested Scopes of the Language Implementation Patterns book I could not get it to work because there is now downup method. According to the documentation this method walks the AST code using ANTLR's built-in downup( ) strategy. Am I correct assuming that this has not been implemented yet for the C# target (as Jim implies in his response). Is it difficult to implement it myself? I guess it would involve implementing the tree pattern matching stuff. Marc P.S. Hope this email files under the proper subject thread, and apologies in advance if it isn't (Just subscribed to the mailing list but I could not find out how to get previous posts from it) > Pattern matcher or normal tree walker? The pattern stuff is not implemented in the C target yet. > > Jim > >> -----Original Message----- >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >> bounces at antlr.org] On Behalf Of Heiko Folkerts >> Sent: Thursday, January 14, 2010 5:01 AM >> To: antlr-interest at antlr.org >> Subject: [antlr-interest] Tree pattern maching using the C target >> >> Hi all, >> I wrote al litle tree pattern matcher for a specific validation we need >> in our grammar. ANTLR and the C compiler compile it all well but there >> is now "downup" mehtod for running the matcher. Instead I only see our >> own rules in the generated parser. So, is the method to run when using >> a tree pattern macher in the C target different than ^"downup"? How to >> run the matcher? >> >> I tried to find an answer in the C examples but there was only a >> treeparser and no tree pattern matcher. >> >> Thx+ >> Heiko >> >> >> -- From JALuber at gmx.de Fri Jan 15 04:58:33 2010 From: JALuber at gmx.de (Johannes Luber) Date: Fri, 15 Jan 2010 13:58:33 +0100 Subject: [antlr-interest] Tree pattern maching using the C target In-Reply-To: <000901ca95d9$df6dd740$9e4985c0$@y.speyer@inter.nl.net> References: <000901ca95d9$df6dd740$9e4985c0$@y.speyer@inter.nl.net> Message-ID: <20100115125833.242280@gmx.net> > Hi all, > > I have a similar issue using the C# target. Using the Cymbol.g example of > pattern 17 Symbol Table for Nested Scopes of the Language Implementation > Patterns book I could not get it to work because there is now downup > method. > According to the documentation this method walks the AST code using > ANTLR's > built-in downup( ) strategy. > > Am I correct assuming that this has not been implemented yet for the C# > target (as Jim implies in his response). Is it difficult to implement it > myself? I guess it would involve implementing the tree pattern matching > stuff. > > Marc You are correct - there is no official version yet, which implements tree pattern matching. I haven't gotten around to the API changes yet (will work on that next week), though I have checked in some untested changes. It would be the easieast if you'd base your own code on that for now. Johannes > P.S. Hope this email files under the proper subject thread, and apologies > in > advance if it isn't (Just subscribed to the mailing list but I could not > find out how to get previous posts from it) > > > Pattern matcher or normal tree walker? The pattern stuff is not > implemented in the C target yet. > > > > Jim > > > >> -----Original Message----- > >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > >> bounces at antlr.org] On Behalf Of Heiko Folkerts > >> Sent: Thursday, January 14, 2010 5:01 AM > >> To: antlr-interest at antlr.org > >> Subject: [antlr-interest] Tree pattern maching using the C target > >> > >> Hi all, > >> I wrote al litle tree pattern matcher for a specific validation we need > >> in our grammar. ANTLR and the C compiler compile it all well but there > >> is now "downup" mehtod for running the matcher. Instead I only see our > >> own rules in the generated parser. So, is the method to run when using > >> a tree pattern macher in the C target different than ^"downup"? How to > >> run the matcher? > >> > >> I tried to find an answer in the C examples but there was only a > >> treeparser and no tree pattern matcher. > >> > >> Thx+ > >> Heiko > >> > >> > >> -- > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From frogery at voila.fr Fri Jan 15 05:53:25 2010 From: frogery at voila.fr (frogery at voila.fr) Date: Fri, 15 Jan 2010 14:53:25 +0100 (CET) Subject: [antlr-interest] Overriding the emit function to use custom tokens Message-ID: <17030639.865681263563605671.JavaMail.www@wwinf4613> Hello, I wanted to create a custom token object, so I have seen in the FAQ that I had to "override" the lexer emit function. So I did that this way: ... pLexer = antlrLexerNew(pInput); pLexer->pLexer->emit = customEmit; ... but it was not working. The customEmit function was never called. So I have debugged and I think there is a bug in antlr3lexer.c. In the nextTokenStr function, shouldn't "emit(lexer)" be replaced by "lexer->emit(lexer);"? What do you think? Thanks, Yann ____________________________________________________ Vous n?avez pas encore adress? vos voeux ? Retrouvez nos cartes sur http://carte-de-voeux.voila.fr ____________________________________________________ Vous n?avez pas encore adress? vos voeux?? Retrouvez nos cartes sur http://carte-de-voeux.voila.fr From Gordon.Tyler at quest.com Fri Jan 15 06:48:49 2010 From: Gordon.Tyler at quest.com (Gordon Tyler) Date: Fri, 15 Jan 2010 06:48:49 -0800 Subject: [antlr-interest] ANTLRWorks plugin for intellij In-Reply-To: References: Message-ID: <1FE9A296676737419A8912A6FD22AE1D01E0479ED4@alvxmbw04.prod.quest.corp> I tried it but I found the ANTLRworks editor too different to the IDEA editor to be comfortable. I was hoping for ANTLR syntax support in the normal IDEA editor. I haven't tried the standalone ANTLRworks editor. -----Original Message----- From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Terence Parr Sent: January 14, 2010 7:47 PM To: antlr-interest at antlr.org interest Subject: [antlr-interest] ANTLRWorks plugin for intellij hi. how many people use AW as a plugin *inside* intellij? it really complicates the code and I'm thinking of dumping it; might make it easier for eclipse plugins too if theyr'e not worried about intellij plugin code intermingled in AW. just getting an idea of how many people use it that way. It's not the best integration with intellij so I use AW standalone personally. Thanks, Ter List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From jimi at temporal-wave.com Fri Jan 15 09:05:41 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Fri, 15 Jan 2010 09:05:41 -0800 Subject: [antlr-interest] Missing error when tokens are left to parse In-Reply-To: References: Message-ID: <1606CC72-9CC1-4F8B-B12B-4DFE70460DCA@temporal-wave.com> This is an FAQ I think. Your start rule does not end in EOF and so ANTLR stops parsing when the next token is not predicted. Jim On Jan 15, 2010, at 0:57, Arne Schr?der wrote: > Hello, > > I am trying to write a parser for an initialization-file. This file is > devided in sections which are not embraced but have a keyword to > start them. > > Unfortunately the parser stops when encountering a problem and just > ends the > parsing-process, not even reporting an error. > > For demostration of the problem I wrote the following example-grammar: > > file : section1 section2? > ; > > section1: 'Section1' > ; > > section2: 'Section2' > ; > > ID : ('a'..'z'|'A'..'Z')+ > ; > > SPACE : ' ' {$channel = HIDDEN;} > ; > > Now using the input "Section1 bla Section2", I would expect the > parser to > stop at "bla", throw an UnwantedTokenException, do a > SingleTokenDeletion as > error-recovery and just continue parsing "Section2". > What happens is that it stops at "bla", does not recognize it as > section2 > and just terminates, leaving the two tokens unparsed and not > reporting any > error. > > So my question is: How can I avoid my parser doing stuff like that > without > changing my files' syntax? > > > Thanks in advance > > Arne > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From parrt at cs.usfca.edu Fri Jan 15 09:23:14 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 15 Jan 2010 09:23:14 -0800 Subject: [antlr-interest] ANTLRWorks plugin for intellij In-Reply-To: <1FE9A296676737419A8912A6FD22AE1D01E0479ED4@alvxmbw04.prod.quest.corp> References: <1FE9A296676737419A8912A6FD22AE1D01E0479ED4@alvxmbw04.prod.quest.corp> Message-ID: Yeah, Jean graduated long before he could work on plugin I think. he was doing the plugin for "free" and it was an afterthought. Ok, I'll talk to Jean. Ter On Jan 15, 2010, at 6:48 AM, Gordon Tyler wrote: > I tried it but I found the ANTLRworks editor too different to the IDEA editor to be comfortable. I was hoping for ANTLR syntax support in the normal IDEA editor. > > I haven't tried the standalone ANTLRworks editor. > > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Terence Parr > Sent: January 14, 2010 7:47 PM > To: antlr-interest at antlr.org interest > Subject: [antlr-interest] ANTLRWorks plugin for intellij > > hi. how many people use AW as a plugin *inside* intellij? it really complicates the code and I'm thinking of dumping it; might make it easier for eclipse plugins too if theyr'e not worried about intellij plugin code intermingled in AW. > > just getting an idea of how many people use it that way. It's not the best integration with intellij so I use AW standalone personally. > > Thanks, > Ter > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From jimi at temporal-wave.com Fri Jan 15 09:50:03 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Fri, 15 Jan 2010 09:50:03 -0800 Subject: [antlr-interest] Overriding the emit function to use custom tokens In-Reply-To: <17030639.865681263563605671.JavaMail.www@wwinf4613> Message-ID: <97280a5c20eb8b4788d15c30e61120a0@temporal-wave.com> No, you have to override nextToken too it calls emit directly for performance reasons. However, no one really needs to do this. There is a user defined pointer built in to every token and a function pointer that is called when the token is released (if it is not NULL). So you can just add your custom token stuff there and rely on the default runtime. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of frogery at voila.fr > Sent: Friday, January 15, 2010 5:53 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Overriding the emit function to use custom > tokens > > Hello, > > I wanted to create a custom token object, so I have seen in the FAQ > that I had to "override" the lexer emit function. So I did that this > way: > > ... > pLexer = antlrLexerNew(pInput); > pLexer->pLexer->emit = customEmit; > ... > > but it was not working. > > The customEmit function was never called. So I have debugged and I > think there is a bug in antlr3lexer.c. In the nextTokenStr function, > shouldn't "emit(lexer)" be replaced by "lexer->emit(lexer);"? What do > you think? > > Thanks, > Yann > > ____________________________________________________ > > Vous n?avez pas encore adress? vos voeux ? Retrouvez nos cartes sur > http://carte-de-voeux.voila.fr > > > ____________________________________________________ > > Vous n?avez pas encore adress? vos voeux?? Retrouvez nos cartes sur > http://carte-de-voeux.voila.fr > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From yurushkin at rambler.ru Fri Jan 15 09:50:49 2010 From: yurushkin at rambler.ru (=?koi8-r?B?4NLV28vJziDtycjBycw=?=) Date: Fri, 15 Jan 2010 20:50:49 +0300 Subject: [antlr-interest] Fortran lexer problem Message-ID: Good day, I want to add comments of Fortran 77: "c xxxxx"; First symbol in column is 'c' - it means that the following line is a line of comment. but I also have NAME token, that will conflict with such COMMENT rule. ('c' can be a name). Is it possible to select rule by my own predicate? Are there any other more clear solvings of this problem? -- Best regards, Michael From jimi at temporal-wave.com Fri Jan 15 10:20:46 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Fri, 15 Jan 2010 10:20:46 -0800 Subject: [antlr-interest] Fortran lexer problem In-Reply-To: Message-ID: <6482427cf4e64b4f8ad286ed88e1f2c4@temporal-wave.com> I think Fortran comments that start with C have to have the C in character position 0 (or 1 in Fortran I guess ;-). So your comment rule can be predicated by checking for line position 0 in ANTLR terms. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of ??????? ?????? > Sent: Friday, January 15, 2010 9:51 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Fortran lexer problem > > Good day, > > I want to add comments of Fortran 77: > > "c xxxxx"; > First symbol in column is 'c' - it means that the following line is a > line > of comment. > > but I also have NAME token, that will conflict with such COMMENT rule. > ('c' can be a name). > > Is it possible to select rule by my own predicate? Are there any other > more clear solvings of > this problem? > > > -- > Best regards, > Michael > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From yurushkin at rambler.ru Fri Jan 15 10:27:22 2010 From: yurushkin at rambler.ru (=?utf-8?B?0K7RgNGD0YjQutC40L0g0JzQuNGF0LDQuNC7?=) Date: Fri, 15 Jan 2010 21:27:22 +0300 Subject: [antlr-interest] Fortran lexer problem In-Reply-To: <6482427cf4e64b4f8ad286ed88e1f2c4@temporal-wave.com> References: <6482427cf4e64b4f8ad286ed88e1f2c4@temporal-wave.com> Message-ID: Excuse me, but how can I specify this condition (is it a first symbol and symbol='c')? Could you send me a piece of lexer grammar? Jim Idle ?????(?) ? ????? ?????? Fri, 15 Jan 2010 21:20:46 +0300: > I think Fortran comments that start with C have to have the C in > character position 0 (or 1 in Fortran I guess ;-). So your comment rule > can be predicated by checking for line position 0 in ANTLR terms. > > Jim > >> -----Original Message----- >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >> bounces at antlr.org] On Behalf Of ??????? ?????? >> Sent: Friday, January 15, 2010 9:51 AM >> To: antlr-interest at antlr.org >> Subject: [antlr-interest] Fortran lexer problem >> >> Good day, >> >> I want to add comments of Fortran 77: >> >> "c xxxxx"; >> First symbol in column is 'c' - it means that the following line is a >> line >> of comment. >> >> but I also have NAME token, that will conflict with such COMMENT rule. >> ('c' can be a name). >> >> Is it possible to select rule by my own predicate? Are there any other >> more clear solvings of >> this problem? >> >> >> -- >> Best regards, >> Michael >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >> email-address > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > __________ Information from ESET Smart Security, version of virus > signature database 4775 (20100115) __________ > > The message was checked by ESET Smart Security. > > http://www.esetnod32.ru > > > -- Best regards, Michael From yurushkin at rambler.ru Fri Jan 15 10:56:21 2010 From: yurushkin at rambler.ru (=?utf-8?B?0K7RgNGD0YjQutC40L0g0JzQuNGF0LDQuNC7?=) Date: Fri, 15 Jan 2010 21:56:21 +0300 Subject: [antlr-interest] Fortran lexer problem In-Reply-To: References: <6482427cf4e64b4f8ad286ed88e1f2c4@temporal-wave.com> Message-ID: I have the following term LINE_COMMENT : ({blabla}? ('c' | 'C' | '*') | '!' ) ~('\n')* { $channel = HIDDEN; } ; but it only pasts the following code at the end: switch (alt31) { case 1: { if ( !((blabla)) ) { CONSTRUCTEX(); EXCEPTION->type = ANTLR3_FAILED_PREDICATE_EXCEPTION; EXCEPTION->message = (void *)"blabla"; EXCEPTION->ruleName = (void *)"LINE_COMMENT"; } if "blabla" is false, an error is occured... but it's not right. ??????? ?????? ?????(?) ? ????? ?????? Fri, 15 Jan 2010 21:27:22 +0300: > Excuse me, but how can I specify this condition (is it a first symbol and > symbol='c')? > Could you send me a piece of lexer grammar? > > > Jim Idle ?????(?) ? ????? ?????? Fri, 15 Jan > 2010 > 21:20:46 +0300: > >> I think Fortran comments that start with C have to have the C in >> character position 0 (or 1 in Fortran I guess ;-). So your comment rule >> can be predicated by checking for line position 0 in ANTLR terms. >> >> Jim >> >>> -----Original Message----- >>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >>> bounces at antlr.org] On Behalf Of ??????? ?????? >>> Sent: Friday, January 15, 2010 9:51 AM >>> To: antlr-interest at antlr.org >>> Subject: [antlr-interest] Fortran lexer problem >>> >>> Good day, >>> >>> I want to add comments of Fortran 77: >>> >>> "c xxxxx"; >>> First symbol in column is 'c' - it means that the following line is a >>> line >>> of comment. >>> >>> but I also have NAME token, that will conflict with such COMMENT rule. >>> ('c' can be a name). >>> >>> Is it possible to select rule by my own predicate? Are there any other >>> more clear solvings of >>> this problem? >>> >>> >>> -- >>> Best regards, >>> Michael >>> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >>> email-address >> >> >> >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> >> __________ Information from ESET Smart Security, version of virus >> signature database 4775 (20100115) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.esetnod32.ru >> >> >> > > -- Best regards, Michael From zep_antlr at bahj.com Fri Jan 15 14:02:40 2010 From: zep_antlr at bahj.com (Zachary Palmer) Date: Fri, 15 Jan 2010 17:02:40 -0500 Subject: [antlr-interest] First and Last Token of a Rule Message-ID: <4B50E600.6090005@bahj.com> All, I think this is a pretty simple operation, but I have no idea how to execute it. Suppose I'm in some action code and have a reference to the parser. Is there a way for me to obtain the most recently used token? How about the token that started the most recent grammar rule? For instance, consider the following grammar (using a Java target language): foo: 'a' bar* 'd' { doStuff(); }; bar: ('b' | 'c') { doStuff(); }; Let's assume we are feeding this grammar the string "abcd". In that case, doStuff is called three times: once after the token 'b' is matched in the bar rule, once after the token 'c' is matched in the bar rule, and once after the tokens 'a' through 'd' are matched in the foo rule. I would like, from within the body of the doStuff method, to obtain the first and last token of each rule matched. So, for instance, if my doStuff method looked like this: void doStuff() { Token first = ...; // first token of the current rule Token last = ...; // token most recently used System.out.println(first.getText() + ", " + last.getText()); } then the output to the above grammar when provided the input "abcd" should be b,b c,c a,d This is, of course, a representative example; the real situation is a bit more complicated. The catch is that I don't want to add any arguments to the doStuff method or do anything else that would require me to change every rule in this 3,000 line grammar. Is there a way that I can get the first token of the current rule and the most recently used token without tweaking every single grammar rule? Many thanks for reading! Zachary Palmer From jimi at temporal-wave.com Fri Jan 15 15:41:56 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Fri, 15 Jan 2010 15:41:56 -0800 Subject: [antlr-interest] First and Last Token of a Rule In-Reply-To: <4B50E600.6090005@bahj.com> Message-ID: <5e608006169f3d4494c1b7c337411109@temporal-wave.com> The upcoming token at any point is returned by input.LT(1), the previous token by input.LT(-1) So: foo @init { CommonToken sToken = input.LT(1); } : A bar* D { doStuff(sToken, input.LT(-1)); } ; And so on. Also look at things like $start depending on what the output is etc. However, you will be much better off building an AST then walking the tree to do your actions. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Zachary Palmer > Sent: Friday, January 15, 2010 2:03 PM > To: antlr-interest at antlr.org > Subject: [antlr-interest] First and Last Token of a Rule > > All, > > I think this is a pretty simple operation, but I have no idea how to > execute it. Suppose I'm in some action code and have a reference to > the > parser. Is there a way for me to obtain the most recently used token? > How about the token that started the most recent grammar rule? > > For instance, consider the following grammar (using a Java target > language): > > foo: 'a' bar* 'd' { doStuff(); }; > bar: ('b' | 'c') { doStuff(); }; > > Let's assume we are feeding this grammar the string "abcd". In that > case, doStuff is called three times: once after the token 'b' is > matched > in the bar rule, once after the token 'c' is matched in the bar rule, > and once after the tokens 'a' through 'd' are matched in the foo rule. > I would like, from within the body of the doStuff method, to obtain the > first and last token of each rule matched. So, for instance, if my > doStuff method looked like this: > > void doStuff() { > Token first = ...; // first token of the current rule > Token last = ...; // token most recently used > System.out.println(first.getText() + ", " + last.getText()); > } > > then the output to the above grammar when provided the input "abcd" > should be > > b,b > c,c > a,d > > This is, of course, a representative example; the real situation is a > bit more complicated. The catch is that I don't want to add any > arguments to the doStuff method or do anything else that would require > me to change every rule in this 3,000 line grammar. Is there a way > that > I can get the first token of the current rule and the most recently > used > token without tweaking every single grammar rule? > > Many thanks for reading! > > Zachary Palmer > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From zep_antlr at bahj.com Fri Jan 15 16:09:32 2010 From: zep_antlr at bahj.com (Zachary Palmer) Date: Fri, 15 Jan 2010 19:09:32 -0500 Subject: [antlr-interest] First and Last Token of a Rule In-Reply-To: <5e608006169f3d4494c1b7c337411109@temporal-wave.com> References: <5e608006169f3d4494c1b7c337411109@temporal-wave.com> Message-ID: <4B5103BC.5030003@bahj.com> Jim, Thanks for the reply. :) That's good to know. Any idea about how to get the first token in a given rule? With the information you've given me, I could always stick something in an @init and an @after in every rule, but I'd definitely like to avoid that. I guess what I'm really wanting is an @allrulesinit and an @allrulesafter (to occur before and after the @init and @after, respectively), but it doesn't seem like those exist. In fact, I am building an AST. The actions I mentioned previously are doing just that and every rule I have is of the (unfortunate) form: foo returns [FooNode ret] : bar ';' { $ret = factory.makeFooNode($bar.ret); } ; Because I want node creation to be indirected through a factory (and because I want a heterogeneous AST), there doesn't seem to be any choice but to use this approach. The people who wrote the ANTLR3 Java 1.5 grammar I pulled from the ANTLR website seemed to agree; the OpenJDK project uses the same approach for their ANTLR parser. I've gotten exactly the tree I needed (built to a different API than the Java Compiler API for purposes of my project) and now I want to tag those nodes with their start and end tokens. I might actually have some luck with scopes; I should look into that. Thanks again for the help! Cheers, Zach > The upcoming token at any point is returned by input.LT(1), the previous token by input.LT(-1) > > So: > > foo > @init { > CommonToken sToken = input.LT(1); > } > : A bar* D { doStuff(sToken, input.LT(-1)); } > ; > > And so on. Also look at things like $start depending on what the output is etc. > > However, you will be much better off building an AST then walking the tree to do your actions. > > Jim > > >> -----Original Message----- >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >> bounces at antlr.org] On Behalf Of Zachary Palmer >> Sent: Friday, January 15, 2010 2:03 PM >> To: antlr-interest at antlr.org >> Subject: [antlr-interest] First and Last Token of a Rule >> >> All, >> >> I think this is a pretty simple operation, but I have no idea how to >> execute it. Suppose I'm in some action code and have a reference to >> the >> parser. Is there a way for me to obtain the most recently used token? >> How about the token that started the most recent grammar rule? >> >> For instance, consider the following grammar (using a Java target >> language): >> >> foo: 'a' bar* 'd' { doStuff(); }; >> bar: ('b' | 'c') { doStuff(); }; >> >> Let's assume we are feeding this grammar the string "abcd". In that >> case, doStuff is called three times: once after the token 'b' is >> matched >> in the bar rule, once after the token 'c' is matched in the bar rule, >> and once after the tokens 'a' through 'd' are matched in the foo rule. >> I would like, from within the body of the doStuff method, to obtain the >> first and last token of each rule matched. So, for instance, if my >> doStuff method looked like this: >> >> void doStuff() { >> Token first = ...; // first token of the current rule >> Token last = ...; // token most recently used >> System.out.println(first.getText() + ", " + last.getText()); >> } >> >> then the output to the above grammar when provided the input "abcd" >> should be >> >> b,b >> c,c >> a,d >> >> This is, of course, a representative example; the real situation is a >> bit more complicated. The catch is that I don't want to add any >> arguments to the doStuff method or do anything else that would require >> me to change every rule in this 3,000 line grammar. Is there a way >> that >> I can get the first token of the current rule and the most recently >> used >> token without tweaking every single grammar rule? >> >> Many thanks for reading! >> >> Zachary Palmer >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >> email-address >> > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > From wclodius at los-alamos.net Fri Jan 15 19:04:02 2010 From: wclodius at los-alamos.net (William B. Clodius) Date: Fri, 15 Jan 2010 20:04:02 -0700 Subject: [antlr-interest] Fortran lexer problem In-Reply-To: References: Message-ID: <4C3D1B55-86D5-4873-B23D-88FCA1FE153A@los-alamos.net> As this is at least your second question on Fortran and ANTLR I suggest you check out the Open Fortran Project. http://fortran-parser.sourceforge.net/ As to the question regarding Fortran comments, Lexing Fortran, particularly the fixed source form, where spacing is not significant, is a pain not really suited to automated tools such as ANTLR. Check out Sale's Algorithm. On Jan 15, 2010, at 10:50 AM, ??????? ?????? wrote: > Good day, > > I want to add comments of Fortran 77: > > "c xxxxx"; > First symbol in column is 'c' - it means that the following line is a line > of comment. > > but I also have NAME token, that will conflict with such COMMENT rule. > ('c' can be a name). > > Is it possible to select rule by my own predicate? Are there any other > more clear solvings of > this problem? > > > -- > Best regards, > Michael > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From christian.schladetsch at gmail.com Sat Jan 16 00:11:06 2010 From: christian.schladetsch at gmail.com (Christian Schladetsch) Date: Sat, 16 Jan 2010 19:11:06 +1100 Subject: [antlr-interest] Incremental Parsing, AST creation, and ST generation Message-ID: <6442c4ae1001160011k6bfc4e8erd4fedf13c3787e3b@mail.gmail.com> Hello, I am writing a network protocol using ANTLR. My idea is to use ANTLR to parse incoming packets formed as possibly nested edicts. Example full transcript of input is: foo(a=1,b="baz") { bar(); spam(c=10) { grok(@b); } } Now, the key issue I have is to avoid reparsing the entire file when new input arrives. For example: foo() { bar(); This is valid input for my grammar. When new input arrives, such as: spam() { I want to re-use the previously parsed items (and AST, and code generated by the AST via StringTemplate), while just adding the new tokens to the parser, and new nodes to the tree, and new code to my VM. Basically, I'd like to know if it is possible to generate an AST from a parser, then add more input to that parser (and have it possibly fail parsing), then add more to the AST. It is impractical to have to re-parse the entire input (and re-create the entire AST) when new input arrives. Full transcripts can be thousands of lines long. There is a way to do this, but I would like to see if I can first leverage ANTLR. I've used ANTLR with great success (and a lot of bleary eyes), but this is a new application of it for me and I am unsure of the feasibility. Thanks in advance, Christian. From christian.schladetsch at gmail.com Sat Jan 16 00:17:53 2010 From: christian.schladetsch at gmail.com (Christian Schladetsch) Date: Sat, 16 Jan 2010 19:17:53 +1100 Subject: [antlr-interest] Incremental Parsing, AST Generation Message-ID: <6442c4ae1001160017v13444961h75afb683b6470abc@mail.gmail.com> Hello, I am writing a network protocol using ANTLR. My idea is to use ANTLR to parse incoming packets formed as possibly nested edicts. Example full transcript of input is: foo(a=1,b="baz") { bar(); spam(c=10) { grok(@b); } } Now, the key issue I have is to avoid reparsing the entire file when new input arrives. For example: foo() { bar(); This is valid input for my grammar. When new input arrives, such as: spam() { I want to re-use the previously parsed items (and AST, and code generated by the AST via StringTemplate), while just adding the new tokens to the parser, and new nodes to the tree, and new code to my VM. Basically, I'd like to know if it is possible to generate an AST from a parser, then add more input to that parser (and have it possibly fail parsing), then add more to the AST. It is impractical to have to re-parse the entire input (and re-create the entire AST) when new input arrives. Full transcripts can be thousands of lines long. There is a way to do this, but I would like to see if I can first leverage ANTLR. I've used ANTLR with great success (and a lot of bleary eyes), but this is a new application of it for me and I am unsure of the feasibility. Thanks in advance, Christian. PS. Apologies if this is a dupe. From jimi at temporal-wave.com Sat Jan 16 19:14:21 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Sat, 16 Jan 2010 19:14:21 -0800 Subject: [antlr-interest] Fortran lexer problem In-Reply-To: References: <6482427cf4e64b4f8ad286ed88e1f2c4@temporal-wave.com> Message-ID: You need gated predicate. Read the getting stared articles in the wiki. Jim On Jan 15, 2010, at 10:56, ??????? ?????? wrote: > I have the following term > > LINE_COMMENT > : ({blabla}? ('c' | 'C' | '*') | '!' ) ~('\n')* > { > $channel = HIDDEN; > } > ; > > but it only pasts the following code at the end: > > > switch (alt31) > { > case 1: > { > if ( !((blabla)) ) > { > CONSTRUCTEX(); > EXCEPTION->type = > ANTLR3_FAILED_PREDICATE_EXCEPTION; > EXCEPTION->message = (void *)"blabla"; > EXCEPTION->ruleName = (void > *)"LINE_COMMENT"; > } > > > if "blabla" is false, an error is occured... but it's not right. > > ??????? ?????? ?????(?) ? > ????? ?????? Fri, 15 Jan 2010 21:27:22 +0300: > >> Excuse me, but how can I specify this condition (is it a first >> symbol and >> symbol='c')? >> Could you send me a piece of lexer grammar? >> >> >> Jim Idle ?????(?) ? ????? >> ?????? Fri, 15 Jan 2010 >> 21:20:46 +0300: >> >>> I think Fortran comments that start with C have to have the C in >>> character position 0 (or 1 in Fortran I guess ;-). So your comment >>> rule >>> can be predicated by checking for line position 0 in ANTLR terms. >>> >>> Jim >>> >>>> -----Original Message----- >>>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >>>> bounces at antlr.org] On Behalf Of ??????? ?????? >>>> Sent: Friday, January 15, 2010 9:51 AM >>>> To: antlr-interest at antlr.org >>>> Subject: [antlr-interest] Fortran lexer problem >>>> >>>> Good day, >>>> >>>> I want to add comments of Fortran 77: >>>> >>>> "c xxxxx"; >>>> First symbol in column is 'c' - it means that the following line >>>> is a >>>> line >>>> of comment. >>>> >>>> but I also have NAME token, that will conflict with such COMMENT >>>> rule. >>>> ('c' can be a name). >>>> >>>> Is it possible to select rule by my own predicate? Are there any >>>> other >>>> more clear solvings of >>>> this problem? >>>> >>>> >>>> -- >>>> Best regards, >>>> Michael >>>> >>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >>>> email-address >>> >>> >>> >>> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>> Unsubscribe: >>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >>> >>> __________ Information from ESET Smart Security, version of virus >>> signature database 4775 (20100115) __________ >>> >>> The message was checked by ESET Smart Security. >>> >>> http://www.esetnod32.ru >>> >>> >>> >> >> > > > -- > Best regards, > Michael From hikemike at gmail.com Sun Jan 17 13:04:35 2010 From: hikemike at gmail.com (Michael C. Starkie) Date: Sun, 17 Jan 2010 16:04:35 -0500 Subject: [antlr-interest] Syntactic Predicates for matching literals within char sequences? Message-ID: <5a5669421001171304va494785n1ee81b35bb153785@mail.gmail.com> Hi, I'm new to Antlr and I'm trying to match the string literal 'DATA_IN' which appears multiple times in a sequence of ASCII chars. However, the parser get's confused when it encounters strings like 'DAP' or 'DA': mismatched character 'P' expecting 'T' lexer: DATA_IN : 'DATA_IN'; ANY_CHAR : '\u0002'..'\u007F'; parser: rule: line+ ; line: data_in check; data_in_check options { backtrack=true; } : data_in | any_char; data_in : DATA_IN any_char : ANY_CHAR; Mike From kferrio at gmail.com Sun Jan 17 13:10:59 2010 From: kferrio at gmail.com (kferrio at gmail.com) Date: Sun, 17 Jan 2010 21:10:59 +0000 Subject: [antlr-interest] Fortran lexer problem In-Reply-To: References: <6482427cf4e64b4f8ad286ed88e1f2c4@temporal-wave.com> Message-ID: <1278375498-1263762662-cardhu_decombobulator_blackberry.rim.net-263095628-@bda428.bisx.prod.on.blackberry> Michael... I feel pity for you if have to parse F77. You're going to run into a few problems harder to solve/avoid than this. So if you're just going to discard comments anyway... I suggest you prefilter your input with a tool like 'sed' to strip fixed format comments. Then you can get on with quirky things like Fortran edit descriptors. :) Kyle Sent from my Verizon Wireless BlackBerry -----Original Message----- From: ??????? ?????? Date: Fri, 15 Jan 2010 21:56:21 To: ??????? ??????; Jim Idle; antlr-interest at antlr.org Subject: Re: [antlr-interest] Fortran lexer problem I have the following term LINE_COMMENT : ({blabla}? ('c' | 'C' | '*') | '!' ) ~('\n')* { $channel = HIDDEN; } ; but it only pasts the following code at the end: switch (alt31) { case 1: { if ( !((blabla)) ) { CONSTRUCTEX(); EXCEPTION->type = ANTLR3_FAILED_PREDICATE_EXCEPTION; EXCEPTION->message = (void *)"blabla"; EXCEPTION->ruleName = (void *)"LINE_COMMENT"; } if "blabla" is false, an error is occured... but it's not right. ??????? ?????? ?????(?) ? ????? ?????? Fri, 15 Jan 2010 21:27:22 +0300: > Excuse me, but how can I specify this condition (is it a first symbol and > symbol='c')? > Could you send me a piece of lexer grammar? > > > Jim Idle ?????(?) ? ????? ?????? Fri, 15 Jan > 2010 > 21:20:46 +0300: > >> I think Fortran comments that start with C have to have the C in >> character position 0 (or 1 in Fortran I guess ;-). So your comment rule >> can be predicated by checking for line position 0 in ANTLR terms. >> >> Jim >> >>> -----Original Message----- >>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >>> bounces at antlr.org] On Behalf Of ??????? ?????? >>> Sent: Friday, January 15, 2010 9:51 AM >>> To: antlr-interest at antlr.org >>> Subject: [antlr-interest] Fortran lexer problem >>> >>> Good day, >>> >>> I want to add comments of Fortran 77: >>> >>> "c xxxxx"; >>> First symbol in column is 'c' - it means that the following line is a >>> line >>> of comment. >>> >>> but I also have NAME token, that will conflict with such COMMENT rule. >>> ('c' can be a name). >>> >>> Is it possible to select rule by my own predicate? Are there any other >>> more clear solvings of >>> this problem? >>> >>> >>> -- >>> Best regards, >>> Michael >>> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >>> email-address >> >> >> >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> >> __________ Information from ESET Smart Security, version of virus >> signature database 4775 (20100115) __________ >> >> The message was checked by ESET Smart Security. >> >> http://www.esetnod32.ru >> >> >> > > -- Best regards, Michael List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From yurushkin at rambler.ru Sun Jan 17 13:17:29 2010 From: yurushkin at rambler.ru (=?utf-8?B?0K7RgNGD0YjQutC40L0g0JzQuNGF0LDQuNC7?=) Date: Mon, 18 Jan 2010 00:17:29 +0300 Subject: [antlr-interest] Fortran lexer problem In-Reply-To: <1278375498-1263762662-cardhu_decombobulator_blackberry.rim.net-263095628-@bda428.bisx.prod.on.blackberry> References: <6482427cf4e64b4f8ad286ed88e1f2c4@temporal-wave.com> <1278375498-1263762662-cardhu_decombobulator_blackberry.rim.net-263095628-@bda428.bisx.prod.on.blackberry> Message-ID: Thank you. I have decided to follow your question :) It was interesting for me to find more 'clear' way. ?????(?) ? ????? ?????? Mon, 18 Jan 2010 00:10:59 +0300: > Michael... I feel pity for you if have to parse F77. You're going to > run into a few problems harder to solve/avoid than this. So if you're > just going to discard comments anyway... I suggest you prefilter your > input with a tool like 'sed' to strip fixed format comments. Then you > can get on with quirky things like Fortran edit descriptors. :) > > Kyle > > Sent from my Verizon Wireless BlackBerry > > -----Original Message----- > From: ??????? ?????? > Date: Fri, 15 Jan 2010 21:56:21 > To: ??????? ??????; Jim > Idle; > antlr-interest at antlr.org > Subject: Re: [antlr-interest] Fortran lexer problem > > I have the following term > > LINE_COMMENT > : ({blabla}? ('c' | 'C' | '*') | '!' ) ~('\n')* > { > $channel = HIDDEN; > } > ; > > but it only pasts the following code at the end: > > > switch (alt31) > { > case 1: > { > if ( !((blabla)) ) > { > CONSTRUCTEX(); > EXCEPTION->type = > ANTLR3_FAILED_PREDICATE_EXCEPTION; > EXCEPTION->message = (void *)"blabla"; > EXCEPTION->ruleName = (void *)"LINE_COMMENT"; > } > > > if "blabla" is false, an error is occured... but it's not right. > > ??????? ?????? ?????(?) ? ????? ?????? Fri, 15 Jan > 2010 21:27:22 +0300: > >> Excuse me, but how can I specify this condition (is it a first symbol >> and >> symbol='c')? >> Could you send me a piece of lexer grammar? >> >> >> Jim Idle ?????(?) ? ????? ?????? Fri, 15 Jan >> 2010 >> 21:20:46 +0300: >> >>> I think Fortran comments that start with C have to have the C in >>> character position 0 (or 1 in Fortran I guess ;-). So your comment rule >>> can be predicated by checking for line position 0 in ANTLR terms. >>> >>> Jim >>> >>>> -----Original Message----- >>>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >>>> bounces at antlr.org] On Behalf Of ??????? ?????? >>>> Sent: Friday, January 15, 2010 9:51 AM >>>> To: antlr-interest at antlr.org >>>> Subject: [antlr-interest] Fortran lexer problem >>>> >>>> Good day, >>>> >>>> I want to add comments of Fortran 77: >>>> >>>> "c xxxxx"; >>>> First symbol in column is 'c' - it means that the following line is a >>>> line >>>> of comment. >>>> >>>> but I also have NAME token, that will conflict with such COMMENT rule. >>>> ('c' can be a name). >>>> >>>> Is it possible to select rule by my own predicate? Are there any other >>>> more clear solvings of >>>> this problem? >>>> >>>> >>>> -- >>>> Best regards, >>>> Michael >>>> >>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >>>> email-address >>> >>> >>> >>> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest >>> Unsubscribe: >>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >>> >>> __________ Information from ESET Smart Security, version of virus >>> signature database 4775 (20100115) __________ >>> >>> The message was checked by ESET Smart Security. >>> >>> http://www.esetnod32.ru >>> >>> >>> >> >> > > -- Best regards, Michael From kferrio at gmail.com Sun Jan 17 13:20:14 2010 From: kferrio at gmail.com (kferrio at gmail.com) Date: Sun, 17 Jan 2010 21:20:14 +0000 Subject: [antlr-interest] Syntactic Predicates for matching literals withinchar sequences? Message-ID: <907896710-1263763215-cardhu_decombobulator_blackberry.rim.net-1973379806-@bda428.bisx.prod.on.blackberry> You probably want to make ANY_CHAR a lexer fragment so that it does not consume input except as part of a larger rule which calls it. Or maybe I missed your intent. Kyle ------Original Message------ From: Michael C. Starkie Sender: ANTLR To: antlr-interest at antlr.org Subject: [antlr-interest] Syntactic Predicates for matching literals withinchar sequences? Sent: Jan 17, 2010 2:04 PM Hi, I'm new to Antlr and I'm trying to match the string literal 'DATA_IN' which appears multiple times in a sequence of ASCII chars. However, the parser get's confused when it encounters strings like 'DAP' or 'DA': mismatched character 'P' expecting 'T' lexer: DATA_IN : 'DATA_IN'; ANY_CHAR : '\u0002'..'\u007F'; parser: rule: line+ ; line: data_in check; data_in_check options { backtrack=true; } : data_in | any_char; data_in : DATA_IN any_char : ANY_CHAR; Mike List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address Sent from my Verizon Wireless BlackBerry From frogery at voila.fr Mon Jan 18 00:02:45 2010 From: frogery at voila.fr (frogery at voila.fr) Date: Mon, 18 Jan 2010 09:02:45 +0100 (CET) Subject: [antlr-interest] Overriding the emit function to use custom tokens Message-ID: <19590761.1131651263801765450.JavaMail.www@wwinf4603> Jim, Indeed, I want to use the custom pointer defined in ANTLR3_COMMON_TOKEN_struct. I have done this: @init { double* pCustom = ANTLR3_MALLOC(sizeof(double)); *pCustom = 0; CUSTOM = (ANTLR3_UINT32)pCustom; } My problem is that I have not found any way to set the freeCustom pointer (that is the pointer to a function that knows how to free the custom structure when the token is destroyed). I have probably missed something but the only way I have found to set this freeCustom pointer was to override the emit function. Is there another way to do it? Thanks, Yann > Message du 15/01/10 ? 18h50 > De : "Jim Idle" > A : "antlr-interest at antlr.org" > Copie ? : > Objet : Re: [antlr-interest] Overriding the emit function to use custom tokens > > No, you have to override nextToken too it calls emit directly for performance reasons. > > However, no one really needs to do this. There is a user defined pointer built in to every token and a function pointer that is called when the token is released (if it is not NULL). So you can just add your custom token stuff there and rely on the default runtime. > > Jim > > > -----Original Message----- > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > > bounces at antlr.org] On Behalf Of frogery at voila.fr > > Sent: Friday, January 15, 2010 5:53 AM > > To: antlr-interest at antlr.org > > Subject: [antlr-interest] Overriding the emit function to use custom > > tokens > > > > Hello, > > > > I wanted to create a custom token object, so I have seen in the FAQ > > that I had to "override" the lexer emit function. So I did that this > > way: > > > > ... > > pLexer = antlrLexerNew(pInput); > > pLexer->pLexer->emit = customEmit; > > ... > > > > but it was not working. > > > > The customEmit function was never called. So I have debugged and I > > think there is a bug in antlr3lexer.c. In the nextTokenStr function, > > shouldn't "emit(lexer)" be replaced by "lexer->emit(lexer);"? What do > > you think? > > > > Thanks, > > Yann > > > > ____________________________________________________ > > > > Vous n?avez pas encore adress? vos voeux ? Retrouvez nos cartes sur > > http://carte-de-voeux.voila.fr > > > > > > ____________________________________________________ > > > > Vous n?avez pas encore adress? vos voeux ? Retrouvez nos cartes sur > > http://carte-de-voeux.voila.fr > > > > > > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > > email-address > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > ____________________________________________________ Vous n?avez pas encore adress? vos voeux?? Retrouvez nos cartes sur http://carte-de-voeux.voila.fr From michnay at gmail.com Mon Jan 18 06:17:51 2010 From: michnay at gmail.com (=?ISO-8859-1?Q?Michnay_Bal=E1zs?=) Date: Mon, 18 Jan 2010 15:17:51 +0100 Subject: [antlr-interest] Rule ignored in CSS grammar Message-ID: Hi Guys, The attached grammar is supposed to parse CSS files. I used this as an initial version: http://www.antlr.org/grammar/1214945003224/csst3.g First I tried to add a functionality to prevent "_" chars for property names, so I created a new lexer rule "CSSPROPERTYNAME" to ensure this. The "declaration" rule has been updated accordingly. The funny thing is that now the "selector" rule fails to recognize tag selectors like: .class_selector img { property: value; ... } Since my update should only affect property names and not selectors, I really do not understand what the problem is. I tried to define lexer rules as both fragments and literal values, no luck. I used ANTLRWorks to debug this and have noticed that in the "selector" rule "selectorOperation" is ignored: selector : elem selectorOperation* attrib* pseudo? -> elem selectorOperation* attrib* pseudo* ; Any ideas? Thanks. -------------- next part -------------- A non-text attachment was scrubbed... Name: css.g Type: application/octet-stream Size: 2973 bytes Desc: not available Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100118/51689444/attachment.obj From m.y.speyer at inter.nl.net Mon Jan 18 08:39:00 2010 From: m.y.speyer at inter.nl.net (Marc Speyer) Date: Mon, 18 Jan 2010 17:39:00 +0100 Subject: [antlr-interest] Tree pattern maching using the C# (was C) target In-Reply-To: <20100115125833.242280@gmx.net> References: <000901ca95d9$df6dd740$9e4985c0$@y.speyer@inter.nl.net> <20100115125833.242280@gmx.net> Message-ID: <002401ca985c$bd00f450$3702dcf0$@y.speyer@inter.nl.net> Hi Johannes, I tried the version that you mentioned by downloading it from antlr:/runtime/CSharp2 in the Fisheye code repository and then tried to compile it using VS2008. This didn't work because a file "TokenConstants.cs" was reported missing by VS2008 and gave me compilation errors. I managed to get a version from the CSharp3 repository and after making one change I could compile. I noticed that the Downup method is part of the Treefilter class which inherits from the TreeParser class. The grammar for the tree parser from the example has the following header: // START: header tree grammar DefRef; options { tokenVocab = Cymbol; ASTLabelType = CommonTree; filter = true; language=CSharp2; } @members { SymbolTable symtab; Scope currentScope; public DefRef(ITreeNodeStream input, SymbolTable symtab) : this(input) { this.symtab = symtab; currentScope = symtab.globals; } } // END: header Generating the tree parser gives DefRef.cs with the DefRef class declared as: public partial class DefRef : TreeParser Now I can cast this into the TreeFilter class but to test things quickly I changed the above line in the DefRef.cs into: public partial class DefRef : TreeFilter In the calling program I use: DefRef def = new DefRef(nodes, symtab); // use custom constructor def.Downup(t); // trigger symtab actions upon certain subtrees When I run this nothings happens whereas I have grammar rules and actions like: exitBlock : BLOCK { Console.WriteLine("locals: "+currentScope); currentScope = currentScope.getEnclosingScope(); // pop scope } ; I have not figured out yet why this doesn't work. The examples is a one-to-one port of the Java example of pattern 17 Symbol Table for Nested Scopes of the Language Implementation Patterns. Any idea? Thanks, Marc >-----Original Message----- >From: Johannes Luber [mailto:JALuber at gmx.de] >Sent: Friday, January 15, 2010 1:59 PM >To: Marc Speyer; antlr-interest at antlr.org >Subject: Re: [antlr-interest] Tree pattern maching using the C target > >> Hi all, >> >> I have a similar issue using the C# target. Using the Cymbol.g example of >> pattern 17 Symbol Table for Nested Scopes of the Language Implementation >> Patterns book I could not get it to work because there is now downup >> method. >> According to the documentation this method walks the AST code using >> ANTLR's >> built-in downup( ) strategy. >> >> Am I correct assuming that this has not been implemented yet for the C# >> target (as Jim implies in his response). Is it difficult to implement it >> myself? I guess it would involve implementing the tree pattern matching >> stuff. >> >> Marc > >You are correct - there is no official version yet, which implements tree >pattern matching. I haven't gotten around to the API changes yet (will work >on that next week), though I have checked in some untested changes. It >would be the easieast if you'd base your own code on that for now. > >Johannes > >> P.S. Hope this email files under the proper subject thread, and apologies >> in >> advance if it isn't (Just subscribed to the mailing list but I could not >> find out how to get previous posts from it) >> >> > Pattern matcher or normal tree walker? The pattern stuff is not >> implemented in the C target yet. >> > >> > Jim >> > >> >> -----Original Message----- >> >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >> >> bounces at antlr.org] On Behalf Of Heiko Folkerts >> >> Sent: Thursday, January 14, 2010 5:01 AM >> >> To: antlr-interest at antlr.org >> >> Subject: [antlr-interest] Tree pattern maching using the C target >> >> >> >> Hi all, >> >> I wrote al litle tree pattern matcher for a specific validation we >need >> >> in our grammar. ANTLR and the C compiler compile it all well but there >> >> is now "downup" mehtod for running the matcher. Instead I only see our >> >> own rules in the generated parser. So, is the method to run when using >> >> a tree pattern macher in the C target different than ^"downup"? How to >> >> run the matcher? >> >> >> >> I tried to find an answer in the C examples but there was only a >> >> treeparser and no tree pattern matcher. >> >> >> >> Thx+ >> >> Heiko >> >> >> >> >> >> -- >> >> >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address > >-- >GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! >Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From jimi at temporal-wave.com Mon Jan 18 09:01:45 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Mon, 18 Jan 2010 09:01:45 -0800 Subject: [antlr-interest] Rule ignored in CSS grammar In-Reply-To: Message-ID: <3212d1f335ec784fa51b4f69cb159efc@temporal-wave.com> You might start with this one http://www.antlr.org/grammar/1240941192304/css21.g (CSS 2.1 grammar that I contributed) and just upgrade it to CSS 3, which is not much different to be honest. The main difficulties are properly lexing the input and the example you quote does not do the lexing correctly. Adding to the parsing constructs is trivial. I am not sure why you would try to prevent the use of '_' as that is part of the spec. However the way you are doing it will not work anyway because the lexer is context free and just produces the tokens it sees (it is not driven by the parser). Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Michnay Bal?zs > Sent: Monday, January 18, 2010 6:18 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Rule ignored in CSS grammar > > Hi Guys, > > The attached grammar is supposed to parse CSS files. I used this as an > initial version: > > http://www.antlr.org/grammar/1214945003224/csst3.g > > First I tried to add a functionality to prevent "_" chars for property > names, so I created a new lexer rule "CSSPROPERTYNAME" to ensure this. > The "declaration" rule has been updated accordingly. The funny thing is > that now the "selector" rule fails to recognize tag selectors like: > > .class_selector img { > property: value; > ... > } > > Since my update should only affect property names and not selectors, I > really do not understand what the problem is. I tried to define lexer > rules as both fragments and literal values, no luck. > > I used ANTLRWorks to debug this and have noticed that in the "selector" > rule "selectorOperation" is ignored: > > selector > : elem selectorOperation* attrib* pseudo? -> elem selectorOperation* > attrib* pseudo* > ; > > Any ideas? > > Thanks. > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 9.0.725 / Virus Database: 270.14.148/2629 - Release Date: > 01/17/10 11:35:00 > > From Jim.Mayer at xerox.com Mon Jan 18 10:38:00 2010 From: Jim.Mayer at xerox.com (Mayer, Jim) Date: Mon, 18 Jan 2010 10:38:00 -0800 Subject: [antlr-interest] ANTLRWorks user registration and firewalls Message-ID: <80EA5989D3149B42B9816C8BE2BADD230E709BA2@USA7061MS02.na.xerox.net> Hi, I'm having difficulty getting ANTLRWorks to start up at work. At home, the system works fine. A quick inspection of the code suggests that the problem is that ANTLRWorks tracks usage statistics and insists upon getting an "ID" from a site at antlr.org as part of its initial startup (this happens even if you ask it to not send information during the "Welcome to ANTLRWorks" dialog). Has anyone else run into this problem? I did some web searches and didn't see any. In addition, I am uncomfortable that the package collects usage statistics (even innocuous ones) without announcing the fact or requesting permission. I would prefer that ANTLRWorks use an "opt in" mechanism, and that if users decline to register that the package have no communication off box. Thanks. -- Jim Mayer From David.Grieve at Sun.COM Mon Jan 18 12:29:17 2010 From: David.Grieve at Sun.COM (David Grieve) Date: Mon, 18 Jan 2010 15:29:17 -0500 Subject: [antlr-interest] Detecting a space as a token Message-ID: <9C71029D-1ED3-4C32-9E08-7BA4C8C40B92@Sun.com> In CSS, a selector is (roughly) a sequence of simple selectors joined by a combinator. http://www.antlr.org/grammar/1240941192304/css21.g has the following rules which correspond to this. combinator : PLUS | GREATER | ; selector : simpleSelector (combinator simpleSelector)* ; The issue I'm having is how to handle the combinator which is a space in the selector rule. Specifically, I should be able to parse A .b as two simple selectors: A and .b. However, since whitespace is ignored, this is getting parsed as one selector. The following parses as desired: A *.b Using the universal selector as part of the second simple selector is a workaround that I shouldn't have to employ. How can I parse "A.b" such that the space is recognized as a combinator? Thanks in advance for any help! David Grieve Sun Microsystems, Inc. From jimi at temporal-wave.com Mon Jan 18 14:45:29 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Mon, 18 Jan 2010 14:45:29 -0800 Subject: [antlr-interest] Detecting a space as a token In-Reply-To: <9C71029D-1ED3-4C32-9E08-7BA4C8C40B92@Sun.com> Message-ID: <3cf75046cfa0ba49b72c4c110e903cd6@temporal-wave.com> All you need do is use a predicate at the DOT, which is where the esPred rule is. You can change the syntactic predicate to a semantic predicate and check for #, ., ( and : via input.LT() but can also look at the previous token even if off channel to make sure it is not a space: simpleSelector : elementName ({ mySemPred() }?=>elementSubsequent)* | ({ mySemPred() }?=>elementSubsequent)+ ; @parser:members { boolean mySemPred() { switch (input.LA(1)) { case DOT: // Only if no preceding spaces (but is that correct for CSS? // if ((TokenStream)input).get( input.index()-1 ).getType() != WS) { return true; } else {return false; } break; case HASH: case LBRACKET: case COLON: return true; default: return false; } } > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of David Grieve > Sent: Monday, January 18, 2010 12:29 PM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Detecting a space as a token > > In CSS, a selector is (roughly) a sequence of simple selectors joined > by a combinator. http://www.antlr.org/grammar/1240941192304/css21.g has > the following rules which correspond to this. > > combinator > : PLUS > | GREATER > | > ; > > selector > : simpleSelector (combinator simpleSelector)* > ; > > The issue I'm having is how to handle the combinator which is a space > in the selector rule. Specifically, I should be able to parse > > A .b > > as two simple selectors: A and .b. However, since whitespace is > ignored, this is getting parsed as one selector. The following parses > as desired: > > A *.b > > Using the universal selector as part of the second simple selector is a > workaround that I shouldn't have to employ. > > How can I parse "A.b" such that the space is recognized as a > combinator? Thanks in advance for any help! > David Grieve > Sun Microsystems, Inc. > > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From pcc482719 at gmail.com Tue Jan 19 08:47:52 2010 From: pcc482719 at gmail.com (Peter C. Chapin) Date: Tue, 19 Jan 2010 11:47:52 -0500 Subject: [antlr-interest] v3.2 C# runtime? Message-ID: <4B55E238.1080408@gmail.com> I'm looking for the ANTLR v3.2 C# runtime support assemblies. I must be missing something because I'm having no luck finding it. The page here http://www.antlr.org/download/CSharp does not include it. I thought, "Oh, okay... I'll download the source code and compile it myself." However the file antlr-3.2.tar.gz pointed at by the "ANTLR 3.2 source distribution" link on http://www.antlr.org/download.html seems to only contain the source for the Java runtime. I attempted a Google search but it only turned up the C# runtime for v3.1.3. Peter From parrt at antlr.org Tue Jan 19 12:30:20 2010 From: parrt at antlr.org (Terence Parr) Date: Tue, 19 Jan 2010 12:30:20 -0800 Subject: [antlr-interest] ANTLR v4 planning stages Message-ID: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> hiya. I'm now ready to embark on ANTLR (and ANTLRWorks) code development after 2 years in book-writing mode. I've come to the conclusion that we need to completely rebuild ANTLR v3, yielding v4. After I finish, I'll update The definitive ANTLR reference book for v4. how this came about: I must reimplement ANTLR v3 in v3, just like I did recently for ST (yielding ST v4). Besides being untidy, important projects like eclipse cannot include ANTLR at the moment due to license restrictions on it's v2 dependency. After discussions with the other developers, I've come to the conclusion that it would be best to rewrite the tool itself from scratch. I'm talking about the tool itself here. The runtime should remain the same, although I hope to optimize the generated code quite a bit. here is the planning page: http://www.antlr.org/wiki/display/~admin/ANTLR+v4+plans no doubt there will be bug fix releases for v3 as we go along. While there was a huge discontinuity between v2 and v3, that was because of the completely new approach. v3 to v4 should be backward compatible or most grammars. The rest should only require a few tweaks. My goal is simply to reimplement existing functionality first and then consider a number of improvements (such as the cool new expression grammar notation). Consider v4.0 a giant re-factoring pass on the internals. Ter From parrt at antlr.org Tue Jan 19 12:30:50 2010 From: parrt at antlr.org (Terence Parr) Date: Tue, 19 Jan 2010 12:30:50 -0800 Subject: [antlr-interest] ANTLR v4 lexer thoughts Message-ID: <32CAC08E-830F-4738-8AEC-74CD5CA8C7C0@antlr.org> In the realm of future improvements, I'm thinking about changing the generate code for lexer grammars. My thoughts are here: http://www.antlr.org/wiki/display/~admin/2010/01/19/ANTLR+v4+lexers Ter From scott at javadude.com Tue Jan 19 12:49:59 2010 From: scott at javadude.com (Scott Stanchfield) Date: Tue, 19 Jan 2010 15:49:59 -0500 Subject: [antlr-interest] ANTLR v4 planning stages In-Reply-To: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> References: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> Message-ID: RE Language-agnostic actions - if you treat this as a strategy pattern (like I seem to recall you did in the antlr 2 code base) this could work really well. What would be really cool IMNSHO: grammar Foo; foo : xxxxxx {@doX(...); } ; fee : xxxxxx {@doY(...); } ; and the generators could generate a spec/interface/abstract class for the action methods, like in Java: public interface FooActionStrategy { void doX(...); void doY(...); } and generate setActionStrategy(FooActionStrategy x) {...} that would be used in the code. All that's needed is an implementation. If all of the action code were simple action-strategy calls, this should be generatable in pretty much any target language. (Of course I haven't given this much thought, but it feels pretty good OTTOMH) -- Scott ---------------------------------------- Scott Stanchfield http://javadude.com From scott at javadude.com Tue Jan 19 12:50:19 2010 From: scott at javadude.com (Scott Stanchfield) Date: Tue, 19 Jan 2010 15:50:19 -0500 Subject: [antlr-interest] ANTLR v4 planning stages In-Reply-To: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> References: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> Message-ID: A little thing to add to the todo list if possible: I've been looking into debugging support in eclipse. When generating code, can you add in source-grammar-line/col-matchup comments a bit more often? in particular, having them appear just before any action code that's dropped into the generated code would be cool. Even better: if the comments could also appear before/after attribute expansion that would help as well. My goal is to be able to use the target-language debugger and map the current code position back to the grammar. This allows walking the grammar while being able to use all of the features of the target-language debugger (like inspecting variables and such). I know how to set this up for Java (I did it in ANTLR 2 using Java SMAPs and it worked well), and I suspect other target languages could do something similar with a bit more information. BTW: +1 for $FIRST/$FOLLOW! -- Scott ---------------------------------------- Scott Stanchfield http://javadude.com From scott at javadude.com Tue Jan 19 12:52:34 2010 From: scott at javadude.com (Scott Stanchfield) Date: Tue, 19 Jan 2010 15:52:34 -0500 Subject: [antlr-interest] ANTLR v4 planning stages In-Reply-To: References: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> Message-ID: oh - just noticed the language-agnostic symbol-table management - might be able to do something similar using a strategy pattern. Perhaps use something like IDL (semi-gack) to specify scopes? -- Scott ---------------------------------------- Scott Stanchfield http://javadude.com From parrt at cs.usfca.edu Tue Jan 19 12:55:08 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Tue, 19 Jan 2010 12:55:08 -0800 Subject: [antlr-interest] ANTLR v4 planning stages In-Reply-To: References: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> Message-ID: <9FBCAA66-66F1-4BE0-A482-62FBA7268FC4@cs.usfca.edu> yeah, I was wondering how we would integrate a generic language (NIL? neutral imperative language? chuckle) with the surrounding code in whatever language. Named method calls like this could work well. Ter On Jan 19, 2010, at 12:49 PM, Scott Stanchfield wrote: > RE Language-agnostic actions - if you treat this as a strategy pattern > (like I seem to recall you did in the antlr 2 code base) this could > work really well. What would be really cool IMNSHO: > > grammar Foo; > foo : xxxxxx {@doX(...); } ; > fee : xxxxxx {@doY(...); } ; > > and the generators could generate a spec/interface/abstract class for > the action methods, like in Java: > > public interface FooActionStrategy { > void doX(...); > void doY(...); > } > > and generate > > setActionStrategy(FooActionStrategy x) {...} > > that would be used in the code. All that's needed is an implementation. > > If all of the action code were simple action-strategy calls, this > should be generatable in pretty much any target language. (Of course I > haven't given this much thought, but it feels pretty good OTTOMH) > > -- Scott > > ---------------------------------------- > Scott Stanchfield > http://javadude.com > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From parrt at antlr.org Tue Jan 19 12:56:31 2010 From: parrt at antlr.org (Terence Parr) Date: Tue, 19 Jan 2010 12:56:31 -0800 Subject: [antlr-interest] ANTLR v4 planning stages In-Reply-To: References: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> Message-ID: On Jan 19, 2010, at 12:50 PM, Scott Stanchfield wrote: > A little thing to add to the todo list if possible: > > I've been looking into debugging support in eclipse. When generating > code, can you add in source-grammar-line/col-matchup comments a bit > more often? in particular, having them appear just before any action > code that's dropped into the generated code would be cool. yeah, easy to do > Even better: if the comments could also appear before/after attribute > expansion that would help as well. that could work although it might cloud the output a little bit. > My goal is to be able to use the target-language debugger and map the > current code position back to the grammar. This allows walking the > grammar while being able to use all of the features of the > target-language debugger (like inspecting variables and such). yeah, it's amazing how much those comments help even as they are. > I know how to set this up for Java (I did it in ANTLR 2 using Java > SMAPs and it worked well), and I suspect other target languages could > do something similar with a bit more information. > > BTW: +1 for $FIRST/$FOLLOW! yep, long overdue T From scott at javadude.com Tue Jan 19 13:03:05 2010 From: scott at javadude.com (Scott Stanchfield) Date: Tue, 19 Jan 2010 16:03:05 -0500 Subject: [antlr-interest] ANTLR v4 planning stages In-Reply-To: <9FBCAA66-66F1-4BE0-A482-62FBA7268FC4@cs.usfca.edu> References: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> <9FBCAA66-66F1-4BE0-A482-62FBA7268FC4@cs.usfca.edu> Message-ID: NIL ;) Ahhh, gotta love the TLAs... If you keep the actions to strictly method calls passing attribute-expressions as values (and don't allow anything else) I'd think it would keep things simple for use and code generation. I assume you'd still allow target-language-specific actions, too, eh? Perhaps an option to specify NIL actions (wow - that might be a confusing name ;) or target-language actions - might be best not to mix 'em... -- Scott ---------------------------------------- Scott Stanchfield http://javadude.com From parrt at cs.usfca.edu Tue Jan 19 13:07:04 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Tue, 19 Jan 2010 13:07:04 -0800 Subject: [antlr-interest] ANTLR v4 planning stages In-Reply-To: References: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> <9FBCAA66-66F1-4BE0-A482-62FBA7268FC4@cs.usfca.edu> Message-ID: <455CF418-6925-460F-A6AB-875F40DD6783@cs.usfca.edu> On Jan 19, 2010, at 1:03 PM, Scott Stanchfield wrote: > NIL ;) Ahhh, gotta love the TLAs... > > If you keep the actions to strictly method calls passing > attribute-expressions as values (and don't allow anything else) I'd > think it would keep things simple for use and code generation. i was thinking something like arbitrary code in some simple imperative language and then any call to @foo() or whatever would call foo in the target language. > I assume you'd still allow target-language-specific actions, too, eh? > Perhaps an option to specify NIL actions (wow - that might be a > confusing name ;) or target-language actions - might be best not to > mix 'em... we'd use language=Java for the target language as we do now and then add perhaps actions=NIL to specify what the actions look like. The default would be actions=language. Perhaps ALE or AIL=ANTLR imperative language? :) T From scott at javadude.com Tue Jan 19 13:19:22 2010 From: scott at javadude.com (Scott Stanchfield) Date: Tue, 19 Jan 2010 16:19:22 -0500 Subject: [antlr-interest] ANTLR v4 planning stages In-Reply-To: <455CF418-6925-460F-A6AB-875F40DD6783@cs.usfca.edu> References: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> <9FBCAA66-66F1-4BE0-A482-62FBA7268FC4@cs.usfca.edu> <455CF418-6925-460F-A6AB-875F40DD6783@cs.usfca.edu> Message-ID: >> If you keep the actions to strictly method calls passing >> attribute-expressions as values (and don't allow anything else) I'd >> think it would keep things simple for use and code generation. > > i was thinking something like ?arbitrary code in some simple imperative language and then any call to @foo() or whatever would call foo in the target language. The thing I'd worry about would be feature creep in that language. Everyone would want "just one more feature" so it could better support their target language. You'd need to nail down that simple language so the generators for it could be written - if any new features were added all generators would be hit. If you kept it to simple method calls, they could do whatever logic they want inside the called method. This would force them to keep the grammar cleaner as well, actions just being calls to the strategy. Anyway, that's my 3c. I know you like writing languages ;) but my recommendation would be keep it simple and small, and anything more complex can be done inside the called methods. Chew on it a bit and see if anything interesting gets spit up or swallowed... >> I assume you'd still allow target-language-specific actions, too, eh? >> Perhaps an option to specify NIL actions (wow - that might be a >> confusing name ;) or target-language actions - might be best not to >> mix 'em... > > we'd use language=Java for the target language as we do now and then add perhaps actions=NIL to specify what the actions look like. The default would be actions=language. Cool - best of both worlds. > Perhaps ALE or AIL=ANTLR imperative language? :) Not bad... (though "AIL" conjures images of sickness) ANTLRScript? (shudder) -- Scott From parrt at cs.usfca.edu Tue Jan 19 16:33:20 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Tue, 19 Jan 2010 16:33:20 -0800 Subject: [antlr-interest] ANTLR v4 planning stages In-Reply-To: References: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> <9FBCAA66-66F1-4BE0-A482-62FBA7268FC4@cs.usfca.edu> <455CF418-6925-460F-A6AB-875F40DD6783@cs.usfca.edu> Message-ID: <8928CE11-E3D6-4647-81F9-66F810759FE1@cs.usfca.edu> On Jan 19, 2010, at 1:19 PM, Scott Stanchfield wrote: >> i was thinking something like arbitrary code in some simple imperative language and then any call to @foo() or whatever would call foo in the target language. > > The thing I'd worry about would be feature creep in that language. > Everyone would want "just one more feature" so it could better support > their target language. You'd need to nail down that simple language so > the generators for it could be written - if any new features were > added all generators would be hit. yeah, a real danger. > ANTLRScript? (shudder) i like! good idea. Ter From JALuber at gmx.de Tue Jan 19 16:42:02 2010 From: JALuber at gmx.de (Johannes Luber) Date: Wed, 20 Jan 2010 01:42:02 +0100 Subject: [antlr-interest] Tree pattern maching using the C# (was C) target In-Reply-To: <002401ca985c$bd00f450$3702dcf0$@y.speyer@inter.nl.net> References: <000901ca95d9$df6dd740$9e4985c0$@y.speyer@inter.nl.net> <20100115125833.242280@gmx.net> <002401ca985c$bd00f450$3702dcf0$@y.speyer@inter.nl.net> Message-ID: <20100120004202.274260@gmx.net> > Hi Johannes, > > I tried the version that you mentioned by downloading it from > antlr:/runtime/CSharp2 in the Fisheye code repository and then tried to > compile it using VS2008. This didn't work because a file > "TokenConstants.cs" > was reported missing by VS2008 and gave me compilation errors. I managed > to > get a version from the CSharp3 repository and after making one change I > could compile. Oops - I thought that I had checked in that file already. Can you send both TokenConstants.cs (for comparing with my own version) and the modified grammar file to the list? I'm not sure where the error can be as I lifted more than a few file from the CSharp3 target. Sam, can you check if the grammar works with CSharp3 target? It would be helpful to narrow down the cause. Johannes > I noticed that the Downup method is part of the Treefilter > class which inherits from the TreeParser class. The grammar for the tree > parser from the example has the following header: > > // START: header > tree grammar DefRef; > options { > tokenVocab = Cymbol; > ASTLabelType = CommonTree; > filter = true; > language=CSharp2; > } > @members { > SymbolTable symtab; > Scope currentScope; > public DefRef(ITreeNodeStream input, SymbolTable symtab) > : this(input) > { > this.symtab = symtab; > currentScope = symtab.globals; > } > } > // END: header > > Generating the tree parser gives DefRef.cs with the DefRef class declared > as: > > public partial class DefRef : TreeParser > > > Now I can cast this into the TreeFilter class but to test things quickly I > changed the above line in the DefRef.cs into: > > public partial class DefRef : TreeFilter > > > In the calling program I use: > > DefRef def = new DefRef(nodes, symtab); // use custom constructor > def.Downup(t); // trigger symtab actions upon certain subtrees > > When I run this nothings happens whereas I have grammar rules and actions > like: > > exitBlock > : BLOCK > { > Console.WriteLine("locals: "+currentScope); > currentScope = currentScope.getEnclosingScope(); // pop scope > } > ; > > I have not figured out yet why this doesn't work. The examples is a > one-to-one port of the Java example of pattern 17 Symbol Table for Nested > Scopes of the Language Implementation Patterns. > > Any idea? > > Thanks, > > Marc > >-----Original Message----- > >From: Johannes Luber [mailto:JALuber at gmx.de] > >Sent: Friday, January 15, 2010 1:59 PM > >To: Marc Speyer; antlr-interest at antlr.org > >Subject: Re: [antlr-interest] Tree pattern maching using the C target > > > >> Hi all, > >> > >> I have a similar issue using the C# target. Using the Cymbol.g example > of > >> pattern 17 Symbol Table for Nested Scopes of the Language > Implementation > >> Patterns book I could not get it to work because there is now downup > >> method. > >> According to the documentation this method walks the AST code using > >> ANTLR's > >> built-in downup( ) strategy. > >> > >> Am I correct assuming that this has not been implemented yet for the C# > >> target (as Jim implies in his response). Is it difficult to implement > it > >> myself? I guess it would involve implementing the tree pattern matching > >> stuff. > >> > >> Marc > > > >You are correct - there is no official version yet, which implements tree > >pattern matching. I haven't gotten around to the API changes yet (will > work > >on that next week), though I have checked in some untested changes. It > >would be the easieast if you'd base your own code on that for now. > > > >Johannes > > > >> P.S. Hope this email files under the proper subject thread, and > apologies > >> in > >> advance if it isn't (Just subscribed to the mailing list but I could > not > >> find out how to get previous posts from it) > >> > >> > Pattern matcher or normal tree walker? The pattern stuff is not > >> implemented in the C target yet. > >> > > >> > Jim > >> > > >> >> -----Original Message----- > >> >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > >> >> bounces at antlr.org] On Behalf Of Heiko Folkerts > >> >> Sent: Thursday, January 14, 2010 5:01 AM > >> >> To: antlr-interest at antlr.org > >> >> Subject: [antlr-interest] Tree pattern maching using the C target > >> >> > >> >> Hi all, > >> >> I wrote al litle tree pattern matcher for a specific validation we > >need > >> >> in our grammar. ANTLR and the C compiler compile it all well but > there > >> >> is now "downup" mehtod for running the matcher. Instead I only see > our > >> >> own rules in the generated parser. So, is the method to run when > using > >> >> a tree pattern macher in the C target different than ^"downup"? How > to > >> >> run the matcher? > >> >> > >> >> I tried to find an answer in the C examples but there was only a > >> >> treeparser and no tree pattern matcher. > >> >> > >> >> Thx+ > >> >> Heiko > >> >> > >> >> > >> >> -- > >> > >> > >> > >> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >> Unsubscribe: > >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > > >-- > >GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! > >Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser From JALuber at gmx.de Tue Jan 19 16:46:34 2010 From: JALuber at gmx.de (Johannes Luber) Date: Wed, 20 Jan 2010 01:46:34 +0100 Subject: [antlr-interest] v3.2 C# runtime? In-Reply-To: <4B55E238.1080408@gmail.com> References: <4B55E238.1080408@gmail.com> Message-ID: <20100120004634.274280@gmx.net> > I'm looking for the ANTLR v3.2 C# runtime support assemblies. I must be > missing something because I'm having no luck finding it. The page here > > http://www.antlr.org/download/CSharp > > does not include it. I thought, "Oh, okay... I'll download the source > code and compile it myself." However the file antlr-3.2.tar.gz pointed > at by the "ANTLR 3.2 source distribution" link on > http://www.antlr.org/download.html seems to only contain the source for > the Java runtime. I attempted a Google search but it only turned up the > C# runtime for v3.1.3. > > Peter I'm still working on the 3.2 version. Unless you need the tree pattern matching feature you can stick to the latest version. Otherwise you'd need the repository version, where at least one missing file and a bug have been reported already. Johannes -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser From sharwell at pixelminegames.com Tue Jan 19 17:30:43 2010 From: sharwell at pixelminegames.com (Sam Harwell) Date: Tue, 19 Jan 2010 19:30:43 -0600 Subject: [antlr-interest] ANTLR v4 planning stages References: <6B4FBB03-3E1A-4DC9-9788-20787BF2A94F@antlr.org> Message-ID: Would it work for you if this information was placed in an xml file next to the generated code? Sam -----Original Message----- From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Scott Stanchfield Sent: Tuesday, January 19, 2010 2:50 PM To: Terence Parr Cc: antlr-interest at antlr.org interest Subject: Re: [antlr-interest] ANTLR v4 planning stages A little thing to add to the todo list if possible: I've been looking into debugging support in eclipse. When generating code, can you add in source-grammar-line/col-matchup comments a bit more often? in particular, having them appear just before any action code that's dropped into the generated code would be cool. Even better: if the comments could also appear before/after attribute expansion that would help as well. My goal is to be able to use the target-language debugger and map the current code position back to the grammar. This allows walking the grammar while being able to use all of the features of the target-language debugger (like inspecting variables and such). I know how to set this up for Java (I did it in ANTLR 2 using Java SMAPs and it worked well), and I suspect other target languages could do something similar with a bit more information. BTW: +1 for $FIRST/$FOLLOW! -- Scott ---------------------------------------- Scott Stanchfield http://javadude.com List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From sharwell at pixelminegames.com Tue Jan 19 17:31:16 2010 From: sharwell at pixelminegames.com (Sam Harwell) Date: Tue, 19 Jan 2010 19:31:16 -0600 Subject: [antlr-interest] Expression parsing ideas for ANTLR v4 Message-ID: Several expression parsers are limited to handling the binary operator portion of the expression. In addition to the obvious limitations, it poses an additional problem for languages like C++ where the assignment operators are split (in precedence) from the rest of the binary operators by the ternary operator (?:). My most complicated production ANTLR grammar (parses the UnrealScript language) currently uses a completely new expression parser that offers a great deal more flexibility than the previous approaches I tried. I don't think it's the end-all solution for integrating expression parsing into ANTLR for v4, but I believe it's a worthwhile example to show what's possible. Here are some pros and cons of the implementation: Pros: * The source code declaring the operator precedence and associativity is very clean (see reference to UnrealScriptParserHelper.cs below) * Very fast execution * Supports a great deal more operations than simply binary operators * Supports operator precedence and associativity in groups * Directly supports changing the token type during AST generation - for example if the token '-' is named MINUS, you could produce an AST with AST_SUBTRACT when it appears as a binary operator and AST_NEGATE when it appears as a unary operator. Cons: * Not currently integrated into the ANTLR language (executes in code) * No compile-time detection of ambiguous operator rules * Not implemented as fully as is possible General idea: Parse every component of an expression into a list - this includes all operators and "atoms". The list is then passed to a "precedence processor" to produce a tree for that expression. Operator categories: This parser was built with the following categories in mind, but the grouping operators are not implemented at this point. With this as a starting place, it's clear how the list might be expanded in the future: * Unary operator: either prefix or postfix * Binary operator * Ternary operator * Grouping operator: for example, the ( and ) in (expression) * Postfix grouping operator: for example, the ( and ) in methodName(args) or the [ and ] in var[index]. * Prefix grouping operator: for example, the ( and ) in (TargetType)objectToCast. Attached is: * UnrealScriptParserHelper.cs: The complete code for declaring a working precedence parser for UnrealScript. * Antlr.Runtime.Expressions.zip: The current implementation of this feature. I'm very interested in any feedback y'all may have on this. Thank you, Sam Harwell -------------- next part -------------- A non-text attachment was scrubbed... Name: UnrealScriptParserHelper.cs Type: application/octet-stream Size: 5227 bytes Desc: UnrealScriptParserHelper.cs Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100119/97cae7d9/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Antlr.Runtime.Expressions.zip Type: application/x-zip-compressed Size: 6152 bytes Desc: Antlr.Runtime.Expressions.zip Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100119/97cae7d9/attachment.bin From pcc482719 at gmail.com Tue Jan 19 18:48:07 2010 From: pcc482719 at gmail.com (Peter C. Chapin) Date: Tue, 19 Jan 2010 21:48:07 -0500 Subject: [antlr-interest] v3.2 C# runtime? In-Reply-To: <20100120004634.274280@gmx.net> References: <4B55E238.1080408@gmail.com> <20100120004634.274280@gmx.net> Message-ID: <4B566EE7.4040601@gmail.com> On 2010-01-19 19:46, Johannes Luber wrote: > I'm still working on the 3.2 version. Unless you need the tree pattern > matching feature you can stick to the latest version. Otherwise you'd > need the repository version, where at least one missing file and a bug > have been reported already. Okay, that's good to know! At least I'm not crazy for not finding it on the regular download page. Anyway thanks for the update. I'll continue with 3.1.3 for now. Peter From gustaf.j at gmail.com Wed Jan 20 01:49:14 2010 From: gustaf.j at gmail.com (Gustaf Johansson) Date: Wed, 20 Jan 2010 10:49:14 +0100 Subject: [antlr-interest] Implicit imports Message-ID: <5f59a7211001200149i7c7ad186k50a1589d4862906b@mail.gmail.com> Hi, I have a grammar in which there can be implicit imports of a few definitions example: module A { enum myEnumA { A1, A2, A3 } } module B { import module A; function myFuncB (int, myEnumA) { ... } } module Prog { import B; myFuncB (1, A2); * } *Here A2 is implicitly known to be of type myEnumA, since the definition of myFuncB is in B and B imports A. The problem i have is that my parser reports A2 as unknown. I have not come up with a good and simple solution to this. I have been thinking along the lines of: Check definition of myFuncB and if it takes a enum as argument, check the local module's imports for the definition of that enum. Any help is really appreciated. Best Regards Gustaf From linlin.xie at siemens.com Wed Jan 20 03:31:44 2010 From: linlin.xie at siemens.com (Xie, Linlin) Date: Wed, 20 Jan 2010 12:31:44 +0100 Subject: [antlr-interest] UTF-8 input? Message-ID: <79118B9FE8CE8E49B0D71964A79CB647033CA2D5@dekomplm002.net.plm.eds.com> Can anyone tell me if antlr3.1.3 generated parser works with UTF-8 input? If it does, how should I configure in the grammar? I noticed there are two macros ANTLR3_INLINE_INPUT_ASCII and ANTLR3_INLINE_INPUT_UTF16, but no UTF-8 one. Many thanks! Linlin From arne.schroeder at gmail.com Wed Jan 20 03:58:27 2010 From: arne.schroeder at gmail.com (=?ISO-8859-1?Q?Arne_Schr=F6der?=) Date: Wed, 20 Jan 2010 12:58:27 +0100 Subject: [antlr-interest] Missing error when tokens are left to parse In-Reply-To: <4b504d15.2508c00a.6ff6.4932SMTPIN_ADDED@mx.google.com> References: <1ec078df1001150127r753cb368p3e70c1039d59101d@mail.gmail.com> <4b504d15.2508c00a.6ff6.4932SMTPIN_ADDED@mx.google.com> Message-ID: Thank you for your help. It now works insofar as the parser now throws an error-message when not encountering EOF after all rules are finished. On Fri, Jan 15, 2010 at 12:10 PM, Gavin Lambert wrote: > At 22:43 15/01/2010, Arne Schr?der wrote: > >file : section1 section2? > > ; > [...] > > >If I now try to parse "Section1 bla()) Section2" something similar > >happens: > >It parses up to the second ")" and then decides to skip the rest. > >And I definitely do not want the second ")" to be there i.e. want > >it to throw a recognition-error and recover itself. > > Try adding EOF to the end of your top-level rule. Without that, ANTLR > assumes that it is not required to parse all the input, so if it > successfully parses a section1 it will just decide that the section2 has > been omitted (since it's optional). > > From JALuber at gmx.de Wed Jan 20 04:08:14 2010 From: JALuber at gmx.de (Johannes Luber) Date: Wed, 20 Jan 2010 13:08:14 +0100 Subject: [antlr-interest] Expression parsing ideas for ANTLR v4 In-Reply-To: References: Message-ID: <20100120120814.15250@gmx.net> > Several expression parsers are limited to handling the binary operator > portion of the expression. In addition to the obvious limitations, it > poses an additional problem for languages like C++ where the assignment > operators are split (in precedence) from the rest of the binary > operators by the ternary operator (?:). My most complicated production > ANTLR grammar (parses the UnrealScript language) currently uses a > completely new expression parser that offers a great deal more > flexibility than the previous approaches I tried. I don't think it's the > end-all solution for integrating expression parsing into ANTLR for v4, > but I believe it's a worthwhile example to show what's possible. Here > are some pros and cons of the implementation: > ... > > I'm very interested in any feedback y'all may have on this. > As a layman in expression parsing I don't feel qualified to comment on if your solution lacks certain features, but the way you define the operators looks clean to me. One knows immediately how operators work in a given language. The only not obvious thing is if the precedence is ascending or descending. I guess ascending from my knowledge of C#. BTW, which tokens are encoded as CATEQ and CAT2EQ? Johannes -- Preisknaller: GMX DSL Flatrate f?r nur 16,99 Euro/mtl.! http://portal.gmx.net/de/go/dsl02 From linlin.xie at siemens.com Wed Jan 20 04:23:33 2010 From: linlin.xie at siemens.com (Xie, Linlin) Date: Wed, 20 Jan 2010 13:23:33 +0100 Subject: [antlr-interest] FW: UTF-8 input? Message-ID: <79118B9FE8CE8E49B0D71964A79CB647033CA342@dekomplm002.net.plm.eds.com> Sorry, I mean the antlr generated C parser! Thanks! -----Original Message----- From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Xie, Linlin Sent: 20 January 2010 11:32 To: antlr-interest at antlr.org Subject: [antlr-interest] UTF-8 input? Can anyone tell me if antlr3.1.3 generated parser works with UTF-8 input? If it does, how should I configure in the grammar? I noticed there are two macros ANTLR3_INLINE_INPUT_ASCII and ANTLR3_INLINE_INPUT_UTF16, but no UTF-8 one. Many thanks! Linlin List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From sharwell at pixelminegames.com Wed Jan 20 06:09:08 2010 From: sharwell at pixelminegames.com (Sam Harwell) Date: Wed, 20 Jan 2010 08:09:08 -0600 Subject: [antlr-interest] Expression parsing ideas for ANTLR v4 References: <20100120120814.15250@gmx.net> Message-ID: UnrealScript uses $ and @ for two types of string concatenation. It also has $= and @= to match. -----Original Message----- From: Johannes Luber [mailto:JALuber at gmx.de] Sent: Wednesday, January 20, 2010 6:08 AM To: Sam Harwell; antlr-interest at antlr.org Subject: Re: [antlr-interest] Expression parsing ideas for ANTLR v4 > Several expression parsers are limited to handling the binary operator > portion of the expression. In addition to the obvious limitations, it > poses an additional problem for languages like C++ where the assignment > operators are split (in precedence) from the rest of the binary > operators by the ternary operator (?:). My most complicated production > ANTLR grammar (parses the UnrealScript language) currently uses a > completely new expression parser that offers a great deal more > flexibility than the previous approaches I tried. I don't think it's the > end-all solution for integrating expression parsing into ANTLR for v4, > but I believe it's a worthwhile example to show what's possible. Here > are some pros and cons of the implementation: > ... > > I'm very interested in any feedback y'all may have on this. > As a layman in expression parsing I don't feel qualified to comment on if your solution lacks certain features, but the way you define the operators looks clean to me. One knows immediately how operators work in a given language. The only not obvious thing is if the precedence is ascending or descending. I guess ascending from my knowledge of C#. BTW, which tokens are encoded as CATEQ and CAT2EQ? Johannes -- Preisknaller: GMX DSL Flatrate f?r nur 16,99 Euro/mtl.! http://portal.gmx.net/de/go/dsl02 From jimi at temporal-wave.com Wed Jan 20 08:30:47 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Wed, 20 Jan 2010 08:30:47 -0800 Subject: [antlr-interest] UTF-8 input? In-Reply-To: <79118B9FE8CE8E49B0D71964A79CB647033CA2D5@dekomplm002.net.plm.eds.com> Message-ID: You need to remember to state which target you are talking about. I have written a new universal input stream for the next version of the C runtime. It takes 8bit, 16 bit, UTF-8, UTF-16, UCS2, UTF32 and EBCDIC (code gen will change slightly to support this). It is not well tested right now but will be available as a snapshot 3.3 release shortly in the downloads page. In the meantime the easiest thing to do is to convert to UCS2 using the supplied converter in the current runtime. Though this will not work with surrogate pairs in UTF-16 though but most people do not need that. If you really need UTf-8 without conversion then it is easy enough to write, or you can just steal the code from my check in of the code in about 10 minutes. Note that while the streams work, I have not provided ANTLR3_STRING support for UTF-8 and so on yet and so getting $text from such a stream may or may not work, Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Xie, Linlin > Sent: Wednesday, January 20, 2010 3:32 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] UTF-8 input? > > Can anyone tell me if antlr3.1.3 generated parser works with UTF-8 > input? If it does, how should I configure in the grammar? I noticed > there are two macros ANTLR3_INLINE_INPUT_ASCII and > ANTLR3_INLINE_INPUT_UTF16, but no UTF-8 one. > > > > Many thanks! > > Linlin > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From aph at redhat.com Wed Jan 20 10:57:43 2010 From: aph at redhat.com (Andrew Haley) Date: Wed, 20 Jan 2010 18:57:43 +0000 Subject: [antlr-interest] java.g does not compile Message-ID: <4B575227.5050904@redhat.com> I just downloaded java.g from http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g and ~ $ java -jar Downloads/antlr-3.2.jar java.g warning(209): java.g:1771:1: Multiple token rules can match input such as "'*'": STAR, STAREQ As a result, token(s) STAREQ were disabled for that input warning(209): java.g:1811:1: Multiple token rules can match input such as "'i'": IF, IMPLEMENTS, IMPORT, INSTANCEOF, INT, INTERFACE, IDENTIFIER ... error(208): java.g:1799:1: The following token definitions can never be matched because prior tokens match the same input: INTLITERAL,DOUBLELITERAL,LINE_COMMENT,ASSERT,BREAK,BYTE,CATCH,CHAR,CLASS,CONST,CONTINUE,DO,DOUBLE,ENUM,EXTENDS,FINALLY,FLOAT,FOR,IMPLEMENTS,IMPORT,INSTANCEOF,INT,INTERFACE,NEW,PRIVATE,PROTECTED,PUBLIC,STATIC,STRICTFP,SUPER,SWITCH,SYNCHRONIZED,THROW,THROWS,TRANSIENT,TRY,VOLATILE,TRUE,FALSE,NULL,DOT,ELLIPSIS,EQEQ,PLUS,SUB,SLASH,AMP,BAR,PLUSEQ,SUBEQ,STAREQ,SLASHEQ,AMPEQ,BAREQ,CARETEQ,PERCENTEQ,BANGEQ This seems very odd. Any ideas? It's claimed to be a grammar for ANTLR v3. Andrew. From jimi at temporal-wave.com Wed Jan 20 11:31:45 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Wed, 20 Jan 2010 11:31:45 -0800 Subject: [antlr-interest] java.g does not compile In-Reply-To: <4B575227.5050904@redhat.com> Message-ID: <3feca76b9c5b0547bf1582becc8ac1c3@temporal-wave.com> Souds like your machine is pretty slow and the conversion timeout default is therefore not engouh. Use the -Xconversiontimeout 30000 option to increase the elapsed time it will spend on it. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Andrew Haley > Sent: Wednesday, January 20, 2010 10:58 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] java.g does not compile > > I just downloaded java.g from > http://openjdk.java.net/projects/compiler-grammar/antlrworks/Java.g > and > > ~ $ java -jar Downloads/antlr-3.2.jar java.g > warning(209): java.g:1771:1: Multiple token rules can match input such > as "'*'": STAR, STAREQ > > As a result, token(s) STAREQ were disabled for that input > warning(209): java.g:1811:1: Multiple token rules can match input such > as "'i'": IF, IMPLEMENTS, IMPORT, INSTANCEOF, INT, INTERFACE, > IDENTIFIER > > ... > > error(208): java.g:1799:1: The following token definitions can never be > matched because prior tokens match the same input: > INTLITERAL,DOUBLELITERAL,LINE_COMMENT,ASSERT,BREAK,BYTE,CATCH,CHAR,CLAS > S,CONST,CONTINUE,DO,DOUBLE,ENUM,EXTENDS,FINALLY,FLOAT,FOR,IMPLEMENTS,IM > PORT,INSTANCEOF,INT,INTERFACE,NEW,PRIVATE,PROTECTED,PUBLIC,STATIC,STRIC > TFP,SUPER,SWITCH,SYNCHRONIZED,THROW,THROWS,TRANSIENT,TRY,VOLATILE,TRUE, > FALSE,NULL,DOT,ELLIPSIS,EQEQ,PLUS,SUB,SLASH,AMP,BAR,PLUSEQ,SUBEQ,STAREQ > ,SLASHEQ,AMPEQ,BAREQ,CARETEQ,PERCENTEQ,BANGEQ > > This seems very odd. Any ideas? It's claimed to be a grammar for > ANTLR v3. > > Andrew. > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From wclodius at los-alamos.net Wed Jan 20 19:21:30 2010 From: wclodius at los-alamos.net (William B. Clodius) Date: Wed, 20 Jan 2010 20:21:30 -0700 Subject: [antlr-interest] Implicit imports In-Reply-To: <5f59a7211001200149i7c7ad186k50a1589d4862906b@mail.gmail.com> References: <5f59a7211001200149i7c7ad186k50a1589d4862906b@mail.gmail.com> Message-ID: <214750A6-A234-4B88-BBBF-CA67F62C3B64@los-alamos.net> First terminology. This sort of analysis is not done as part of parsing, but as part of the semantic analysis. You need to develop a simplified representation of the important semantic information, i.e., the names of public entities and their types and store that for comparison. Typically the modules can be in separate files, and to minimize processing it is useful to create a separate file for each module containing the information. The file should be much smaller than a typical source code file and the contents should have a structure as close as possible to the internal representation used for the data. However it is also useful to have additional information such as a time stamp, and a version number for the code that generated the summary, so that you can identify whether the contents are out of date either compared with the contents of the module or with the code of your compiler/interpreter. Typically the "summary" file is a text file so that problems can be visually identified, but a binary form can be more compact and faster to process. On Jan 20, 2010, at 2:49 AM, Gustaf Johansson wrote: > Hi, > > I have a grammar in which there can be implicit imports of a few > definitions example: > > module A { > enum myEnumA { A1, A2, A3 } > } > > module B { > import module A; > function myFuncB (int, myEnumA) { > ... > } > } > > > module Prog { > import B; > myFuncB (1, A2); * > } > > *Here A2 is implicitly known to be of type myEnumA, since the > definition of myFuncB is in B and B imports A. > > The problem i have is that my parser reports A2 as unknown. > I have not come up with a good and simple solution to this. > I have been thinking along the lines of: > Check definition of myFuncB and if it takes a enum as argument, check > the local module's imports for the definition of that enum. > > Any help is really appreciated. > > Best Regards Gustaf > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From aph at redhat.com Thu Jan 21 01:49:19 2010 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Jan 2010 09:49:19 +0000 Subject: [antlr-interest] java.g does not compile In-Reply-To: <3feca76b9c5b0547bf1582becc8ac1c3@temporal-wave.com> References: <3feca76b9c5b0547bf1582becc8ac1c3@temporal-wave.com> Message-ID: <4B58231F.3070701@redhat.com> On 01/20/2010 07:31 PM, Jim Idle wrote: > Souds like your machine is pretty slow and the conversion timeout default is therefore not engouh. > > Use the -Xconversiontimeout 30000 option to increase the elapsed time it will spend on it. Thank you, that worked. However, this is a fast machine: a four-core Nehalem-based Xeon system. There are faster machines available, but not many. :-) Andrew. From iwm at doc.ic.ac.uk Thu Jan 21 02:22:33 2010 From: iwm at doc.ic.ac.uk (Ian Moor) Date: Thu, 21 Jan 2010 10:22:33 +0000 Subject: [antlr-interest] gunit problem Message-ID: <4B582AE9.4030403@doc.ic.ac.uk> I am using the gunit which is provided with antlr 3.2 and I am trying to test parts of an tree, for example statement walks statements: "x=1" -> "ok" I expect an error message saying the code produced to System.out is not "ok", but gunit prints no output, ans stops with a non zero return value. I have a couple of simple program walks program (where program is the complete program),and when the example above is commented out gunit gives correct test results. If I use -o, gunit hangs, and when I stop it the junit file has code for all of the tests looking as if they can be run. Is there a way finding what is happening, or a later gunit ? Ian Moor From yurushkin at rambler.ru Thu Jan 21 03:21:43 2010 From: yurushkin at rambler.ru (=?koi8-r?B?4NLV28vJziDtycjBycw=?=) Date: Thu, 21 Jan 2010 14:21:43 +0300 Subject: [antlr-interest] [C target] Duplicating tree error Message-ID: Good day, I have the following rewrite rule: type_declaration_stmt : label? declaration_type_spec ( (T_COMMA attr_spec )* T_COLON_COLON )? entity_decl (T_COMMA entity_decl)* end_of_stmt -> ^(T_TYPE_DECLARATION_STMT declaration_type_spec attr_spec* entity_decl)+ ; and this is a piece of tree parser grammar: type_declaration_stmt : ^(T_TYPE_DECLARATION_STMT declaration_type_spec attr_spec* entity_decl) ; When I give "integer a, b, c" on the input, 3 sequential T_TYPE_DECLARATION-trees are generated. It's right. BUT declaration_type_spec subtree isn't dublicated (only the root of subtree). Where is mistake? thanks -- Best regards, Michael From antlr at mirality.co.nz Thu Jan 21 03:42:01 2010 From: antlr at mirality.co.nz (Gavin Lambert) Date: Fri, 22 Jan 2010 00:42:01 +1300 Subject: [antlr-interest] [C target] Duplicating tree error In-Reply-To: References: Message-ID: <20100121114217.994F13418424@www.antlr.org> At 00:21 22/01/2010, =?koi8-r?B?4NLV28vJziDtycjBycw=?= wrote: >type_declaration_stmt > : label? declaration_type_spec ( (T_COMMA attr_spec )* >T_COLON_COLON )? > entity_decl (T_COMMA entity_decl)* end_of_stmt > -> ^(T_TYPE_DECLARATION_STMT declaration_type_spec attr_spec* >entity_decl)+ > ; [...] >BUT declaration_type_spec subtree isn't dublicated (only the root >of subtree). > >Where is mistake? IIRC, when you use a rule name in a rewrite rule, it represents "the first unused instance of this rule in the input" (which is why entity_decl is doing what it is). So the second and subsequent times it appears (during the + loop) the value is empty since it didn't occur any more times in the input. To duplicate nodes you need to use a label. From yurushkin at rambler.ru Thu Jan 21 03:56:53 2010 From: yurushkin at rambler.ru (=?koi8-r?B?4NLV28vJziDtycjBycw=?=) Date: Thu, 21 Jan 2010 14:56:53 +0300 Subject: [antlr-interest] [C target] Duplicating tree error In-Reply-To: <20100121114219.75CF337588D@mx5.rambler.ru> References: <20100121114219.75CF337588D@mx5.rambler.ru> Message-ID: Excuse me, what you mean behind "you need to use a label"? Could you send me example? And, currently, I haven't seen problems with duplicating of "entity_decl" tree. I have a fault with a coping of "declaration_type_spec" tree. Gavin Lambert ?????(?) ? ????? ?????? Thu, 21 Jan 2010 14:42:01 +0300: > At 00:21 22/01/2010, =?koi8-r?B?4NLV28vJziDtycjBycw=?= wrote: > >type_declaration_stmt > > : label? declaration_type_spec ( (T_COMMA attr_spec )* > >T_COLON_COLON )? > > entity_decl (T_COMMA entity_decl)* end_of_stmt > > -> ^(T_TYPE_DECLARATION_STMT declaration_type_spec attr_spec* > >entity_decl)+ > > ; > [...] > >BUT declaration_type_spec subtree isn't dublicated (only the root > >of subtree). > > > >Where is mistake? > > IIRC, when you use a rule name in a rewrite rule, it represents "the > first unused instance of this rule in the input" (which is why > entity_decl is doing what it is). So the second and subsequent times it > appears (during the + loop) the value is empty since it didn't occur any > more times in the input. To duplicate nodes you need to use a label. > > > __________ Information from ESET Smart Security, version of virus > signature database 4792 (20100121) __________ > > The message was checked by ESET Smart Security. > > http://www.esetnod32.ru > > > -- Best regards, Michael From jimi at temporal-wave.com Thu Jan 21 06:40:58 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Thu, 21 Jan 2010 06:40:58 -0800 Subject: [antlr-interest] java.g does not compile In-Reply-To: <4B58231F.3070701@redhat.com> Message-ID: <47ff671de7f8524dbbca4695f1f41700@temporal-wave.com> You are probably right on the limit of the default 10000, or perhaps you are not compiling the exact original? Try the on in the examples zip and see if there are any differences. However, Xeon's are not as fast as you think on a single thread which is what the analysis phase runs on by default. Jim > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Thursday, January 21, 2010 1:49 AM > To: Jim Idle > Cc: antlr-interest at antlr.org > Subject: Re: [antlr-interest] java.g does not compile > > On 01/20/2010 07:31 PM, Jim Idle wrote: > > Souds like your machine is pretty slow and the conversion timeout > default is therefore not engouh. > > > > Use the -Xconversiontimeout 30000 option to increase the elapsed time > it will spend on it. > > Thank you, that worked. > > However, this is a fast machine: a four-core Nehalem-based Xeon system. > There are faster machines available, but not many. :-) > > Andrew. From jimi at temporal-wave.com Thu Jan 21 06:47:57 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Thu, 21 Jan 2010 06:47:57 -0800 Subject: [antlr-interest] [C target] Duplicating tree error In-Reply-To: Message-ID: <5cd0ce1546bc114c83d6a37e06a0f390@temporal-wave.com> Well, you are rewriting the tree with ^(....)+ but the tree grammar only walks one declaration ^(...). Unless you are using the + higher up the rule chain. Example is: ... e+=entity_decl (COMMA e+=entity_decl)* ... -> ^(X .... $e)+ Also, your rewrite rule loses the label. It is generally a good idea to break components up in to separate rules when rewriting as then the token boundaries of nodes are correctly set. So here, you would move label? In to a higher rule for instance. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of ??????? ?????? > Sent: Thursday, January 21, 2010 3:57 AM > To: Gavin Lambert; antlr-interest at antlr.org > Subject: Re: [antlr-interest] [C target] Duplicating tree error > > Excuse me, what you mean behind "you need to use a label"? Could you > send > me > example? > > And, currently, I haven't seen problems with duplicating of > "entity_decl" > tree. > I have a fault with a coping of "declaration_type_spec" tree. > > > Gavin Lambert ?????(?) ? ????? ?????? Thu, 21 > Jan > 2010 14:42:01 +0300: > > > At 00:21 22/01/2010, =?koi8-r?B?4NLV28vJziDtycjBycw=?= wrote: > > >type_declaration_stmt > > > : label? declaration_type_spec ( (T_COMMA attr_spec )* > > >T_COLON_COLON )? > > > entity_decl (T_COMMA entity_decl)* end_of_stmt > > > -> ^(T_TYPE_DECLARATION_STMT declaration_type_spec attr_spec* > > >entity_decl)+ > > > ; > > [...] > > >BUT declaration_type_spec subtree isn't dublicated (only the root > > >of subtree). > > > > > >Where is mistake? > > > > IIRC, when you use a rule name in a rewrite rule, it represents "the > > first unused instance of this rule in the input" (which is why > > entity_decl is doing what it is). So the second and subsequent times > it > > appears (during the + loop) the value is empty since it didn't occur > any > > more times in the input. To duplicate nodes you need to use a label. > > > > > > __________ Information from ESET Smart Security, version of virus > > signature database 4792 (20100121) __________ > > > > The message was checked by ESET Smart Security. > > > > http://www.esetnod32.ru > > > > > > > > > -- > Best regards, > Michael > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From aph at redhat.com Thu Jan 21 06:50:57 2010 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Jan 2010 14:50:57 +0000 Subject: [antlr-interest] java.g does not compile In-Reply-To: <47ff671de7f8524dbbca4695f1f41700@temporal-wave.com> References: <47ff671de7f8524dbbca4695f1f41700@temporal-wave.com> Message-ID: <4B5869D1.1090707@redhat.com> On 01/21/2010 02:40 PM, Jim Idle wrote: > You are probably right on the limit of the default 10000, or perhaps > you are not compiling the exact original? I haven't touched it. Honestly! Besides, the default seems to be 1000, not 10000. $ java -jar Downloads/antlr-3.2.jar -X -Xconversiontimeout t set NFA conversion timeout (ms) for each decision [1000] I changed it to 10000, and all is fine: --- antlr-3.2/tool/src/main/java/org/antlr/analysis/DFA.java~ 2009-09-23 19:36:06.000000000 +0100 +++ antlr-3.2/tool/src/main/java/org/antlr/analysis/DFA.java 2010-01-21 13:08:32.625782840 +0000 @@ -53,7 +53,7 @@ */ /** Set to 0 to not terminate early (time in ms) */ - public static int MAX_TIME_PER_DFA_CREATION = 1*1000; + public static int MAX_TIME_PER_DFA_CREATION = 10*1000; /** How many edges can each DFA state have before a "special" state * is created that uses IF expressions instead of a table? > Try the on in the examples zip and see if there are any > differences. However, Xeon's are not as fast as you think on a > single thread which is what the analysis phase runs on by default. Err, how on Earth do you know how fast I think Xeons are? :-) But anyway, most users aren't likely to have anything hugely faster. Andrew. From jimi at temporal-wave.com Thu Jan 21 07:00:30 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Thu, 21 Jan 2010 07:00:30 -0800 Subject: [antlr-interest] java.g does not compile In-Reply-To: <4B5869D1.1090707@redhat.com> Message-ID: <6653941088acac4fb5b7cb8c45a7a0ac@temporal-wave.com> I wouldn't change the default time out as then your project depends on a custom version of NATLR for no good reason. That was just my 6:40AM typo of course :-) I have a QX9450 and some i7s. I think that the Xeon server versions of 9450 etc might be slower on a single thread. I think a lot of the i7s are faster than Xeon? However I haven't bothered with Xeon myself. But, it depends what you are measuring. Most of the published benchmark programs are worthless. Jim > -----Original Message----- > From: Andrew Haley [mailto:aph at redhat.com] > Sent: Thursday, January 21, 2010 6:51 AM > To: Jim Idle > Cc: antlr-interest at antlr.org > Subject: Re: [antlr-interest] java.g does not compile > > On 01/21/2010 02:40 PM, Jim Idle wrote: > > > You are probably right on the limit of the default 10000, or perhaps > > you are not compiling the exact original? > > I haven't touched it. Honestly! > > Besides, the default seems to be 1000, not 10000. > > $ java -jar Downloads/antlr-3.2.jar -X > -Xconversiontimeout t set NFA conversion timeout (ms) for each > decision [1000] > > I changed it to 10000, and all is fine: > > --- antlr-3.2/tool/src/main/java/org/antlr/analysis/DFA.java~ 2009- > 09-23 19:36:06.000000000 +0100 > +++ antlr-3.2/tool/src/main/java/org/antlr/analysis/DFA.java 2010- > 01-21 13:08:32.625782840 +0000 > @@ -53,7 +53,7 @@ > */ > > /** Set to 0 to not terminate early (time in ms) */ > - public static int MAX_TIME_PER_DFA_CREATION = 1*1000; > + public static int MAX_TIME_PER_DFA_CREATION = 10*1000; > > /** How many edges can each DFA state have before a "special" > state > * is created that uses IF expressions instead of a table? > > > Try the on in the examples zip and see if there are any > > differences. However, Xeon's are not as fast as you think on a > > single thread which is what the analysis phase runs on by default. > > Err, how on Earth do you know how fast I think Xeons are? :-) > But anyway, most users aren't likely to have anything hugely faster. > > Andrew. From aph at redhat.com Thu Jan 21 07:22:39 2010 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Jan 2010 15:22:39 +0000 Subject: [antlr-interest] java.g does not compile In-Reply-To: <6653941088acac4fb5b7cb8c45a7a0ac@temporal-wave.com> References: <6653941088acac4fb5b7cb8c45a7a0ac@temporal-wave.com> Message-ID: <4B58713F.2070009@redhat.com> On 01/21/2010 03:00 PM, Jim Idle wrote: > I wouldn't change the default time out as then your project depends > on a custom version of NATLR for no good reason. That was just my > 6:40AM typo of course :-) I'm using antlrworks, and I can't find any other way to change the default. > I have a QX9450 and some i7s. I think that the Xeon server versions > of 9450 etc might be slower on a single thread. I'm sure they would be, but I'm talking about a Nehalem-based Xeon: it *is* an i7, not a Core 2 anything. The Xeon 35xx and Core i7-9xx are more or less the same thing. So, I think that almost everyone would have the same problem I'm having. Andrew. From jimi at temporal-wave.com Thu Jan 21 07:41:24 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Thu, 21 Jan 2010 07:41:24 -0800 Subject: [antlr-interest] java.g does not compile In-Reply-To: <4B58713F.2070009@redhat.com> Message-ID: <57cab6502a087249bff2330f62dab5f0@temporal-wave.com> They probably would, unless they read the comments at the start of the .g file, where it says: * NOTE: If you try to compile this file from command line and Antlr gives an exception * like error message while compiling, add option * -Xconversiontimeout 100000 * to the command line. Sorry - missed the Nehalem comment. However, my 3Ghz QX9650 (forgot what CPU I had in this thing) running Vista 64 and Sun's 64 bit JRE deals with it just fine. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Andrew Haley > Sent: Thursday, January 21, 2010 7:23 AM > To: antlr-interest at antlr.org > Subject: Re: [antlr-interest] java.g does not compile > > On 01/21/2010 03:00 PM, Jim Idle wrote: > > > I wouldn't change the default time out as then your project depends > > on a custom version of NATLR for no good reason. That was just my > > 6:40AM typo of course :-) > > I'm using antlrworks, and I can't find any other way to change the > default. > > > I have a QX9450 and some i7s. I think that the Xeon server versions > > of 9450 etc might be slower on a single thread. > > I'm sure they would be, but I'm talking about a Nehalem-based Xeon: it > *is* an i7, not a Core 2 anything. The Xeon 35xx and Core i7-9xx are > more or less the same thing. > > So, I think that almost everyone would have the same problem I'm > having. > > Andrew. > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From aph at redhat.com Thu Jan 21 07:51:50 2010 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Jan 2010 15:51:50 +0000 Subject: [antlr-interest] java.g does not compile In-Reply-To: <57cab6502a087249bff2330f62dab5f0@temporal-wave.com> References: <57cab6502a087249bff2330f62dab5f0@temporal-wave.com> Message-ID: <4B587816.7090103@redhat.com> On 01/21/2010 03:41 PM, Jim Idle wrote: > They probably would, unless they read the comments at the start of the .g file, where it says: > > * NOTE: If you try to compile this file from command line and Antlr gives an exception > * like error message while compiling, add option > * -Xconversiontimeout 100000 > * to the command line. > Hah! I even did a web search for other people having the same problem, and never saw that comment in the file. :-) > Sorry - missed the Nehalem comment. However, my 3Ghz QX9650 (forgot > what CPU I had in this thing) running Vista 64 and Sun's 64 bit JRE > deals with it just fine. Me too, kinda sorta (OpenJDK64 on Linux). Weird. Thanks again, Andrew. From jimi at temporal-wave.com Thu Jan 21 08:49:25 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Thu, 21 Jan 2010 08:49:25 -0800 Subject: [antlr-interest] java.g does not compile In-Reply-To: <4B587816.7090103@redhat.com> Message-ID: <5b21023602fded4c83de5774d2c76cee@temporal-wave.com> I have found OpenJDK to be less than reliable to be honest, though many say it is fine for them. It might be the 64 bit version that always seemed to let me down in Fedora. Once I changed to Sun JDK/JRE then all my issues went away. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Andrew Haley > Sent: Thursday, January 21, 2010 7:52 AM > To: antlr-interest at antlr.org > Subject: Re: [antlr-interest] java.g does not compile > > On 01/21/2010 03:41 PM, Jim Idle wrote: > > They probably would, unless they read the comments at the start of > the .g file, where it says: > > > > * NOTE: If you try to compile this file from command line and Antlr > gives an exception > > * like error message while compiling, add option > > * -Xconversiontimeout 100000 > > * to the command line. > > > > Hah! I even did a web search for other people having the same > problem, and never saw that comment in the file. :-) > > > Sorry - missed the Nehalem comment. However, my 3Ghz QX9650 (forgot > > what CPU I had in this thing) running Vista 64 and Sun's 64 bit JRE > > deals with it just fine. > > Me too, kinda sorta (OpenJDK64 on Linux). Weird. > > Thanks again, > Andrew. > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From aph at redhat.com Thu Jan 21 10:05:51 2010 From: aph at redhat.com (Andrew Haley) Date: Thu, 21 Jan 2010 18:05:51 +0000 Subject: [antlr-interest] java.g does not compile In-Reply-To: <5b21023602fded4c83de5774d2c76cee@temporal-wave.com> References: <5b21023602fded4c83de5774d2c76cee@temporal-wave.com> Message-ID: <4B58977F.80600@redhat.com> On 01/21/2010 04:49 PM, Jim Idle wrote: > I have found OpenJDK to be less than reliable to be honest, though > many say it is fine for them. It might be the 64 bit version that > always seemed to let me down in Fedora. Once I changed to Sun > JDK/JRE then all my issues went away. Hmm, that's not good. There shouldn't really be any difference, given that OpenJDK is built from a very similar codebase and runs all the same compatibility tests. We need all the feedback about failures we can get, with test cases if possible. Andrew. From antlr at mirality.co.nz Thu Jan 21 11:08:58 2010 From: antlr at mirality.co.nz (Gavin Lambert) Date: Fri, 22 Jan 2010 08:08:58 +1300 Subject: [antlr-interest] [C target] Duplicating tree error In-Reply-To: References: <20100121114219.75CF337588D@mx5.rambler.ru> Message-ID: <20100121190918.E09123418423@www.antlr.org> At 00:56 22/01/2010, =?koi8-r?B?4NLV28vJziDtycjBycw=?= wrote: >Excuse me, what you mean behind "you need to use a label"? Could >you send me example? [...] t=declaration_type_spec [...] -> ^(T_TYPE_DECLARATION_STMT $t [...] >And, currently, I haven't seen problems with duplicating of >"entity_decl" tree. >I have a fault with a coping of "declaration_type_spec" tree. That's because you're only using one entity_decl at a time. My point is that they're doing the same thing -- the first time around the loop it uses the first encountered entity_decl, the second time it uses the second, etc. It's behaving exactly the same with the declaration_type_spec; only there's just the one of those in the input. From greneche.hugo at gmail.com Thu Jan 21 11:20:46 2010 From: greneche.hugo at gmail.com (Hugo) Date: Thu, 21 Jan 2010 20:20:46 +0100 Subject: [antlr-interest] newbie needs help Message-ID: <4B58A90E.5020401@gmail.com> I started using antlr to parse a specific file format. The problem is that i don't know how to write correctly my grammar. The file have the following format. It contains multiple lines and each can have the following format: Only one or multilple hexadecimal caracter with space or not ex: A0 A4 B5 77 or: A0 Only variable identifier with the format VAR_XXX ex: VAR_MY_VARIABLE Or the combinaison of the two previous format ex: A0 A4B5 VAR_MY_VARIABLE 77 98 VAR_MY_VARIABLE2 or VAR_MY_VARIABLE AA BB or AA BB VAR_MY_VARIABLE what i want to do is to build a AST tree And the problem is that i don't know how to do this with antlr. the tool always tell me that multiple rule can be applies with my grammar. please help me to solve my problem. Here is my grammar: stmts : bytes+ ; bytes : multiple_byte bytes? -> ^(EXPR_DEF multiple_byte bytes? ) | define_expression bytes? -> ^(EXPR_DEF define_expression bytes? ) | NEWLINE ; define_expression : define_var -> ^(DEFINE_VAR_DEF define_var) ; define_var : DEFINE_VARIABLE ; multiple_byte : single_byte (single_byte)+ -> ^(MULTIPLE_BYTES_DEF single_byte single_byte+) ; single_byte : byte_digit -> ^(BYTES_DEF byte_digit) ; byte_digit : BYTE_DIGIT ; DEFINE_VARIABLE : 'VAR_'('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*; BYTE_DIGIT :('0'..'9'| 'A'..'F'|'a'..'f')('0'..'9'| 'A'..'F'|'a'..'f') ; // Ignore whitespace, tab and escape sequence WS : (' '|'\t'|'\\\r\n')+ {$channel = HIDDEN;} ; // a new line NEWLINE : '\r'? '\n' ; thanks a lot From jbb at acm.org Thu Jan 21 13:25:24 2010 From: jbb at acm.org (John B. Brodie) Date: Thu, 21 Jan 2010 16:25:24 -0500 Subject: [antlr-interest] newbie needs help In-Reply-To: <4B58A90E.5020401@gmail.com> References: <4B58A90E.5020401@gmail.com> Message-ID: <1264109124.9363.10.camel@gecko.home.org> Greetings! On Thu, 2010-01-21 at 20:20 +0100, Hugo wrote: > I started using antlr to parse a specific file format. > The problem is that i don't know how to write correctly my grammar. > > The file have the following format. > It contains multiple lines and each can have the following format: > > Only one or multilple hexadecimal caracter with space or not > ex: A0 A4 B5 77 > or: A0 > > Only variable identifier with the format VAR_XXX > ex: VAR_MY_VARIABLE > > Or the combinaison of the two previous format > ex: > A0 A4B5 VAR_MY_VARIABLE 77 98 VAR_MY_VARIABLE2 > or > VAR_MY_VARIABLE AA BB > or > AA BB VAR_MY_VARIABLE > > > what i want to do is to build a AST tree attached please find a grammar file that is *almost* what I think you are trying to do. It does not have a MULTIPLE_BYTES_DEF node because the grouping of a collection of single_byte instances into a multibyte is ambiguous. Consider 11 22 33 44 55 66 77 88 is this 8 single bytes? 1 single byte and 7-long multi? is it 4 multi pairs? a triple, a single and a quad? i kinda expect you want it to be a single 8-long multi, e.g. any run of single bytes becomes a multi. But that is a semantic of your language and getting a parser to do semantics isn't always possible.... if you really need the MULTIPLE_BYTE_DEF node, you might be best served by parsing using some like my code (e.g. the parser produces only BYTE_DEF nodes) and then write a tree-walker that transforms the AST resultant from the parse into a new AST that contains the requisite MULTIPLE_BYTE_DEF nodes. e.g. scan for and collapse sequences of consecutive EXPR_DEF nodes that have BYTE_DEF children into a single EXPR_DEF node containing a single MULTIPLE_BYTE_DEF child. > > And the problem is that i don't know how to do this with antlr. the tool > always tell me that multiple rule can be applies with my grammar. > > please help me to solve my problem. > > Here is my grammar: > > stmts : bytes+ ; > > > bytes : multiple_byte bytes? -> ^(EXPR_DEF multiple_byte bytes? ) > > | define_expression bytes? -> ^(EXPR_DEF define_expression bytes? ) > > | NEWLINE ; > > define_expression : define_var -> ^(DEFINE_VAR_DEF define_var) ; > > define_var : DEFINE_VARIABLE ; > multiple_byte : single_byte (single_byte)+ -> ^(MULTIPLE_BYTES_DEF > single_byte single_byte+) ; > > > single_byte : byte_digit -> ^(BYTES_DEF byte_digit) ; > > byte_digit : BYTE_DIGIT ; > > DEFINE_VARIABLE : > 'VAR_'('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*; > > BYTE_DIGIT :('0'..'9'| 'A'..'F'|'a'..'f')('0'..'9'| 'A'..'F'|'a'..'f') ; > > // Ignore whitespace, tab and escape sequence WS : (' '|'\t'|'\\\r\n')+ > {$channel = HIDDEN;} ; > > // a new line NEWLINE : '\r'? '\n' ; > > thanks a lot hope this helps... -jbb -------------- next part -------------- grammar Test; options { output = AST; ASTLabelType = CommonTree; } tokens { EXPR_DEF; DEFINE_VAR_DEF; BYTES_DEF; } @members { private static final String [] x = new String[]{ "A0\n", "A0 A4 B5 77\n", "VAR_MY_VARIABLE\n", "A0 A4B5 VAR_MY_VARIABLE 77 98 VAR_MY_VARIABLE2\n", "VAR_MY_VARIABLE AA BB\n", "AA BB VAR_MY_VARIABLE\n" }; public static void main(String [] args) { for( int i = 0; i < x.length; ++i ) { try { System.out.println("about to parse:`"+x[i]+"`"); TestLexer lexer = new TestLexer(new ANTLRStringStream(x[i])); CommonTokenStream tokens = new CommonTokenStream(lexer); TestParser parser = new TestParser(tokens); TestParser.stmts_return p_result = parser.stmts(); CommonTree ast = p_result.tree; if( ast == null ) { System.out.println("resultant tree: is NULL"); } else { System.out.println("resultant tree: " + ast.toStringTree()); } System.out.println(); } catch(Exception e) { e.printStackTrace(); } } } } stmts : bytes+ EOF!; bytes : ( b=BYTE_DIGIT t=bytes -> ^(EXPR_DEF ^(BYTES_DEF $b) $t) ) | ( d=DEFINE_VARIABLE t=bytes -> ^(EXPR_DEF ^(DEFINE_VAR_DEF $d) $t) ) | NEWLINE ; fragment LETTER : 'a' .. 'z' | 'A' .. 'Z' ; fragment DIGIT : '0'.. '9' ; DEFINE_VARIABLE : 'VAR_' (LETTER|'_') (LETTER | DIGIT | '_')*; fragment HEXIT : '0'..'9' | 'A'..'F' | 'a'..'f' ; BYTE_DIGIT : HEXIT HEXIT ; // Ignore whitespace, tab and escape sequence WS : (' '|'\t'|'\\\r\n')+ {$channel = HIDDEN;} ; // a new line NEWLINE : '\r'? '\n' ; From michael.scholz at gmail.com Thu Jan 21 14:20:55 2010 From: michael.scholz at gmail.com (Michael Scholz) Date: Thu, 21 Jan 2010 14:20:55 -0800 Subject: [antlr-interest] Problem with rewrite rule: DebugTokenStream cannot be cast to TokenRewriteStream Message-ID: <61e8cbbd1001211420u3829eb86jdd930c64c5f564d0@mail.gmail.com> I have hit the issue described here: http://www.antlr.org/pipermail/antlr-interest/2009-January/032284.html and here: http://www.antlr.org/jira/browse/AW-242 Since it's not a new problem, is there a known fix/workaround/patch? Thanks From m.y.speyer at inter.nl.net Thu Jan 21 16:55:05 2010 From: m.y.speyer at inter.nl.net (Marc Speyer) Date: Fri, 22 Jan 2010 01:55:05 +0100 Subject: [antlr-interest] Tree pattern maching using the C# (was C) target In-Reply-To: <20100120004202.274260@gmx.net> References: <000901ca95d9$df6dd740$9e4985c0$@y.speyer@inter.nl.net> <20100115125833.242280@gmx.net> <002401ca985c$bd00f450$3702dcf0$@y.speyer@inter.nl.net> <20100120004202.274260@gmx.net> Message-ID: <003501ca9afd$898245e0$9c86d1a0$@y.speyer@inter.nl.net> Hi Johannes, Please find the file attached. I can get it compiled with this file but When I then run the grammar nothings happens whereas I have grammar rules and actions (see my previous post). I have not tested the CSharp3 target myself yet because I could not compile the source for it either but did only spend a lot bit of time on it since I cannot find anything about the status of the CSharp3 target. Any help would be much appreciated. Thanks, Marc >-----Original Message----- >From: Johannes Luber [mailto:JALuber at gmx.de] >Sent: Wednesday, January 20, 2010 1:42 AM >To: Marc Speyer; antlr-interest at antlr.org >Subject: Re: [antlr-interest] Tree pattern maching using the C# (was C) >target > >> Hi Johannes, >> >> I tried the version that you mentioned by downloading it from >> antlr:/runtime/CSharp2 in the Fisheye code repository and then tried to >> compile it using VS2008. This didn't work because a file >> "TokenConstants.cs" >> was reported missing by VS2008 and gave me compilation errors. I managed >> to >> get a version from the CSharp3 repository and after making one change I >> could compile. > >Oops - I thought that I had checked in that file already. Can you send both >TokenConstants.cs (for comparing with my own version) and the modified >grammar file to the list? I'm not sure where the error can be as I lifted >more than a few file from the CSharp3 target. > >Sam, can you check if the grammar works with CSharp3 target? It would be >helpful to narrow down the cause. > >Johannes > >> I noticed that the Downup method is part of the Treefilter >> class which inherits from the TreeParser class. The grammar for the tree >> parser from the example has the following header: >> >> // START: header >> tree grammar DefRef; >> options { >> tokenVocab = Cymbol; >> ASTLabelType = CommonTree; >> filter = true; >> language=CSharp2; >> } >> @members { >> SymbolTable symtab; >> Scope currentScope; >> public DefRef(ITreeNodeStream input, SymbolTable symtab) >> : this(input) >> { >> this.symtab = symtab; >> currentScope = symtab.globals; >> } >> } >> // END: header >> >> Generating the tree parser gives DefRef.cs with the DefRef class declared >> as: >> >> public partial class DefRef : TreeParser >> >> >> Now I can cast this into the TreeFilter class but to test things quickly >I >> changed the above line in the DefRef.cs into: >> >> public partial class DefRef : TreeFilter >> >> >> In the calling program I use: >> >> DefRef def = new DefRef(nodes, symtab); // use custom constructor >> def.Downup(t); // trigger symtab actions upon certain subtrees >> >> When I run this nothings happens whereas I have grammar rules and actions >> like: >> >> exitBlock >> : BLOCK >> { >> Console.WriteLine("locals: "+currentScope); >> currentScope = currentScope.getEnclosingScope(); // pop scope >> } >> ; >> >> I have not figured out yet why this doesn't work. The examples is a >> one-to-one port of the Java example of pattern 17 Symbol Table for Nested >> Scopes of the Language Implementation Patterns. >> >> Any idea? >> >> Thanks, >> >> Marc >> >-----Original Message----- >> >From: Johannes Luber [mailto:JALuber at gmx.de] >> >Sent: Friday, January 15, 2010 1:59 PM >> >To: Marc Speyer; antlr-interest at antlr.org >> >Subject: Re: [antlr-interest] Tree pattern maching using the C target >> > >> >> Hi all, >> >> >> >> I have a similar issue using the C# target. Using the Cymbol.g example >> of >> >> pattern 17 Symbol Table for Nested Scopes of the Language >> Implementation >> >> Patterns book I could not get it to work because there is now downup >> >> method. >> >> According to the documentation this method walks the AST code using >> >> ANTLR's >> >> built-in downup( ) strategy. >> >> >> >> Am I correct assuming that this has not been implemented yet for the >C# >> >> target (as Jim implies in his response). Is it difficult to implement >> it >> >> myself? I guess it would involve implementing the tree pattern >matching >> >> stuff. >> >> >> >> Marc >> > >> >You are correct - there is no official version yet, which implements >tree >> >pattern matching. I haven't gotten around to the API changes yet (will >> work >> >on that next week), though I have checked in some untested changes. It >> >would be the easieast if you'd base your own code on that for now. >> > >> >Johannes >> > >> >> P.S. Hope this email files under the proper subject thread, and >> apologies >> >> in >> >> advance if it isn't (Just subscribed to the mailing list but I could >> not >> >> find out how to get previous posts from it) >> >> >> >> > Pattern matcher or normal tree walker? The pattern stuff is not >> >> implemented in the C target yet. >> >> > >> >> > Jim >> >> > >> >> >> -----Original Message----- >> >> >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >> >> >> bounces at antlr.org] On Behalf Of Heiko Folkerts >> >> >> Sent: Thursday, January 14, 2010 5:01 AM >> >> >> To: antlr-interest at antlr.org >> >> >> Subject: [antlr-interest] Tree pattern maching using the C target >> >> >> >> >> >> Hi all, >> >> >> I wrote al litle tree pattern matcher for a specific validation we >> >need >> >> >> in our grammar. ANTLR and the C compiler compile it all well but >> there >> >> >> is now "downup" mehtod for running the matcher. Instead I only see >> our >> >> >> own rules in the generated parser. So, is the method to run when >> using >> >> >> a tree pattern macher in the C target different than ^"downup"? How >> to >> >> >> run the matcher? >> >> >> >> >> >> I tried to find an answer in the C examples but there was only a >> >> >> treeparser and no tree pattern matcher. >> >> >> >> >> >> Thx+ >> >> >> Heiko >> >> >> >> >> >> >> >> >> -- >> >> >> >> >> >> >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> >> Unsubscribe: >> >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> > >> >-- >> >GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! >> >Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 >> >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address > >-- >Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 >- >sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: TokenConstants.cs Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20100122/b71db7bf/attachment.pl From michael.scholz at gmail.com Thu Jan 21 18:23:44 2010 From: michael.scholz at gmail.com (Michael Scholz) Date: Thu, 21 Jan 2010 18:23:44 -0800 Subject: [antlr-interest] multiple command queues to get multiple rewrites from a single pass Message-ID: <61e8cbbd1001211823mf618eccs2275e0b1c2d0b7f9@mail.gmail.com> Referring to The Definitive ANTLR Reference, page 220 "You can also have multiple command queues to get multiple rewrites from a single pass over the input such as generating both a C file and its header file (see the TokenRewriteStream Javadoc for an example)" said example: /* You can also have multiple "instruction streams" and get multiple * rewrites from a single pass over the input. Just name the instruction * streams and use that name again when printing the buffer. This could be * useful for generating a C file and also its header file--all from the * same buffer: * * tokens.insertAfter("pass1", t, "text to put after t");} * tokens.insertAfter("pass2", u, "text after u");} * System.out.println(tokens.toString("pass1")); * System.out.println(tokens.toString("pass2")); * * If you don't use named rewrite streams, a "default" stream is used as * the first example shows. */ I don't see how to apply this in the context of the CMinus.g 1pass rewriter. This example uses inline template definitions to rewrite, and the syntax for doing that: ... -> template-name(<>) doesn't have any obvious way to specify the non-default instruction stream... to generate a replace(String,...) instead of replace(...) in the parser file. If this functionality is enabled, the documentation for getting to it is not obvious. Is this a V2/V3 issue? The tweak example seems like it might be closer than 1pass rewriter, but it doesn't look consistent with the book's techniques. From linlin.xie at siemens.com Fri Jan 22 04:57:40 2010 From: linlin.xie at siemens.com (Xie, Linlin) Date: Fri, 22 Jan 2010 13:57:40 +0100 Subject: [antlr-interest] UTF-8 input? In-Reply-To: References: <79118B9FE8CE8E49B0D71964A79CB647033CA2D5@dekomplm002.net.plm.eds.com> Message-ID: <79118B9FE8CE8E49B0D71964A79CB647033CABB3@dekomplm002.net.plm.eds.com> Hi jim, Thanks for the reply. You said I can convert my UTF8 input "to UCS2 using the supplied converter in the current runtime", but I can't find any such converter in antlr c runtime. Can you suggest me which API to use? Btw, I searched the archive, I can see the person who had similar problem as mine used iconv library on linux. Thanks in advance! Linlin -----Original Message----- From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle Sent: 20 January 2010 16:31 To: antlr-interest at antlr.org Subject: Re: [antlr-interest] UTF-8 input? You need to remember to state which target you are talking about. I have written a new universal input stream for the next version of the C runtime. It takes 8bit, 16 bit, UTF-8, UTF-16, UCS2, UTF32 and EBCDIC (code gen will change slightly to support this). It is not well tested right now but will be available as a snapshot 3.3 release shortly in the downloads page. In the meantime the easiest thing to do is to convert to UCS2 using the supplied converter in the current runtime. Though this will not work with surrogate pairs in UTF-16 though but most people do not need that. If you really need UTf-8 without conversion then it is easy enough to write, or you can just steal the code from my check in of the code in about 10 minutes. Note that while the streams work, I have not provided ANTLR3_STRING support for UTF-8 and so on yet and so getting $text from such a stream may or may not work, Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Xie, Linlin > Sent: Wednesday, January 20, 2010 3:32 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] UTF-8 input? > > Can anyone tell me if antlr3.1.3 generated parser works with UTF-8 > input? If it does, how should I configure in the grammar? I noticed > there are two macros ANTLR3_INLINE_INPUT_ASCII and > ANTLR3_INLINE_INPUT_UTF16, but no UTF-8 one. > > > > Many thanks! > > Linlin > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From jamesdcarrollml at verizon.net Fri Jan 22 07:20:48 2010 From: jamesdcarrollml at verizon.net (James Carroll) Date: Fri, 22 Jan 2010 10:20:48 -0500 Subject: [antlr-interest] Agglutinative language Message-ID: <1264173648.24829.53.camel@Cheyenne> I'm only starting with ANTLR and thinking about 'language' for its own sake and was wondering.... are there any agglutinative programming languages? The Decorator pattern would seem to lend itself to it. From wikipedia (javascript example): function Coffee() {this.cost = function(){return 1;}; } // Decorator A function Milk(coffee) {this.cost = function() {return coffee.cost() + 0.5;}; } // Decorator B function Whip(coffee) {this.cost = function() {return coffee.cost() + 0.7;}; } // Decorator C function Sprinkles(coffee) {this.cost = function() {return coffee.cost() + 0.2;}; } var coffee = new Coffee(); coffee = new Sprinkles(coffee); coffee = new Whip(coffee); coffee = new Milk(coffee); But what if I wanted to do this: entity Coffee() {this.cost = function(){return 1;}; } entity Espresso() is Coffee {this.cost = function(){return 1.5;} // Decorator A feature Milked(coffee) {this.cost = function() {return coffee.cost() + 0.5;}; } // Decorator B feature Whipped(coffee) {this.cost = function() {return coffee.cost() + 0.7;}; } // Decorator C feature Sprinkled(coffee) {this.cost = function() {return coffee.cost() + 0.2;}; } var coffee1 = new Coffee(); var coffee2 = new WhippedCoffee(); var coffee3 = new SprinkledMilkedCoffee(); var coffee4 = new SprinkledEspresso(); Just curious. From kaleb.pederson at gmail.com Fri Jan 22 07:46:43 2010 From: kaleb.pederson at gmail.com (Kaleb Pederson) Date: Fri, 22 Jan 2010 07:46:43 -0800 Subject: [antlr-interest] gunit problem In-Reply-To: <4B582AE9.4030403@doc.ic.ac.uk> References: <4B582AE9.4030403@doc.ic.ac.uk> Message-ID: On Thu, Jan 21, 2010 at 2:22 AM, Ian Moor wrote: > I am using the gunit which is provided with antlr 3.2 and > I am trying to test parts of an tree, for example > ? statement walks statements: > ? ?"x=1" -> "ok" > > I expect an error message saying the code produced to System.out is > not ?"ok", but gunit prints no output, ans stops with a non zero return > value. Although it takes some work, you can debug gunit. You'll need to grab the source and then set appropriate breakpoints as one normally would in working with a debugger, but it's possible. > Is there a way finding what is happening, or a later gunit ? A patched version of gunit is available that includes better support for custom ASTs: http://www.antlr.org/wiki/pages/viewpageattachments.action?pageId=3244061&metadataLink=true I know I found source for it somewhere and was able to debug another problem, so hopefully you can do the same. -- Kaleb Pederson Blog - http://kalebpederson.com Twitter - http://twitter.com/kalebpederson From parrt at cs.usfca.edu Fri Jan 22 09:32:53 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Fri, 22 Jan 2010 09:32:53 -0800 Subject: [antlr-interest] Call for Papers and Tool Demo Proposals - SCAM 2010 Message-ID: <6A6EDA85-EA68-44DA-B84B-1548B55FDAB4@cs.usfca.edu> hiya. Anybody got a cool product / tool to show off? Here's a conference. Ter --------------------- Tenth IEEE International Working Conference on Source Code Analysis and Manipulation 12th-13th September 2010, Timisoara, Romania, Co-located with ICSM 2010 http://www2010.ieee-scam.org/ Sponsored by IEEE CS (pending) In cooperation with: - Semantic Designs Inc., Austin, TX, USA - Univ. "Politehnica" Timisoara, Romania - Centre for Research in Evolution, Search and Testing (CREST), King's College London, UK ---------------- Conference aims: ---------------- The aim of this working conference is to bring together researchers and practitioners working on theory, techniques and applications which concern analysis and/or manipulation of the source code of computer systems. While much attention in the wider software engineering community is properly directed towards other aspects of systems development and evolution, such as specification, design and requirements engineering, it is the source code that contains the only precise description of the behaviour of the system. The analysis and manipulation of source code thus remains a pressing concern. --------- Keynotes: --------- This year SCAM will feature two outstanding keynotes: - Mark Harman, King's College London, UK - Andreas Zeller, Saarland University, Germany --------------------------------- Covered topics and paper formats: --------------------------------- We welcome submission of papers that describe original and significant work in the field of source code analysis and manipulation. Topics of interest include, but are not limited to: * program transformation * abstract interpretation * program slicing * source level software metrics * decompilation * source level testing and verification * source level optimization * program comprehension Note that SCAM explicitly solicits results from any theoretical or technological domain that can be applied to these and similar topics. Submitted papers should not be longer than 10 pages. We also welcome submission of 2 page proposals for tool demonstrations expected to be performed live at the conference. All papers submitted should follow IEEE Computer Society Press Proceedings Author Guidelines. The papers should be submitted electronically via the conference web site. Submitted papers should not have been previously published, and should not have been concurrently submitted elsewhere. ------------ Proceedings: ------------ All accepted papers will appear in the proceedings which will be published by the IEEE Computer Society Press. -------------- Special Issue: -------------- Best papers from SCAM 2010 will be considered for revision, extension, and publication in a special issue of the Science of Computer Programming journal edited by Elsevier. ---------------- Important Dates: ---------------- Deadline for submission: Abstract due: 23rd April, 2010 Full paper due: 30 April, 2010 Notification: 7th June, 2010 Working Conference: 12th-13th September 2010 ------------------------ Conference Organization: ------------------------ General Chair Massimiliano Di Penta, Research Centre on Software Technology, Universita degli Studi del Sannio, Italy Program Co-Chairs Jurgen Vinju, Centrum Wiskunde & Informatica, The Netherlands Cristina Marinescu, Politehnica University of Timisoara, Romania Publicity Chair Zheng Li, CREST Centre, Department of Computer Science, King?s College London, UK Finance Chair Dave Binkley, Computer Science Department, Loyola College in Maryland, USA Tool Demonstration Chair Pascal Cuoq, CEA-Recherche Technologique, France Local Arrangements Chair Marius Minea, Politehnica University of Timisoara, Romania ----------------------------------------- Steering Committee and Program Committee: ----------------------------------------- See the conference Website From jimi at temporal-wave.com Fri Jan 22 12:06:31 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Fri, 22 Jan 2010 12:06:31 -0800 Subject: [antlr-interest] UTF-8 input? In-Reply-To: <79118B9FE8CE8E49B0D71964A79CB647033CABB3@dekomplm002.net.plm.eds.com> Message-ID: <52625667efb426469f56bf603d379f7d@temporal-wave.com> Do you not see the function call: ConvertUTF8toUTF16() ? In the file called 'antlr3convertutf.c" ? Jim > -----Original Message----- > From: Xie, Linlin [mailto:linlin.xie at siemens.com] > Sent: Friday, January 22, 2010 4:58 AM > To: Jim Idle; antlr-interest at antlr.org > Subject: RE: [antlr-interest] UTF-8 input? > > Hi jim, > > Thanks for the reply. You said I can convert my UTF8 input "to UCS2 > using the supplied converter in the current runtime", but I can't find > any such converter in antlr c runtime. Can you suggest me which API to > use? Btw, I searched the archive, I can see the person who had similar > problem as mine used iconv library on linux. > > Thanks in advance! > Linlin > > > -----Original Message----- > From: antlr-interest-bounces at antlr.org > [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle > Sent: 20 January 2010 16:31 > To: antlr-interest at antlr.org > Subject: Re: [antlr-interest] UTF-8 input? > > You need to remember to state which target you are talking about. > > I have written a new universal input stream for the next version of the > C runtime. It takes 8bit, 16 bit, UTF-8, UTF-16, UCS2, UTF32 and EBCDIC > (code gen will change slightly to support this). It is not well tested > right now but will be available as a snapshot 3.3 release shortly in > the > downloads page. > > In the meantime the easiest thing to do is to convert to UCS2 using the > supplied converter in the current runtime. Though this will not work > with surrogate pairs in UTF-16 though but most people do not need that. > > If you really need UTf-8 without conversion then it is easy enough to > write, or you can just steal the code from my check in of the code in > about 10 minutes. Note that while the streams work, I have not provided > ANTLR3_STRING support for UTF-8 and so on yet and so getting $text from > such a stream may or may not work, > > Jim > > > -----Original Message----- > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > > bounces at antlr.org] On Behalf Of Xie, Linlin > > Sent: Wednesday, January 20, 2010 3:32 AM > > To: antlr-interest at antlr.org > > Subject: [antlr-interest] UTF-8 input? > > > > Can anyone tell me if antlr3.1.3 generated parser works with UTF-8 > > input? If it does, how should I configure in the grammar? I noticed > > there are two macros ANTLR3_INLINE_INPUT_ASCII and > > ANTLR3_INLINE_INPUT_UTF16, but no UTF-8 one. > > > > > > > > Many thanks! > > > > Linlin > > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > > Unsubscribe: http://www.antlr.org/mailman/options/antlr- > interest/your- > > email-address > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address From duygu_the_duygu at yahoo.com Fri Jan 22 17:01:21 2010 From: duygu_the_duygu at yahoo.com (Duygu Altinok) Date: Fri, 22 Jan 2010 17:01:21 -0800 (PST) Subject: [antlr-interest] newbie question- tree walker goes into infininte loop Message-ID: <678818.24633.qm@web46002.mail.sp1.yahoo.com> Hi, I'm writing a C-like language compiler . I both generate code and do usual stuff within my tree walker but , it goes into an infinite loop with the following , problem is with the function part. I really didin't get what I'm doing wrong , so please can anybody help?Thanx in advance. Parser : program: function_list { #program= #([PROGRAM,"program"],symbol_table, program); } ; function_list: { is_in_function_list = true; } (function)+ { #function_list= #([FUNCTION_LIST, "function_list"], function_list); } ; function : { String bt; } bt=basic_type! i:ID! { String identifier = i.getText(); if (identifier.length() > 32) { error(WARN00, i.getLine(), i.getColumn()); identifier = identifier.substring(0, 32); } which_function = new String(identifier); identifier=identifier + ":" + Integer.toString(i.getLine()) + ":" + Integer.toString(i.getColumn()); } LPAREN! parameter_list! RPAREN! LCURLY function_body RCURLY { symbol_table.addChild(#([SYMBOL_FUNCTION, identifier ], [SYMBOL_TYPE, bt] , symbol_parameters, symbol_locals )); #function=(#([ID,identifier],function)); } ; function_body: declaration_list! statement_list ; Tree walker: program : #(PROGRAM symbol_table { sTable.sort(); assemble.code+=new String("\n\t#Duygu\n\n\n"); assemble.code+=new String("\t.data\n"); } function_list { sTable.prettyPrint(); try{ FileWriter file=new FileWriter(new String("output.asm")); file.write(assemble.code.toCharArray()); file.close(); }catch (Exception e) { e.printStackTrace(); System.out.println(e); } } ) ; function_list : #(FUNCTION_LIST (function)+) ; function : #(i:ID { //Parse info String identifier; int line, column; String [] params = new String[3]; identifier = i.getText(); params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); int index; int line2=0,column2=0; //line and column info from the symbol table index = sTable.getFunctionIndex(identifier); if(index != -1) { line2=((Function)sTable.functions.elementAt(index)).line; column2=((Function)sTable.functions.elementAt(index)).column; } if(index!=-1 && line==line2 && column==column2) { isFunctionLegal=true; currentFunction = (Function) sTable.functions.elementAt(index); Vector parameters=currentFunction.parameters; int offset=0; for(int k=0;k=0;ind--) assemble.code+=Pop(new String("$s"+Integer.toString(ind))); //restore ra assemble.code+=new String("#pop ra\n"); assemble.code+=Pop(new String("$ra")); //restore fp assemble.code+=new String("#pop fp\n"); assemble.code+=Pop(new String("$fp")); //back to caller fnc assemble.code+=new String("\tli $v0, 0\n"); assemble.code+=new String("\tjr $ra\n"); } } ) ; function_body: statement_list ; statement_list: (statement)+ ; statement: |assignment_statement |return_statement |if_statement |while_statement | print_statement | expression SEMI | read_statement ; From michael.scholz at gmail.com Fri Jan 22 19:24:31 2010 From: michael.scholz at gmail.com (Michael Scholz) Date: Fri, 22 Jan 2010 19:24:31 -0800 Subject: [antlr-interest] ANTLR In-Reply-To: <61e8cbbd1001211835g4a148662i116658ad16d4692a@mail.gmail.com> References: <61e8cbbd1001211835g4a148662i116658ad16d4692a@mail.gmail.com> Message-ID: <61e8cbbd1001221924h693195faq51a83d24fef119e5@mail.gmail.com> I'm trying to do something fairly simple. A variation of the tweak example, based on the 1pass template rewrite concept. I also would like to dup the whitespace as the attached code attempts to do, but it doesn't work. Help? MS -------------- next part -------------- A non-text attachment was scrubbed... Name: mytest.zip Type: application/zip Size: 2369 bytes Desc: not available Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100122/0b14bce7/attachment.zip From michael.scholz at gmail.com Fri Jan 22 23:39:28 2010 From: michael.scholz at gmail.com (Michael Scholz) Date: Fri, 22 Jan 2010 23:39:28 -0800 Subject: [antlr-interest] ANTLR In-Reply-To: <61e8cbbd1001221924h693195faq51a83d24fef119e5@mail.gmail.com> References: <61e8cbbd1001211835g4a148662i116658ad16d4692a@mail.gmail.com> <61e8cbbd1001221924h693195faq51a83d24fef119e5@mail.gmail.com> Message-ID: <61e8cbbd1001222339i29be92e6pedf86f4da8e46d2c@mail.gmail.com> So I basically solved what I was after. Code is attached, for your comments. (and posterity) MS On Fri, Jan 22, 2010 at 7:24 PM, Michael Scholz wrote: > I'm trying to do something fairly simple. A variation of the tweak example, > based on the 1pass template rewrite concept. I also would like to dup the > whitespace as the attached code attempts to do, but it doesn't work. > > Help? > MS > > -------------- next part -------------- A non-text attachment was scrubbed... Name: mytest.zip Type: application/zip Size: 2374 bytes Desc: not available Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100122/fb4756bb/attachment.zip From serega.sheypak at gmail.com Sat Jan 23 03:55:04 2010 From: serega.sheypak at gmail.com (Serega Sheypak) Date: Sat, 23 Jan 2010 14:55:04 +0300 Subject: [antlr-interest] Typographing text with antlr Message-ID: <197382531001230355p63c1c275k218afbec0825c4d0@mail.gmail.com> Hi guys, I've really impressed with the power of ANTLR. I need your advice. I would like to develop typograph application. I've tried to do it with the help of regular expressions, but it's extremely hard to maintain it. I write crazy rregex and in few months I'can get how does it work:) Typical situation when work with complicated regex'es. I've seen named regex in Ruby (I use Ruby, Rais, Groovy, Java, JavaFX) but they don't bring so clear evidence like ANTLR does. Nice declarative ANTLR style should help a lot. I am from Russia and I'm working with web texts in Russian. Please, see short description of the task. ANTLR based (Target lang -> Ruby) application accepts usual text typed in browser. Application should apply several characters transformation rules and emit well-typographed text. Example rules are: 1. "Something in quotes" -> «Something in quotes » 2. "Some text goes nere "Oh, my something in quotes again!" " -> «Some text goes here „Oh my something in quotes again!“ » 3. (r) -> *?*, 4. (c) -> *?*, 5. (tm) -> ? 6. someWord-someOtherWord -> someWord – someOtherWord Special rule for first quote 7. "some text goes here... -> «some text goes here... and many other rules. The first I would like to do it for Russian lang, nex will be English. What do you think, is ANTLR nice for such task, is it convenient to solve such task using ANTLR? Thank you for your attention, waiting for your considerations. From stevenraemaekers at gmail.com Sat Jan 23 08:36:59 2010 From: stevenraemaekers at gmail.com (Steven Raemaekers) Date: Sat, 23 Jan 2010 17:36:59 +0100 Subject: [antlr-interest] Making a distinction between float and int calculation Message-ID: <46450b021001230836h1966343fpd52991913f3a9913@mail.gmail.com> Hello, In my grammar there should be an evaluator for numeric expressions. These numeric expressions should return an integer, or a float, depending on the contents of the expression. For example: 3 + 2.0: should return float 3 + 2: should return integer 2.0 + 3.0: should return float 1 / 3: should return float 4 / 2: should return int In my grammar there is only one rule for a numeric expression. I do not know whether I should duplicate the entire operator precedence rules for the distinction between float and int. The following statements are part of my grammar: expression : list | quotedword | booleanexpression ; booleanexpression : numericexpression (BOOL^ numericexpression)* ; numericexpression : mult ((PLUS^ | MINUS^) mult)* ; mult : atom ((MULTIPLY^ | DIVIDE^) atom)* ; atom : INT | FLOAT | ID | LEFTPAREN expression RIGHTPAREN -> ^(EXPRESSION expression) ; Does anybody have a idea how I should take care of this distinction between float and int? Or is this distinction even necessary? -- Regards, Steven From endigitalmind at yahoo.co.uk Sat Jan 23 13:50:57 2010 From: endigitalmind at yahoo.co.uk (Phil Ritchie) Date: Sat, 23 Jan 2010 13:50:57 -0800 (PST) Subject: [antlr-interest] Quantifiers Message-ID: <929543.17622.qm@web23305.mail.ird.yahoo.com> I think ANTLR might be a quick way for me to build a validating lexer/parser. The file I want to validate is essentially a comma separated values file but the content of individual fields must adhere to content and length restrictions. One field specification I can't seem to find a way of declaring is (in regular expression form): [a-zA-Z]{1,128}. ? Is there a way I could approach this? ? From oliver.zeigermann at gmail.com Sun Jan 24 01:03:35 2010 From: oliver.zeigermann at gmail.com (Oliver Zeigermann) Date: Sun, 24 Jan 2010 10:03:35 +0100 Subject: [antlr-interest] Anyone in the whole world doing multi step tree transformation? Message-ID: <9da4f4521001240103r5505ee05oc3391065be6bdbee@mail.gmail.com> Folks! I was just wondering if anyone except me is actually doing tree transformations using ANTLR. I use the tree transformation feature introduced in 3.1. While this does work well, it is so very hard to refactor or extend my tree structures as I have to change all my transformer stages and have no tool support to find out what to change and where. I started using heterogenous tokens with normalized children to make use of compiler type checking which helps, but does not comletely solve my issues as I still have an unchecked children list - which I need to traverse the tree using tree walkes. I was considering skipping the whole grammar driving tree transformation step, but what should I replace it with? I know of the xtext approach that uses non normalized heterogenous tokens generated from a common model shared by all transformation parts. Which seems like a good idea, however, does not seem to have a means powerful enough to do serious tree transformation. Any experiences? Hints? Thanks in advance - Oliver From e0309169 at student.tuwien.ac.at Sun Jan 24 04:35:36 2010 From: e0309169 at student.tuwien.ac.at (Mikolaj Koziarkiewicz) Date: Sun, 24 Jan 2010 13:35:36 +0100 Subject: [antlr-interest] Quantifiers In-Reply-To: <929543.17622.qm@web23305.mail.ird.yahoo.com> References: <929543.17622.qm@web23305.mail.ird.yahoo.com> Message-ID: <4B5C3E98.2050605@student.tuwien.ac.at> Hi Phil, could you provide a textual definition of your grammar, and/or your ANTLR specification so far? Cheers, Nick > I think ANTLR might be a quick way for me to build a validating lexer/parser. The file I want to validate is essentially a comma separated values file but the content of individual fields must adhere to content and length restrictions. One field specification I can't seem to find a way of declaring is (in regular expression form): [a-zA-Z]{1,128}. > > Is there a way I could approach this? > > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From greneche.hugo at gmail.com Sun Jan 24 10:18:56 2010 From: greneche.hugo at gmail.com (Hugo) Date: Sun, 24 Jan 2010 19:18:56 +0100 Subject: [antlr-interest] newbie needs help In-Reply-To: <1264109124.9363.10.camel@gecko.home.org> References: <4B58A90E.5020401@gmail.com> <1264109124.9363.10.camel@gecko.home.org> Message-ID: <4B5C8F10.2080608@gmail.com> Thanks you for all... but i have another problem because, my file also contains some kind of function with the following format: FUNCTION_A //without parameters FUNCTION_B /opt1 /opt2 //2 parameters FUNCTION_C A0 B0 %VAR_MYVARIABLE // data with the bytes format the name of the function are name starting always with FUNCTION_ the problem is that where a NEWLINE is detected, it is considered like a "bytes" and it's a problem for this function and a function like FUNCTION_C is badly detected Could you give your precious help thanks in advance John B. Brodie a ?crit : > Greetings! > > On Thu, 2010-01-21 at 20:20 +0100, Hugo wrote: > >> I started using antlr to parse a specific file format. >> The problem is that i don't know how to write correctly my grammar. >> >> The file have the following format. >> It contains multiple lines and each can have the following format: >> >> Only one or multilple hexadecimal caracter with space or not >> ex: A0 A4 B5 77 >> or: A0 >> >> Only variable identifier with the format VAR_XXX >> ex: VAR_MY_VARIABLE >> >> Or the combinaison of the two previous format >> ex: >> A0 A4B5 VAR_MY_VARIABLE 77 98 VAR_MY_VARIABLE2 >> or >> VAR_MY_VARIABLE AA BB >> or >> AA BB VAR_MY_VARIABLE >> >> >> what i want to do is to build a AST tree >> > > attached please find a grammar file that is *almost* what I think you > are trying to do. > > It does not have a MULTIPLE_BYTES_DEF node because the grouping of a > collection of single_byte instances into a multibyte is ambiguous. > Consider > > 11 22 33 44 55 66 77 88 > > is this 8 single bytes? 1 single byte and 7-long multi? is it 4 multi > pairs? a triple, a single and a quad? > > i kinda expect you want it to be a single 8-long multi, e.g. any run of > single bytes becomes a multi. But that is a semantic of your language > and getting a parser to do semantics isn't always possible.... > > if you really need the MULTIPLE_BYTE_DEF node, you might be best served > by parsing using some like my code (e.g. the parser produces only > BYTE_DEF nodes) and then write a tree-walker that transforms the AST > resultant from the parse into a new AST that contains the requisite > MULTIPLE_BYTE_DEF nodes. e.g. scan for and collapse sequences of > consecutive EXPR_DEF nodes that have BYTE_DEF children into a single > EXPR_DEF node containing a single MULTIPLE_BYTE_DEF child. > > >> And the problem is that i don't know how to do this with antlr. the tool >> always tell me that multiple rule can be applies with my grammar. >> >> please help me to solve my problem. >> >> Here is my grammar: >> >> stmts : bytes+ ; >> >> >> bytes : multiple_byte bytes? -> ^(EXPR_DEF multiple_byte bytes? ) >> >> | define_expression bytes? -> ^(EXPR_DEF define_expression bytes? ) >> >> | NEWLINE ; >> >> define_expression : define_var -> ^(DEFINE_VAR_DEF define_var) ; >> >> define_var : DEFINE_VARIABLE ; >> multiple_byte : single_byte (single_byte)+ -> ^(MULTIPLE_BYTES_DEF >> single_byte single_byte+) ; >> >> >> single_byte : byte_digit -> ^(BYTES_DEF byte_digit) ; >> >> byte_digit : BYTE_DIGIT ; >> >> DEFINE_VARIABLE : >> 'VAR_'('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*; >> >> BYTE_DIGIT :('0'..'9'| 'A'..'F'|'a'..'f')('0'..'9'| 'A'..'F'|'a'..'f') ; >> >> // Ignore whitespace, tab and escape sequence WS : (' '|'\t'|'\\\r\n')+ >> {$channel = HIDDEN;} ; >> >> // a new line NEWLINE : '\r'? '\n' ; >> >> thanks a lot >> > > hope this helps... > -jbb > > From parrt at cs.usfca.edu Sun Jan 24 12:01:00 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Sun, 24 Jan 2010 12:01:00 -0800 Subject: [antlr-interest] org.antlr.v4.* ??? Message-ID: Hi. to avoid classes where antlr v3 and v4 have to coexist in a project, is it ok if i use org.antlr.v4 as the root package? ST v4 is cool since it's org.stringtemplate.* old was org.antlr.stringtemplate.* Ter From scott at javadude.com Sun Jan 24 12:11:37 2010 From: scott at javadude.com (Scott Stanchfield) Date: Sun, 24 Jan 2010 15:11:37 -0500 Subject: [antlr-interest] org.antlr.v4.* ??? In-Reply-To: References: Message-ID: Sounds cool, but I'd suggest using a similar convention for both antlr and stringtemplate so it'll be easier come v5 ;) -- Scott ---------------------------------------- Scott Stanchfield http://javadude.com On Sun, Jan 24, 2010 at 3:01 PM, Terence Parr wrote: > Hi. to avoid classes where antlr v3 and v4 have to coexist in a project, is it ok if i use org.antlr.v4 as the root package? > > ST v4 is cool since it's org.stringtemplate.* old was org.antlr.stringtemplate.* > > Ter > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From parrt at cs.usfca.edu Sun Jan 24 12:14:06 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Sun, 24 Jan 2010 12:14:06 -0800 Subject: [antlr-interest] org.antlr.v4.* ??? In-Reply-To: References: Message-ID: <1801E8A3-536B-4337-B2A0-64F7DA61A13E@cs.usfca.edu> good point. so org.stringtemplate.v4.* and org.antlr.v4.* right? Ter On Jan 24, 2010, at 12:11 PM, Scott Stanchfield wrote: > Sounds cool, but I'd suggest using a similar convention for both antlr > and stringtemplate so it'll be easier come v5 ;) > > -- Scott From scott at javadude.com Sun Jan 24 12:15:54 2010 From: scott at javadude.com (Scott Stanchfield) Date: Sun, 24 Jan 2010 15:15:54 -0500 Subject: [antlr-interest] org.antlr.v4.* ??? In-Reply-To: <1801E8A3-536B-4337-B2A0-64F7DA61A13E@cs.usfca.edu> References: <1801E8A3-536B-4337-B2A0-64F7DA61A13E@cs.usfca.edu> Message-ID: Sounds good. Normally I'd say use the same package names as before, but because you're doing a rewrite having a different package name is a good idea imho. -- Scott ---------------------------------------- Scott Stanchfield http://javadude.com On Sun, Jan 24, 2010 at 3:14 PM, Terence Parr wrote: > good point. so org.stringtemplate.v4.* and org.antlr.v4.* right? > > Ter > On Jan 24, 2010, at 12:11 PM, Scott Stanchfield wrote: > >> Sounds cool, but I'd suggest using a similar convention for both antlr >> and stringtemplate so it'll be easier come v5 ;) >> >> -- Scott > > From endigitalmind at yahoo.co.uk Sun Jan 24 12:29:20 2010 From: endigitalmind at yahoo.co.uk (Phil Ritchie) Date: Sun, 24 Jan 2010 12:29:20 -0800 (PST) Subject: [antlr-interest] Quantifiers Message-ID: <698757.2976.qm@web23307.mail.ird.yahoo.com> Mikolaj ? I haven't attempted a grammar yet but below is a textual example: ? The file should contain three fields called "jobNo", "description" and "cost". ? The fields should adhere to the following specifications (regex in braces): jobNo:? must be digits only, maximum 5 - (\d{1,5}) description:? any lowercase characters or space upto a maximum of 128 - ([a-z ]{1,128}) cost:? positive or negative amount formatted as upto 5 digits before the decimal and 4 afterwards zero padded - (-?\d?\d?\d?\d?\d\.\d\d\d\d) ? E.g. ? "jobNo","description","cost" "12345","this record conforms","123.4321" "987","this RECORD does not conform because of uppercase usage","-22.44" ? Phil. --- On Sun, 24/1/10, Mikolaj Koziarkiewicz wrote: From: Mikolaj Koziarkiewicz Subject: Re: [antlr-interest] Quantifiers To: "Phil Ritchie" Cc: antlr-interest at antlr.org Date: Sunday, 24 January, 2010, 12:35 Hi Phil, could you provide a textual definition of your grammar, and/or your ANTLR specification so far? Cheers, Nick > I think ANTLR might be a quick way for me to build a validating lexer/parser. The file I want to validate is essentially a comma separated values file but the content of individual fields must adhere to content and length restrictions. One field specification I can't seem to find a way of declaring is (in regular expression form): [a-zA-Z]{1,128}. >? Is there a way I could approach this? >? > >? ? ??? > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From sharwell at pixelminegames.com Sun Jan 24 12:47:22 2010 From: sharwell at pixelminegames.com (Sam Harwell) Date: Sun, 24 Jan 2010 14:47:22 -0600 Subject: [antlr-interest] org.antlr.v4.* ??? References: Message-ID: What about org.antlr.compiler.*? -----Original Message----- From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Terence Parr Sent: Sunday, January 24, 2010 2:01 PM To: antlr-interest at antlr.org interest Subject: [antlr-interest] org.antlr.v4.* ??? Hi. to avoid classes where antlr v3 and v4 have to coexist in a project, is it ok if i use org.antlr.v4 as the root package? ST v4 is cool since it's org.stringtemplate.* old was org.antlr.stringtemplate.* Ter List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From parrt at cs.usfca.edu Sun Jan 24 12:49:52 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Sun, 24 Jan 2010 12:49:52 -0800 Subject: [antlr-interest] org.antlr.v4.* ??? In-Reply-To: References: Message-ID: what does compiler mean here? Ter On Jan 24, 2010, at 12:47 PM, Sam Harwell wrote: > What about org.antlr.compiler.*? > > -----Original Message----- > From: antlr-interest-bounces at antlr.org > [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Terence Parr > Sent: Sunday, January 24, 2010 2:01 PM > To: antlr-interest at antlr.org interest > Subject: [antlr-interest] org.antlr.v4.* ??? > > Hi. to avoid classes where antlr v3 and v4 have to coexist in a project, > is it ok if i use org.antlr.v4 as the root package? > > ST v4 is cool since it's org.stringtemplate.* old was > org.antlr.stringtemplate.* > > Ter > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address From sharwell at pixelminegames.com Sun Jan 24 12:54:47 2010 From: sharwell at pixelminegames.com (Sam Harwell) Date: Sun, 24 Jan 2010 14:54:47 -0600 Subject: [antlr-interest] org.antlr.v4.* ??? References: Message-ID: I guess you could use codegen.* for the tool instead, so you'd end up with org.antlr.runtime, org.antlr.codegen, etc. -----Original Message----- From: Terence Parr [mailto:parrt at cs.usfca.edu] Sent: Sunday, January 24, 2010 2:50 PM To: Sam Harwell Cc: antlr-interest at antlr.org Subject: Re: [antlr-interest] org.antlr.v4.* ??? what does compiler mean here? Ter On Jan 24, 2010, at 12:47 PM, Sam Harwell wrote: > What about org.antlr.compiler.*? > > -----Original Message----- > From: antlr-interest-bounces at antlr.org > [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Terence Parr > Sent: Sunday, January 24, 2010 2:01 PM > To: antlr-interest at antlr.org interest > Subject: [antlr-interest] org.antlr.v4.* ??? > > Hi. to avoid classes where antlr v3 and v4 have to coexist in a project, > is it ok if i use org.antlr.v4 as the root package? > > ST v4 is cool since it's org.stringtemplate.* old was > org.antlr.stringtemplate.* > > Ter > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address From oliver.zeigermann at gmail.com Sun Jan 24 14:29:01 2010 From: oliver.zeigermann at gmail.com (Oliver Zeigermann) Date: Sun, 24 Jan 2010 23:29:01 +0100 Subject: [antlr-interest] org.antlr.v4.* ??? In-Reply-To: References: Message-ID: <9da4f4521001241429i22d3da55l5d3cb12532a6bbeb@mail.gmail.com> Very good idea. 2010/1/24 Terence Parr : > Hi. to avoid classes where antlr v3 and v4 have to coexist in a project, is it ok if i use org.antlr.v4 as the root package? > > ST v4 is cool since it's org.stringtemplate.* old was org.antlr.stringtemplate.* > > Ter > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From parrt at cs.usfca.edu Sun Jan 24 16:40:43 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Sun, 24 Jan 2010 16:40:43 -0800 Subject: [antlr-interest] gunit use in ANTLR v4 Message-ID: Hiya, been using Leon's gunit to test ANTLR's AST builder...works great! I made a few improvements (for next v3 ANTLR release). Here's a few samples: grammarSpec: "parser grammar P; a : A;" -> (PARSER_GRAMMAR P (RULES (RULE a (BLOCK (ALT A))))) << parser grammar P; options {k=2; output=AST;} scope S {int x} tokens { A; B='33'; } @header {foo} a : A; >> -> (PARSER_GRAMMAR P (OPTIONS (= k 2) (= output AST)) (scope S {int x}) (tokens { A (= B '33')) (@ header {foo}) (RULES (RULE a (BLOCK (ALT A))))) block: "( ^(A B) | ^(b C) )" -> (BLOCK (ALT ("^(" A B)) (ALT ("^(" b C))) alternative: "x+=ID* -> $x*" -> (ALT_REWRITE (ALT (* (BLOCK (ALT (+= x ID))))) (-> (ALT (* (BLOCK (ALT x)))))) "A -> ..." -> (ALT_REWRITE (ALT A) (-> ...)) "A -> " -> (ALT_REWRITE (ALT A) (-> EPSILON)) element: "b+" -> (+ (BLOCK (ALT b))) "(b)+" -> (+ (BLOCK (ALT b))) "b?" -> (? (BLOCK (ALT b))) "(b)?" -> (? (BLOCK (ALT b))) "(b)*" -> (* (BLOCK (ALT b))) "b*" -> (* (BLOCK (ALT b))) "'while'*" -> (* (BLOCK (ALT 'while'))) "'a'+" -> (+ (BLOCK (ALT 'a'))) "a[3]" -> (a 3) "'a'..'z'+" -> (+ (BLOCK (ALT (.. 'a' 'z')))) Pretty cool, eh? Ter From wclodius at los-alamos.net Sun Jan 24 19:51:54 2010 From: wclodius at los-alamos.net (William B. Clodius) Date: Sun, 24 Jan 2010 20:51:54 -0700 Subject: [antlr-interest] Making a distinction between float and int calculation In-Reply-To: <46450b021001231403i3305f8cfsc032169f3dd91658@mail.gmail.com> References: <46450b021001230836h1966343fpd52991913f3a9913@mail.gmail.com> <28F5E254-3E2E-4FC7-A856-F12C7E6EFA76@los-alamos.net> <46450b021001231403i3305f8cfsc032169f3dd91658@mail.gmail.com> Message-ID: Steven: I should have originally posted my answer to antlr-interest not directly to you. So far I have only been using ANTLR to test the lexing and parsing of a language I am developing as a hobby. I have read the tree parsing material in the ANTLR reference but have not used it. Roughly what I would do is have each of INT and float have two attributes associated with it: a type (INT and FLOAT respectively) and a value, to be determined by the string that represents the value. Similarly an ID and an expression would also have a type and a value associated with them. Since you include boolean expressions you will have to decide if you want to explicitly have a boolean type or just have it be an integer type. For these other entities the types and values will have to be determined as you traverse the tree. For example for numericexpression you should examine the types of the two atoms, if both have type INT, then the type of multexpression is INT and the value is the sum of the two values, if both are FLOAT the corresponding logic is used, if one is INT and the other is FLOAT then you should include an intermediate step that converts the INT to a FLOAT, both in type and value, then perform the numeric operation. Read Chapter 6 of the reference if you have it. On Jan 23, 2010, at 3:03 PM, Steven Raemaekers wrote: > Hi William, > > How should i do this exactly in ANTLR? Should I test for this in my Tree walker? I do not have a clue where to start, when I make my numericexpression like this: > > numericexpression returns [int value] > : ^(PLUS mult1 = mult mult2 = mult) { $value = 20; } > | ^(MINUS mult1 = mult mult2 = mult) { $value = 20; } > | ^(MULTIPLY mult1 = mult mult2 = mult) { $value = 20; } > | ^(DIVIDE mult1 = mult mult2 = mult) { $value = 20; } > ; > > What value should it return, if all mults can be either floats or integers? > > Thanks, > > Steven > > On Sat, Jan 23, 2010 at 8:19 PM, William B. Clodius wrote: > THis is normally done as part of the semantic evaluation not as parsing. If and when you start including named entities you will normally be unable to make this distinction using syntax (unless you require integers and floats to have special categories of names). Putting it off until the semantics analysis also allows better error reporting, if you should say make assignment and comparison equalities both valid expressions. > > On Jan 23, 2010, at 9:36 AM, Steven Raemaekers wrote: > > > Hello, > > > > In my grammar there should be an evaluator for numeric expressions. These > > numeric expressions should return an integer, or a float, depending on the > > contents of the expression. > > For example: > > > > 3 + 2.0: should return float > > 3 + 2: should return integer > > 2.0 + 3.0: should return float > > 1 / 3: should return float > > 4 / 2: should return int > > > > In my grammar there is only one rule for a numeric expression. I do not know > > whether I should duplicate the entire operator precedence rules for the > > distinction between float and int. > > The following statements are part of my grammar: > > > > expression > > : list > > | quotedword > > | booleanexpression > > ; > > > > booleanexpression > > : numericexpression (BOOL^ numericexpression)* > > ; > > > > numericexpression > > : mult ((PLUS^ | MINUS^) mult)* > > ; > > > > mult > > : atom ((MULTIPLY^ | DIVIDE^) atom)* > > ; > > > > atom > > : INT > > | FLOAT > > | ID > > | LEFTPAREN expression RIGHTPAREN > > -> ^(EXPRESSION expression) > > ; > > > > Does anybody have a idea how I should take care of this distinction between > > float and int? Or is this distinction even necessary? > > > > -- > > Regards, > > > > Steven > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > > From lgcraymer at yahoo.com Sun Jan 24 22:25:58 2010 From: lgcraymer at yahoo.com (Loring Craymer) Date: Sun, 24 Jan 2010 22:25:58 -0800 (PST) Subject: [antlr-interest] Anyone in the whole world doing multi step tree transformation? In-Reply-To: <9da4f4521001240103r5505ee05oc3391065be6bdbee@mail.gmail.com> References: <9da4f4521001240103r5505ee05oc3391065be6bdbee@mail.gmail.com> Message-ID: <72961.38549.qm@web55905.mail.re3.yahoo.com> Oliver-- Some key points: 1.) Capture semantics rather than designing tree structures. 2.) Preserve grammar structure--that is, rule a in pass n becomes rule a in pass n+1 unless there is reason to do otherwise. 3.) Avoid cluttering your grammars with action code. 4.) Separate analysis passes from transformation passes. Follow those principles, and you'll find that rippling changes across grammars is tedious, but not a real problem. --Loring ----- Original Message ---- > From: Oliver Zeigermann > To: antlr-interest Interest > Sent: Sun, January 24, 2010 1:03:35 AM > Subject: [antlr-interest] Anyone in the whole world doing multi step tree transformation? > > Folks! > > I was just wondering if anyone except me is actually doing tree > transformations using ANTLR. I use the tree transformation feature > introduced in 3.1. While this does work well, it is so very hard to > refactor or extend my tree structures as I have to change all my > transformer stages and have no tool support to find out what to change > and where. > > I started using heterogenous tokens with normalized children to make > use of compiler type checking which helps, but does not comletely > solve my issues as I still have an unchecked children list - which I > need to traverse the tree using tree walkes. > > I was considering skipping the whole grammar driving tree > transformation step, but what should I replace it with? > > I know of the xtext approach that uses non normalized heterogenous > tokens generated from a common model shared by all transformation > parts. Which seems like a good idea, however, does not seem to have a > means powerful enough to do serious tree transformation. > > Any experiences? Hints? > > Thanks in advance > > - Oliver > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address From oliver.zeigermann at gmail.com Mon Jan 25 01:00:27 2010 From: oliver.zeigermann at gmail.com (Oliver Zeigermann) Date: Mon, 25 Jan 2010 10:00:27 +0100 Subject: [antlr-interest] Anyone in the whole world doing multi step tree transformation? In-Reply-To: <72961.38549.qm@web55905.mail.re3.yahoo.com> References: <9da4f4521001240103r5505ee05oc3391065be6bdbee@mail.gmail.com> <72961.38549.qm@web55905.mail.re3.yahoo.com> Message-ID: <9da4f4521001250100w175aeb6bwfd9bf43d7b5551c4@mail.gmail.com> Hey, Loring! Thanks for your help. Two questions: 1.) This does not help when you do major refactorings, doest it ;) 2.) Where do you store the information you gather in analysis? I stick them back into the tree or put them into (symbol-)tables. If you do so as well: What do you do if the tree data in tables has to be processed in subsequent tree transformation steps? How do you pass in the data? Any thoughts? - Oliver 2010/1/25 Loring Craymer : > Oliver-- > > Some key points: > 1.) ?Capture semantics rather than designing tree structures. > 2.) ?Preserve grammar structure--that is, rule a in pass n becomes rule a in pass n+1 unless there is reason to do otherwise. > 3.) ?Avoid cluttering your grammars with action code. > 4.) ?Separate analysis passes from transformation passes. > > Follow those principles, and you'll find that rippling changes across grammars is tedious, but not a real problem. > > --Loring > > > > > ----- Original Message ---- >> From: Oliver Zeigermann >> To: antlr-interest Interest >> Sent: Sun, January 24, 2010 1:03:35 AM >> Subject: [antlr-interest] Anyone in the whole world doing multi step tree transformation? >> >> Folks! >> >> I was just wondering if anyone except me is actually doing tree >> transformations using ANTLR. I use the tree transformation feature >> introduced in 3.1. While this does work well, it is so very hard to >> refactor or extend my tree structures as I have to change all my >> transformer stages and have no tool support to find out what to change >> and where. >> >> I started using heterogenous tokens with normalized children to make >> use of compiler type checking which helps, but does not comletely >> solve my issues as I still have an unchecked children list - which I >> need to traverse the tree using tree walkes. >> >> I was considering skipping the whole grammar driving tree >> transformation step, but what should I replace it with? >> >> I know of the xtext approach that uses non normalized heterogenous >> tokens generated from a common model shared by all transformation >> parts. Which seems like a good idea, however, does not seem to have a >> means powerful enough to do serious tree transformation. >> >> Any experiences? Hints? >> >> Thanks in advance >> >> - Oliver >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > > > > > From lgcraymer at yahoo.com Mon Jan 25 02:22:16 2010 From: lgcraymer at yahoo.com (Loring Craymer) Date: Mon, 25 Jan 2010 02:22:16 -0800 (PST) Subject: [antlr-interest] Anyone in the whole world doing multi step tree transformation? In-Reply-To: <9da4f4521001250100w175aeb6bwfd9bf43d7b5551c4@mail.gmail.com> References: <9da4f4521001240103r5505ee05oc3391065be6bdbee@mail.gmail.com> <72961.38549.qm@web55905.mail.re3.yahoo.com> <9da4f4521001250100w175aeb6bwfd9bf43d7b5551c4@mail.gmail.com> Message-ID: <745659.10881.qm@web55907.mail.re3.yahoo.com> Oliver-- 1.) There are languages where semantically equivalent data--argument lists and the like--appears in different places with different syntax. In those cases you can end up with tree grammars that radically differ from the parser grammar, but then the tree grammars can usually be kept reasonably consistent. I happen to believe that the refactoring should be done with tool assistance (a refactoring editor) so that it might be possible to reconstruct the refactorings. Without tool assistance, though, the consolation in these cases is that the tree grammars end up being simpler than the parser grammar. 2.) In Yggdrasil, data structures that need to be preserved across passes are declared as "public" attributes of the grammars and propagated in the target language wrapper code that invokes each pass. --Loring ----- Original Message ---- > From: Oliver Zeigermann > To: Loring Craymer > Cc: antlr-interest Interest > Sent: Mon, January 25, 2010 1:00:27 AM > Subject: Re: [antlr-interest] Anyone in the whole world doing multi step tree transformation? > > Hey, Loring! > > Thanks for your help. Two questions: > 1.) This does not help when you do major refactorings, doest it ;) > 2.) Where do you store the information you gather in analysis? I stick > them back into the tree or put them into (symbol-)tables. If you do so > as well: What do you do if the tree data in tables has to be processed > in subsequent tree transformation steps? How do you pass in the data? > > Any thoughts? > > - Oliver > > 2010/1/25 Loring Craymer : > > Oliver-- > > > > Some key points: > > 1.) Capture semantics rather than designing tree structures. > > 2.) Preserve grammar structure--that is, rule a in pass n becomes rule a in > pass n+1 unless there is reason to do otherwise. > > 3.) Avoid cluttering your grammars with action code. > > 4.) Separate analysis passes from transformation passes. > > > > Follow those principles, and you'll find that rippling changes across grammars > is tedious, but not a real problem. > > > > --Loring > > > > > > > > > > ----- Original Message ---- > >> From: Oliver Zeigermann > >> To: antlr-interest Interest > >> Sent: Sun, January 24, 2010 1:03:35 AM > >> Subject: [antlr-interest] Anyone in the whole world doing multi step tree > transformation? > >> > >> Folks! > >> > >> I was just wondering if anyone except me is actually doing tree > >> transformations using ANTLR. I use the tree transformation feature > >> introduced in 3.1. While this does work well, it is so very hard to > >> refactor or extend my tree structures as I have to change all my > >> transformer stages and have no tool support to find out what to change > >> and where. > >> > >> I started using heterogenous tokens with normalized children to make > >> use of compiler type checking which helps, but does not comletely > >> solve my issues as I still have an unchecked children list - which I > >> need to traverse the tree using tree walkes. > >> > >> I was considering skipping the whole grammar driving tree > >> transformation step, but what should I replace it with? > >> > >> I know of the xtext approach that uses non normalized heterogenous > >> tokens generated from a common model shared by all transformation > >> parts. Which seems like a good idea, however, does not seem to have a > >> means powerful enough to do serious tree transformation. > >> > >> Any experiences? Hints? > >> > >> Thanks in advance > >> > >> - Oliver > >> > >> List: http://www.antlr.org/mailman/listinfo/antlr-interest > >> Unsubscribe: > >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > > > > > > > > > > > From ranco.marcus at epirion.nl Mon Jan 25 02:35:56 2010 From: ranco.marcus at epirion.nl (Ranco Marcus) Date: Mon, 25 Jan 2010 10:35:56 +0000 Subject: [antlr-interest] Antlr does not generate Lexer from a composite grammar In-Reply-To: <93BD0000E4D72D458F0E8CDE6BA971A80EBFECBD@CINMLVEM11.e2k.ad.ge.com> References: <93BD0000E4D72D458F0E8CDE6BA971A80EBFECBD@CINMLVEM11.e2k.ad.ge.com> Message-ID: <2B65C901391C804DBB9CF9E6FE30C6F914976940@sun.epirion.local> I experienced the same problem and did not find a proper solution for it. As a work-around, I have found that adding a dummy lexer rule to the composite grammar causes the lexer to be generated. grammar C ; import L, P2 ; stuff : ( letters spaces )+ ; dummy : 'DUMMY'; In general, I would expect that no parser or lexer rule is required in the composite grammar. This way, we can use the composite grammer only as a way to glue things together and specify generation options for a particular use. I hope this is of any help to you. Best regards, Ranco Marcus From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Stevenson, Todd (GE Healthcare, consultant) Sent: dinsdag 1 december 2009 19:09 To: antlr-interest at antlr.org Subject: [antlr-interest] Antlr does not generate Lexer from a composite grammar I built the grammars described at the bottom of the Composite Grammars page of the Antlr documentation(i.e. L, P1, P2, and C). When I run Antlr with no command line arguments on the combined grammar 'C', it generates C_P1.java, C_P2_P1.java, and CParser.java, but does not generate CLexer.java. Is this correct behavior? If so, how to I call the lexer from my source java program? When I build the other grammars on that page (Root and Delegate), and run Antlr on 'Root', it generates RootParser.java Root_Delegate.java and RootLexer.java. thanks. I tried it with Antlr 3.2 and Antlr 3.1.3. From candide at palacehotel.org Mon Jan 25 04:17:30 2010 From: candide at palacehotel.org (Candide Kemmler) Date: Mon, 25 Jan 2010 13:17:30 +0100 Subject: [antlr-interest] antlrworks interpreter like serialized parse tree Message-ID: <74DD45A5-DD1E-41C3-819D-2032293EF2A9@palacehotel.org> Hi, I'm very happy with my antlr results so far, and next step is to use antlr's output to add a code-completion like feature to my application. I love the parse tree representation that antlrWorks presents and getting such a structure would be ideal for my use case. However I can't seem to find a way to create a similar representation of the parse tree using the API. Any ideas? Candide From candide at palacehotel.org Mon Jan 25 04:48:24 2010 From: candide at palacehotel.org (Candide Kemmler) Date: Mon, 25 Jan 2010 13:48:24 +0100 Subject: [antlr-interest] antlrworks interpreter like serialized parse tree In-Reply-To: References: <74DD45A5-DD1E-41C3-819D-2032293EF2A9@palacehotel.org> Message-ID: That's very interesting. I don't want to create an image, no: only a structured data representation (XML or JSON). Can you elaborate a little bit on how to enable the debug option ("debug = true" is not working for me) and then how to listen to the debugging events? Thanks a lot for your quick and enlightening answer :-) Candide On 25 Jan 2010, at 13:23, Scott Stanchfield wrote: > It's captured using the debugging API. ANTLRWorks listens to debugging > events from your parser (when it's generated with the debug option) > and hears when rules are entered and exited. > > You could use these events to build a tree (I'm working on an > AST-diagram generator for eclipse using the debug API, using Eclipse's > Zest framework for the diagram). > > If you just want images, I would recommend that you use the debugging > api to capture the enters/exits and then create a GraphViz dot file. > Check out http://www.graphviz.org. You can use it to generate many > graphics file formats. > -- Scott > > ---------------------------------------- > Scott Stanchfield > http://javadude.com > > > > On Mon, Jan 25, 2010 at 7:17 AM, Candide Kemmler > wrote: >> Hi, >> >> I'm very happy with my antlr results so far, and next step is to use antlr's output to add a code-completion like feature to my application. >> >> I love the parse tree representation that antlrWorks presents and getting such a structure would be ideal for my use case. However I can't seem to find a way to create a similar representation of the parse tree using the API. >> >> Any ideas? >> >> Candide >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> From antlr at mirality.co.nz Mon Jan 25 12:14:59 2010 From: antlr at mirality.co.nz (Gavin Lambert) Date: Tue, 26 Jan 2010 09:14:59 +1300 Subject: [antlr-interest] antlrworks interpreter like serialized parse tree In-Reply-To: <74DD45A5-DD1E-41C3-819D-2032293EF2A9@palacehotel.org> References: <74DD45A5-DD1E-41C3-819D-2032293EF2A9@palacehotel.org> Message-ID: <20100125201513.24B0A341840F@www.antlr.org> At 01:17 26/01/2010, Candide Kemmler wrote: >I love the parse tree representation that antlrWorks presents and >getting such a structure would be ideal for my use case. However I >can't seem to find a way to create a similar representation of the >parse tree using the API. Normally you don't really want to generate the parse tree as shown in ANTLRworks -- that's purely a debugging aid. For production use you're better off generating an AST instead; this way you have more control over the output and you can (among other things) refactor your parser without altering the output if you want to. See the output=AST option and the various example grammars, wiki pages, and book chapters about AST construction. From candide at palacehotel.org Mon Jan 25 12:42:39 2010 From: candide at palacehotel.org (Candide Kemmler) Date: Mon, 25 Jan 2010 21:42:39 +0100 Subject: [antlr-interest] antlrworks interpreter like serialized parse tree In-Reply-To: <20100125201514.DB2F5952049@ns1.jwhosting.eu> References: <74DD45A5-DD1E-41C3-819D-2032293EF2A9@palacehotel.org> <20100125201514.DB2F5952049@ns1.jwhosting.eu> Message-ID: <4DC81062-2BE2-4D24-95E5-86F21821FF2F@palacehotel.org> Yes that's already what I'm doing but the AST (in the form of a CommonTree) is only really giving me the leaf tokens without the intermediary branches corresponding to the rules that "recognized" my programs. I have attached an example of a test grammar to illustrate what I mean. The sample sentence where multiple rules are fired ("location", "when", "where",...) are shown in a nice hierarchy in AntlrWorks whereas in the debugger in Eclipse I can only see a flat structure where the root tree has a boring set of 6 children each corresponding to the final tokens of my sentence. -------------- next part -------------- A non-text attachment was scrubbed... Name: astantlrworks.png Type: image/png Size: 11893 bytes Desc: not available Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100125/9a892bf1/attachment.png -------------- next part -------------- A non-text attachment was scrubbed... Name: astdebugger.png Type: image/png Size: 25082 bytes Desc: not available Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20100125/9a892bf1/attachment-0001.png -------------- next part -------------- Or maybe I am completely missing the point here... On 25 Jan 2010, at 21:14, Gavin Lambert wrote: > At 01:17 26/01/2010, Candide Kemmler wrote: > >I love the parse tree representation that antlrWorks presents and > >getting such a structure would be ideal for my use case. However I > >can't seem to find a way to create a similar representation of the > >parse tree using the API. > > Normally you don't really want to generate the parse tree as shown in ANTLRworks -- that's purely a debugging aid. For production use you're better off generating an AST instead; this way you have more control over the output and you can (among other things) refactor your parser without altering the output if you want to. > > See the output=AST option and the various example grammars, wiki pages, and book chapters about AST construction. > From killebrew.daniel at gmail.com Mon Jan 25 17:52:16 2010 From: killebrew.daniel at gmail.com (Daniel Killebrew) Date: Mon, 25 Jan 2010 17:52:16 -0800 Subject: [antlr-interest] two ways to match nothing In-Reply-To: <4B318151.20107@gmail.com> References: <4B318151.20107@gmail.com> Message-ID: <4B5E4AD0.5050000@gmail.com> Antlr doesn't like it when there are multiple ways to match nothing. It says there's an error in my grammar because the second "alternative" (which is another way to match nothing) will never match. Antlr can enter the optional (...)? element and match nothing, or skip the optional element, thus matching nothing. example: naughty_rule : Start (A? List*)? End ; Start : 'start'; A : 'aaa'; End : 'end'; List : 'list'; Rewritten so Antlr is happy good_rule : Start End | Start A List* End | Start List+ End ; While I can rewrite my grammar easily enough, it seems odd that Antlr doesn't recognize that it's trying to match nothing in two different ways, so who cares if it can't match the second alternative. That shouldn't be an error. If it's a warning, I could understand that. To make it the user rewrite their code into something less legible seems to be opposite of the usual 'Antlr way'. Although I guess this would require making the code a little more complicated to detect this special case, so perhaps this was already considered. Cheers Daniel From killebrew.daniel at gmail.com Mon Jan 25 18:10:43 2010 From: killebrew.daniel at gmail.com (Daniel Killebrew) Date: Mon, 25 Jan 2010 18:10:43 -0800 Subject: [antlr-interest] two ways to match nothing In-Reply-To: <4B5E4D75.7020305@kjchome.homeip.net> References: <4B318151.20107@gmail.com> <4B5E4AD0.5050000@gmail.com> <4B5E4D75.7020305@kjchome.homeip.net> Message-ID: <4B5E4F23.5080103@gmail.com> Doh, thanks for pointing that out Kevin. Ignore my silliness, everyone :) I got caught up transcribing a parser into Antlr and overlooked this simple, obvious transformation. Daniel On 1/25/2010 6:03 PM, Kevin J. Cummings wrote: > On 01/25/2010 08:52 PM, Daniel Killebrew wrote: > >> Antlr doesn't like it when there are multiple ways to match nothing. It >> says there's an error in my grammar because the second "alternative" >> (which is another way to match nothing) will never match. >> Antlr can enter the optional (...)? element and match nothing, or skip >> the optional element, thus matching nothing. >> >> example: >> >> naughty_rule >> : Start (A? List*)? End >> ; >> > Why can't you just rewrite naughty_rule as: > > good_rule > : Start A? List* End > ; > > I think the outer ()? is what was confusing antlr.... > > >> Start : 'start'; >> A : 'aaa'; >> End : 'end'; >> List : 'list'; >> >> Rewritten so Antlr is happy >> good_rule >> : Start End >> | Start A List* End >> | Start List+ End >> ; >> >> While I can rewrite my grammar easily enough, it seems odd that Antlr >> doesn't recognize that it's trying to match nothing in two different >> ways, so who cares if it can't match the second alternative. That >> shouldn't be an error. If it's a warning, I could understand that. To >> make it the user rewrite their code into something less legible seems to >> be opposite of the usual 'Antlr way'. Although I guess this would >> require making the code a little more complicated to detect this special >> case, so perhaps this was already considered. >> >> Cheers >> Daniel >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> > > From hvmedhaj at gmail.com Mon Jan 25 20:47:05 2010 From: hvmedhaj at gmail.com (venkat medhaj) Date: Mon, 25 Jan 2010 23:47:05 -0500 Subject: [antlr-interest] how to construct an AST ? Message-ID: Hi, I am a newbie to ANTLR and I am learning to use Antlr lately. I want to generate an AST for the .g file i.e the grammar file available, the target language being Java 1.6. Can anyone please tell me how to proceed ? I find it a bit confusing. Thnks, -V From jeff.wilcox at mac.com Tue Jan 26 06:52:18 2010 From: jeff.wilcox at mac.com (Jeff Wilcox) Date: Tue, 26 Jan 2010 06:52:18 -0800 Subject: [antlr-interest] Disabling rules in the lexer Message-ID: <33AFEA62-2C42-4174-B149-06D2025628F9@mac.com> Hi, I have a special area in this language that has symbols within a table structure that are normally used in other tokens in other areas of the language (like a couple digits, a couple letters and a couple symbols). So I am trying to setup the lexer to accept these table tokens only when in a table. Based on what I have been able to dig up, I believe gated semantic predicates are a valid way to disable rules in the lexer. However, I am seeing issues with this with ANTLR 3.2 and the java language target. So I expected a lexer rules like this to do the trick: Level0 : {inTable}?=> '0'; But that actually creates a very strange loop when inTable is false. I basically throws a FailedPredicateException (which I would not have expected for a gated predicate) and then retries the same token with the same rule, obviously resulting in an infinite loop. Can someone clarify whether this is allowed and if so whether there is some trick to using it? I am stumped. Thanks Jeff From csp7kk3 at cs.ucy.ac.cy Tue Jan 26 07:34:33 2010 From: csp7kk3 at cs.ucy.ac.cy (Konstantinos Kakousis) Date: Tue, 26 Jan 2010 17:34:33 +0200 Subject: [antlr-interest] Test class for tree grammars Message-ID: <4B5F0B89.8080206@cs.ucy.ac.cy> Hello, I have my grammar and tree grammars working exactly as expected at the AntlrWorks. Now I was trying to run from console or Eclipse the same grammar using the following Test.java class: import org.antlr.runtime.ANTLRStringStream; import org.antlr.runtime.CommonTokenStream; import org.antlr.runtime.RecognitionException; import org.antlr.runtime.tree.CommonTree; import org.antlr.runtime.tree.CommonTreeNodeStream; import org.antlr.runtime.tree.Tree; public class Test{ public static void main (String[] args){ try{ String in = "5+6*7"; ANTLRStringStream input = new ANTLRStringStream(in); UtilityLexer lexer = new UtilityLexer(input); CommonTokenStream tokens = new CommonTokenStream(lexer); UtilityParser parser = new UtilityParser(tokens); UtilityParser.prog_return r = null; r = parser.prog(); CommonTree t = (CommonTree)r.getTree(); // get tree from parser System.out.println("Parse Tree:"+t.toStringTree()); CommonTreeNodeStream nodes = new CommonTreeNodeStream(t); System.out.println ("Here1"); nodes.setTokenStream(tokens); System.out.println ("Here2"); UtilTree walker = new UtilTree(nodes); System.out.println ("Here3"); walker.prog(); System.out.println ("Here4"); } catch (RecognitionException e) { e.printStackTrace(); } } } From the output: Parse Tree:(+ 5 (* 6 7)) Here1 Here2 It seems that the programs hangs on the following command: UtilTree walker = new UtilTree(nodes); Is there somewhere a standard Test.java class for running the generated grammars? Is there something wrong with the above class? BR, -- Konstantinos Kakousis Research Associate Department of Computer Science University of Cyprus Address: P.O. Box 20537, CY-1678, Nicosia, Cyprus Tel: +357 22892684 Fax: +357 22892701 Webpage: http://www.cs.ucy.ac.cy/~csp7kk3 Email: mailto://kakousis at cs.ucy.ac.cy Skype: callto://costas.kakousis From kfeuerherm at wlu.ca Tue Jan 26 09:51:54 2010 From: kfeuerherm at wlu.ca (Karljurgen Feuerherm) Date: Tue, 26 Jan 2010 12:51:54 -0500 Subject: [antlr-interest] Running ANTLRWorks 1.3.1 -- javac error Message-ID: <4B5EE56A020000CC0001CF38@wlgw07.wlu.ca> Hello, I'm new to this product (and to modern products of this type generally... was a B programmer in the early 80s and trying to get updated!) I'm on Windows XP, and have run the JAR file to invoke ANTLRWorks. I'm trying out the Expression Evaluator Tutorial. Interpreter works fine, but invoking the debugger gets me "java.IO.IOException: Cannot run program "javac": CreateProcess error=2, the system cannot find the file specified" (Oddly, after a while, trying it again got me a different error about timeout, even though I'd changed nothing [Sure. Famous Last Words, eh?].) Not sure where to go from here... By all means be pedantic in a response :) Thanks! K Karlj?rgen G. Feuerherm, PhD Department of Archaeology and Classical Studies Wilfrid Laurier University 75 University Avenue West Waterloo, Ontario N2L 3C5 Tel. (519) 884-1970 x3193 Fax (519) 883-0991 (ATTN Arch. & Classics) From bkiers at gmail.com Tue Jan 26 10:19:59 2010 From: bkiers at gmail.com (Bart Kiers) Date: Tue, 26 Jan 2010 19:19:59 +0100 Subject: [antlr-interest] Running ANTLRWorks 1.3.1 -- javac error In-Reply-To: <4B5EE56A020000CC0001CF38@wlgw07.wlu.ca> References: <4B5EE56A020000CC0001CF38@wlgw07.wlu.ca> Message-ID: Karlj?rgen, In order to run ANTLRWorks, you do not need 'javac', but 'java'. 'javac' is the compiler that will compile java source files into byte codes that the JRE (Java Runtime Environment) interprets/executes. 'java' is the application that executes the byte codes produced by 'javac'. Since ANTLRWorks is already compiled, you only need 'java'. So, on the command line, give the following command: java -jar antlrworks-1.3.1.jar If the above does not work, please post the exact error message(s) on the list. Thanks. Bart. On Tue, Jan 26, 2010 at 6:51 PM, Karljurgen Feuerherm wrote: > Hello, > > I'm new to this product (and to modern products of this type > generally... was a B programmer in the early 80s and trying to get > updated!) > > I'm on Windows XP, and have run the JAR file to invoke ANTLRWorks. > > I'm trying out the Expression Evaluator Tutorial. Interpreter works > fine, but invoking the debugger gets me > > "java.IO.IOException: Cannot run program "javac": CreateProcess > error=2, the system cannot find the file specified" > > (Oddly, after a while, trying it again got me a different error about > timeout, even though I'd changed nothing [Sure. Famous Last Words, > eh?].) > > Not sure where to go from here... By all means be pedantic in a > response :) > > Thanks! > > K > > Karlj?rgen G. Feuerherm, PhD > Department of Archaeology and Classical Studies > Wilfrid Laurier University > 75 University Avenue West > Waterloo, Ontario N2L 3C5 > Tel. (519) 884-1970 x3193 > Fax (519) 883-0991 (ATTN Arch. & Classics) > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From bkiers at gmail.com Tue Jan 26 10:40:52 2010 From: bkiers at gmail.com (Bart Kiers) Date: Tue, 26 Jan 2010 19:40:52 +0100 Subject: [antlr-interest] Running ANTLRWorks 1.3.1 -- javac error In-Reply-To: <1332b72e1001261037w341273e4kdedd7de7ccc1317@mail.gmail.com> References: <4B5EE56A020000CC0001CF38@wlgw07.wlu.ca> <1332b72e1001261037w341273e4kdedd7de7ccc1317@mail.gmail.com> Message-ID: On Tue, Jan 26, 2010 at 7:37 PM, Andreas Stefik wrote: > I think he's asking why the debugger throws errors with javac, not how to > start antlrworks. > > The error you are seeing is because the antlrworks debugger, far as I > understand it, needs a java compiler to actually debug a grammar. As such, > you need to put the path to your javac compiler in the path field in > antlrworks. This is straightforward to do: > > 1. Open up the options window. I'm on mac at the moment, which is in > preferences, but on windows it is similar. > 2. Go to the tab labeled compiler and look for where it says javac. > 3. Check path under javac, then click browse and a window should appear. > 4. Browse to where javac is located. > > As I'm on mac, the paths are different, but if I recall correctly, on > windows javac is in program files, so it would be something "like" > > c:\program files\Java\bin\javac.exe > > That path might not be correct, but I don't have a windows box on me to > give it to you exactly. Should be close though and if you browse around you > should find it. > > The last detail is that, if you can't find javac, you may not have the JDK > installed (java.sun.com), so you'll need to do that. It's just a little > installer, so there's nothing fancy to do. You can know for sure whether you > have it by going to the command line and typing: > > javac > > if it throws an error, you need the JDK. If it's there, you will see a > bunch of information put out to the terminal. > > Hope that helps, > > Andreas Stefik, Ph.D. > Assistant Professor > Department of Computer Science > Southern Illinois University Edwardsville > > > > On Tue, Jan 26, 2010 at 12:19 PM, Bart Kiers wrote: > >> Karlj?rgen, >> >> In order to run ANTLRWorks, you do not need 'javac', but 'java'. >> >> 'javac' is the compiler that will compile java source files into byte >> codes >> that the JRE (Java Runtime Environment) interprets/executes. >> >> 'java' is the application that executes the byte codes produced by >> 'javac'. >> Since ANTLRWorks is already compiled, you only need 'java'. >> >> So, on the command line, give the following command: >> >> java -jar antlrworks-1.3.1.jar >> >> If the above does not work, please post the exact error message(s) on the >> list. >> >> Thanks. >> >> Bart. >> >> >> On Tue, Jan 26, 2010 at 6:51 PM, Karljurgen Feuerherm > >wrote: >> >> > Hello, >> > >> > I'm new to this product (and to modern products of this type >> > generally... was a B programmer in the early 80s and trying to get >> > updated!) >> > >> > I'm on Windows XP, and have run the JAR file to invoke ANTLRWorks. >> > >> > I'm trying out the Expression Evaluator Tutorial. Interpreter works >> > fine, but invoking the debugger gets me >> > >> > "java.IO.IOException: Cannot run program "javac": CreateProcess >> > error=2, the system cannot find the file specified" >> > >> > (Oddly, after a while, trying it again got me a different error about >> > timeout, even though I'd changed nothing [Sure. Famous Last Words, >> > eh?].) >> > >> > Not sure where to go from here... By all means be pedantic in a >> > response :) >> > >> > Thanks! >> > >> > K >> > >> > Karlj?rgen G. Feuerherm, PhD >> > Department of Archaeology and Classical Studies >> > Wilfrid Laurier University >> > 75 University Avenue West >> > Waterloo, Ontario N2L 3C5 >> > Tel. (519) 884-1970 x3193 >> > Fax (519) 883-0991 (ATTN Arch. & Classics) >> > >> > List: http://www.antlr.org/mailman/listinfo/antlr-interest >> > Unsubscribe: >> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> > >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address >> > > From stefika at gmail.com Tue Jan 26 13:46:46 2010 From: stefika at gmail.com (Andreas Stefik) Date: Tue, 26 Jan 2010 15:46:46 -0600 Subject: [antlr-interest] Running ANTLRWorks 1.3.1 -- javac error In-Reply-To: References: <4B5EE56A020000CC0001CF38@wlgw07.wlu.ca> <1332b72e1001261037w341273e4kdedd7de7ccc1317@mail.gmail.com> Message-ID: <1332b72e1001261346l46901804u60c4d70933cd7995@mail.gmail.com> JDK means the same thing as SDK does for Java (Java Development Kit). JDK 6 U18 is the correct thing to have downloaded, so you are on the right track. After installing it, you definitely should have javac and it should be in your environment variables. Might be obvious, but have you tried restarting? Andreas Stefik, Ph.D. Assistant Professor Department of Computer Science Southern Illinois University Edwardsville On Tue, Jan 26, 2010 at 3:41 PM, Karljurgen Feuerherm wrote: > Hi > > Thanks, that makes a lot more sense. > > Now, oddly, I found and downloaded JDK 6 U18, and installed it... and javac > is still not found. I'm hunting around for documentation on the site, but > there are so many different options... > > Maybe I need an SDK? > > K > > Karlj?rgen G. Feuerherm, PhD > Department of Archaeology and Classical Studies > Wilfrid Laurier University > 75 University Avenue West > Waterloo, Ontario N2L 3C5 > Tel. (519) 884-1970 x3193 > Fax (519) 883-0991 (ATTN Arch. & Classics) > > >>> Bart Kiers 26/01/2010 1:40 pm >>> > > On Tue, Jan 26, 2010 at 7:37 PM, Andreas Stefik <* stefika at gmail.com* > > wrote: > > > I think he's asking why the debugger throws errors with javac, not how to > > start antlrworks. > > > > The error you are seeing is because the antlrworks debugger, far as I > > understand it, needs a java compiler to actually debug a grammar. As > such, > > you need to put the path to your javac compiler in the path field in > > antlrworks. This is straightforward to do: > > > > 1. Open up the options window. I'm on mac at the moment, which is in > > preferences, but on windows it is similar. > > 2. Go to the tab labeled compiler and look for where it says javac. > > 3. Check path under javac, then click browse and a window should appear. > > 4. Browse to where javac is located. > > > > As I'm on mac, the paths are different, but if I recall correctly, on > > windows javac is in program files, so it would be something "like" > > > > c:\program files\Java\bin\javac.exe > > > > That path might not be correct, but I don't have a windows box on me to > > give it to you exactly. Should be close though and if you browse around > you > > should find it. > > > > The last detail is that, if you can't find javac, you may not have the > JDK > > installed (java.sun.com), so you'll need to do that. It's just a little > > installer, so there's nothing fancy to do. You can know for sure whether > you > > have it by going to the command line and typing: > > > > javac > > > > if it throws an error, you need the JDK. If it's there, you will see a > > bunch of information put out to the terminal. > > > > Hope that helps, > > > > Andreas Stefik, Ph.D. > > Assistant Professor > > Department of Computer Science > > Southern Illinois University Edwardsville > > > > > > > > On Tue, Jan 26, 2010 at 12:19 PM, Bart Kiers <* bkiers at gmail.com* > > wrote: > > > >> Karlj?rgen, > >> > >> In order to run ANTLRWorks, you do not need 'javac', but 'java'. > >> > >> 'javac' is the compiler that will compile java source files into byte > >> codes > >> that the JRE (Java Runtime Environment) interprets/executes. > >> > >> 'java' is the application that executes the byte codes produced by > >> 'javac'. > >> Since ANTLRWorks is already compiled, you only need 'java'. > >> > >> So, on the command line, give the following command: > >> > >> java -jar antlrworks-1.3.1.jar > >> > >> If the above does not work, please post the exact error message(s) on > the > >> list. > >> > >> Thanks. > >> > >> Bart. > >> > >> > >> On Tue, Jan 26, 2010 at 6:51 PM, Karljurgen Feuerherm <* > kfeuerherm at wlu.ca* > >> >wrote: > >> > >> > Hello, > >> > > >> > I'm new to this product (and to modern products of this type > >> > generally... was a B programmer in the early 80s and trying to get > >> > updated!) > >> > > >> > I'm on Windows XP, and have run the JAR file to invoke ANTLRWorks. > >> > > >> > I'm trying out the Expression Evaluator Tutorial. Interpreter works > >> > fine, but invoking the debugger gets me > >> > > >> > "java.IO.IOException: Cannot run program "javac": CreateProcess > >> > error=2, the system cannot find the file specified" > >> > > >> > (Oddly, after a while, trying it again got me a different error about > >> > timeout, even though I'd changed nothing [Sure. Famous Last Words, > >> > eh?].) > >> > > >> > Not sure where to go from here... By all means be pedantic in a > >> > response :) > >> > > >> > Thanks! > >> > > >> > K > >> > > >> > Karlj?rgen G. Feuerherm, PhD > >> > Department of Archaeology and Classical Studies > >> > Wilfrid Laurier University > >> > 75 University Avenue West > >> > Waterloo, Ontario N2L 3C5 > >> > Tel. (519) 884-1970 x3193 > >> > Fax (519) 883-0991 (ATTN Arch. & Classics) > >> > > >> > List: *http://www.antlr.org/mailman/listinfo/antlr-interest* > >> > Unsubscribe: > >> > * > http://www.antlr.org/mailman/options/antlr-interest/your-email-address* > >> > > >> > >> List: *http://www.antlr.org/mailman/listinfo/antlr-interest* > >> Unsubscribe: > >> *http://www.antlr.org/mailman/options/antlr-interest/your-email-address > * > >> > > > > > > List: *http://www.antlr.org/mailman/listinfo/antlr-interest* > Unsubscribe: * > http://www.antlr.org/mailman/options/antlr-interest/your-email-address* > From stefika at gmail.com Tue Jan 26 14:35:56 2010 From: stefika at gmail.com (Andreas Stefik) Date: Tue, 26 Jan 2010 16:35:56 -0600 Subject: [antlr-interest] Got it! In-Reply-To: <4B5F26C1020000CC0001D02D@wlgw07.wlu.ca> References: <4B5EE56A020000CC0001CF38@wlgw07.wlu.ca> <1332b72e1001261037w341273e4kdedd7de7ccc1317@mail.gmail.com> <1332b72e1001261346l46901804u60c4d70933cd7995@mail.gmail.com> <4B5F26C1020000CC0001D02D@wlgw07.wlu.ca> Message-ID: <1332b72e1001261435x69aa6b26qa8ee8dbdbeada49e@mail.gmail.com> No problem and best of luck. ANTLR is a great parsing tool. In my view, much easier to use than many of the alternatives, so hopefully you have a good time hacking away. Andreas Stefik, Ph.D. Assistant Professor Department of Computer Science Southern Illinois University Edwardsville On Tue, Jan 26, 2010 at 4:30 PM, Karljurgen Feuerherm wrote: > hi > > thanks for your patience :) > > the second instal did create the directory. a reboot didn't change the fact > that the environment variable is not set globally... however, following your > instructions I was able to set the patch to C:\Program > Files\Java\jdk1.6.0_18\bin and now it seems to work. > > i have no idea why the installation didn't work properly the first time... > maybe all that fiddling trying to change the options fixed it, who knows. in > any case, one problem down. > > i appreciate your help! now let's see whether i can come up with some REAL > problems...! > > Best > > K > > Karlj?rgen G. Feuerherm, PhD > Department of Archaeology and Classical Studies > Wilfrid Laurier University > 75 University Avenue West > Waterloo, Ontario N2L 3C5 > Tel. (519) 884-1970 x3193 > Fax (519) 883-0991 (ATTN Arch. & Classics) > > >>> Andreas Stefik 26/01/2010 4:46 pm >>> > JDK means the same thing as SDK does for Java (Java Development Kit). JDK 6 > U18 is the correct thing to have downloaded, so you are on the right track. > > After installing it, you definitely should have javac and it should be in > your environment variables. Might be obvious, but have you tried restarting? > > Andreas Stefik, Ph.D. > Assistant Professor > Department of Computer Science > Southern Illinois University Edwardsville > > > On Tue, Jan 26, 2010 at 3:41 PM, Karljurgen Feuerherm wrote: > >> Hi >> >> Thanks, that makes a lot more sense. >> >> Now, oddly, I found and downloaded JDK 6 U18, and installed it... and >> javac is still not found. I'm hunting around for documentation on the site, >> but there are so many different options... >> >> Maybe I need an SDK? >> >> K >> >> Karlj?rgen G. Feuerherm, PhD >> Department of Archaeology and Classical Studies >> Wilfrid Laurier University >> 75 University Avenue West >> Waterloo, Ontario N2L 3C5 >> Tel. (519) 884-1970 x3193 >> Fax (519) 883-0991 (ATTN Arch. & Classics) >> >> >>> Bart Kiers 26/01/2010 1:40 pm >>> >> >> On Tue, Jan 26, 2010 at 7:37 PM, Andreas Stefik <* stefika at gmail.com* > >> wrote: >> >> > I think he's asking why the debugger throws errors with javac, not how >> to >> > start antlrworks. >> > >> > The error you are seeing is because the antlrworks debugger, far as I >> > understand it, needs a java compiler to actually debug a grammar. As >> such, >> > you need to put the path to your javac compiler in the path field in >> > antlrworks. This is straightforward to do: >> > >> > 1. Open up the options window. I'm on mac at the moment, which is in >> > preferences, but on windows it is similar. >> > 2. Go to the tab labeled compiler and look for where it says javac. >> > 3. Check path under javac, then click browse and a window should appear. >> > 4. Browse to where javac is located. >> > >> > As I'm on mac, the paths are different, but if I recall correctly, on >> > windows javac is in program files, so it would be something "like" >> > >> > c:\program files\Java\bin\javac.exe >> > >> > That path might not be correct, but I don't have a windows box on me to >> > give it to you exactly. Should be close though and if you browse around >> you >> > should find it. >> > >> > The last detail is that, if you can't find javac, you may not have the >> JDK >> > installed (java.sun.com), so you'll need to do that. It's just a little >> > installer, so there's nothing fancy to do. You can know for sure whether >> you >> > have it by going to the command line and typing: >> > >> > javac >> > >> > if it throws an error, you need the JDK. If it's there, you will see a >> > bunch of information put out to the terminal. >> > >> > Hope that helps, >> > >> > Andreas Stefik, Ph.D. >> > Assistant Professor >> > Department of Computer Science >> > Southern Illinois University Edwardsville >> > >> > >> > >> > On Tue, Jan 26, 2010 at 12:19 PM, Bart Kiers <* bkiers at gmail.com* > >> wrote: >> > >> >> Karlj?rgen, >> >> >> >> In order to run ANTLRWorks, you do not need 'javac', but 'java'. >> >> >> >> 'javac' is the compiler that will compile java source files into byte >> >> codes >> >> that the JRE (Java Runtime Environment) interprets/executes. >> >> >> >> 'java' is the application that executes the byte codes produced by >> >> 'javac'. >> >> Since ANTLRWorks is already compiled, you only need 'java'. >> >> >> >> So, on the command line, give the following command: >> >> >> >> java -jar antlrworks-1.3.1.jar >> >> >> >> If the above does not work, please post the exact error message(s) on >> the >> >> list. >> >> >> >> Thanks. >> >> >> >> Bart. >> >> >> >> >> >> On Tue, Jan 26, 2010 at 6:51 PM, Karljurgen Feuerherm <* >> kfeuerherm at wlu.ca* >> >> >wrote: >> >> >> >> > Hello, >> >> > >> >> > I'm new to this product (and to modern products of this type >> >> > generally... was a B programmer in the early 80s and trying to get >> >> > updated!) >> >> > >> >> > I'm on Windows XP, and have run the JAR file to invoke ANTLRWorks. >> >> > >> >> > I'm trying out the Expression Evaluator Tutorial. Interpreter works >> >> > fine, but invoking the debugger gets me >> >> > >> >> > "java.IO.IOException: Cannot run program "javac": CreateProcess >> >> > error=2, the system cannot find the file specified" >> >> > >> >> > (Oddly, after a while, trying it again got me a different error about >> >> > timeout, even though I'd changed nothing [Sure. Famous Last Words, >> >> > eh?].) >> >> > >> >> > Not sure where to go from here... By all means be pedantic in a >> >> > response :) >> >> > >> >> > Thanks! >> >> > >> >> > K >> >> > >> >> > Karlj?rgen G. Feuerherm, PhD >> >> > Department of Archaeology and Classical Studies >> >> > Wilfrid Laurier University >> >> > 75 University Avenue West >> >> > Waterloo, Ontario N2L 3C5 >> >> > Tel. (519) 884-1970 x3193 >> >> > Fax (519) 883-0991 (ATTN Arch. & Classics) >> >> > >> >> > List: *http://www.antlr.org/mailman/listinfo/antlr-interest* >> >> > Unsubscribe: >> >> > * >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address* >> >> > >> >> >> >> List: *http://www.antlr.org/mailman/listinfo/antlr-interest* >> >> Unsubscribe: >> >> * >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address* >> >> >> > >> > >> >> List: *http://www.antlr.org/mailman/listinfo/antlr-interest* >> Unsubscribe: * >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address* >> > > From parrt at cs.usfca.edu Tue Jan 26 14:57:40 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Tue, 26 Jan 2010 14:57:40 -0800 Subject: [antlr-interest] better error messages in tree parsers Message-ID: <604A33CC-D57B-4C48-ADDE-75331132609D@cs.usfca.edu> Hi, a reminder that debugging tree grammars can be a bitch. I like to override standard messaging to spew lots of stuff. E.g., i like this kind of thing: ASTVerifier.g: node from after line 150:17 [grammarSpec, rules, rule, altListAsBlock, altList, alternative, elements, element, ebnf, block, altList, alternative] no viable alt; token=[@-1,0:0='ALT',<84>,0:-1] (decision=24 state 3) decision=<<>> context=...DOWN BLOCK DOWN >>>ALT<<< DOWN DOC_COMMENT... Here's my code: public String getErrorMessage(RecognitionException e, String[] tokenNames) { List stack = getRuleInvocationStack(e, this.getClass().getName()); String msg = null; String inputContext = ((Tree)input.LT(-3)).getText()+" "+ ((Tree)input.LT(-2)).getText()+" "+ ((Tree)input.LT(-1)).getText()+" >>>"+ ((Tree)input.LT(1)).getText()+"<<< "+ ((Tree)input.LT(2)).getText()+" "+ ((Tree)input.LT(3)).getText(); if ( e instanceof NoViableAltException ) { NoViableAltException nvae = (NoViableAltException)e; msg = " no viable alt; token="+e.token+ " (decision="+nvae.decisionNumber+ " state "+nvae.stateNumber+")"+ " decision=<<"+nvae.grammarDecisionDescription+">>"; } else { msg = super.getErrorMessage(e, tokenNames); } return stack+" "+msg+" context=..."+inputContext+"..."; } public String getTokenErrorDisplay(Token t) { return t.toString(); } Ter From kferrio at gmail.com Tue Jan 26 18:00:51 2010 From: kferrio at gmail.com (kferrio at gmail.com) Date: Wed, 27 Jan 2010 02:00:51 +0000 Subject: [antlr-interest] better error messages in tree parsers In-Reply-To: <604A33CC-D57B-4C48-ADDE-75331132609D@cs.usfca.edu> References: <604A33CC-D57B-4C48-ADDE-75331132609D@cs.usfca.edu> Message-ID: <177143290-1264557652-cardhu_decombobulator_blackberry.rim.net-951491048-@bda428.bisx.prod.on.blackberry> ROTFL! Thanks for calling it as you see it. I feel a little less na?ve now, knowing that you have "issues" with debugging. Thanks for the nice example too! Kyle Sent from my Verizon Wireless BlackBerry -----Original Message----- From: Terence Parr Date: Tue, 26 Jan 2010 14:57:40 To: antlr-interest at antlr.org interest Subject: [antlr-interest] better error messages in tree parsers Hi, a reminder that debugging tree grammars can be a bitch. I like to override standard messaging to spew lots of stuff. E.g., i like this kind of thing: ASTVerifier.g: node from after line 150:17 [grammarSpec, rules, rule, altListAsBlock, altList, alternative, elements, element, ebnf, block, altList, alternative] no viable alt; token=[@-1,0:0='ALT',<84>,0:-1] (decision=24 state 3) decision=<<>> context=...DOWN BLOCK DOWN >>>ALT<<< DOWN DOC_COMMENT... Here's my code: public String getErrorMessage(RecognitionException e, String[] tokenNames) { List stack = getRuleInvocationStack(e, this.getClass().getName()); String msg = null; String inputContext = ((Tree)input.LT(-3)).getText()+" "+ ((Tree)input.LT(-2)).getText()+" "+ ((Tree)input.LT(-1)).getText()+" >>>"+ ((Tree)input.LT(1)).getText()+"<<< "+ ((Tree)input.LT(2)).getText()+" "+ ((Tree)input.LT(3)).getText(); if ( e instanceof NoViableAltException ) { NoViableAltException nvae = (NoViableAltException)e; msg = " no viable alt; token="+e.token+ " (decision="+nvae.decisionNumber+ " state "+nvae.stateNumber+")"+ " decision=<<"+nvae.grammarDecisionDescription+">>"; } else { msg = super.getErrorMessage(e, tokenNames); } return stack+" "+msg+" context=..."+inputContext+"..."; } public String getTokenErrorDisplay(Token t) { return t.toString(); } Ter List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From wclodius at los-alamos.net Tue Jan 26 19:58:17 2010 From: wclodius at los-alamos.net (William B. Clodius) Date: Tue, 26 Jan 2010 20:58:17 -0700 Subject: [antlr-interest] Disabling rules in the lexer In-Reply-To: <33AFEA62-2C42-4174-B149-06D2025628F9@mac.com> References: <33AFEA62-2C42-4174-B149-06D2025628F9@mac.com> Message-ID: <0CB6E93D-F815-47C4-A870-8A2C9C0E83A7@los-alamos.net> Generally don't try to be too restrictive with your lexer and parser. This sort of context dependence is more naturally handled in the semantic analysis. In particular error reporting is much better if you accept things that are ultimately illegal in the lexer and parser and determine whether they are they are illegal in the semantic analysis. Instead of a minimal message such as "Illegal token" you can report "Illegal token for the table structure see constraint # in the language definition", or "Token is not one of the set of ..." On Jan 26, 2010, at 7:52 AM, Jeff Wilcox wrote: > Hi, > > I have a special area in this language that has symbols within a table structure that are normally used in other tokens in other areas of the language (like a couple digits, a couple letters and a couple symbols). So I am trying to setup the lexer to accept these table tokens only when in a table. Based on what I have been able to dig up, I believe gated semantic predicates are a valid way to disable rules in the lexer. However, I am seeing issues with this with ANTLR 3.2 and the java language target. > > So I expected a lexer rules like this to do the trick: > > Level0 : {inTable}?=> '0'; > > But that actually creates a very strange loop when inTable is false. I basically throws a FailedPredicateException (which I would not have expected for a gated predicate) and then retries the same token with the same rule, obviously resulting in an infinite loop. > > Can someone clarify whether this is allowed and if so whether there is some trick to using it? I am stumped. > > Thanks > Jeff From C.P.T.de.Gouw at cwi.nl Wed Jan 27 01:19:19 2010 From: C.P.T.de.Gouw at cwi.nl (Stijn de Gouw) Date: Wed, 27 Jan 2010 10:19:19 +0100 Subject: [antlr-interest] Parsing a sequence of objects Message-ID: <4B600517.3040102@cwi.nl> Given an attribute grammar (with probably only synthesized attributes), instead of parsing a sequence of terminal strings, I want to parse a sequence (array) of (Java) Objects. Each object o has 3 fields: (1) String name (2) Object[] p (3) String c The terminals in the grammar correspond exactly to the name field of an object (each o.name is a terminal), so parsing decisions should be done based on this field (perhaps no lexer is needed?). In the attribute grammar the other two fields of the object must be used as attributes of the terminal (note that the values of these attributes are NOT given by a production in the grammar!! but instead are given (before parsing) in each object), and it must be possible to define the (synthesized) attributes of non-terminals in terms of the attributes of the terminals (namely, the o.p and o.c fields). To make it more clear, consider the following example (I will denote each object as a triple (name, p, c)): Given a sequence of objects ("first", p1, "z"), ("first", p2, "y"), ("last", p3, "z"), ("last", p4, "x") and the attribute grammar S ::= FIRST LAST { $cSet = createset($FIRST.c, $LAST.c); } | FIRST S1=S LAST { $cSet = union($S1.cSet, createset($FIRST.c, $LAST.c)); } where cSet an attribute of type 'set of Strings', createset creates a new set containing its parameters as elements of the set, and union(a,b) returns the union of the sets a and b the parsing of the sequence of objects produces: S.cSet = {"x","y","z"} / | \ / | \ / | \ / | \ / | \ FIRST = ("first",p1,"z") S.cSet = {"y", "z"} LAST = ("last", p4, "x") / \ / \ / \ / \ FIRST = ("first", p2, "y") LAST = ("last", p3, "z") What would be the best way to implement this? Perhaps subclass the antlr.Token class to add the Object[] p and String c fields (if so, what would the best way to create a token stream from the given sequence of objects)? My current approach, which works but is not very elegant, is to 1) Concatenate all name attributes from the objects in the sequence to create a single string S 2) Add the array storing the sequence of objects as a @members variable to the grammar (let's call this array a). 3) In the attribute grammar, one can refer to the terminal attribute Object[] p of "first" by writing 'a[$FIRST.getTokenIndex()].p' where FIRST is a terminal defined in the lexer as FIRST: 'first';. 4) Call the parser with as input the string S formed in step 1 From andre.rutti at gmail.com Wed Jan 27 07:36:02 2010 From: andre.rutti at gmail.com (andre rutti) Date: Wed, 27 Jan 2010 16:36:02 +0100 Subject: [antlr-interest] Python RuntimeError Message-ID: <2132cf931001270736p28eb80bfx432bc5ac38f479dd@mail.gmail.com> Hi, I'm using antlr\antlrworks-1.3 to generate lexer and parser for Python. Using the examples from http://www.antlr.org/wiki/display/ANTLR3/Antlr3PythonTarget When I run Test.py, I get RuntimeError: ANTLR version mismatch: The recognizer has been generated by V3.2 Sep 23, 2009 12:02:23, but this runtime is V3.1.2. Please use the V3.2 Sep 23, 2 009 12:02:23 runtime or higher. Is the Python runtime for V3.2 available ? I tried with antlrworks-1.2.2, but then, I got errors for Eval.g [15:44:31] error(10): internal error: eval tree parse error : :0:0: unexpected AST node: org.antlr.stringtemplate.language.ActionEvaluator.expr(Unknown Source) org.antlr.stringtemplate.language.ActionEvaluator.action(Unknown Source) org.antlr.stringtemplate.language.ASTExpr.evaluateExpression(Unknown Source) org.antlr.stringtemplate.language.ASTExpr.handleExprOptions(Unknown Source) Thanks and regards, Andre From alexander.herz at mytum.de Wed Jan 27 08:23:55 2010 From: alexander.herz at mytum.de (Alexander Herz) Date: Wed, 27 Jan 2010 17:23:55 +0100 Subject: [antlr-interest] antlr grammar+missing symbol Message-ID: <4B60689B.6030805@mytum.de> Hi, I'm trying to debug the python2.5 grammer from the antlr homepage. Compiling it gives an error that "token" is not recognized as a symbol. Where/how should it be defined? Generally, is there a docu or something where I can look up which symbols are provided for the generated classes (so that I can rever to them from inside the grammar)? Thx, Alex -- ------------------------------------------------------- Lehrstuhl I2 Seidl Sprachen und Beschreibungsstrukturen der Informatik Institut fuer Informatik Technische Universitaet Muenchen Boltzmannstrasse 3 85748 Garching http://www2.in.tum.de Telefon: +89 289 181806 Fax: +89 289 18161 ------------------------------------------------------- From parrt at cs.usfca.edu Wed Jan 27 11:03:18 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Wed, 27 Jan 2010 11:03:18 -0800 Subject: [antlr-interest] better error messages in tree parsers In-Reply-To: <177143290-1264557652-cardhu_decombobulator_blackberry.rim.net-951491048-@bda428.bisx.prod.on.blackberry> References: <604A33CC-D57B-4C48-ADDE-75331132609D@cs.usfca.edu> <177143290-1264557652-cardhu_decombobulator_blackberry.rim.net-951491048-@bda428.bisx.prod.on.blackberry> Message-ID: <4ECD2BE4-A90F-42FC-A348-5F38904338CD@cs.usfca.edu> On Jan 26, 2010, at 6:00 PM, kferrio at gmail.com wrote: > ROTFL! Thanks for calling it as you see it. I feel a little less na?ve now, knowing that you have "issues" with debugging. Thanks for the nice example too! :) Added a faq entry. yeah, it's tough for me right now because I'm debugging a tree grammar parsing an AST representing an ANTLR tree grammar. my brain hurts. Ter > > Kyle > > Sent from my Verizon Wireless BlackBerry > > -----Original Message----- > From: Terence Parr > Date: Tue, 26 Jan 2010 14:57:40 > To: antlr-interest at antlr.org interest > Subject: [antlr-interest] better error messages in tree parsers > > Hi, a reminder that debugging tree grammars can be a bitch. I like to override standard messaging to spew lots of stuff. E.g., i like this kind of thing: > > ASTVerifier.g: node from after line 150:17 [grammarSpec, rules, rule, altListAsBlock, altList, alternative, elements, element, ebnf, block, altList, alternative] no viable alt; token=[@-1,0:0='ALT',<84>,0:-1] (decision=24 state 3) decision=<<>> > context=...DOWN BLOCK DOWN >>>ALT<<< DOWN DOC_COMMENT... > > Here's my code: > > public String getErrorMessage(RecognitionException e, > String[] tokenNames) > { > List stack = getRuleInvocationStack(e, this.getClass().getName()); > String msg = null; > String inputContext = > ((Tree)input.LT(-3)).getText()+" "+ > ((Tree)input.LT(-2)).getText()+" "+ > ((Tree)input.LT(-1)).getText()+" >>>"+ > ((Tree)input.LT(1)).getText()+"<<< "+ > ((Tree)input.LT(2)).getText()+" "+ > ((Tree)input.LT(3)).getText(); > if ( e instanceof NoViableAltException ) { > NoViableAltException nvae = (NoViableAltException)e; > msg = " no viable alt; token="+e.token+ > " (decision="+nvae.decisionNumber+ > " state "+nvae.stateNumber+")"+ > " decision=<<"+nvae.grammarDecisionDescription+">>"; > } > else { > msg = super.getErrorMessage(e, tokenNames); > } > return stack+" "+msg+" context=..."+inputContext+"..."; > } > public String getTokenErrorDisplay(Token t) { > return t.toString(); > } > > Ter > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From parrt at cs.usfca.edu Wed Jan 27 11:21:46 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Wed, 27 Jan 2010 11:21:46 -0800 Subject: [antlr-interest] better error messages in tree parsers In-Reply-To: <177143290-1264557652-cardhu_decombobulator_blackberry.rim.net-951491048-@bda428.bisx.prod.on.blackberry> References: <604A33CC-D57B-4C48-ADDE-75331132609D@cs.usfca.edu> <177143290-1264557652-cardhu_decombobulator_blackberry.rim.net-951491048-@bda428.bisx.prod.on.blackberry> Message-ID: <2DAFE674-88D1-4A7F-BAC4-846AA42597E9@cs.usfca.edu> Also note that use use the decision number (24 here) by using -dfa option on antlr and then loading your grammar-dec-24.dot into Graphviz. look at state 3 and you'll see that the token ALT (in this case) has no path to take. Ter > ASTVerifier.g: node from after line 150:17 [grammarSpec, rules, rule, altListAsBlock, altList, alternative, elements, element, ebnf, block, altList, alternative] no viable alt; token=[@-1,0:0='ALT',<84>,0:-1] (decision=24 state 3) decision=<<>> > context=...DOWN BLOCK DOWN >>>ALT<<< DOWN DOC_COMMENT... > > Here's my code: > > public String getErrorMessage(RecognitionException e, > String[] tokenNames) > { > List stack = getRuleInvocationStack(e, this.getClass().getName()); > String msg = null; > String inputContext = > ((Tree)input.LT(-3)).getText()+" "+ > ((Tree)input.LT(-2)).getText()+" "+ > ((Tree)input.LT(-1)).getText()+" >>>"+ > ((Tree)input.LT(1)).getText()+"<<< "+ > ((Tree)input.LT(2)).getText()+" "+ > ((Tree)input.LT(3)).getText(); > if ( e instanceof NoViableAltException ) { > NoViableAltException nvae = (NoViableAltException)e; > msg = " no viable alt; token="+e.token+ > " (decision="+nvae.decisionNumber+ > " state "+nvae.stateNumber+")"+ > " decision=<<"+nvae.grammarDecisionDescription+">>"; > } > else { > msg = super.getErrorMessage(e, tokenNames); > } > return stack+" "+msg+" context=..."+inputContext+"..."; > } > public String getTokenErrorDisplay(Token t) { > return t.toString(); > } > > Ter > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address From gabriel_erzse at yahoo.com Wed Jan 27 12:36:34 2010 From: gabriel_erzse at yahoo.com (Gabriel Erzse) Date: Wed, 27 Jan 2010 12:36:34 -0800 (PST) Subject: [antlr-interest] detecting rubbish data at end of input Message-ID: <596411.1651.qm@web51302.mail.re2.yahoo.com> Hello, When using grammars written in ANTLR, the parser correctly recognizes data from an input stream, but if I have some rubbish text at the end of the input (which rubbish text is not supposed to be parsed by the grammar) the parser does not complain. I guess this behavior is all right (I mean the parser did its job and parsed whatever I said it should parse), but is there any trick to detect when there is any data left in the input after the parser has done its job? Thanks, Gabi. From scott at javadude.com Wed Jan 27 12:38:15 2010 From: scott at javadude.com (Scott Stanchfield) Date: Wed, 27 Jan 2010 15:38:15 -0500 Subject: [antlr-interest] detecting rubbish data at end of input In-Reply-To: <596411.1651.qm@web51302.mail.re2.yahoo.com> References: <596411.1651.qm@web51302.mail.re2.yahoo.com> Message-ID: Add an EOF token to the end of your start rule -- Scott ---------------------------------------- Scott Stanchfield http://javadude.com On Wed, Jan 27, 2010 at 3:36 PM, Gabriel Erzse wrote: > Hello, > > When using grammars written in ANTLR, the parser correctly recognizes > data from an input stream, but if I have some rubbish text at the end of > the input (which rubbish text is not supposed to be parsed by the grammar) > the parser does not complain. > > I guess this behavior is all right (I mean the parser did its job and parsed > whatever I said it should parse), but is there any trick to detect when there > is any data left in the input after the parser has done its job? > > Thanks, > Gabi. > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From jeff.wilcox at mac.com Wed Jan 27 13:22:16 2010 From: jeff.wilcox at mac.com (Jeff Wilcox) Date: Wed, 27 Jan 2010 13:22:16 -0800 Subject: [antlr-interest] Disabling rules in the lexer Message-ID: <29B7ADE0-F06D-4A60-822A-A16E906610E5@mac.com> Yes, I agree with you, and in general this is how my parsers have worked. But there are a couple cases where disabling lexer rules is useful and/or necessary. Disable keywords that exist only in newer versions of the language which could be identifiers in older versions for example; there are other semi tedious ways around that with predicates but it should not be necessary. This case though involves a table section of characters, symbols and numbers. So a N column row of N discrete symbols could otherwise be a single number, a single identifier, a number plus an identifier, etc. So without special casing the lexer, the easiest thing was to accept possible candidates, suck it all into a string a re-parse in the semantic analyzer. But that feels like the wrong solution. In general though, it seems like there is a bug in ANLTR's treatment of gated semantic predicates in the lexer. It does not work unless there are other alternatives in the rule. Is there any other way to completely turn off a rule in the lexer (without throwing a FPE)? Thanks, Jeff On Jan 26, 2010, at 8:58 PM, William B. Clodius wrote: > Generally don't try to be too restrictive with your lexer and parser. This sort of context dependence is more naturally handled in the semantic analysis. In particular error reporting is much better if you accept things that are ultimately illegal in the lexer and parser and determine whether they are they are illegal in the semantic analysis. Instead of a minimal message such as "Illegal token" you can report "Illegal token for the table structure see constraint # in the language definition", or "Token is not one of the set of ..." > > On Jan 26, 2010, at 7:52 AM, Jeff Wilcox wrote: > >> Hi, >> >> I have a special area in this language that has symbols within a table structure that are normally used in other tokens in other areas of the language (like a couple digits, a couple letters and a couple symbols). So I am trying to setup the lexer to accept these table tokens only when in a table. Based on what I have been able to dig up, I believe gated semantic predicates are a valid way to disable rules in the lexer. However, I am seeing issues with this with ANTLR 3.2 and the java language target. >> >> So I expected a lexer rules like this to do the trick: >> >> Level0 : {inTable}?=> '0'; >> >> But that actually creates a very strange loop when inTable is false. I basically throws a FailedPredicateException (which I would not have expected for a gated predicate) and then retries the same token with the same rule, obviously resulting in an infinite loop. >> >> Can someone clarify whether this is allowed and if so whether there is some trick to using it? I am stumped. >> >> Thanks >> Jeff From parrt at cs.usfca.edu Wed Jan 27 21:21:48 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Wed, 27 Jan 2010 21:21:48 -0800 Subject: [antlr-interest] DSL discussion at javaranch.com Message-ID: <93526F1F-A27D-4315-BA68-DD0FE1B3A430@cs.usfca.edu> Hiya. In case you're interested, we're have some interesting discussions about DSL terminology, design, and implementation: http://www.coderanch.com/forums/f-12/IDEs-Version-Control-other-tools Ter From C.P.T.de.Gouw at cwi.nl Fri Jan 29 00:19:18 2010 From: C.P.T.de.Gouw at cwi.nl (C.P.T.de.Gouw at cwi.nl) Date: Fri, 29 Jan 2010 09:19:18 +0100 (CET) Subject: [antlr-interest] Parsing a sequence of objects In-Reply-To: <4B600517.3040102@cwi.nl> References: <4B600517.3040102@cwi.nl> Message-ID: <38399.132.229.128.127.1264753158.squirrel@webmail.cwi.nl> > Given an attribute grammar (with probably only synthesized attributes), > instead of parsing a sequence of terminal strings, I want to parse a > sequence (array) of (Java) Objects. I just noted an old antlr2 blog post, that I think describes exactly what I want: http://www.antlr2.org/blog/antlr3/lexical.tml. The feature I'm interested in is "the parser grammar (or combined grammar) can specify the extra fields for a token, which results in a grammar specific token. Tokens may also have a generate attributes table for dynamically setting attributes, thus, avoiding creation of a million token subclasses." Has this been added to antlr v3? From scott.oakes63 at googlemail.com Fri Jan 29 09:42:53 2010 From: scott.oakes63 at googlemail.com (Scott Oakes) Date: Fri, 29 Jan 2010 17:42:53 +0000 Subject: [antlr-interest] Lexer for floating point numbers + field access syntax with '.' Message-ID: <6e75196e1001290942t546f22b6lafdb030ca239c76@mail.gmail.com> Hi, hoping for some help trying to write a lexer that allows you to recognise floating point literals (2.3) as well as field accesses of the form x.y; see grammar below. The trouble is that an input like 3.fieldAccess Produces two tokens, FLOAT and ID, rather than the desired three, INT, DOT and ID. Pointers would be much appreciated! ------------------- grammar test; top: expr EOF; expr: (INT | FLOAT | ID | '(' expr ')') (DOT ID)*; ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; INT : '0'..'9'+ ; DOT: '.'; FLOAT : ('0'..'9')+ '.' ('0'..'9')* EXPONENT? | '.' ('0'..'9')+ EXPONENT? | ('0'..'9')+ EXPONENT ; WS : ( ' ' | '\t' | '\r' | '\n' ) {$channel=HIDDEN;} ; fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ; From jimi at temporal-wave.com Fri Jan 29 10:02:17 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Fri, 29 Jan 2010 10:02:17 -0800 Subject: [antlr-interest] Lexer for floating point numbers + field access syntax with '.' In-Reply-To: <6e75196e1001290942t546f22b6lafdb030ca239c76@mail.gmail.com> Message-ID: Please see the FAQ and complete grammar at: http://antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs All you need do is add to the predicate here: | // We can of course have 0.nnnnn // { input.LA(2) != '.'}?=> '.' To check : { input.LA(2) != '.' && input.LA(2) >= '0' && input.LA(2) <= '0' }?=> '.' Then remove the empty alt there that allows number forms like 8. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Scott Oakes > Sent: Friday, January 29, 2010 9:43 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Lexer for floating point numbers + field > access syntax with '.' > > Hi, hoping for some help trying to write a lexer that allows you to > recognise floating point literals (2.3) as well as field accesses of > the > form x.y; see grammar below. The trouble is that an input like > > 3.fieldAccess > > Produces two tokens, FLOAT and ID, rather than the desired three, INT, > DOT > and ID. > > Pointers would be much appreciated! > > ------------------- > > grammar test; > > top: expr EOF; > > expr: (INT | FLOAT | ID | '(' expr ')') (DOT ID)*; > > ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* > ; > > INT : '0'..'9'+ > ; > > DOT: '.'; > > FLOAT > : ('0'..'9')+ '.' ('0'..'9')* EXPONENT? > | '.' ('0'..'9')+ EXPONENT? > | ('0'..'9')+ EXPONENT > ; > > WS : ( ' ' > | '\t' > | '\r' > | '\n' > ) {$channel=HIDDEN;} > ; > > fragment > EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ; > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From scott.oakes63 at googlemail.com Fri Jan 29 10:30:09 2010 From: scott.oakes63 at googlemail.com (Scott Oakes) Date: Fri, 29 Jan 2010 18:30:09 +0000 Subject: [antlr-interest] Lexer for floating point numbers + field access syntax with '.' In-Reply-To: References: <6e75196e1001290942t546f22b6lafdb030ca239c76@mail.gmail.com> Message-ID: <6e75196e1001291030w43480359xc73f9e04d3c5225c@mail.gmail.com> Thanks Jim, the link looks very useful, albeit a bit daunting. I tried amending my FLOAT to: FLOAT : ('0'..'9')+ ({input.LA(2) >= '0' && input.LA(2) <= '9'}?=>'.') ('0'..'9')+ EXPONENT? | '.' ('0'..'9')+ EXPONENT? | ('0'..'9')+ EXPONENT ; Unfortunately I get a "rule FLOAT failed predicate" error. On Fri, Jan 29, 2010 at 6:02 PM, Jim Idle wrote: > Please see the FAQ and complete grammar at: > > > http://antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs > > > All you need do is add to the predicate here: > > | // We can of course have 0.nnnnn > // > { input.LA(2) != '.'}?=> '.' > > To check : > > { input.LA(2) != '.' && input.LA(2) >= '0' && input.LA(2) <= '0' }?=> '.' > > Then remove the empty alt there that allows number forms like 8. > > Jim > > > -----Original Message----- > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > > bounces at antlr.org] On Behalf Of Scott Oakes > > Sent: Friday, January 29, 2010 9:43 AM > > To: antlr-interest at antlr.org > > Subject: [antlr-interest] Lexer for floating point numbers + field > > access syntax with '.' > > > > Hi, hoping for some help trying to write a lexer that allows you to > > recognise floating point literals (2.3) as well as field accesses of > > the > > form x.y; see grammar below. The trouble is that an input like > > > > 3.fieldAccess > > > > Produces two tokens, FLOAT and ID, rather than the desired three, INT, > > DOT > > and ID. > > > > Pointers would be much appreciated! > > > > ------------------- > > > > grammar test; > > > > top: expr EOF; > > > > expr: (INT | FLOAT | ID | '(' expr ')') (DOT ID)*; > > > > ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* > > ; > > > > INT : '0'..'9'+ > > ; > > > > DOT: '.'; > > > > FLOAT > > : ('0'..'9')+ '.' ('0'..'9')* EXPONENT? > > | '.' ('0'..'9')+ EXPONENT? > > | ('0'..'9')+ EXPONENT > > ; > > > > WS : ( ' ' > > | '\t' > > | '\r' > > | '\n' > > ) {$channel=HIDDEN;} > > ; > > > > fragment > > EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ; > From jimi at temporal-wave.com Fri Jan 29 10:37:44 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Fri, 29 Jan 2010 10:37:44 -0800 Subject: [antlr-interest] Lexer for floating point numbers + field access syntax with '.' In-Reply-To: <6e75196e1001291030w43480359xc73f9e04d3c5225c@mail.gmail.com> Message-ID: Yes, you need to follow the method in the example - what you are trying to do will not work until you left factor it. Jim From: Scott Oakes [mailto:scott.oakes63 at googlemail.com] Sent: Friday, January 29, 2010 10:30 AM To: Jim Idle Cc: antlr-interest at antlr.org Subject: Re: [antlr-interest] Lexer for floating point numbers + field access syntax with '.' Thanks Jim, the link looks very useful, albeit a bit daunting. I tried amending my FLOAT to: FLOAT : ('0'..'9')+ ({input.LA(2) >= '0' && input.LA(2) <= '9'}?=>'.') ('0'..'9')+ EXPONENT? | '.' ('0'..'9')+ EXPONENT? | ('0'..'9')+ EXPONENT ; Unfortunately I get a "rule FLOAT failed predicate" error. On Fri, Jan 29, 2010 at 6:02 PM, Jim Idle wrote: Please see the FAQ and complete grammar at: http://antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs All you need do is add to the predicate here: | // We can of course have 0.nnnnn // { input.LA(2) != '.'}?=> '.' To check : { input.LA(2) != '.' && input.LA(2) >= '0' && input.LA(2) <= '0' }?=> '.' Then remove the empty alt there that allows number forms like 8. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Scott Oakes > Sent: Friday, January 29, 2010 9:43 AM > To: antlr-interest at antlr.org > Subject: [antlr-interest] Lexer for floating point numbers + field > access syntax with '.' > > Hi, hoping for some help trying to write a lexer that allows you to > recognise floating point literals (2.3) as well as field accesses of > the > form x.y; see grammar below. The trouble is that an input like > > 3.fieldAccess > > Produces two tokens, FLOAT and ID, rather than the desired three, INT, > DOT > and ID. > > Pointers would be much appreciated! > > ------------------- > > grammar test; > > top: expr EOF; > > expr: (INT | FLOAT | ID | '(' expr ')') (DOT ID)*; > > ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* > ; > > INT : '0'..'9'+ > ; > > DOT: '.'; > > FLOAT > : ('0'..'9')+ '.' ('0'..'9')* EXPONENT? > | '.' ('0'..'9')+ EXPONENT? > | ('0'..'9')+ EXPONENT > ; > > WS : ( ' ' > | '\t' > | '\r' > | '\n' > ) {$channel=HIDDEN;} > ; > > fragment > EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ; From ron.hunter-duvar at oracle.com Fri Jan 29 20:51:40 2010 From: ron.hunter-duvar at oracle.com (Ron Hunter-Duvar) Date: Fri, 29 Jan 2010 21:51:40 -0700 Subject: [antlr-interest] ANTLR running out of memory during generation Message-ID: <4B63BADC.7000005@oracle.com> I'm having a strange problem with ANTLR. I'm building a grammar for a language with a huge number (hundreds) of non-reserved keywords. I'm using the approach of having the lexer return a different token type for each keyword, and then having a parser rule of the form: id : ( ID | QUOTED_ID | KW_A | KW_B | ... | KW_ZZZ ); This was working great until today. In fact, ANTLR 3.2 generates surprisingly clever code for this - all the keywords are assigned consecutive token numbers, and generated code just says: if ( (input.LA(1)>=KW_A && input.LA(1)<=KW_ZZZ)||(input.LA(1)>=ID && input.LA(1)<=QUOTED_ID) ) { input.consume(); ... This works all the way up to 631 keywords. ANTLR runs in about 20 seconds, and never uses more than 269MB of memory. When I add a 632nd keyword (doesn't matter what the keyword is), and change nothing else, ANTLR runs for 2 minutes and runs out of heap space. I kept bumping the max space up, but even going to 2GB doesn't make any difference. What's really interesting is that I was using ANTLR 3.1 until now. When I ran into this I upgraded to 3.2, but both of them fail at exactly the same spot, 632 keywords. Not surprisingly, the stack trace varies from one run to the next, depending on the exact point it runs out of memory, but it always has deeply nested calls to these and other methods: org.antlr.stringtemplate.language.ASTExpr.writeTemplate(ASTExpr.java:750) org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:680) org.antlr.stringtemplate.language.ASTExpr.writeAttribute(ASTExpr.java:660) org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluator.java:86) org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149) org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705) I don't know if it makes a difference, but I'm using backtracking (otherwise, this approach to non-reserved keywords doesn't work without a lot of synpreds), and outputting ASTs. Since this is size related, it's hard to narrow it down to a simple example. I could try to duplicate it with just the id rule and nothing else. Any ideas what might be happening here, and whether a fix might be possible? Thanks, Ron -- Ron Hunter-Duvar | Software Developer V | 403-272-6580 Oracle Service Engineering Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5 All opinions expressed here are mine, and do not necessarily represent those of my employer. From oliver.zeigermann at gmail.com Sat Jan 30 03:48:21 2010 From: oliver.zeigermann at gmail.com (Oliver Zeigermann) Date: Sat, 30 Jan 2010 12:48:21 +0100 Subject: [antlr-interest] better error messages in tree parsers In-Reply-To: <604A33CC-D57B-4C48-ADDE-75331132609D@cs.usfca.edu> References: <604A33CC-D57B-4C48-ADDE-75331132609D@cs.usfca.edu> Message-ID: <9da4f4521001300348t3be6ac97t44963f4b351d423e@mail.gmail.com> As input.LT seems to return null values in case we are at the very start/end of the node stream, I added this check which does the job for me input.LT(-3) == null ? "" : ((Tree)input.LT(-3)).getText()+" "+ input.LT(-2) == null ? "" : ((Tree)input.LT(-2)).getText()+" "+ input.LT(-1) == null ? "" : ((Tree)input.LT(-1)).getText()+" >>>"+ input.LT(1) == null ? "" : ((Tree)input.LT(1)).getText()+"<<< "+ input.LT(2) == null ? "" : ((Tree)input.LT(2)).getText()+" "+ input.LT(3) == null ? "" : ((Tree)input.LT(3)).getText(); 2010/1/26 Terence Parr : > Hi, a reminder that debugging tree grammars can be a bitch. ?I like to override standard messaging to spew lots of stuff. ?E.g., i like this kind of thing: > > ASTVerifier.g: node from after line 150:17 [grammarSpec, rules, rule, altListAsBlock, altList, alternative, elements, element, ebnf, block, altList, alternative] ?no viable alt; token=[@-1,0:0='ALT',<84>,0:-1] (decision=24 state 3) decision=<<>> > context=...DOWN BLOCK DOWN >>>ALT<<< DOWN DOC_COMMENT... > > Here's my code: > > ? ?public String getErrorMessage(RecognitionException e, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?String[] tokenNames) > ? ?{ > ? ? ? ?List stack = getRuleInvocationStack(e, this.getClass().getName()); > ? ? ? ?String msg = null; > ? ? ? ?String inputContext = > ? ? ? ? ? ?((Tree)input.LT(-3)).getText()+" "+ > ? ? ? ? ? ?((Tree)input.LT(-2)).getText()+" "+ > ? ? ? ? ? ?((Tree)input.LT(-1)).getText()+" >>>"+ > ? ? ? ? ? ?((Tree)input.LT(1)).getText()+"<<< "+ > ? ? ? ? ? ?((Tree)input.LT(2)).getText()+" "+ > ? ? ? ? ? ?((Tree)input.LT(3)).getText(); > ? ? ? ?if ( e instanceof NoViableAltException ) { > ? ? ? ? ? NoViableAltException nvae = (NoViableAltException)e; > ? ? ? ? ? msg = " no viable alt; token="+e.token+ > ? ? ? ? ? ? ?" (decision="+nvae.decisionNumber+ > ? ? ? ? ? ? ?" state "+nvae.stateNumber+")"+ > ? ? ? ? ? ? ?" decision=<<"+nvae.grammarDecisionDescription+">>"; > ? ? ? ?} > ? ? ? ?else { > ? ? ? ? ? msg = super.getErrorMessage(e, tokenNames); > ? ? ? ?} > ? ? ? ?return stack+" "+msg+" context=..."+inputContext+"..."; > ? ?} > ? ?public String getTokenErrorDisplay(Token t) { > ? ? ? ?return t.toString(); > ? ?} > > Ter > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > From scott.oakes63 at googlemail.com Sat Jan 30 05:30:50 2010 From: scott.oakes63 at googlemail.com (Scott Oakes) Date: Sat, 30 Jan 2010 13:30:50 +0000 Subject: [antlr-interest] Lexer for floating point numbers + field access syntax with '.' In-Reply-To: References: <6e75196e1001291030w43480359xc73f9e04d3c5225c@mail.gmail.com> Message-ID: <6e75196e1001300530o40b7a224l2b01c6a4eaeedb39@mail.gmail.com> > On Fri, Jan 29, 2010 at 6:37 PM, Jim Idle wrote: > Yes, you need to follow the method in the example - what you are trying to do will not work until you left factor it. OK, I've attempted to merge the INT, DOT and FLOAT rules together and manually set the token types at various branch points in the rules. I'm still not having much luck with it, I'm afraid, but here's my grammar to date: grammar test; fragment INT:; fragment DOT:; top: expr EOF; expr: (INT | FLOAT | ID | '(' expr ')') (DOT ID)*; ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ; FLOAT : ('0'..'9')+ ( {input.LA(2) >= '0' && input.LA(2) <= '9'}?=> '.' ('0'..'9')+ EXPONENT? {$type = FLOAT;} | {$type = INT;} ( '.' {$type = DOT;} ) ) | '.' {$type = DOT;} ; WS : ( ' ' | '\t' | '\r' | '\n' ) {$channel=HIDDEN;} ; fragment EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ; From jimi at temporal-wave.com Sat Jan 30 11:42:12 2010 From: jimi at temporal-wave.com (Jim Idle) Date: Sat, 30 Jan 2010 11:42:12 -0800 Subject: [antlr-interest] ANTLR running out of memory during generation In-Reply-To: <4B63BADC.7000005@oracle.com> Message-ID: <7277098525d5fb4685c662b1fba4f4e2@temporal-wave.com> Ron, First you really need to switch off backtracking unless the objective of your parser is to analyze SQL (you gave it away when you mentioned 632 keywords that can be identifiers). There are not as many predicates required as you think so long as you left factor everything. Your tokens should be consecutive so long as you list them that way in the lexer. The problem might well be that although SQL sort of allows all keywords to be identifiers, it does not allow all because some of them would be to ambiguous even for a syntax directed hand crafted parser. If you turn on backtracking then try to allow one of these reserved words to be an identifier, then you will probably mask the issue because all warnings and errors are turned off. It is entirely feasible to create a full SQL parser without backtracking, very little look ahead and few predicates (all of the one or two token lookahead type). I have an online demo of T-SQL for instance on my web site at www.temporal-wave.com (select 'online demos' link), and Oracle SQL/PLSQL will be up there before long too. So, I think you will need to do the following to have a chance of generating the code: 1) Use -Xconversiontimeout 10000 2) Cause switches to be generated rather than ifs: -Xmaxswitchcaselabels 32000 -Xminswitchalts 1-xmaxinlineddfastates 65534 3) Use -Xmx2G when invoking the java command (assuming your jvm allows that) But if you cannot get it going that way, then basically you are masking a bigger problem in your grammar that you are not seeing because of global backtracking. Jim > -----Original Message----- > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- > bounces at antlr.org] On Behalf Of Ron Hunter-Duvar > Sent: Friday, January 29, 2010 8:52 PM > To: antlr-interest at antlr.org > Subject: [antlr-interest] ANTLR running out of memory during generation > > I'm having a strange problem with ANTLR. I'm building a grammar for a > language with a huge number (hundreds) of non-reserved keywords. I'm > using the approach of having the lexer return a different token type > for > each keyword, and then having a parser rule of the form: > > id : ( ID | QUOTED_ID | KW_A | KW_B | ... | KW_ZZZ ); > > This was working great until today. In fact, ANTLR 3.2 generates > surprisingly clever code for this - all the keywords are assigned > consecutive token numbers, and generated code just says: > > if ( (input.LA(1)>=KW_A && input.LA(1)<=KW_ZZZ)||(input.LA(1)>=ID > && > input.LA(1)<=QUOTED_ID) ) { > input.consume(); > ... > > This works all the way up to 631 keywords. ANTLR runs in about 20 > seconds, and never uses more than 269MB of memory. When I add a 632nd > keyword (doesn't matter what the keyword is), and change nothing else, > ANTLR runs for 2 minutes and runs out of heap space. I kept bumping the > max space up, but even going to 2GB doesn't make any difference. > > What's really interesting is that I was using ANTLR 3.1 until now. When > I ran into this I upgraded to 3.2, but both of them fail at exactly the > same spot, 632 keywords. Not surprisingly, the stack trace varies from > one run to the next, depending on the exact point it runs out of > memory, > but it always has deeply nested calls to these and other methods: > > > org.antlr.stringtemplate.language.ASTExpr.writeTemplate(ASTExpr.java:75 > 0) > org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:680) > > org.antlr.stringtemplate.language.ASTExpr.writeAttribute(ASTExpr.java:6 > 60) > > org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluato > r.java:86) > org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149) > > org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705) > > I don't know if it makes a difference, but I'm using backtracking > (otherwise, this approach to non-reserved keywords doesn't work without > a lot of synpreds), and outputting ASTs. > > Since this is size related, it's hard to narrow it down to a simple > example. I could try to duplicate it with just the id rule and nothing > else. > > Any ideas what might be happening here, and whether a fix might be > possible? > > Thanks, > Ron > > -- > Ron Hunter-Duvar | Software Developer V | 403-272-6580 > Oracle Service Engineering > Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5 > > All opinions expressed here are mine, and do not necessarily represent > those of my employer. > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- > email-address From parrt at cs.usfca.edu Sat Jan 30 11:48:18 2010 From: parrt at cs.usfca.edu (Terence Parr) Date: Sat, 30 Jan 2010 11:48:18 -0800 Subject: [antlr-interest] better error messages in tree parsers In-Reply-To: <9da4f4521001300348t3be6ac97t44963f4b351d423e@mail.gmail.com> References: <604A33CC-D57B-4C48-ADDE-75331132609D@cs.usfca.edu> <9da4f4521001300348t3be6ac97t44963f4b351d423e@mail.gmail.com> Message-ID: On Jan 30, 2010, at 3:48 AM, Oliver Zeigermann wrote: > As input.LT seems to return null values in case we are at the very > start/end of the node stream, I added this check which does the job > for me > > input.LT(-3) == null ? "" : ((Tree)input.LT(-3)).getText()+" "+ > input.LT(-2) == null ? "" : ((Tree)input.LT(-2)).getText()+" "+ > input.LT(-1) == null ? "" : ((Tree)input.LT(-1)).getText()+" >>>"+ > input.LT(1) == null ? "" : ((Tree)input.LT(1)).getText()+"<<< "+ > input.LT(2) == null ? "" : ((Tree)input.LT(2)).getText()+" "+ > input.LT(3) == null ? "" : ((Tree)input.LT(3)).getText(); oh. right. start is a problem. end is EOF so no problem. can u update the faq too? ;) Ter From duygu_the_duygu at yahoo.com Tue Jan 19 13:20:57 2010 From: duygu_the_duygu at yahoo.com (Duygu Altinok) Date: Tue, 19 Jan 2010 13:20:57 -0800 (PST) Subject: [antlr-interest] infinite recursion on tree parser Message-ID: <611145.64998.qm@web46001.mail.sp1.yahoo.com> I'm writing a C -like language compiler for my language processors course . I defined a rule compound_expr which represents nested blocks closed within curly braces. Compilation of the parser is fine but tree parser gives an error , can anybody please help?? f3.g:1014:10: infinite recursion to rule statement from rule statement_list f3.g:1030:20: infinite recursion to rule statement from rule ca f3.g:1029:26: infinite recursion to rule statement from rule compound_expr f3.g:1023:13: infinite recursion to rule statement from rule statement f3.g:1014:10: infinite recursion to rule statement from rule statement_list f3.g:1014:10: infinite recursion to rule statement from rule statement_list f3.g:1014:10: infinite recursion to rule statement from rule statement_list f3.g:1014: warning:nondeterminism upon f3.g:1014: k==1:NULL_TREE_LOOKAHEAD,NUM,ID,PLUS,MINUS,MULT,DIV,LT,LEQ,EQ,NEQ,ISEQ,OUTPUTT,INPUTT,PARANTEZLISIN,IF,"return","while" f3.g:1014: between alt 1 and exit branch of block Here's my code for compound_expr in the parser and the tree parser : parser : program: function_list { #program= #([PROGRAM,"program"],symbol_table, program); } ; function_list: { is_in_function_list = true; } (function)+ { #function_list= #([FUNCTION_LIST, "function_list"], function_list); } ; function: { String bt; } bt=basic_type! "func"! i:ID! { String identifier = i.getText(); if (identifier.length() > 32) { error(WARN00, i.getLine(), i.getColumn()); identifier = identifier.substring(0, 32); } which_function = new String(identifier); identifier=identifier + ":" + Integer.toString(i.getLine()) + ":" + Integer.toString(i.getColumn()); } parameter_list! function_body { symbol_table.addChild(#([SYMBOL_FUNCTION, identifier ], [SYMBOL_TYPE, bt] , symbol_parameters, symbol_locals )); #function=(#([ID,identifier],function)); } ; function_body: LCURLY declaration_list! statement_list RCURLY ; declaration_list: { if (is_in_function_list) symbol_locals = (CommonAST) astFactory.create(SYMBOL_LOCALS, "symbol_locals"); } (declaration! SEMI!)* ; declaration: { String t = new String(""); String t2 = new String(""); } t=basic_type i:ID t2=array_extension { String identifier = i.getText(); if (identifier.length() > 32) { error(WARN00, i.getLine(), i.getColumn()); identifier = identifier.substring(0, 32); } t += t2; identifier=identifier + ":" + Integer.toString(i.getLine()) + ":" + Integer.toString(i.getColumn()); if (is_in_function_list && is_in_parameter) symbol_parameters.addChild(#([SYMBOL_PARAMETER, identifier], [SYMBOL_TYPE, t], [SYMBOL_FUNCTION_SCOPE, which_function])); else if (is_in_function_list && ! is_in_parameter) symbol_locals.addChild(#([SYMBOL_LOCAL, identifier], [SYMBOL_TYPE, t], [SYMBOL_FUNCTION_SCOPE, which_function])); } ; paramdecl : { String t = new String(""); String t2 = new String(""); } t=basic_type i:ID t2=array_extension { String identifier = i.getText(); if (identifier.length() > 32) { error(WARN00, i.getLine(), i.getColumn()); identifier = identifier.substring(0, 32); } t += t2; identifier=identifier + ":" + Integer.toString(i.getLine()) + ":" + Integer.toString(i.getColumn()); if (is_in_function_list && is_in_parameter) symbol_parameters.addChild(#([SYMBOL_PARAMETER, identifier], [SYMBOL_TYPE, t], [SYMBOL_FUNCTION_SCOPE, which_function])); else if (is_in_function_list && ! is_in_parameter) symbol_locals.addChild(#([SYMBOL_LOCAL, identifier], [SYMBOL_TYPE, t], [SYMBOL_FUNCTION_SCOPE, which_function])); } ; parameter_list: { symbol_parameters = (CommonAST) astFactory.create(SYMBOL_PARAMETERS, "symbol_parameters"); is_in_parameter = true; } LPAREN ( variable_list {is_in_parameter = false;} | {is_in_parameter = false;} ) RPAREN ; variable_list : LPAREN (paramdecl (COMMA paramdecl)*)? RPAREN ; type returns [String t] { String bt=new String(); String ar=new String(); t = new String(); } : bt=basic_type ar=array_extension { t = bt + ar; } ; basic_type returns [String bt] { bt = new String(); } : "int" { bt = new String("int"); } |"float" { bt = new String("float"); } ; array_extension returns [String ar] { ar = new String(); } : lbracket:LBRAC{ar = new String("[");} (n:NUM {ar += n.getText();}) RBRAC{ar += "]" ;} | { ar = new String(""); } ; array_extension2 returns [String ar] { ar = new String(); } : lbracket:LBRAC{ar = new String("[");} RBRAC{ar += "]" ;} | { ar = new String(""); } ; statement_list : (statement)+; statement: (assignment_statement)=>assignment_statement | read_statement |return_statement |if_statement |while_statement | compound_expr | print_statement | expression SEMI ; return_statement : "return"^ expression SEMI; assignment_statement: variable EQ^ expression SEMI! ; compound_expr : ca; ca : LCURLY! (statement_list)? RCURLY!; variable: i:ID! (LBRAC expression RBRAC)? { String identifier=i.getText()+":"+i.getLine()+":"+i.getColumn(); #variable = #([ID,identifier], variable); } ; expression: simple_expression ( (LEQ^ |NEQ^|LT^ |ISEQ^) simple_expression)* ; simple_expression: term ((PLUS^|MINUS^) term)* ; term : factor ( (MULT | DIV ) factor)* ; factor : i:ID! (LBRAC expression RBRAC | LPAREN argument_list RPAREN)? { String identifier=i.getText()+":"+i.getLine()+":"+i.getColumn(); #factor = #([ID,identifier], factor); } | j: NUM! { String identifier=j.getText()+":"+j.getLine()+":"+j.getColumn(); #factor = #([NUM,identifier], factor); } | LPAREN! expression RPAREN! { #factor = #([PARANTEZLISIN, "parantezli"], factor);} ; read_statement : read_item EQ! INPUTT^ LPAREN! (STRING)? RPAREN! ; read_item : variable; print_statement : OUTPUTT^ LPAREN! (STRING COMMA)? print_item RPAREN! SEMI!; print_item: variable; function_call : ID LPAREN argument_list RPAREN SEMI; expression_list : expression (COMMA! expression)* ; argument_list : expression_list | ; text_character : CHARLIT | special_character ; special_character : "\n" ; if_statement : if_part then_part else_part { #if_statement = #([IF , "if"], #if_statement); }; if_part : IFF! LPAREN! expression RPAREN! ; then_part : statement; else_part : (ELSE^ statement)? ENDIF! SEMI! ; while_statement : WHILE^ LPAREN! expression RPAREN! statement; This part is OK. It gives no compilation errors. Here's the erronous part: program : #(PROGRAM symbol_table { sTable.sort(); } function_list { sTable.prettyPrint(); } ) ; symbol_table : #(SYMBOL_TABLE ( #(i:SYMBOL_FUNCTION j:SYMBOL_TYPE { //Parse info String identifier; int line, column; String [] params = new String[3]; identifier = i.getText(); params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); Function newFunction=new Function(identifier,j.getText(), line, column); //add Function to Symbol Table int index; if ((index = sTable.searchFunction(newFunction.name)) != -1) { error(ERR01, line, column, ((Function)sTable.functions.elementAt(index)).line, ((Function)sTable.functions.elementAt(index)).column); isFunctionLegal = false; } else { sTable.addFunction(newFunction); int last=sTable.functions.size()-1; currentFunction =(Function) sTable.functions.elementAt(last); isFunctionLegal = true; } } symbol_parameters symbol_locals) )* ) ; symbol_parameters : #(SYMBOL_PARAMETERS ( #(i:SYMBOL_PARAMETER j:SYMBOL_TYPE { //Parse info String identifier; int line, column; String [] params = new String[3]; identifier = i.getText(); params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); if (isFunctionLegal) { int last=sTable.functions.size()-1; Symbol newSymbol = new Symbol(identifier,j.getText(),line,column); if (currentFunction.searchParameter(newSymbol.name) != -1){ error(FuncErr+currentFunction.name,line,column); error(ERR03, line, column); } else currentFunction.addParameter(newSymbol); } } SYMBOL_FUNCTION_SCOPE))* ) ; symbol_locals : #(SYMBOL_LOCALS (#(i:SYMBOL_LOCAL j:SYMBOL_TYPE { //Parse info String identifier; int line, column; String [] params = new String[3]; identifier = i.getText(); params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); if (isFunctionLegal) { int index; int last=sTable.functions.size()-1; Symbol newSymbol = new Symbol(identifier,j.getText(),line, column); if ((index = currentFunction.searchParameter(newSymbol.name)) != -1){ error(FuncErr+currentFunction.name,line,column); error(ERR04, line, column, ((Symbol)currentFunction.parameters.elementAt(index)).line, ((Symbol)currentFunction.parameters.elementAt(index)).column); } else if ((index = currentFunction.searchLocal(newSymbol.name)) != -1){ error(FuncErr+currentFunction.name,line,column); error(ERR05, line, column, ((Symbol)currentFunction.locals.elementAt(index)).line, ((Symbol)currentFunction.locals.elementAt(index)).column); } else if(j.getText().indexOf("[]")==-1) { currentFunction.addLocal(newSymbol); } else error(ERR06, line, column); } } SYMBOL_FUNCTION_SCOPE))*) ; function_list : #(FUNCTION_LIST (function)+) ; function : #(i:ID { //Parse info String identifier; int line, column; String [] params = new String[3]; identifier = i.getText(); params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); int index; int line2=0,column2=0; //line and column info from the symbol table index = sTable.getFunctionIndex(identifier); if(index != -1) { line2=((Function)sTable.functions.elementAt(index)).line; column2=((Function)sTable.functions.elementAt(index)).column; } if(index!=-1 && line==line2 && column==column2) { isFunctionLegal=true; currentFunction = (Function) sTable.functions.elementAt(index); } else isFunctionLegal=false; } function_body) ; function_body: statement_list ; statement_list: (statement)+ ; statement: (assignment_statement)=>assignment_statement | read_statement |return_statement |if_statement |while_statement | compound_expr | print_statement | expression SEMI ; compound_expr : ca; ca : (statement_list)? ; assignment_statement { Symbol retType ; String type = new String(""); int index; String exType = new String(""); } : #(EQ retType=variable exType=expression) { if(isFunctionLegal) { if (exType.startsWith("float")) { index=currentFunction.getParameterIndex(retType.name); if(index!=-1) { type=((Symbol)currentFunction.parameters.elementAt(index)).type; } else{ index = currentFunction.getLocalIndex(retType.name); if(index!=-1){ type=((Symbol)currentFunction.locals.elementAt(index)).type; } } if (type.startsWith(new String("int"))) { error(FuncErr+currentFunction.name,retType.line,retType.column); error(ERR12,retType.line,retType.column); } } } } ; return_statement { String exType; String type; } : #("return" exType=expression { if(isFunctionLegal) { type=currentFunction.returntype; if(exType.startsWith("float") && type.startsWith("int")) { String identifier; int line, column; String [] params = new String[3]; identifier = exType; params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); error(FuncErr+currentFunction.name,line,column); error(ERR13,line,column); } } } ) ; print_statement : #(OUTPUTT print_item) ; print_item : variable ; read_statement : #(INPUTT read_item) ; read_item { Symbol retType; }: retType=variable { if(isFunctionLegal) { int index; String type=new String(""); int line,column; index=currentFunction.getParameterIndex(retType.name); if(index!=-1) { type=((Symbol)currentFunction.parameters.elementAt(index)).type; } else{ index = currentFunction.getLocalIndex(retType.name); if(index!=-1){ type=((Symbol)currentFunction.locals.elementAt(index)).type; } else { index=sTable.getFunctionIndex(retType.name); if(index!=-1) { line=((Function)sTable.functions.elementAt(index)).line; column=((Function)sTable.functions.elementAt(index)).column; error(FuncErr+currentFunction.name,retType.line,retType.column); error(ERR07,retType.line,retType.column,line,column); } } } if ( !(type.startsWith(new String("int"))) && !(type.equals(""))) { error(FuncErr+currentFunction.name,retType.line,retType.column); error(ERR09,retType.line,retType.column); } } } ; if_statement: #(IF if_part then_part else_part) ; if_part: expression ; then_part : #("then" statement) ; else_part : #("else" statement) | ; while_statement : #("while" expression statement) ; variable returns [Symbol v]{ v =new Symbol(new String(""),new String(""),0,0); String exType; boolean isArray=false; } : #(i:ID (LBRAC exType=expression RBRAC { if(isFunctionLegal) { if(exType.startsWith(new String("float"))) { //Parse info String identifier; int line, column; String [] params = new String[3]; identifier = exType; params = identifier.split(":"); identifier = params[0]; line= Integer.parseInt(params[1]); column= Integer.parseInt(params[2]); error(FuncErr+currentFunction.name,line,column); error(ERR11,line,column); } isArray=true; } } )? ) { if(isFunctionLegal) { //Parse info String identifier; int line, column; String [] params = new String[3]; identifier = i.getText(); params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); v = new Symbol(identifier,"", line, column); int index; String type=new String(""); index=currentFunction.getParameterIndex(identifier); if(index!=-1) { type=((Symbol)currentFunction.parameters.elementAt(index)).type; } else{ index = currentFunction.getLocalIndex(identifier); if(index!=-1){ type=((Symbol)currentFunction.locals.elementAt(index)).type; } } if(!type.equals("") && type.indexOf("[")!=-1 && !isArray) { error(FuncErr+currentFunction.name,line,column); error(ERR15,line,column); } if(!type.equals("") && type.indexOf("[")==-1 && isArray) { error(FuncErr+currentFunction.name,line,column); error(ERR16,line,column); } if(isArray){ isArray=false; } } } ; expression returns [String exType] { String sType; exType=new String(""); } : (#(ISEQ expression simple_expression)) => #(ISEQ exType=expression sType=simple_expression) { if(isFunctionLegal) { if (sType.startsWith(new String("float")) && !exType.startsWith(new String("float"))) { exType=sType; } } } | (#(NEQ expression simple_expression)) => #(NEQ exType=expression sType=simple_expression) { if(isFunctionLegal) { if (sType.startsWith(new String("float")) && !exType.startsWith(new String("float"))) { exType=sType; } } } |( #(LT expression simple_expression) ) => #(LT exType=expression sType=simple_expression) { if(isFunctionLegal) { if (sType.startsWith(new String("float")) && !exType.startsWith(new String("float"))) { exType=sType; } } } |( #(LEQ expression simple_expression) ) => #(LEQ exType=expression sType=simple_expression) { if(isFunctionLegal) { if (sType.startsWith(new String("float")) && !exType.startsWith(new String("float"))) { exType=sType; } } } |exType=simple_expression ; simple_expression returns [String sType] { String tType; sType=new String(); } : (#(PLUS simple_expression term))=>#(PLUS sType=simple_expression tType=term) { if(isFunctionLegal) { if (tType.startsWith(new String("float")) && !sType.startsWith(new String("float"))) { sType=tType; } } } | (#(MINUS simple_expression term))=>#(MINUS sType=simple_expression tType=term) { if(isFunctionLegal) { if (tType.startsWith(new String("float")) && !sType.startsWith(new String("float"))) { sType=tType; } } } | sType=term ; term returns [String tType] { Symbol retType=new Symbol(new String(""),new String(""),0,0); tType=new String(""); } : (#( MULT term factor))=>#( MULT tType=term retType=factor { if(isFunctionLegal) { int index; String type=new String(""); int line,column; //Control whether it is a number or an identifier if(!retType.name.equals("") && new Character(retType.name.charAt(0))<=new Character('9') && new Character(retType.name.charAt(0))>=new Character('0')) { if(retType.name.indexOf('.')!=-1 || retType.name.indexOf('E')!=-1 || retType.name.indexOf('e')!=-1) type="float"; else type="int"; } else{ index=currentFunction.getParameterIndex(retType.name); if(index!=-1) { type=((Symbol)currentFunction.parameters.elementAt(index)).type; } else{ index = currentFunction.getLocalIndex(retType.name); if(index!=-1){ type=((Symbol)currentFunction.locals.elementAt(index)).type; } else { error(FuncErr+currentFunction.name,retType.line,retType.column); error(ERR08,retType.line,retType.column); } } } if(type.startsWith(new String("float")) && !tType.startsWith(new String("float"))) tType=type+new String(":")+new Integer(retType.line)+new String(":")+new Integer(retType.column); } } ) |(#( DIV term factor))=>#( DIV tType=term retType=factor { if(isFunctionLegal) { int index; String type=new String(""); int line,column; //Control whether it is a number or an identifier if(!retType.name.equals("") && new Character(retType.name.charAt(0))<=new Character('9') && new Character(retType.name.charAt(0))>=new Character('0')) { if(retType.name.indexOf('.')!=-1 || retType.name.indexOf('E')!=-1 || retType.name.indexOf('e')!=-1) type="float"; else type="int"; } else{ index=currentFunction.getParameterIndex(retType.name); if(index!=-1) { type=((Symbol)currentFunction.parameters.elementAt(index)).type; } else{ index = currentFunction.getLocalIndex(retType.name); if(index!=-1){ type=((Symbol)currentFunction.locals.elementAt(index)).type; } else { index=sTable.getFunctionIndex(retType.name); if(index!=-1) { type=((Function)sTable.functions.elementAt(index)).returntype; } } } } if(type.startsWith(new String("float")) && !tType.startsWith(new String("float"))) tType=type+new String(":")+new Integer(retType.line)+new String(":")+new Integer(retType.column); } } ) | retType=factor { if(isFunctionLegal) { int index; String type=new String(""); int line,column; //Control whether it is a number or an identifier if(!retType.name.equals("") && new Character(retType.name.charAt(0))<=new Character('9') && new Character(retType.name.charAt(0))>=new Charac ter('0')) { if(retType.name.indexOf('.')!=-1 || retType.name.indexOf('E')!=-1 || retType.name.indexOf('e')!=-1) type="float"; else type="int"; } else{ index=currentFunction.getParameterIndex(retType.name); if(index!=-1) { type=((Symbol)currentFunction.parameters.elementAt(index)).type; } else{ index = currentFunction.getLocalIndex(retType.name); if(index!=-1){ type=((Symbol)currentFunction.locals.elementAt(index)).type; } else { index=sTable.getFunctionIndex(retType.name); if(index!=-1) { type=((Function)sTable.functions.elementAt(index)).returntype; } } } } tType=type+new String(":")+new Integer(retType.line)+new String(":")+new Integer(retType.column); } } ; factor returns [Symbol v] { v=new Symbol(new String(""),new String(""),0,0); String exType; String errorStr; Vector argsVec; boolean isFunction=false; boolean isArray=false; } : (#(ID (LPAREN argument_list RPAREN))) => #(i:ID { if(isFunctionLegal) { //Parse info String identifier; int line, column; String [] params = new String[3]; identifier = i.getText(); params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); v = new Symbol(identifier, "", line, column); } } (LPAREN argsVec=argument_list RPAREN { if(isFunctionLegal) { isFunction=true; String identifier; int line, column; String [] params = new String[3]; identifier = i.getText(); params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); boolean errorVr=false; int index; index=sTable.getFunctionIndex(identifier); if(index!=-1) { currentCalledFunc=(Function)sTable.functions.elementAt(index); isCalledFunctionLegal=true; } else { isCalledFunctionLegal=false; error(FuncErr+currentFunction.name,line,column); error(ERR14,line,column); } if(isCalledFunctionLegal) { if(argsVec.size()!= currentCalledFunc.parameters.size()) errorVr=true; else for(int a=0;a#(j:ID { if(isFunctionLegal) { //Parse info String identifier; int line, column; String [] params = new String[3]; identifier = j.getText(); params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); v = new Symbol(identifier, "", line, column); } }(LBRAC exType=expression RBRAC { if(isFunctionLegal) { isArray=true; if(exType.startsWith(new String("float"))) { //Parse info String identifier; int line, column; String [] params = new String[3]; identifier = exType; params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); error(FuncErr+currentFunction.name,line,column); error(ERR11,line,column); } } } )? { if(isFunctionLegal) { String identifier; int line, column; String [] params = new String[3]; identifier = j.getText(); params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); int index; String type=new String(""); index=currentFunction.getParameterIndex(identifier); if(index!=-1) { type=((Symbol)currentFunction.parameters.elementAt(index)).type; } else{ index = currentFunction.getLocalIndex(identifier); if(index!=-1){ type=((Symbol)currentFunction.locals.elementAt(index)).type; } else { error(FuncErr+currentFunction.name,line,column); error(ERR08,line,column); } } if(!type.equals("") && type.indexOf("[")!=-1 && !isArray) { error(FuncErr+currentFunction.name,line,column); error(ERR15,line,column); } if(!type.equals("") && type.indexOf("[")==-1 && isArray) { error(FuncErr+currentFunction.name,line,column); error(ERR16,line,column); } if(isArray) isArray=false; } } ) | #(PARANTEZLISIN exType=expression { if(isFunctionLegal) { String identifier; int line, column; String [] params = new String[3]; identifier = exType; params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); v = new Symbol(identifier, "", line, column); } } ) ; expression_list returns [Vector args] { args=new Vector(); String argType; } : (argType=expression { if(isFunctionLegal) { String identifier; int line, column; String [] params = new String[3]; identifier = argType; params = identifier.split(":"); identifier = params[0]; line = Integer.parseInt(params[1]); column = Integer.parseInt(params[2]); args.add(identifier); } } )+ ; argument_list returns [Vector args] { args=new Vector(); } : args=expression_list | ; From ron.hunter-duvar at oracle.com Sat Jan 30 21:18:27 2010 From: ron.hunter-duvar at oracle.com (Ron Hunter-Duvar) Date: Sat, 30 Jan 2010 22:18:27 -0700 Subject: [antlr-interest] ANTLR running out of memory during generation In-Reply-To: <7277098525d5fb4685c662b1fba4f4e2@temporal-wave.com> References: <7277098525d5fb4685c662b1fba4f4e2@temporal-wave.com> Message-ID: <4B6512A3.9020304@oracle.com> Jim, Thanks for the response. Yeah, the target language is kind of obvious isn't it? What else could have that many keywords? I might try turning off backtracking later on and see what all I have to fix. Right now it's turning out to be a lot easier, and hasn't created any performance problems. Also, I'm not concerned with rejecting invalid code, only with successfully parsing all valid code, which simplifies things. But the problem I'm having doesn't relate to any specific keyword. I even try inserting garbage keywords, with the same result. To me, the fact that it runs perfectly fine (and fast) with 631, and apparently hits some endless loop/recursion at 632 that makes it run 10x longer and run out of memory indicates a bug or implementation limitation. The fact that 3.1 and 3.2 behave exactly the same way indicates it's code that hasn't changed in the latest release. Unfortunately, I don't know enough of ANTLR's internals to be able to track it down, and don't have the time now to learn what I need to. I have run it with 2G heap space. I bumped it up from 512M to 1G then 2G, and all it accomplished was to make it run a few seconds longer before running out of memory. A clear symptom of endless loop/recursion. There shouldn't be anything I can do in my grammar that would cause ANTLR to act this way. I'll try those switches and see if they help. For the moment I've been able to side step the problem by cutting it down to the set of keywords for currently implemented parts of the language, bringing it down to about 150 (I had started with the full keyword list that's available, and then kept adding all the omissions from that list, of which there are many). But ultimately I'll have to find a way to deal with it. I'm hoping maybe Terry will have a bug fix for me before that 8^). Ron Jim Idle wrote: > Ron, > > First you really need to switch off backtracking unless the objective of your parser is to analyze SQL (you gave it away when you mentioned 632 keywords that can be identifiers). There are not as many predicates required as you think so long as you left factor everything. > > Your tokens should be consecutive so long as you list them that way in the lexer. > > The problem might well be that although SQL sort of allows all keywords to be identifiers, it does not allow all because some of them would be to ambiguous even for a syntax directed hand crafted parser. If you turn on backtracking then try to allow one of these reserved words to be an identifier, then you will probably mask the issue because all warnings and errors are turned off. > > It is entirely feasible to create a full SQL parser without backtracking, very little look ahead and few predicates (all of the one or two token lookahead type). I have an online demo of T-SQL for instance on my web site at www.temporal-wave.com (select 'online demos' link), and Oracle SQL/PLSQL will be up there before long too. > > So, I think you will need to do the following to have a chance of generating the code: > > 1) Use -Xconversiontimeout 10000 > 2) Cause switches to be generated rather than ifs: -Xmaxswitchcaselabels 32000 -Xminswitchalts 1-xmaxinlineddfastates 65534 > 3) Use -Xmx2G when invoking the java command (assuming your jvm allows that) > > But if you cannot get it going that way, then basically you are masking a bigger problem in your grammar that you are not seeing because of global backtracking. > > Jim > > >> -----Original Message----- >> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest- >> bounces at antlr.org] On Behalf Of Ron Hunter-Duvar >> Sent: Friday, January 29, 2010 8:52 PM >> To: antlr-interest at antlr.org >> Subject: [antlr-interest] ANTLR running out of memory during generation >> >> I'm having a strange problem with ANTLR. I'm building a grammar for a >> language with a huge number (hundreds) of non-reserved keywords. I'm >> using the approach of having the lexer return a different token type >> for >> each keyword, and then having a parser rule of the form: >> >> id : ( ID | QUOTED_ID | KW_A | KW_B | ... | KW_ZZZ ); >> >> This was working great until today. In fact, ANTLR 3.2 generates >> surprisingly clever code for this - all the keywords are assigned >> consecutive token numbers, and generated code just says: >> >> if ( (input.LA(1)>=KW_A && input.LA(1)<=KW_ZZZ)||(input.LA(1)>=ID >> && >> input.LA(1)<=QUOTED_ID) ) { >> input.consume(); >> ... >> >> This works all the way up to 631 keywords. ANTLR runs in about 20 >> seconds, and never uses more than 269MB of memory. When I add a 632nd >> keyword (doesn't matter what the keyword is), and change nothing else, >> ANTLR runs for 2 minutes and runs out of heap space. I kept bumping the >> max space up, but even going to 2GB doesn't make any difference. >> >> What's really interesting is that I was using ANTLR 3.1 until now. When >> I ran into this I upgraded to 3.2, but both of them fail at exactly the >> same spot, 632 keywords. Not surprisingly, the stack trace varies from >> one run to the next, depending on the exact point it runs out of >> memory, >> but it always has deeply nested calls to these and other methods: >> >> >> org.antlr.stringtemplate.language.ASTExpr.writeTemplate(ASTExpr.java:75 >> 0) >> org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:680) >> >> org.antlr.stringtemplate.language.ASTExpr.writeAttribute(ASTExpr.java:6 >> 60) >> >> org.antlr.stringtemplate.language.ActionEvaluator.action(ActionEvaluato >> r.java:86) >> org.antlr.stringtemplate.language.ASTExpr.write(ASTExpr.java:149) >> >> org.antlr.stringtemplate.StringTemplate.write(StringTemplate.java:705) >> >> I don't know if it makes a difference, but I'm using backtracking >> (otherwise, this approach to non-reserved keywords doesn't work without >> a lot of synpreds), and outputting ASTs. >> >> Since this is size related, it's hard to narrow it down to a simple >> example. I could try to duplicate it with just the id rule and nothing >> else. >> >> Any ideas what might be happening here, and whether a fix might be >> possible? >> >> Thanks, >> Ron >> >> -- >> Ron Hunter-Duvar | Software Developer V | 403-272-6580 >> Oracle Service Engineering >> Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5 >> >> All opinions expressed here are mine, and do not necessarily represent >> those of my employer. >> >> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest >> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your- >> email-address >> > > > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address > > -- Ron Hunter-Duvar | Software Developer V | 403-272-6580 Oracle Service Engineering Gulf Canada Square 401 - 9th Avenue S.W., Calgary, AB, Canada T2P 3C5 All opinions expressed here are mine, and do not necessarily represent those of my employer. From khamenya at gmail.com Sun Jan 31 14:46:15 2010 From: khamenya at gmail.com (Valery Khamenya) Date: Sun, 31 Jan 2010 23:46:15 +0100 Subject: [antlr-interest] "prog : .+ ; " ==> "no viable alternative at character" (antlr-3.1.2) In-Reply-To: <84fecab1001311440r7aa627e2t9318591653225c42@mail.gmail.com> References: <84fecab1001311440r7aa627e2t9318591653225c42@mail.gmail.com> Message-ID: <84fecab1001311446l74172614v2c51a31b3c2284bb@mail.gmail.com> Hi, what's wrong with the following trivial lexer grammar? grammar Grammar; options { language=Python; output=AST; ASTLabelType=CommonTree; } prog : .+ ; I am getting "no viable alternative at character ..." at every character of input stream. antlr-3.1.2 Of course I don't really need a 1-char chopping lexer. It is just a relevant extraction from a real case grammar. Comments and hints are welcome! Best regards -- Valery From kirby.bohling at gmail.com Sun Jan 31 15:44:38 2010 From: kirby.bohling at gmail.com (Kirby Bohling) Date: Sun, 31 Jan 2010 17:44:38 -0600 Subject: [antlr-interest] "prog : .+ ; " ==> "no viable alternative at character" (antlr-3.1.2) In-Reply-To: <84fecab1001311446l74172614v2c51a31b3c2284bb@mail.gmail.com> References: <84fecab1001311440r7aa627e2t9318591653225c42@mail.gmail.com> <84fecab1001311446l74172614v2c51a31b3c2284bb@mail.gmail.com> Message-ID: <3cac8fdf1001311544x1a3f9bceyb6891c7e30c66b16@mail.gmail.com> On Sun, Jan 31, 2010 at 4:46 PM, Valery Khamenya wrote: > Hi, > > what's wrong with the following trivial lexer grammar? > > ?grammar Grammar; > ?options { > language=Python; > ?output=AST; > ASTLabelType=CommonTree; > ?} > ?prog : .+ ?; > > I am getting "no viable alternative at character ..." at every character of > input stream. In this case, I'm pretty sure it's because you don't have a lexer rule... Just as an aside, I'm pretty sure this is a combined grammar, as you didn't spec it to be a lexer only. Uppercase prog to PROG, and it should generate exactly one token. You'll probably want to add a parser rule if you make that change otherwise it will lex, but not parse. Kirby > > antlr-3.1.2 > > Of course I don't really need a 1-char chopping lexer. It is just a relevant > extraction from a real case grammar. > > Comments and hints are welcome! > > Best regards > -- > Valery > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address >