[antlr-interest] Bounding the token stream in the C backend

Thu Feb 25 20:38:23 PST 2010

Even though empty alts are not the OPs problem...this seems like something which antlr could preempt with a stern warning at grammar generation time.   Or is there ever a legitimate reason to let it pass?  
Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: "Jim Idle" <jimi at temporal-wave.com>
Date: Thu, 25 Feb 2010 07:40:30 
Cc: antlr-interest at antlr.org<antlr-interest at antlr.org>
Subject: Re: [antlr-interest] Bounding the token stream in the C backend

The problem is your lexer (almost 100%). Look for a rule that has an empty alt. This rule will match forever and consume no input:

FRED : ;

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Nick Vlassopoulos
> Sent: Thursday, February 25, 2010 7:31 AM
> To: Christopher L Conway
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Bounding the token stream in the C
> backend
> 
> Hi Christopher,
> 
> I am not entirely sure, but you may have run into the same problem as I
> did
> a
> while ago. You may want to have a look at the discussion thread back
> then
> for
> some advices:
> http://www.antlr.org/pipermail/antlr-interest/2009-April/034125.html
> Although I used the simple solution Jim suggested, i.e. parsed the
> headers and just used some custom code to parse the rest of the file,
> some of the advices in that thread might be helpful.
> 
> Hope this helps,
> 
> Nikos
> 
> 
> On Thu, Feb 25, 2010 at 6:09 AM, Christopher L Conway
> <cconway at cs.nyu.edu>wrote:
> 
> > I've got a large input file (~39MB) that I'm attempting to parse with
> > an ANTLR3-generated C parser. The parser is using a huge amount of
> > memory (~3.7GB) and seems to start thrashing without making much
> > progress towards termination. I found a thread from earlier this
> month
> > (http://markmail.org/message/jfngdd2ci6h7qrbo) suggesting the most
> > likely cause of such behavior is a parser bug, but I've stepped
> > through the code and it seems to be lexing just fine. Rather, it
> seems
> > the problem is that fillBuffer() is tokenizing the whole file in one
> > go; then, the parsing rules slow to a crawl because the token buffer
> > is sitting on all the memory.
> >
> > I wonder if there is a way to change fillBuffer()'s behavior, so that
> > it will only lex some bounded number of tokens before allowing
> parsing
> > to proceed?
> >
> > Thanks,
> > Chris
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
> >
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address