[antlr-interest] Parsing Large Files

Thu Apr 1 08:26:30 PDT 2010

Actually, I think that if you use UnbufferedTokenStream(), that this will pretty much do what you want already, but it is easy to derive from one of the token streams, and add methods tah can discard buffered tokens once you know you have dealt with them.

Also, if you have comma separated files, then it is usually easier to use awk. Finally, your grammar has myriad lexical ambiguities and I am afraid it is not going to work as you have written it. You cannot have more than one lexer rule that matches the same text as the lexer is not syntax directed, it just tokenizes what it sees.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Marcin Rzeznicki
> Sent: Thursday, April 01, 2010 8:02 AM
> To: Kumar, Amitesh
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Parsing Large Files
> 
> On Thu, Apr 1, 2010 at 4:26 PM, Kumar, Amitesh
> <Amitesh.Kumar at standardbank.com> wrote:
> >
> 
> >
> > But my general issue is that not all my data is a simple CSV file
> some
> > will be multi line records. Hence I didn't want to keep a record of
> the
> > tokens.
> > Any ideas . By the way thanks for your reply.
> >
> 
> Hi
> You can easily implement your own TokenStream that is optimized for
> your use case eg. does not try to keep everything in one big array. If
> you explore this possibility, you will quickly discover that it is
> very easy thing to do and test. Hope it helps.
> 
> 
> > Cheers
> > Kumaap0
> 
> 
> --
> Greetings
> Marcin Rzeźnicki
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address