[antlr-interest] ANTLR running out of memory while parsing huge files
Jim Idle
jimi at temporal-wave.com
Tue Apr 21 08:11:08 PDT 2009
Nick Vlassopoulos wrote:
> Hi Jim!
>
> Thanks for your replies!!
>
> The input lines are of the form
> "var = data"
> so they are pretty simple!
> If I got this right, you suggest using something like a
> body_set :
> body_start (probably a "greedy" option here?) body_end
> rule and then just add code to parse the intermediate lines (which are
> pretty simple) manually??
Actually, do you need a parser? Perhaps you can do this all in the lexer
and not create tokens for the data but just use the input stream in your
own lexer action code.
But I was thinking this:
1) Copy my input stream code and name it for yourself;
2) Have it respond to LA() using buffered reads until it finds the token
that starts the body, say it is 'BODY', then it returns EOF;
3) Invoke the parser/lexer/inputstream stack and it will set up the
information you need for the incoming data and stop, the input stream
remembers where it was;
4) Process the data using a little custom C code that works with the
input stream until you see the data has ended, tell the input stream
where to restart;
5) Tell the input stream to set up for the next header starting at the
data end location. If it wasn't at real EOF, then go to 3)
6) End
It sounds more complicated written in an email than it will be in the C
code ;-) You can also do the same thing without a custom input stream,
but then you would be reading the entire file and pre-scanning and so on.
If your headers are pretty simple, you might also find that an awk
script or just plain C code is a better method ;-)
Jim
More information about the antlr-interest
mailing list