[antlr-interest] Trying to convert a file-oriented lexer to a line-oriented one
siva.kumar at loglogic.com
siva.kumar at loglogic.com
Thu Jun 26 08:19:57 PDT 2008
Hi Gavin,
Thank you for taking the time to answer this question.
I did try to read in the entire file originally. With really large files, I got this problem:
% java testParser <MY-HUGE-FILE> > /dev/null Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
This was with the following Java code:
public static void main(String[] args) throws Exception {
testLexer lex = new testLexer(new ANTLRFileStream(args[0]));
CommonTokenStream tokens = new CommonTokenStream(lex);
testParser parser = new testParser(tokens);
try {
parser.line();
} catch (RecognitionException e) {
e.printStackTrace();
}
}
Interestingly enough, I didn't get to read even a single line (the entire file was being read before feeding it to the lexer).
The "equivalent" C code was:
int main(int argc, char * argv[])
{
pANTLR3_INPUT_STREAM input;
ptestLexer lex;
pANTLR3_COMMON_TOKEN_STREAM tokens;
ptestParser parser;
input = antlr3AsciiFileStreamNew ((pANTLR3_UINT8)argv[1]);
lex = testLexerNew (input);
tokens = antlr3CommonTokenStreamSourceNew (ANTLR3_SIZE_HINT, TOKENSOURCE(lex));
parser = testParserNew (tokens);
parser ->logfile(parser);
// Must manually clean up
//
parser ->free(parser);
tokens ->free(tokens);
lex ->free(lex);
input ->close(input);
return(0);
}
This does read many lines before core-dumping. It appears that the Java and C implementation of the FileStream is somewhat different?
So I was trying to read just one line at a time.
Thanks,
-Siva
________________________________________
From: Gavin Lambert [mailto:antlr at mirality.co.nz]
Sent: Thursday, June 26, 2008 1:03 AM
To: Siva Kumar (siva.kumar at loglogic.com); antlr-interest at antlr.org
Subject: Re: [antlr-interest] Trying to convert a file-oriented lexer to a line-oriented one
At 09:49 26/06/2008, siva.kumar at loglogic.com wrote:
I've looked for examples of using a lexer in a line-oriented fashion but can't seem to find one.
[...]
But I'm having some trouble figuring out how to call the lexer on each line that I've read from the file.
ANTLR isn't really designed to operate in that manner; it wants to suck up all the input and tokenise it before parsing begins.
There was some discussion on the list a while back (and this wiki page: http://www.antlr.org/wiki/pages/viewpage.action?pageId=7929859) about making it lazy-load input for cases where the input stream is long-running (eg. interactive input), but I'm not sure how far it got.
What I am essentially trying to do is:
Open a file
While lines exist
Get a line
Call the Parser
Why do you need to do this (especially if it's already in a file, and not using a slow/incomplete connection)? Why can't you parse the entire file in one go? Even if each line represents a separate expression or command or something, you can just make the top-level rule accept an arbitrary number of these separated by newlines.
More information about the antlr-interest
mailing list