[antlr-interest] Trying to convert a file-oriented lexer to a line-oriented one

siva.kumar at loglogic.com siva.kumar at loglogic.com
Thu Jun 26 08:19:57 PDT 2008


Hi Gavin,
         Thank you for taking the time to answer this question.

I did try to read in the entire file originally. With really large files, I got this problem:
     % java testParser <MY-HUGE-FILE> > /dev/null Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

This was with the following Java code:
  public static void main(String[] args) throws Exception {
        testLexer lex = new testLexer(new ANTLRFileStream(args[0]));
        CommonTokenStream tokens = new CommonTokenStream(lex);

        testParser parser = new testParser(tokens);

        try {
            parser.line();
        } catch (RecognitionException e)  {
            e.printStackTrace();
        }
  }


Interestingly enough, I didn't get to read even a single line (the entire file was being read before feeding it to the lexer).

The "equivalent" C code was:

  int main(int argc, char * argv[])
  {

    pANTLR3_INPUT_STREAM           input;
    ptestLexer               lex;
    pANTLR3_COMMON_TOKEN_STREAM    tokens;
    ptestParser              parser;

    input  = antlr3AsciiFileStreamNew          ((pANTLR3_UINT8)argv[1]);
    lex    = testLexerNew                (input);
    tokens = antlr3CommonTokenStreamSourceNew  (ANTLR3_SIZE_HINT, TOKENSOURCE(lex));
    parser = testParserNew               (tokens);

    parser  ->logfile(parser);

    // Must manually clean up
    //
    parser ->free(parser);
    tokens ->free(tokens);
    lex    ->free(lex);
    input  ->close(input);

    return(0);
  }


This does read many lines before core-dumping. It appears that the Java and C implementation of the FileStream is somewhat different?


So I was trying to read just one line at a time.

Thanks,

-Siva

________________________________________
From: Gavin Lambert [mailto:antlr at mirality.co.nz] 
Sent: Thursday, June 26, 2008 1:03 AM
To: Siva Kumar (siva.kumar at loglogic.com); antlr-interest at antlr.org
Subject: Re: [antlr-interest] Trying to convert a file-oriented lexer to a line-oriented one

At 09:49 26/06/2008, siva.kumar at loglogic.com wrote:

I've looked for examples of using a lexer in a line-oriented fashion but can't seem to find one.
[...]

But I'm having some trouble figuring out how to call the lexer on each line that I've read from the file.

ANTLR isn't really designed to operate in that manner; it wants to suck up all the input and tokenise it before parsing begins.

There was some discussion on the list a while back (and this wiki page: http://www.antlr.org/wiki/pages/viewpage.action?pageId=7929859) about making it lazy-load input for cases where the input stream is long-running (eg. interactive input), but I'm not sure how far it got.


What I am essentially trying to do is:
 
            Open a file 
            While lines exist
                        Get a line
                        Call the Parser

Why do you need to do this (especially if it's already in a file, and not using a slow/incomplete connection)?  Why can't you parse the entire file in one go?  Even if each line represents a separate expression or command or something, you can just make the top-level rule accept an arbitrary number of these separated by newlines.


More information about the antlr-interest mailing list