[antlr-interest] parser/lexer invocation: performance/optimization question

Ric Klaren klaren at cs.utwente.nl
Wed Jun 9 02:51:15 PDT 2004


On Tue, Jun 08, 2004 at 10:09:33PM -0000, Margaret Fieland wrote:
>  I have a parser/lexer that is repeatedly invoked to parse a
> succession of strings.
>
> The current implementation is that the application invokes a routine
> that does something like:
>
> string source("Your input text");
> istringstream str(source);
>
> MyLexer lexer(str);
> MyParser parser(lexer);
>  ... initializeASTFactory
>  ... setASTFactory
>
> This routine is invoked literally thousands of times.
>
> I'd like to be able to the setup (above) once in a constructor and
> just invoke the parser multiple times on the strings as I would like
> to avoid the overhead in the initialization.
>
> Is there any way to do this?  Nothing I've tried so far has worked.

First time setup:

// if we don't have a file we use stdin (cin) (or use a dummy stream)
lexer = new Lexer(std::cin);

nodes_factory = new antlr::ASTFactory();
// Create a parser that reads from the scanner
parser = new Parser( *lexer, symbol_information );
// Initialize it with factory and setup the factory and other trivia
parser->setASTNodeFactory( nodes_factory );
// let the parser register the used ast types in the factory
parser->initializeASTFactory( *nodes_factory );
parser->setFilename("stdin");

Then per file/string:

---snip---
ifstream file;

file.open( filename );
if( ! file.is_open() )
{
	cerr << "Error opening file: '" << filename << "'" << endl;
   exit(1);
}
antlr::LexerSharedInputState lex_input = lexer->getInputState();
lex_input->initialize(file, filename);
// and reset parser (antlr) internal state
parser->getInputState()->reset();
parser->setFilename(filename);
// start parsing at the 'start' rule
parser->start();
---snip---

The above voodoo makes sure to reset all internal state of lexer and parser
to defaults for a new file. E.g. resets column/line info, guessing mode
info etc.

The lex_input->initialize only reallocates a CharBuffer which isn't too
expensive. You can use stringstreams in the above.

You could also use a CharInputBuffer if you want to read from plain old
char arrays. If you subclass it you can make a trivial addition to replace
the buffer it is reading from. Say name it MyCharInputBuffer and add this
method:

void setBuffer( unsigned char* buf, size_t size, bool owner = false )
{
	// delete current buf if needed
	if( delete_buffer && buffer )
		delete [] buffer;
	buffer = buf;
   end = buf+size;
	delete_buffer = owner;
}

Then the first time setup becomes:

unsigned char buf[BUF_SIZE];
MyCharInputBuffer input( buffer, sizeof(buf) );
lexer = new Lexer(input);
... rest the same ...

The per file bit:

antlr::LexerSharedInputState lex_input = lexer->getInputState();
lex_input->reset();
MyCharInputBuffer* input = dynamic_cast<MyCharInputBuffer*>(&lex_input->getInput());
input->setBuffer( new_buf, size );
// and reset parser (antlr) internal state
parser->getInputState()->reset();
parser->setFilename(filename);
// start parsing at the 'start' rule
parser->start();

Actually the setBuffer method is something that's good for inclusion in the
thing per default. Loring's dropin replacement might be nicer, I didn't
look at it yet.

Cheers,

Ric
--
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893755  ----
-----+++++*****************************************************+++++++++-------
 'And this 'rebooting' business? Give it a good kicking, do you?' 'Oh, no,
  of course, we ... that is ... well, yes, in fact,' said Ponder. 'Adrian
    goes round the back and ... er ... prods it with his foot. But in a
     technical way,' he added. --- From: Hogfather by Terry Pratchett.



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list