[antlr-interest] Re: Short circuit of the lexer

Tue Jan 21 09:32:34 PST 2003

Yes, you should follow the LA() discipline.  You probably need to switch
into your custom array lexer, gobble the array, then switch back to the
other lexer and reset the lookahead buffer to clear out any cached values.
Inspect the code and ask specific questions, I don't have the time to go
spelunking right now....

Monty

-----Original Message-----
From: xadeck <decoret at graphics.lcs.mit.edu>
[mailto:decoret at graphics.lcs.mit.edu]
Sent: Saturday, January 18, 2003 2:40 PM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Re: Short circuit of the lexer

--- In antlr-interest at yahoogroups.com, Terence Parr <parrt at j...> 
> 
> Are you using the latest 2.7.2 stuff or 2.7.1?  I think 2.7.2 is faster 
> :)

Of course I am ;-)

> 
> Also, (INT)* is definitely more efficient than the tail recursion you 
> are using.  just add the action within the loop:
> 
> ( i:INT {result.push_back(atoi(i->getText().c_str()));} )*
> 
> Put that in rule decl instead of referring to values and you should be 
> good to go.  Let me know if this works.  The tail recursion will build 
> a HUGE stack of method invocation records if you have 180k lines...very 
> very inefficient.  Try the loop :)
> 

Well, I switch from tail recursion to list and it is still slow. I
wrote a dummy version of my grammar keeping only the array stuff and I
t is quite pretty fast (10s for a test file) but when I add more
recognition token to the lexer it gets slower (30s) and when I use my
full grammar without any action (so C++ extra code cannot be
involved), it get quite slow (1m30 to 2min) and it even seems the tail
recursion is faster ?!?.

I am trying to figure out what is going one -> will investigate) cause
I know such files can be parsed pretty fast with antlr (I have seen
examples but I cannot use their grammar). I can send the full grammar
but it is pretty long (VRML2 grammar) and you would need the associate
library to compile it. I guess you have something else to do than
debugging other people's grammar.

Anyway, the original questions still holds for curiosity: will
Lexer::LA() be messed if I screw up the input stream within
lexer::nextToken()?

And by the way, thanks for the help.

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/