[antlr-interest] Recovering white space in V3.0

Terence Parr parrt at cs.usfca.edu
Sat Jun 4 17:07:01 PDT 2005


On Jun 4, 2005, at 4:12 PM, Bryan Ewbank wrote:

> Ter,
>
> Can you define "common" and "extreme" in this context?

Sure.  Common: buffer up all tokens (Note that in the early 90's  
PCCTS did this for syntactic predicates).  Make tweaking the input  
stream and spitting it back out mostly verbatim easy.  Extreme:  
parsing something bigger than the 2G RAM I have in my box ;)

Some of the stuff is more heavyweight than you'd want in a really  
speed-critical app.  For example, my common tokens store the token  
index because it's damn useful.  They also track indexes into the  
char buffer (start/stop of the token string) rather than build  
strings...requires the chars be buffered too.  The tokens store the  
char position in the line (column) as well as the line.  All this  
takes memory to store and time in the lexer to set.

I experimented returning the same exact token object for all  
whitespace and comments just to see if it saved much in speed.   
Didn't notice much but it's hard to measure as you know.  Point is,  
you can do anything you want.  I'm just making it really easy to whip  
together some cool translators.  If you need to handle extremely  
large files or need extreme speed, you can do it--you just have to do  
a wee bit of work for it.

For example, you can copy the Java.stg template file and tweak it for  
speed (very easily done) and then just keep that around forever so  
you can use it.  Say language=MyJava in the grammar options and boom-- 
it uses your faster code generator :)

Does that help?  More details?

Ter

>
> On 6/4/05, Terence Parr <parrt at cs.usfca.edu> wrote:
>
>> I am building stuff in general to work for the common
>> case not the extremes, leaving the ability to handle
>> the extremes.
>



More information about the antlr-interest mailing list