[antlr-interest] Bounding the token stream in the C backend

Wed Mar 3 15:58:38 PST 2010

> -----Original Message-----
> From: Christopher L Conway [mailto:cconway at cs.nyu.edu]
> Sent: Wednesday, March 03, 2010 3:37 PM
> To: Jim Idle
> > It should be much better than that, so it tends to make me think that
> the overhead is in the other code you have surrounding the parser. You
> should try and do a comparison with no actions in either. However,
> perhaps you do not need to because once the parsing time is not really
> any part of the total time, you will get more performance by improving
> the action code of course.
> 
> I'm giving the running time for the whole parsing process, including
> semantic actions. We've previously measured that about 50% of the time
> was spent in ANTLR code, so this represents probably an 80-90% speedup
> on pure parsing.

Still doesn't seem to be quite right to be honest, you should be seeing it much faster than that. Or do you mean that it now only takes 10 to 20% of the time to parse than it used to?

> 
> >> This is intriguing. Could you point to a few of the important
> settings
> >> I should be looking at?
> >
> > Things such as not using method calls for LA() when you know you have
> 8 bit or 16 bit input (you can do this now, check your generated code
> or the C examples)
> 
> I'm having trouble figuring out how to do this. If I try to re-#define
> LA in the @postinclude section, it gets placed before the default
> generated #definition, so the default #definition wins.

You need to define the macro as per the examples in the downloadable examples tar ball:

// While you can implement your own character streams and so on, they
// normally call things like LA() via function pointers. In general you will
// be using one of the pre-supplied input streams and you can instruct the
// generated code to access the input pointers directly.
//
// For  8 bit inputs            : #define ANTLR3_INLINE_INPUT_ASCII
// For 16 bit UTF16/UCS2 inputs : #define ANTLR3_INLINE_INPUT_UTF16
//
// If your compiled recognizer might be given inputs from either of the sources
// or you have written your own character input stream, then do not define
// either of these.
//
@lexer::header
{
#define	ANTLR3_INLINE_INPUT_ASCII
}

> 
> > turning off follow set stacking if you do not need fancy error
> messages but just wish to fail out or say "Syntax error at line 4".
> 
> I also can't figure out how to do this and I'm not sure where to start.

You cannot do this until the next release.

Jim