[antlr-interest] Bounding the token stream in the C backend

Christopher L Conway cconway at cs.nyu.edu
Wed Mar 3 15:37:22 PST 2010


Jim,

On Wed, Mar 3, 2010 at 2:38 PM, Jim Idle <jimi at temporal-wave.com> wrote:
>>     pANTLR3_COMMON_TOKEN token = $IDENTIFIER;
>>     ANTLR3_MARKER start = token->getStartIndex(token);
>>     ANTLR3_MARKER end = token->getStopIndex(token);
>>     std::string id( (const char *)start, end-start+1 );
>>
>
> But, do you really even need to create the string? Can you not just use the token and then if you ever actualize the text for something only copy it at that point?

In general, this is a good suggestion. In this case, the identifier is
going into a symbol table so, yes, I do need the copy.

>> I see another 3-fold decrease in memory usage. In combination with the
>> bounded lookahead stream and token factory, this brings the memory
>> usage of my ANTLR 3 C parser roughly in line the ANTLR 2.7 C++ version
>> (it's still ~40% faster).
>
> It should be much better than that, so it tends to make me think that the overhead is in the other code you have surrounding the parser. You should try and do a comparison with no actions in either. However, perhaps you do not need to because once the parsing time is not really any part of the total time, you will get more performance by improving the action code of course.

I'm giving the running time for the whole parsing process, including
semantic actions. We've previously measured that about 50% of the time
was spent in ANTLR code, so this represents probably an 80-90% speedup
on pure parsing.

>> This is intriguing. Could you point to a few of the important settings
>> I should be looking at?
>
> Things such as not using method calls for LA() when you know you have 8 bit or 16 bit input (you can do this now, check your generated code or the C examples)

I'm having trouble figuring out how to do this. If I try to re-#define
LA in the @postinclude section, it gets placed before the default
generated #definition, so the default #definition wins.

> turning off follow set stacking if you do not need fancy error messages but just wish to fail out or say "Syntax error at line 4".

I also can't figure out how to do this and I'm not sure where to start.

Thanks,
Chris


More information about the antlr-interest mailing list