[antlr-interest] open-ended speed question: order of magnitude comparison?

Tue Nov 18 15:24:38 PST 2003

In another thread, John "Lingua Idiota" Mitchell wrote:

> For example, if very high speed is so important then what the hell are
> you doing using any "language" that needs such complexity to lex, parse,
> understand, and act upon to solve the problem? I.e., why aren't you
> using a purpose specific, fixed, highly normalized language that's 
> extremely easy to robustly deal with rapidly?

I was searching for some really open-ended information on ANTLR
performance (and, generally, performance of automatically-generated
recognizers/parsers) when I stumbled on this. I was planning on posting
this question anyway, but John's comment is of particular relevance to it.

The question, and then some qualifiers:

	Does anyone have a rough, order of magnitude, back of the 
	envelope, rule of thumb, etc. idea about the (runtime, lexing and 
	parsing and what-not) speed difference between a parser 
	generated with ANTLR, with one of the LALR(1) tools, or by hand
	(by someone competent but not necessarily wizardly)?

Now, to qualify that question:

	- I realize it's entirely open-ended and very much dependant on 
	  unique characteristics of the grammar being recognized that 
	  would either make it a "good" fit for automated recognition or a 
	  "good" fit for hand coding.

	- As John's post suggests, some languages optimized for easy 
	  parsing would be... parsed easily :-) which also implies that 
	  writing a "more" efficient parser would be eas{y,ier}.

However, in my particular application (or, at least, the application I'm 
considering), I don't really have control over the "languages" that would 
be recognized, and therefore their "parsability" characteristics.

Furthermore, the entire point of my wanting to use generators is to not 
have to hand-code the parsers for each new language I encounter in the 
application (and there would be plenty). Moreover, I want the people 
extending my application to not have to hand-code new parsers as they 
expand the system (and writing new parsers is definitely an integral part 
of expanding the system).

Finally, I'm hoping to be able to use the generated parsers to do 
'realtime' (for some values of "realtime") recognition of instances of the 
languages.

So, I realize that's entirely broad and open-ended. I'm not looking for 
any definitive statements, just general wisdom along the lines of "well, 
no, you're screwed" or "you could probably do it" or "yeah, no problem, 
performance is comparable for all but the tightest languages hand-coded by 
the most wizardly of parser mages".

Thanks!

- t.

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/