[antlr-interest] alternative prediction mechanism for 3.0

Mon Jun 14 04:01:03 PDT 2004

Hi,

On Sun, Jun 13, 2004 at 02:32:58PM -0700, Terence Parr wrote:
> Concerning the new parsers and lexers generated by ANTLR 3.0, I'm
> trying to decide how to encode the prediction DFAs.  If you have an
> opinion on what the generated parsers/lexers should look like, please
> take a quick look at the latest code gen blog entry:
>
> http://www.antlr.org/blog/antlr3/codegen.tml

> DFA decision = dfaFactory.get("...DFA description...");

You mean to have a 'high' level ASCII DFA specification of
states/transitions etc. between the ".." or just an id for that specific
DFA?

Why not encode the DFA in static tables (that can be 'run' directly without
first processing them)? There's lot's of ways to do that if I recall right.
It also offers the possibility to dump the tables in separate class files
(if size becomes an issue). (I'm always a bit itchy when people suggest
building stuff on the run, when at first glance it looks like something
you'd put in static tables)

a) it's faster

b) it gives more consistent runtime behavior (some input will parse fast,
	some input will parse slow depending on if it takes a worst case path
	through the generated parser).

c) The runtime memory usage pattern will be more predictable (this can be
	very important for people wanting to use antlr in embedded
	environments). Ok you could generate code to calculate the DFA's at
	parser instantiation.

The DFA will do a 'highlevel' choice anyway (as I understood it). Much like
a bitset.isMember(LA(x)) in the current codegen. As long as there's a way
to see what the DFA does I'm happy, being it a comment or some form of
tracing. (heck put a piece of .dot input (AT&T graphviz) in a comment for
the DFA) Or put an extra textfield in the dfa table that gets printed in
trace mode. Or combine the two: put .dot input near the table and put in
the table the necessary curses to make dot recolor the DFA previously
generated. So you can even see what the DFA does at runtime, this would be
a near trivial bit of code in Tcl/Tk.

Another point is that these DFA's get generated by ANTLR. The average ANTLR
user will probably not want to look inside these DFA's anyway. Looking into
a bitset is different from looking into a DFA. Also putting a clear
description of the automaton in a comment is probably a lot more helpfull
then a somewhat obscure though human 'readable' format in a text string. A
plain table with transitions is a lot harder to grok than something that
dot spits out for one (even if you have to run an external program to get a
nice picture).

> ..
> but the more complicated DFAs should be handled not with
> hard-coded DFAs but with DFAs dynamically created from simple
> Human-readable descriptions.
> ..

For some reason this statement really itches me. Ok I got it out of context
but even in context. I recall statements about making antlr 3 a speed
freak. This is kindoff orthogonal to that. Ok it will probably be not that
bad with caching of the generated automata but anyway ;)

To sum up let's keep the generated code readable but within reason. At
least that would be what I'd do personally.

Cheers,

Ric
--
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893755  ----
-----+++++*****************************************************+++++++++-------
  Xander: "Buffy, we need to do something *now*"
  Angel: "We need a distraction."
  Buffy: "Right."
  Angel: "What are you going to do?"
  Buffy: "I'm going to kill them all. That oughtta distract 'em..."

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/