[antlr-interest] Re: a new paper on ANTLR style grammars
Thomas Brandon
tom at psy.unsw.edu.au
Fri Nov 21 04:19:21 PST 2003
One way you might do it is with a versioned data structure. The
Harmonia project developed a versioned tree for incremental language
analysis. A versioned tree with properties is obviously a powerful
data structure (that's basically XML). So, if you had actions as
transformations on a versioned tree then you could be doing updates
to multiple versions, so you could branch your parser where needed
simply by carrying out the actions against different versions. All
your parse data like your symbol table would just be nodes attached
to your tree or on a seperate branch for things like your symbol
table. So, a lexer would take a character stream a produce a tree
with a branch conatining data, e.g. a symbol table, and a branch
containing the flat sequence of tokens output, so AST creation would
take place in the lexer rather than in the parser, though the
language would stay the same, it would just spit out a stream of AST
nodes rather than tokens. For parsers and tree-parsers the branch
containing data would be copied while the other data would be
transformed as usual.
The main problem would be making it easy to work with a tree,
obviously you *can* store a symbol table in a tree but you need to
make accessing it easy. You would need some way to do XPath like
expressions to address nodes and properties. Perhaps that could just
be tacked onto the action-transformation stuff with expressions done
in target-language. So you'd have some construct within actions to
execute an XPath expression against the data portion of the output
tree or the properties of the parse portion of the input, returning
an AST node\s. So nodes at arbitrary places can be modified and new
nodes can be added. With that you should be able to track a fair
amount of dependency information for optimisation too.
Tom.
--- In antlr-interest at yahoogroups.com, "Oliver Zeigermann"
<oliver at z...> wrote:
> Loring, thanks for the substantial input and taking this
seriously :)
>
> --- In antlr-interest at yahoogroups.com, "lgcraymer" <lgc at m...>
wrote:
> > You can do better by deferring the actions--basically, build a
> > monster case statement including all of the possible actions in
a
> > grammar--and execute them after matching a rule.
>
> *After* exactly is the problem. It is pretty easy to execute
actions
> after a derivation / reduce, but not while shifting. This may be
> desirable though...
>
> > Then you can
> > trigger a set of actions at "commit" points. Functional
languages
> > make this sort of lazy evaluation easier.
>
> Where should those commit points be?
>
> > Rolling back actions is
> > trickier--you have to have some sort of mechanism to record
state,
> > or the cost for checkpoint/rollback is very high.
>
> Rolling back / forward does not come for free, agreed! But, if
> implemented reasonably expenses may at most be doubled. Compared to
> worst case exponential costs of backtracking this is not so bad.
>
> > For that matter, you can defer all actions until the entire
grammar
> > is recognized as long as the actions do not affect the parse.
> > Editing of a generated tree might be trickier, but not that
> > difficult--you just insert code for that in the list of deferred
> > actions.
>
> We discussed this before - but still - writing to and afterwards
> reading from a symbol table is quite a usual thing. This can not be
> expressed merely by means of CFGs (or less), you need actions and
> semantic checks (i.e. predicates) here.
>
> Other than that, to show my colors: I am a big fan of ASTs in
genereal
> and ANTLR tree transformation in particular :)
>
> Oliver
>
> > --Loring
> >
> >
> > --- In antlr-interest at yahoogroups.com, "Oliver Zeigermann"
> > <oliver at z...> wrote:
> > > What I wanted to say was: If you have sematic actions
associated
> to
> > > your grammar that can be inserted (and of course executed) at
any
> > > point and you have a table driven approach you are in trouble.
> > This is
> > > because what I understand as the precomputation of a search
tree
> > into
> > > a table combining certain *search* states. Extrapolating from
> what
> > I
> > > know about LR you have a problem when youe have a grammar like
> > this:
> > >
> > > a : A A { do something here } A A B ;
> > > b : A A { do something different here } A A C ;
> > >
> > > upon input
> > >
> > > AAAAC
> > >
> > > This is because the parser has no idea which action to execute
> > here.
> > > Now, my idea was to execute both and roll back the action of
rule
> > a as
> > > soon as it is clear that rule b actually matches.
> > >
> > > A bit clearer now?
> > >
> > > Oliver
> > >
> > > --- In antlr-interest at yahoogroups.com, Terence Parr
<parrt at c...>
> > wrote:
> > > >
> > > > On Wednesday, November 19, 2003, at 03:12 PM, Oliver
Zeigermann
> > wrote:
> > > >
> > > > > Actually made it through the paper while getting nervous
with
> > the
> > > > > proofs ;)
> > > > >
> > > > > While he has linear time "backtracking" performance, ANTLR
is
> > worst
> > > > > case exponential. I was wondering why: ANTLR does not
combine
> > its
> > > > > depth first search (aka backtracking in this context) into
a
> > table
> > > > > while Bryan's approach does (at least I understand it this
> > way). The
> > > > > problem Bryan will come across (given my understanding is
> > halfway
> > > > > correct) is ACTIONS. As with LR and combined states, the
> > problem is
> > > > > when to execute associated semantic actions. The drawback
is
> > well
> > > > > known and and leads to reduction in parsing power.
> > > > >
> > > > > Might sound weird, but I thought if we still combined
states
> > even
> > > > > though they are associated with different actions and
simple
> > execute
> > > > > all actions, there would be no loss of power :) Silly? Not
if
> > you have
> > > > > a transactional language that allows you to roll back
actions
> > that
> > > > > turn out to be invalid later and roll forward the valid
ones.
> > > > >
> > > > > Technically this is possible. Does it make sense as well?
Am
> I
> > slowly
> > > > > going crazy? ;)
> > > >
> > > > You are already crazy like me ;) <snicker, snort>. Just got
> > mail from
> > > > him. :) Hope it's ok to repeat part here:
> > > >
> > > > > - Packrat parsing guarantees linear-time parsing on all
the
> > types of
> > > > > grammars
> > > > > it supports, which amounts to everything that fits the
> > formalism or
> > > > > "conceptual model" of parsing expression grammars. But
> > the "pure"
> > > PEG
> > > > > model
> > > > > doesn't directly support "stateful" grammars like those of
C
> > and C++,
> > > > > in
> > > > > which you have to build up symbol tables and such that
> > effectively
> > > > > modify the
> > > > > grammar mid-stream as the parser scans the input from left
to
> > right.
> > > > > From
> > > > > what I've seen so far, it appears fundamentally difficult
or
> > > > > impossible to
> > > > > make a packrat parser support stateful grammars
efficiently
> > without
> > > > > effectively turning it into a deterministic (e.g., LR)
parser.
> > > >
> > > > So, the actions are the problem for everyone :)
> > > >
> > > > Ter
> > > > --
> > > > Professor Comp. Sci., University of San Francisco
> > > > Creator, ANTLR Parser Generator, http://www.antlr.org
> > > > Co-founder, http://www.jguru.com
> > > > Co-founder, http://www.knowspam.net enjoy email again!
> > > > Co-founder, http://www.peerscope.com pure link sharing
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list