[antlr-interest] Re: a new paper on ANTLR style grammars

Fri Nov 21 04:19:21 PST 2003

One way you might do it is with a versioned data structure. The 
Harmonia project developed a versioned tree for incremental language 
analysis. A versioned tree with properties is obviously a powerful 
data structure (that's basically XML). So, if you had actions as 
transformations on a versioned tree then you could be doing updates 
to multiple versions, so you could branch your parser where needed 
simply by carrying out the actions against different versions. All 
your parse data like your symbol table would just be nodes attached 
to your tree or on a seperate branch for things like your symbol 
table. So, a lexer would take a character stream a produce a tree 
with a branch conatining data, e.g. a symbol table, and a branch 
containing the flat sequence of tokens output, so AST creation would 
take place in the lexer rather than in the parser, though the 
language would stay the same, it would just spit out a stream of AST 
nodes rather than tokens. For parsers and tree-parsers the branch 
containing data would be copied while the other data would be 
transformed as usual.
The main problem would be making it easy to work with a tree, 
obviously you *can* store a symbol table in a tree but you need to 
make accessing it easy. You would need some way to do XPath like 
expressions to address nodes and properties. Perhaps that could just 
be tacked onto the action-transformation stuff with expressions done 
in target-language. So you'd have some construct within actions to 
execute an XPath expression against the data portion of the output 
tree or the properties of the parse portion of the input, returning 
an AST node\s. So nodes at arbitrary places can be modified and new 
nodes can be added. With that you should be able to track a fair 
amount of dependency information for optimisation too.

Tom.
--- In antlr-interest at yahoogroups.com, "Oliver Zeigermann" 
<oliver at z...> wrote:
> Loring, thanks for the substantial input and taking this 
seriously :)
> 
> --- In antlr-interest at yahoogroups.com, "lgcraymer" <lgc at m...> 
wrote:
> > You can do better by deferring the actions--basically, build a 
> > monster case statement including all of the possible actions in 
a 
> > grammar--and execute them after matching a rule. 
> 
> *After* exactly is the problem. It is pretty easy to execute 
actions
> after a derivation / reduce, but not while shifting. This may be
> desirable though...
> 
> > Then you can 
> > trigger a set of actions at "commit" points.  Functional 
languages 
> > make this sort of lazy evaluation easier.  
> 
> Where should those commit points be?
> 
> > Rolling back actions is 
> > trickier--you have to have some sort of mechanism to record 
state, 
> > or the cost for checkpoint/rollback is very high.
> 
> Rolling back / forward does not come for free, agreed! But, if
> implemented reasonably expenses may at most be doubled. Compared to
> worst case exponential costs of backtracking this is not so bad.
> 
> > For that matter, you can defer all actions until the entire 
grammar 
> > is recognized as long as the actions do not affect the parse.  
> > Editing of a generated tree might be trickier, but not that 
> > difficult--you just insert code for that in the list of deferred 
> > actions.
> 
> We discussed this before - but still - writing to and afterwards
> reading from a symbol table is quite a usual thing. This can not be
> expressed merely by means of CFGs (or less), you need actions and
> semantic checks (i.e. predicates) here.
> 
> Other than that, to show my colors: I am a big fan of ASTs in 
genereal
> and ANTLR tree transformation in particular :)
> 
> Oliver
> 
> > --Loring
> > 
> > 
> > --- In antlr-interest at yahoogroups.com, "Oliver Zeigermann" 
> > <oliver at z...> wrote:
> > > What I wanted to say was: If you have sematic actions 
associated
> to
> > > your grammar that can be inserted (and of course executed) at 
any
> > > point and you have a table driven approach you are in trouble. 
> > This is
> > > because what I understand as the precomputation of a search 
tree 
> > into
> > > a table combining certain *search* states. Extrapolating from
> what 
> > I
> > > know about LR you have a problem when youe have a grammar like 
> > this:
> > > 
> > > a : A A { do something here } A A B ;
> > > b : A A { do something different here } A A C ;
> > > 
> > > upon input
> > > 
> > > AAAAC
> > > 
> > > This is because the parser has no idea which action to execute 
> > here.
> > > Now, my idea was to execute both and roll back the action of 
rule 
> > a as
> > > soon as it is clear that rule b actually matches.
> > > 
> > > A bit clearer now?
> > > 
> > > Oliver
> > > 
> > > --- In antlr-interest at yahoogroups.com, Terence Parr 
<parrt at c...> 
> > wrote:
> > > > 
> > > > On Wednesday, November 19, 2003, at 03:12 PM, Oliver 
Zeigermann 
> > wrote:
> > > > 
> > > > > Actually made it through the paper while getting nervous 
with 
> > the
> > > > > proofs ;)
> > > > >
> > > > > While he has linear time "backtracking" performance, ANTLR 
is 
> > worst
> > > > > case exponential. I was wondering why: ANTLR does not 
combine 
> > its
> > > > > depth first search (aka backtracking in this context) into 
a 
> > table
> > > > > while Bryan's approach does (at least I understand it this 
> > way). The
> > > > > problem Bryan will come across (given my understanding is 
> > halfway
> > > > > correct) is ACTIONS. As with LR and combined states, the 
> > problem is
> > > > > when to execute associated semantic actions. The drawback 
is 
> > well
> > > > > known and and leads to reduction in parsing power.
> > > > >
> > > > > Might sound weird, but I thought if we still combined 
states 
> > even
> > > > > though they are associated with different actions and 
simple 
> > execute
> > > > > all actions, there would be no loss of power :) Silly? Not 
if 
> > you have
> > > > > a transactional language that allows you to roll back 
actions 
> > that
> > > > > turn out to be invalid later and roll forward the valid 
ones.
> > > > >
> > > > > Technically this is possible. Does it make sense as well? 
Am
> I 
> > slowly
> > > > > going crazy? ;)
> > > > 
> > > > You are already crazy like me ;) <snicker, snort>.  Just got 
> > mail from 
> > > > him. :)  Hope it's ok to repeat part here:
> > > > 
> > > > > - Packrat parsing guarantees linear-time parsing on all 
the 
> > types of 
> > > > > grammars
> > > > > it supports, which amounts to everything that fits the 
> > formalism or
> > > > > "conceptual model" of parsing expression grammars.  But 
> > the "pure"
> > > PEG 
> > > > > model
> > > > > doesn't directly support "stateful" grammars like those of 
C 
> > and C++, 
> > > > > in
> > > > > which you have to build up symbol tables and such that 
> > effectively 
> > > > > modify the
> > > > > grammar mid-stream as the parser scans the input from left 
to 
> > right.  
> > > > > From
> > > > > what I've seen so far, it appears fundamentally difficult 
or 
> > > > > impossible to
> > > > > make a packrat parser support stateful grammars 
efficiently 
> > without
> > > > > effectively turning it into a deterministic (e.g., LR) 
parser.
> > > > 
> > > > So, the actions are the problem for everyone :)
> > > > 
> > > > Ter
> > > > --
> > > > Professor Comp. Sci., University of San Francisco
> > > > Creator, ANTLR Parser Generator, http://www.antlr.org
> > > > Co-founder, http://www.jguru.com
> > > > Co-founder, http://www.knowspam.net enjoy email again!
> > > > Co-founder, http://www.peerscope.com pure link sharing

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/