[antlr-interest] antlr v4 wish list

Thu Mar 24 11:32:11 PDT 2011

On Thu, Mar 24, 2011 at 1:23 PM, Terence Parr <parrt at cs.usfca.edu> wrote:

> added
>
> * Tree parser error handling should skip subtrees not nodes; these are
> programming errors not input errors.  The flat stream makes it hard to
> resync.
>
> Ter
>  On Mar 24, 2011, at 2:07 AM, Iztok Kavkler wrote:
>
> >> Howdy, I'm going to start augmenting ANTLR v3 significantly to create
> v4. The goal is backward compatibility; any new functionality, of course,
> will require altering or augmenting your grammars to take advantage of it.
> Here is my potential list of updates:
> >>
> >> http://www.antlr.org/wiki/display/ANTLR4/ANTLR+v4+Wish+list
> >>
> >> Anything to add or comment on?
> >>
> >> Ter
> >>
> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
> > A new error recovery mode for tree parsing:
> > When parsing ASTs, the ordinary error recovery strategies based on token
> > deletion/insertion are completely useless, because there are no man-made
> > syntax errors. In my experience, what you really want to do is the
> > following: assume that you have an error handler attached to some rule
> > and an error happens somewhere in the subtree of the node parsed by that
> > rule. When the handler catches an error, the parser must skip the
> > remainder of that subtree, otherwise the parser position is not
> > consistent with the grammar position anymore. In AST implementations
> > that are based on pointers between nodes this happens automatically, but
> > Antlr's representation as a flat list of nodes with UP and DOWN tokens
> > makes it requires some work - the parser has to keep track of the
> > current node's depth and skip the appropriate number of UP nodes
> > whenever an error is caught.
> >
> > Iztok
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

1. If my concept of scannerless parsing is the same as yours, then in the
generated code for a rule allow the true for "do { <rule code> }
while(true)" to be an attribute of the rule, i.e exit. Obviously the value
would be true unless changed by a user.This would allow the user to have
control of when to exit the rule. By turning true into a attribute of the
rule, this allows for more control than gated semantic predicates.

Based on by concept of scannerless parsing, there is no lexer and the parser
drives the reading of the tokens from the intput stream. The input stream
does not generate the tokens ahead of time but only when needed. In a quick
proof of concept I had the token type passed from the parser as a generic
parameter, allowing the redefinition of the token returned by the token
stream. There were no pre-defined tokens values; they were dynamically
generated.To get the proof of concept to work required having a
cross-reference table between token types and token values.

2. If ANTLR 4 will allow the reading of binary data streams, then please
don't put char and line pos in a base class. There could be one inherited
classes that defines line and char pos, and another inherited class that
defines offset.

Thanks

Eric