[antlr-interest] antlr v4 wish list

Fri Mar 25 07:19:35 PDT 2011

On Thu, Mar 24, 2011 at 2:32 PM, The Researcher <researcher0x00 at gmail.com>wrote:

>
>
> On Thu, Mar 24, 2011 at 1:23 PM, Terence Parr <parrt at cs.usfca.edu> wrote:
>
>> added
>>
>> * Tree parser error handling should skip subtrees not nodes; these are
>> programming errors not input errors.  The flat stream makes it hard to
>> resync.
>>
>> Ter
>>  On Mar 24, 2011, at 2:07 AM, Iztok Kavkler wrote:
>>
>> >> Howdy, I'm going to start augmenting ANTLR v3 significantly to create
>> v4. The goal is backward compatibility; any new functionality, of course,
>> will require altering or augmenting your grammars to take advantage of it.
>> Here is my potential list of updates:
>> >>
>> >> http://www.antlr.org/wiki/display/ANTLR4/ANTLR+v4+Wish+list
>> >>
>> >> Anything to add or comment on?
>> >>
>> >> Ter
>> >>
>> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> >> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>> >
>> > A new error recovery mode for tree parsing:
>> > When parsing ASTs, the ordinary error recovery strategies based on token
>> > deletion/insertion are completely useless, because there are no man-made
>> > syntax errors. In my experience, what you really want to do is the
>> > following: assume that you have an error handler attached to some rule
>> > and an error happens somewhere in the subtree of the node parsed by that
>> > rule. When the handler catches an error, the parser must skip the
>> > remainder of that subtree, otherwise the parser position is not
>> > consistent with the grammar position anymore. In AST implementations
>> > that are based on pointers between nodes this happens automatically, but
>> > Antlr's representation as a flat list of nodes with UP and DOWN tokens
>> > makes it requires some work - the parser has to keep track of the
>> > current node's depth and skip the appropriate number of UP nodes
>> > whenever an error is caught.
>> >
>> > Iztok
>> >
>> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> > Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
>
> 1. If my concept of scannerless parsing is the same as yours, then in the
> generated code for a rule allow the true for "do { <rule code> }
> while(true)" to be an attribute of the rule, i.e exit. Obviously the value
> would be true unless changed by a user.This would allow the user to have
> control of when to exit the rule. By turning true into a attribute of the
> rule, this allows for more control than gated semantic predicates.
>
> Based on by concept of scannerless parsing, there is no lexer and the
> parser drives the reading of the tokens from the intput stream. The input
> stream does not generate the tokens ahead of time but only when needed. In a
> quick proof of concept I had the token type passed from the parser as a
> generic parameter, allowing the redefinition of the token returned by the
> token stream. There were no pre-defined tokens values; they were dynamically
> generated.To get the proof of concept to work required having a
> cross-reference table between token types and token values.
>
> 2. If ANTLR 4 will allow the reading of binary data streams, then please
> don't put char and line pos in a base class. There could be one inherited
> classes that defines line and char pos, and another inherited class that
> defines offset.
>
> Thanks
>
> Eric
>
>
After finding Scannerless Generalized LR (SGLR),  which I believe is closer
to your meaning, my concept of scannerless parsing is different enough that
the reference should should be disregarded. I still submit the request for a
rule to have an exit attribute.

Thanks, Eric