[antlr-interest] antlr v4 wish list

The Researcher researcher0x00 at gmail.com
Sat Mar 26 11:36:44 PDT 2011


On Sat, Mar 26, 2011 at 1:21 PM, Jason Doege <jdoege at gmail.com> wrote:

> Re: Scanner-less parsing
>
> The Parse::RecDescent module for Perl5 implements parsers without a
> separate scanner and is what comes to mind when I hear the phrase
> scanner-less. If you were to retain a scanner, I think the
> characteristic that could provide the same function is to provide
> context to the scanner so that when you go to get the next token, the
> scanner only considers the type of token next expected in the current
> alternative in the production. This way one could have multiple tokens
> that might all match some text (but not others) and have the context of
> the production resolve which one it was, (so long as it matched, of
> course.)
>
> For instance, I might want to have separate token types for binary, hex
> and decimal digits, but a scanner can not tell which of the three it is
> if the input is '0' or '1'. Hex overlaps with decimal for 0-9 and
> overlaps with binary for 0-1 and potentially 'x', 'X', 'z' and 'Z' for
> some implementations. There absolutely are other ways to handle this,
> but there is a great deal of flexibility that comes from permitting
> context to guide the scanner.
>
> Having to work through the unambiguity of lexer patterns was something
> that was unexpected when I recently began working with ANTLR. I suspect
> that this would not be the case for someone who is more accustomed to
> using Lex/Yacc or comes from a more traditional or academic
> parser-building background.
>
> Best regards,
> Jason Doege
>
> On 3/25/2011 9:19 AM, The Researcher wrote:
> >
> >
> > On Thu, Mar 24, 2011 at 2:32 PM, The Researcher<researcher0x00 at gmail.com
> >wrote:
> >
> >>
> >> On Thu, Mar 24, 2011 at 1:23 PM, Terence Parr<parrt at cs.usfca.edu>
>  wrote:
> >>
> >>> added
> >>>
> >>> * Tree parser error handling should skip subtrees not nodes; these are
> >>> programming errors not input errors.  The flat stream makes it hard to
> >>> resync.
> >>>
> >>> Ter
> >>>   On Mar 24, 2011, at 2:07 AM, Iztok Kavkler wrote:
> >>>
> >>>>> Howdy, I'm going to start augmenting ANTLR v3 significantly to create
> >>> v4. The goal is backward compatibility; any new functionality, of
> course,
> >>> will require altering or augmenting your grammars to take advantage of
> it.
> >>> Here is my potential list of updates:
> >>>>> http://www.antlr.org/wiki/display/ANTLR4/ANTLR+v4+Wish+list
> >>>>>
> >>>>> Anything to add or comment on?
> >>>>>
> >>>>> Ter
> >>>>>
> >>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>>>> Unsubscribe:
> >>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >>>> A new error recovery mode for tree parsing:
> >>>> When parsing ASTs, the ordinary error recovery strategies based on
> token
> >>>> deletion/insertion are completely useless, because there are no
> man-made
> >>>> syntax errors. In my experience, what you really want to do is the
> >>>> following: assume that you have an error handler attached to some rule
> >>>> and an error happens somewhere in the subtree of the node parsed by
> that
> >>>> rule. When the handler catches an error, the parser must skip the
> >>>> remainder of that subtree, otherwise the parser position is not
> >>>> consistent with the grammar position anymore. In AST implementations
> >>>> that are based on pointers between nodes this happens automatically,
> but
> >>>> Antlr's representation as a flat list of nodes with UP and DOWN tokens
> >>>> makes it requires some work - the parser has to keep track of the
> >>>> current node's depth and skip the appropriate number of UP nodes
> >>>> whenever an error is caught.
> >>>>
> >>>> Iztok
> >>>>
> >>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>>> Unsubscribe:
> >>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >>>
> >>>
> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>> Unsubscribe:
> >>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >>>
> >>
> >> 1. If my concept of scannerless parsing is the same as yours, then in
> the
> >> generated code for a rule allow the true for "do {<rule code>  }
> >> while(true)" to be an attribute of the rule, i.e exit. Obviously the
> value
> >> would be true unless changed by a user.This would allow the user to have
> >> control of when to exit the rule. By turning true into a attribute of
> the
> >> rule, this allows for more control than gated semantic predicates.
> >>
> >> Based on by concept of scannerless parsing, there is no lexer and the
> >> parser drives the reading of the tokens from the intput stream. The
> input
> >> stream does not generate the tokens ahead of time but only when needed.
> In a
> >> quick proof of concept I had the token type passed from the parser as a
> >> generic parameter, allowing the redefinition of the token returned by
> the
> >> token stream. There were no pre-defined tokens values; they were
> dynamically
> >> generated.To get the proof of concept to work required having a
> >> cross-reference table between token types and token values.
> >>
> >> 2. If ANTLR 4 will allow the reading of binary data streams, then please
> >> don't put char and line pos in a base class. There could be one
> inherited
> >> classes that defines line and char pos, and another inherited class that
> >> defines offset.
> >>
> >> Thanks
> >>
> >> Eric
> >>
> >>
> > After finding Scannerless Generalized LR (SGLR),  which I believe is
> closer
> > to your meaning, my concept of scannerless parsing is different enough
> that
> > the reference should should be disregarded. I still submit the request
> for a
> > rule to have an exit attribute.
> >
> > Thanks, Eric
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


Jason  Thanks for the info.

Ter, with regards to wishes for ANTLR 4, I don't know how long you will be
leaving the pipeline open, so I am sending in my wishes before the pipeline
closes and the wishes aren't as polished as they should be.

More in line with what I am wishing is that ANTLR 4 have more features
accessible from the grammar for doing research; possibly via a research
mode. The ablility to manipulate the state machine for each rule is desired,
along with the ability to manipulte the trees in a fashion simular to
PROLOG.

I know you have been considering LLVM, for which I would truly like to see
in ANTLR 4, and that may be the ingress I seek.

I wish I could flesh out the details more, but maybe others can hop onto
this suggestion.

Thanks, Eric


More information about the antlr-interest mailing list