[antlr-interest] antlr v4 wish list

Terence Parr parrt at cs.usfca.edu
Sat Mar 26 11:59:28 PDT 2011


thanks for the ideas gang!  I'll see how things progress soon.

first task: rebuild v3 antlr parser in v3 and jam into v3 code base; it's in v2 now.

then might attack lexer stuff.

Ter
On Mar 26, 2011, at 11:36 AM, The Researcher wrote:

> On Sat, Mar 26, 2011 at 1:21 PM, Jason Doege <jdoege at gmail.com> wrote:
> 
>> Re: Scanner-less parsing
>> 
>> The Parse::RecDescent module for Perl5 implements parsers without a
>> separate scanner and is what comes to mind when I hear the phrase
>> scanner-less. If you were to retain a scanner, I think the
>> characteristic that could provide the same function is to provide
>> context to the scanner so that when you go to get the next token, the
>> scanner only considers the type of token next expected in the current
>> alternative in the production. This way one could have multiple tokens
>> that might all match some text (but not others) and have the context of
>> the production resolve which one it was, (so long as it matched, of
>> course.)
>> 
>> For instance, I might want to have separate token types for binary, hex
>> and decimal digits, but a scanner can not tell which of the three it is
>> if the input is '0' or '1'. Hex overlaps with decimal for 0-9 and
>> overlaps with binary for 0-1 and potentially 'x', 'X', 'z' and 'Z' for
>> some implementations. There absolutely are other ways to handle this,
>> but there is a great deal of flexibility that comes from permitting
>> context to guide the scanner.
>> 
>> Having to work through the unambiguity of lexer patterns was something
>> that was unexpected when I recently began working with ANTLR. I suspect
>> that this would not be the case for someone who is more accustomed to
>> using Lex/Yacc or comes from a more traditional or academic
>> parser-building background.
>> 
>> Best regards,
>> Jason Doege
>> 
>> On 3/25/2011 9:19 AM, The Researcher wrote:
>>> 
>>> 
>>> On Thu, Mar 24, 2011 at 2:32 PM, The Researcher<researcher0x00 at gmail.com
>>> wrote:
>>> 
>>>> 
>>>> On Thu, Mar 24, 2011 at 1:23 PM, Terence Parr<parrt at cs.usfca.edu>
>> wrote:
>>>> 
>>>>> added
>>>>> 
>>>>> * Tree parser error handling should skip subtrees not nodes; these are
>>>>> programming errors not input errors.  The flat stream makes it hard to
>>>>> resync.
>>>>> 
>>>>> Ter
>>>>>  On Mar 24, 2011, at 2:07 AM, Iztok Kavkler wrote:
>>>>> 
>>>>>>> Howdy, I'm going to start augmenting ANTLR v3 significantly to create
>>>>> v4. The goal is backward compatibility; any new functionality, of
>> course,
>>>>> will require altering or augmenting your grammars to take advantage of
>> it.
>>>>> Here is my potential list of updates:
>>>>>>> http://www.antlr.org/wiki/display/ANTLR4/ANTLR+v4+Wish+list
>>>>>>> 
>>>>>>> Anything to add or comment on?
>>>>>>> 
>>>>>>> Ter
>>>>>>> 
>>>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>>>> Unsubscribe:
>>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>>> A new error recovery mode for tree parsing:
>>>>>> When parsing ASTs, the ordinary error recovery strategies based on
>> token
>>>>>> deletion/insertion are completely useless, because there are no
>> man-made
>>>>>> syntax errors. In my experience, what you really want to do is the
>>>>>> following: assume that you have an error handler attached to some rule
>>>>>> and an error happens somewhere in the subtree of the node parsed by
>> that
>>>>>> rule. When the handler catches an error, the parser must skip the
>>>>>> remainder of that subtree, otherwise the parser position is not
>>>>>> consistent with the grammar position anymore. In AST implementations
>>>>>> that are based on pointers between nodes this happens automatically,
>> but
>>>>>> Antlr's representation as a flat list of nodes with UP and DOWN tokens
>>>>>> makes it requires some work - the parser has to keep track of the
>>>>>> current node's depth and skip the appropriate number of UP nodes
>>>>>> whenever an error is caught.
>>>>>> 
>>>>>> Iztok
>>>>>> 
>>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>>> Unsubscribe:
>>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>> 
>>>>> 
>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>> Unsubscribe:
>>>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>> 
>>>> 
>>>> 1. If my concept of scannerless parsing is the same as yours, then in
>> the
>>>> generated code for a rule allow the true for "do {<rule code>  }
>>>> while(true)" to be an attribute of the rule, i.e exit. Obviously the
>> value
>>>> would be true unless changed by a user.This would allow the user to have
>>>> control of when to exit the rule. By turning true into a attribute of
>> the
>>>> rule, this allows for more control than gated semantic predicates.
>>>> 
>>>> Based on by concept of scannerless parsing, there is no lexer and the
>>>> parser drives the reading of the tokens from the intput stream. The
>> input
>>>> stream does not generate the tokens ahead of time but only when needed.
>> In a
>>>> quick proof of concept I had the token type passed from the parser as a
>>>> generic parameter, allowing the redefinition of the token returned by
>> the
>>>> token stream. There were no pre-defined tokens values; they were
>> dynamically
>>>> generated.To get the proof of concept to work required having a
>>>> cross-reference table between token types and token values.
>>>> 
>>>> 2. If ANTLR 4 will allow the reading of binary data streams, then please
>>>> don't put char and line pos in a base class. There could be one
>> inherited
>>>> classes that defines line and char pos, and another inherited class that
>>>> defines offset.
>>>> 
>>>> Thanks
>>>> 
>>>> Eric
>>>> 
>>>> 
>>> After finding Scannerless Generalized LR (SGLR),  which I believe is
>> closer
>>> to your meaning, my concept of scannerless parsing is different enough
>> that
>>> the reference should should be disregarded. I still submit the request
>> for a
>>> rule to have an exit attribute.
>>> 
>>> Thanks, Eric
>>> 
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>> 
>> 
>> 
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>> 
> 
> 
> Jason  Thanks for the info.
> 
> Ter, with regards to wishes for ANTLR 4, I don't know how long you will be
> leaving the pipeline open, so I am sending in my wishes before the pipeline
> closes and the wishes aren't as polished as they should be.
> 
> More in line with what I am wishing is that ANTLR 4 have more features
> accessible from the grammar for doing research; possibly via a research
> mode. The ablility to manipulate the state machine for each rule is desired,
> along with the ability to manipulte the trees in a fashion simular to
> PROLOG.
> 
> I know you have been considering LLVM, for which I would truly like to see
> in ANTLR 4, and that may be the ingress I seek.
> 
> I wish I could flesh out the details more, but maybe others can hop onto
> this suggestion.
> 
> Thanks, Eric
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list