[antlr-interest] Ambiguous parse tree generated
sam at tunnelvisionlabs.com
Tue Oct 30 19:31:59 PDT 2012
I regularly use TerminalNode instead of Token when working in the trees. This gives me access to the parent node of a token in the parse tree. A special binary search can be used to locate the TerminalNode within the children of its parent.
I strongly disagree with your statement about decomposition complexity. I use an annotation processor to statically validate code (classes, fields, methods) that depends on the shape of my parse tree (or any subset of it). The grammar used by GoWorks has 126 context classes, with 670 dependencies throughout the code base. If 2 smaller rules get combined, then instead of having separate dependencies for the two rules the affected code will all share a dependency, which doubles my work if I need to make a future change to just 1 of the two smaller rules.
From: Gerald Rosenberg [mailto:gerald at certiv.net]
Sent: Tuesday, October 30, 2012 8:36 PM
To: Sam Harwell
Cc: ANTLR-Interest Interest
Subject: Re: [antlr-interest] Ambiguous parse tree generated
Yes, its that "in the order they were parsed" that is the issue. That would imply a list of lists per context to preserve ordering. Kinda what I was hoping for.
The problem with rule decomposition is that it non-trivially increases the complexity of processing the parse tree. For example, the Java grammar is less than about 30 rules. Yet the number of distinct contexts, factoring in decomposition and use of the # alt rule markers (including refactoring to ensure that the alts marked are all outer), is very nearly 200. Not a desirable direction.
In the rules that I presented, the elements are all terminals, so examining the token stream is just a single look back. Changing the lexer to produce Identifier : DotIdentifier | HashIdentifier ; tokens would also work.
A more general solution would require a token to keep a reference to its enclosing scope, including its prior peer element and its closure.
Should be able to do that using custom tokens and actions to explicitly capture scope state in the initial parse. Any reason this approach would not work/would not be desirable?
On 10/30/2012 4:07 PM, Sam Harwell wrote:
> In V3, a rewrite with multiple root operators behaves like individual rules in v4's parse trees. While the nodes in an AST produced by a V3 rule can have a complicated shape, the node created for a V4 rule is always a single node for the rule with a flat list of children (rule contexts and terminals) in the order they were parsed. Rule decomposition is the way to give your trees the shape you want, and really shouldn't cause a performance problem in v4.
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Gerald
> Sent: Tuesday, October 30, 2012 3:04 PM
> To: jimi at temporal-wave.com
> Cc: ANTLR-Interest Interest
> Subject: Re: [antlr-interest] Ambiguous parse tree generated
> Yes, and that is the work around I am using now. Sorry if I was not clear.
> Resorting to rule decomposition unfortunately greatly increases the number of enter/exits and the depth of what was, in v3, AST rewrites.
> Was hoping I was missing some way to mark or label the rule elements to remove the ambiguity.
More information about the antlr-interest