[antlr-interest] bad generated code?
Christian Bird
cabird at gmail.com
Fri Sep 30 20:04:53 PDT 2005
I get why they have to be lookahead 1. In my case, I know that all
the places that need lookahead 2 only look at the "same level" of the
tree. That is, there's not subtrees at those locations. In general
though, I see why the problem is a difficult one. There should at
least be a check so that if the code generator is asked to do
lookahead > 1 for a treeparser it generates an exception or error with
the rule that caused the issue. In my case I was looking all over my
grammar and eventually looked through the source.
On a tangent: I'm not so convinced that treeparsers are actually
working in two dimensions. In reality, a treeparser simply does an
pre-order traversal of the AST trying to match tokens, right? There
are many applications in computer science where a "2-dimensional" tree
is turned into linear sequence of "things" (in this case tokens, both
real and imaginary). XML, for instance is a tree turned into a linear
sequence of tokens with special tokens to indicate where subtrees
start and end. If I add two tokens, one designating that a new
subtree is beginning (BEGIN) and one designating that the current
subtree is ending (END) then the following rule:
rule1 : #(A #(B C D) E F);
can be translated into
rule2: A BEGIN B BEGIN C D END E F END;
I could create a token generator that does a pre-order traversal of
the tree spitting back tokens to a standard parser that knows about
BEGIN and AND and it's essentially the same thing as a tree parser.
In this case it shouldn't be tough to do arbitrary lookahead, right?
Sorry for the over-analysis. A fellow grad-student here at Davis is a
tree-nut (HAHA) and I must have been around him a little too much
lately.
-- Chris
On 9/30/05, Monty Zukowski <monty at codetransform.com> wrote:
> Tree parsers have to be k=1 because they are actually two dimensional
> and it gets too weird to allow k>1. Syntactic predicates should
> work, though.
>
> You might even try this with 2.7.4. This seems pretty basic to get
> wrong. I wonder if it is a newly introduced bug.
>
> Monty
>
>
> On Sep 30, 2005, at 6:59 PM, Christian Bird wrote:
>
> > In case anyone cares, I tracked down the problem in the source. Here
> > is the lookahead code generator from JavaCodeGenerator.java:
> >
> > private String lookaheadString(int k) {
> > if (grammar instanceof TreeWalkerGrammar) {
> > return "_t.getType()";
> > }
> > return "LA(" + k + ")";
> > }
> >
> > Note that if it's a tree walker, k isn't examined at all. My solution
> > (though I'm not sure that it's always NullPointerException safe) is
> > this:
> >
> > private String lookaheadString(int k) {
> > if (grammar instanceof TreeWalkerGrammar) {
> > System.out.println("k is " + Integer.toString(k) );
> > String retStr = "_t";
> > while (k > 1) {
> > retStr += ".getNextSibling()";
> > k--;
> > }
> > retStr += ".getType()";
> > return retStr;
> > }
> > return "LA(" + k + ")";
> > }
> >
> > A quick recompile and that seems to do the trick. It's possible that
> > the check for getNextSibling could run off the end of the list and
> > generate a NullPointerException, but in my grammar I know that the
> > places where lookahead two is required, there's gauranteed to be a
> > next sibling.
> >
> > Terrence,
> > Any chance that this or something similar and safer (I'm not too
> > familiar with the codebase) could make it into 2.7.6? I know that
> > it's probably not often that a tree parser needs k > 1, but (at least
> > in my case) it can occur. Thanks.
> >
> > -- Chris
> >
> > On 9/30/05, Christian Bird <cabird at gmail.com> wrote:
> >
> >> That didn't seem to work either. I tried using a syntactic
> >> predicate:
> >>
> >> name :
> >> (ID DOT) => complexName
> >> | (ID ~DOT) => identifier
> >> ;
> >>
> >> and adding a rule that changes the followset of name:
> >>
> >> aname :
> >> name SEMI;
> >>
> >> But the code still has issues:
> >>
> >> boolean synPredMatched98 = false;
> >> if (((_t.getType()==ID) && (_t.getType()==SEMI||_t.getType()
> >> ==ARROW))) {
> >> AST __t98 = _t;
> >> synPredMatched98 = true;
> >> inputState.guessing++;
> >> try {
> >> {
> >> AST tmp63_AST_in = (AST)_t;
> >> match(_t,ID);
> >> _t = _t.getNextSibling();
> >> AST tmp64_AST_in = (AST)_t;
> >> matchNot(_t,DOT);
> >> _t = _t.getNextSibling();
> >> }
> >> }
> >> catch (RecognitionException pe) {
> >> synPredMatched98 = false;
> >> }
> >> _t = __t98;
> >> inputState.guessing--;
> >> }
> >> if ( synPredMatched98 ) {
> >> identifier(_t);
> >> _t = _retTree;
> >> }
> >> else {
> >> throw new NoViableAltException(_t);
> >> }
> >>
> >>
> >> Oh well...
> >>
> >> -- Chris
> >>
> >> On 9/30/05, Monty Zukowski <monty at codetransform.com> wrote:
> >>
> >>> I dunno. Try putting parenthesis around the two alternatives?
> >>>
> >>> Monty
> >>>
> >>> On Sep 30, 2005, at 5:48 PM, Christian Bird wrote:
> >>>
> >>>
> >>>> Good suggestion, but unfortunately the code generated for name is
> >>>> still the same. I don't understand antlr could ever generate code
> >>>> that looks like:
> >>>>
> >>>> if ((_t.getType()==A) && (_t.getType()==B)) {}
> >>>>
> >>>> When A is not the same as B. I'm guessing, however, that a
> >>>> treeparser
> >>>> generator is more complicated to write and probably not as often
> >>>> used/tested by antlr users as a normal parser generator (most
> >>>> people
> >>>> I've talked to here at UC Davis only use it for parsers and lexers,
> >>>> not AST traversals).
> >>>>
> >>>> Any other ideas? I appreciate your taking a look at it.
> >>>>
> >>>> -- Chris
> >>>>
> >>>> On 9/30/05, Monty Zukowski <monty at codetransform.com> wrote:
> >>>>
> >>>>
> >>>>>
> >>>>>
> >>>>> On Sep 30, 2005, at 4:57 PM, Christian Bird wrote:
> >>>>>
> >>>>> zimport :
> >>>>> #("import"
> >>>>> (name ARROW complexNameList SEMI |
> >>>>> "all" identifier SEMI) )
> >>>>> ;
> >>>>> It does seem like a code gen bug. I would recommend breaking this
> >>>>> up into
> >>>>> another rule if you can:
> >>>>>
> >>>>> zimport: #("import" importSuffix)
> >>>>> importSuffix: name ARROW complexNameList SEMI
> >>>>> | "all" identifier SEMI
> >>>>> ;
> >>>>>
> >>>>> See if that still triggers the problem.
> >>>>>
> >>>>> Monty
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Christian Bird
> >>>> cabird at gmail.com
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Christian Bird
> >> cabird at gmail.com
> >>
> >>
> >
> >
> > --
> > Christian Bird
> > cabird at gmail.com
> >
> >
> >
>
>
--
Christian Bird
cabird at gmail.com
More information about the antlr-interest
mailing list