[antlr-interest] Re: How to make antlr parse this
Mark
markl at glyphic.com
Sat Apr 17 13:12:56 PDT 2004
> Ah yes, now I see the error of my ways. In the case you show I want
> just the last c to match typeDefinition. So my grammar is more like
> this:
> compilationUnit
> :
> ( (javadocComment)* packageDefinition ) => (javadocComment)*
> packageDefinition // Thanks Loring that took care of the ones above
> packages
> | /* nothing */
> )
> (javadocComment | import)*
> ( (invariantCondition)? typeDefinition)*
> EOF
>
> Of course antlr still informs me that this is non-deterministic, likely
> due to k being 2 and javadocComment being longer than 2 tokens. Am I
> right on that one?
Nope. It is still inherently non-deterministic:
u : (((c)* p) => c* p | ) (c|i)* ((c)? t)*
again, c c c c t
does (c|i)* match three or four of the comments, leaving one or zero for (c)?
Also, the syntatic predicate does nothing: Let's see why:
I'm going to rewrite your grammar somewhat for illustration:
unit : start middle end ;
start : (comment)* package | empty ;
middle : (comment | import)* ;
end : ((invariant)? typedef)* ;
invariant : comment { some code here } ;
empty : ;
Notice that invariant is just comment, which to the parser it is, except that it does some
semantic processing.
This grammar doesn't treat comments matched in start or comments matched in middle
any differently. In other words, if you see three comments at the start of an input, you're
not going to care if they eventually match the comment reference in start or the reference
in middle. Because of the way you written the grammar, it is the presence of a package,
**after a potentially huge number of comments** that determines if those comments
should match in start or middle. The syntactic predicate helps by effectively doing a run-
time large look ahead. But, in the end you don't care where they get matched.
So, the first re-write of the grammar is:
unit: pre start middle end ;
pre: (comment)* ;
start : (package (comment)*)? ;
middle : (import (comment)*)* ;
end : ((invariant)? typedef)* ;
invariant : comment { some code here } ;
empty : ;
This parses the same grammar, but it pushes as many comments into pre as it possible
can, then as many into start as it can, etc... Since you don't need to semantically treat
these comments differently, it doesn't matter where they get matched. Notice that each
rule that is optional ( (...)? or (...)* rules) starts with a fixed length unique sequence so the
parser can tell if that rule is to be used.
Now the problem with this grammar is that sequence that matches invariant before the
typedef will also match the comment in any of pre, start, or middle. And since we've made
the grammar be hungry and match as much as possible, this grammar will never see the
optional invariant in the end rule - it will have been eaten by some other (comment)*
sequence. This is legal based on the grammar, since it is non-deterministic, but it is not
how you want to resolve the non-determinism.
HERE is the case to use a syntactic predicate: You are trying to make a parse decision (is it
an invariant or a comment) based on syntax that **follows** the sequence in question.
The solution is that we don't want any of the (comment)* sequences to match the last
comment if it followed by a typedef, because we want to resolve the non-determinism in
favor of matching the invariant. Which leads us to:
unit: pre start middle end ;
pre: fluff ;
start : (package fluff)? ;
middle : (import fluff)* ;
end : ((invariant)? typedef)* ;
fluff :
(comment)=> ( (invariant typedef)=> empty | comment fluff )
| empty ;
invariant : comment { some code here } ;
empty : ;
So, fluff is a sequence of zero or more comments, but doesn't include the last comment
that could be an invariant if if is followed by a typedef. Alas, this is tail recursive, but that
may or may not be a concern for you. (Do you expect to parse things with thousands of
consecutive comment blocks?)
To get rid of tail recursion, we want something like this:
fluff : ((comment comment)=> comment)* flufftail
flufftail : (comment)=> ( (invariant typedef)=> empty | comment ) | empty ;
But, alas, Antlr doesn't support syntactic predicates as predictors of (...)* loops.
(Terence, are you listening?)
- Mark
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list