[antlr-interest] internal error
Gavin Lambert
antlr at mirality.co.nz
Fri Feb 1 13:22:50 PST 2008
At 09:54 2/02/2008, Olivier Lefevre wrote:
>
>Thanks for the advice. I was pilfering the JSON grammar but
>my mental map of ANTLR is still very imperfect. Doesn't
>ANTLR still have a bug, though? It should *never* blow up.
True, but ANTLR's own error handling is a little
flaky at present since it's still using ANTLR 2.7
internally. This should improve in time.
>> I think you want this:
>> list: '[' elements? ']' -> ^(ARRAY elements?);
>
>I am unclear as to what the '?' will do in a tree rewrite rule.
The cardinality of a rewrite rule element must
match the cardinality of the actual element. In
the case above, you're telling the rewrite engine
that "elements" might not have any value, so it
needs to insert a test to check for that before
inserting it into the tree. Had you left the ?
out, it treats it as an assertion that "elements"
always has exactly one value and thus won't
include the test (for performance reasons), which
will lead to a runtime exception if it turns out
that it wasn't actually supplied.
Similarly, when matching something with + or * on
the left you should use the same in the rewrite
rule. Although it's the cardinality that's
important, not having an exact match -- for
example this rule is correct:
foo: a+=bar SEP a+=bar SEP a+=bar -> ^(BARLIST $a+);
Since even though there's no +s on the left
you're still generating 1..n values in "a".
>Expr.g in § 3.3 of the book does just that, though. Isn't that
>what the NEWLINE as a stat option is for?
I haven't seen that example, but it's generally a
bad idea to match the same set of characters from
multiple lexer rules, since it causes
ambiguity. In particular, if you have both a
NEWLINE and a WS rule (and assuming the NEWLINE
rule is listed first and only matches one newline
sequence) then NEWLINE will be matched whenever
there is a single newline in the input, but WS
will be matched instead whenever there are
multiple newlines, or newlines followed by other
whitespace (since that's the longest match
possible). In addition, you can't refer to any
hidden (or off-channel) token from the parser
directly. So what Mark was saying is true.
>Exception in thread "main"
>org.antlr.runtime.tree.RewriteEmptyStreamException:
rule elements
> at
>org.antlr.runtime.tree.RewriteRuleElementStream._next(RewriteRuleEl
>ementStream.java:158)
> at
>org.antlr.runtime.tree.RewriteRuleElementStream.next(RewriteRuleEle
>mentStream.java:145)
> at ListExprParser.list(ListExprParser.java:307)
> at ListExprParser.stat(ListExprParser.java:158)
> at ListExprParser.prog(ListExprParser.java:78)
> at Test_ListExpr.main(Test_ListExpr.java:12)
It's not that unhelpful -- it's telling you that
there's a problem with the rewrite rule in the
"list" rule. Sure, it's not a pretty message,
but it gives you all the info you need.
>list : '[' (elements)? ']'
> -> ^(ARRAY elements)
> ;
There's that cardinality mismatch again. This
will throw the exception above given an input of
"[]", since as I explained above "elements" may
not have a value but the rewrite rule is asserting that it does.
>NEWLINE:'\r'?'\n' ;
>WS: (' '|'\t'|'\n'|'\r')+ {skip();} ;
And again you've still got the newline ambiguity. Don't do that.
More information about the antlr-interest
mailing list