[antlr-interest] internal error

Gavin Lambert antlr at mirality.co.nz
Fri Feb 1 13:22:50 PST 2008


At 09:54 2/02/2008, Olivier Lefevre wrote:
 >
 >Thanks for the advice. I was pilfering the JSON grammar but
 >my mental map of ANTLR is still very imperfect. Doesn't
 >ANTLR still have a bug, though? It should *never* blow up.

True, but ANTLR's own error handling is a little 
flaky at present since it's still using ANTLR 2.7 
internally.  This should improve in time.

 >> I think you want this:
 >> list: '[' elements? ']' -> ^(ARRAY elements?);
 >
 >I am unclear as to what the '?' will do in a tree rewrite rule.

The cardinality of a rewrite rule element must 
match the cardinality of the actual element.  In 
the case above, you're telling the rewrite engine 
that "elements" might not have any value, so it 
needs to insert a test to check for that before 
inserting it into the tree.  Had you left the ? 
out, it treats it as an assertion that "elements" 
always has exactly one value and thus won't 
include the test (for performance reasons), which 
will lead to a runtime exception if it turns out 
that it wasn't actually supplied.

Similarly, when matching something with + or * on 
the left you should use the same in the rewrite 
rule.  Although it's the cardinality that's 
important, not having an exact match -- for 
example this rule is correct:

foo: a+=bar SEP a+=bar SEP a+=bar -> ^(BARLIST $a+);

Since even though there's no +s on the left 
you're still generating 1..n values in "a".

 >Expr.g in § 3.3 of the book does just that, though. Isn't that
 >what the NEWLINE as a stat option is for?

I haven't seen that example, but it's generally a 
bad idea to match the same set of characters from 
multiple lexer rules, since it causes 
ambiguity.  In particular, if you have both a 
NEWLINE and a WS rule (and assuming the NEWLINE 
rule is listed first and only matches one newline 
sequence) then NEWLINE will be matched whenever 
there is a single newline in the input, but WS 
will be matched instead whenever there are 
multiple newlines, or newlines followed by other 
whitespace (since that's the longest match 
possible).  In addition, you can't refer to any 
hidden (or off-channel) token from the parser 
directly.  So what Mark was saying is true.

 >Exception in thread "main"
 >org.antlr.runtime.tree.RewriteEmptyStreamException: 
rule elements
 >         at
 >org.antlr.runtime.tree.RewriteRuleElementStream._next(RewriteRuleEl
 >ementStream.java:158)
 >         at
 >org.antlr.runtime.tree.RewriteRuleElementStream.next(RewriteRuleEle
 >mentStream.java:145)
 >         at ListExprParser.list(ListExprParser.java:307)
 >         at ListExprParser.stat(ListExprParser.java:158)
 >         at ListExprParser.prog(ListExprParser.java:78)
 >         at Test_ListExpr.main(Test_ListExpr.java:12)

It's not that unhelpful -- it's telling you that 
there's a problem with the rewrite rule in the 
"list" rule.  Sure, it's not a pretty message, 
but it gives you all the info you need.

 >list : '[' (elements)? ']'
 >        -> ^(ARRAY elements)
 >     ;

There's that cardinality mismatch again.  This 
will throw the exception above given an input of 
"[]", since as I explained above "elements" may 
not have a value but the rewrite rule is asserting that it does.

 >NEWLINE:'\r'?'\n' ;
 >WS: (' '|'\t'|'\n'|'\r')+ {skip();} ;

And again you've still got the newline ambiguity.  Don't do that.



More information about the antlr-interest mailing list