[antlr-interest] merry xmas and more
Terence Parr
parrt at cs.usfca.edu
Sat Dec 25 13:43:47 PST 2004
Howdy and happy holidays to y'all. :)
Just a note to say I have finished the basic automatic error recovery
stuff for v3.0. Tosses in stuff like I'm doing now for 2.7.5:
catch (RecognitionException re) {
reportError(re);
recover(FOLLOW_atom);
}
and then with support sets (with better names, as you can see) at the
bottom of the file:
public static final BitSet FOLLOW_stat = new BitSet(new
long[]{41346L});
Note that the rule invocation stack is available to you if you want it
for error messages or context-sensitive error recovery (thanks to Java
1.4.2's stack trace access). For example, here is the crappy but
info-laden error reporting:
[stat, stat, expr, mexpr, atom]: line 5 atom : ( INT | ID | '(' expr
')' | "maxint" ); state 0 (decision=5) no viable alt; token=[;/<5>,4:8]
It prints stack trace, the grammar location, state/decision information
and current token. I have isolated the error reporting mechanism from
the parser so that strings are not computed and passed around. Now,
the exception objects just track info and reportError can compute error
messages in whatever language. [ANTLR itself will automatically be
generating localized error messages if an error template file exists
for your language].
Next step is to make the tool a little more robust. Then DFA
optimization. Then tree construction. Then tree parsers. :)
Hooray! I'll be damned if this suckers isn't starting to look like a
parser generator. :)
Ter
PS runtime package has only about 1000 lines of java code :)
PPS for those interested, i include the comment / class def for
RecognitionException here:
/** The root of the ANTLR exception hierarchy.
*
* To avoid English-only error messages and to generally make things
* as flexible as possible, these exceptions are not created with
strings,
* but rather the information necessary to generate an error. Then
* the various reporting methods in Parser and Lexer can be overridden
* to generate a localized error message. For example, MismatchedToken
* exceptions are built with the expected token types.
* So, don't expect getMessage() to return anything.
*
* Note that as of Java 1.4, you can access the stack trace, which
means
* that you can compute the complete trace of rules from the start
symbol.
* This gives you considerable context information with which to
generate
* useful error messages.
*
* ANTLR generates code that throws exceptions upon recognition error
and
* also generates code to catch these exceptions in each rule. If you
* want to quit upon first error, you can turn off the automatic error
* handling mechanism. If you want an ANTLR-generated recognizer to
bail
* out after another kind of error, then just throw an exception not
* under this hierarchy.
*
* In general, the recognition exceptions track where in a grammar an
* problem occurred and/or what was the expected input. The parser
* knows its state (such as current input symbol and line info) so
* this information is left out of the exception and left to the
reporting
* method(s) to fill in. You might want to have an error report an
* entire line of input not just a single token, for example. Better
to
* just say the recognizer had a problem and then let the parser report
* where the heck it was.
*/
public class RecognitionException extends Exception {
}
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
More information about the antlr-interest
mailing list