[antlr-interest] Error reporting with ANTLR tree grammar
Arthur Goldberg
goldberg at cbio.mskcc.org
Wed Nov 24 15:26:41 PST 2010
Hello All
I'm writing a parser for a fairly simple language (14 rules & 10 tokens)
that reads a description of a graph -- like this OncoPrint
<http://cbio.mskcc.org/cancergenomics-dataportal/index.do?case_set_id=gbm_3way_complete&tab_index=tab_visualize&action=Submit&genetic_profile_ids=gbm_mutations&genetic_profile_ids=gbm_cna_rae&genetic_profile_ids=gbm_mrna_zscores&case_ids=&Z_SCORE_THRESHOLD=1.0&cancer_type_id=gbm&gene_list=EGFR+ERBB2+PDGFRA+MET+KRAS+NRAS+HRAS+NF1+SPRY2+FOXO1+FOXO3+AKT1+AKT2+AKT3+PIK3R1+PIK3CA+PTEN&gene_set_choice=glioblastoma:_rtk/ras/pi3k/akt_signaling_%2817_genes%29&>
-- of cancer data and produces a data structure that will be used to
select, organize and filter the data to be shown in the graph. Users
will enter the language on our web site.
I have a working one-pass grammar, but after building it found that it's
very difficult to produce error messages in one pass. For example, one
might think that a failed semantic predicate would be a good place to
report an error, but that doesn't work because exceptions are not thrown
when predicates are hoisted and predicates are called multiple times as
the parser backtracks to find a parse. (See my previous message on use
of semantic predicates and hoisting
<http://www.antlr.org/pipermail/antlr-interest/2010-November/040091.html>.)
I simply want to say things like
"Syntax error at 'xyz' at char <c> on line <l>" // when the input
syntax is wrong (I can't say "line 1:0 no viable alternative at input
'xyz'"), and
"<input> is not a valid <type> at char <c> on line <l>" // when the
input semantics is wrong, for example when <input> should be a word that
fits a pattern that describes a genetic data type
Therefore, I'm told that one should postpone error reporting until
later, and that I need a two pass grammar -- 1) build AST, 2) walk the
tree -- to easily and accurately report errors. I've started down that
path, and have a few productions in each grammar and a driver program
that connects them and handles bits of input.
I think that I can report the syntax errors by overriding
public void displayRecognitionError(String[] tokenNames,
RecognitionException e) and
public String getErrorMessage(RecognitionException e, String[]
tokenNames)
in Phase 1,
But it isn't clear how one accesses data in the AST with the tree
grammar. That is, inside the tree grammar how do I get the data I need
to produce the semantic error message above?
Is that documented? I don't see it in The Definitive ANTLR Ref, Chap. 8
or 10.
Thanks & Thanksgiving
Arthur
--
Senior Research Scientist
Computational Biology
Memorial Sloan-Kettering Cancer Center
More information about the antlr-interest
mailing list