[antlr-interest] Error reporting with ANTLR tree grammar

Wed Nov 24 15:26:41 PST 2010

Hello All

I'm writing a parser for a fairly simple language (14 rules & 10 tokens) 
that reads a description of a graph -- like this OncoPrint 
<http://cbio.mskcc.org/cancergenomics-dataportal/index.do?case_set_id=gbm_3way_complete&tab_index=tab_visualize&action=Submit&genetic_profile_ids=gbm_mutations&genetic_profile_ids=gbm_cna_rae&genetic_profile_ids=gbm_mrna_zscores&case_ids=&Z_SCORE_THRESHOLD=1.0&cancer_type_id=gbm&gene_list=EGFR+ERBB2+PDGFRA+MET+KRAS+NRAS+HRAS+NF1+SPRY2+FOXO1+FOXO3+AKT1+AKT2+AKT3+PIK3R1+PIK3CA+PTEN&gene_set_choice=glioblastoma:_rtk/ras/pi3k/akt_signaling_%2817_genes%29&> 
-- of cancer data and produces a data structure that will be used to 
select, organize and filter the data to be shown in the graph. Users 
will enter the language on our web site.

I have a working one-pass grammar, but after building it found that it's 
very difficult to produce error messages in one pass. For example, one 
might think that a failed semantic predicate would be a good place to 
report an error, but that doesn't work because exceptions are not thrown 
when predicates are hoisted and predicates are called multiple times as 
the parser backtracks to find a parse. (See my previous message on use 
of semantic predicates and hoisting 
<http://www.antlr.org/pipermail/antlr-interest/2010-November/040091.html>.)

I simply want to say things like
"Syntax error at 'xyz' at char <c> on line <l>"   // when the input 
syntax is wrong (I can't say "line 1:0 no viable alternative at input 
'xyz'"), and
"<input> is not a valid <type> at char <c> on line <l>"   // when the 
input semantics is wrong, for example when <input> should be a word that 
fits a pattern that describes a genetic data type

Therefore, I'm told that one should postpone error reporting until 
later, and that I need a two pass grammar -- 1) build AST, 2) walk the 
tree -- to easily and accurately report errors. I've started down that 
path, and have a few productions in each grammar and a driver program 
that connects them and handles bits of input.

I think that I can report the syntax errors by overriding
    public void displayRecognitionError(String[] tokenNames, 
RecognitionException e) and
    public String getErrorMessage(RecognitionException e, String[] 
tokenNames)
in Phase 1,

But it isn't clear how one accesses data in the AST with the tree 
grammar. That is, inside the tree grammar how do I get the data I need 
to produce the semantic error message above?

Is that documented? I don't see it in The Definitive ANTLR Ref, Chap. 8 
or 10.

Thanks & Thanksgiving
Arthur

-- 
Senior Research Scientist
Computational Biology
Memorial Sloan-Kettering Cancer Center