[antlr-interest] Composite Grammars

Benjamin S Wolf jokeserver at gmail.com
Fri Dec 23 21:49:13 PST 2011

I've gotten some very strange errors while trying to make a composite
grammar, and I think I've figured out why and/or a way around it. I'm
posting this because the error messages were not that helpful on their
own, and I had to fool around for a while with a minimal test case
until I found a way out of the errors.

I have a composite grammar G, which imports two disjoint lexer
grammars A and B, and a parser grammar C (which only requires the
tokens from A). Using antlr3.4 on G with varying subsequent changes
gives one of the following sets of errors, regardless of output option
or language.

1. G has no rules.

2. parser rule ... not allowed in lexer, lexer rule ... not allowed in
parser, etc.

3. java.lang.ClassCastException: org.antlr.runtime.tree.CommonTree
cannot be cast to org.antlr.tool.GrammarAST.

The short answer (before I go into details below) is that a) G needs a
parser rule, not just lexer rules, and b) G should only import one
lexer grammar, and the others should be imported by that one.
Strangely, b) does not apply to parser grammars, as I added a second
parser grammar D (dependent on both A and B) to test, and G is fine*
either way.

The long story: When I encountered (1), I added a dummy lexer rule
"COMMA : ',' ;". This cured G's lack of rules but now antlr3.4 was
giving me (2), where it seemed that antlr3 thought I was putting all
of A's lexer rules in C and all of C's parser rules in A (and B,
etc.). Since I had no rules dependent on B, I removed it from being
imported. With G importing only A and C, I was now getting (3). I
added the rule "comma : COMMA ;" to G and now antlr3 completed
successfully (and still did when I folded these two rules together
into "comma : ',' ;"). So I added B back to the import list from G,
and it gave me (2) again. But removing B from G's import list and
making A import it made it work fine.

So antlr3 successfully produces a recognizer for G when G imports A,
C, and D, where A imports B, or when G imports B, C, and D, and B
imports A**.

I am not sure of the root reason behind the inability of the top level
of a composite grammar to import two lexer grammars (whether a design
decision or bug, eg.) as none of the documentation I could find on
composite grammars indicates either that this is the case or should be
otherwise. I would have liked a better error message in place of (2),
at least for the case where G had a lexer rule but not a parser rule,
because it would have saved a little bit of stumbling around.

*By "fine" I mean antlr3 finishes successfully. But if G doesn't
import B, then the generated lexer can't produce tokens defined in B
and so the rules in D can't be reached.

**Unless you're like me, and have an unfortunately large lexer grammar
B, which causes antlr3 to run out of stack space if G imports A
imports B but not if G imports B imports A.

More information about the antlr-interest mailing list