[antlr-interest] Multiple-target parsers, and extending without overriding

Tue Jan 4 12:33:28 PST 2011

Hi-

I'm new to ANTLR, and I'm trying to evaluate its suitability for a project
I'm working on. I'd appreciate help with a few questions:

- What is the status of C++ support? The wiki indicates that C++ support is
coming "later in 2008", but this is obviously out of date.

- One goal of the project is to provide cross-platform parsing and unparsing
support, i.e. to generate parsers and unparsers in multiple target languages
(primarily C++ and Java) from a single representation of the grammar. As far
as I can see, the only way to accomplish this in ANTLR is to provide a
grammar with AST output type which uses only rewrite rules and AST operators
(and, for unparsing, a tree grammar with template output type), with no
target-language code at all. However, I'm not sure this is feasible; many
ANTLR features (e.g. attributes and predicates, custom error handling) and
techniques (e.g. implementing case insensitivity or keywords-as-identifiers)
require use of a specific target language. Is this approach workable? Are
there better options I'm overlooking?

- Another goal of the project is to provide a unified parsing framework for
a family of closely related but distinct languages (specifically, SQL
dialects). We want to be able to express the language grammars in terms of
an inheritance hierarchy, where each language (other than the base) is
specified in terms of its differences from the parent language. This seems
like a natural fit for ANTLR's support for composite grammars, but I see two
drawbacks with that approach: first, the languages may differ in both
lexical structure and syntax; since combined grammars cannot inherit from
other combined grammars, this seems to imply that we'd need to maintain
separate, parallel hierarchies of lexer and parser grammars, which are
combined only in the leaves. Is there a cleaner solution? Second, the
composition mechanism doesn't seem to support extending a grammar with new
productions; only overriding existing productions. For example, if I have:

----
parser grammar Base;

a: b
----
parser grammar Derived;
import Base;

a: c
----

then as I understand it a -> b is not a rule of the derived language. I can
of course explicitly re-add it:

----
parser grammar Derived;
import Base;

a: b | c
----

but this defeats much of the purpose of the inheritance, by duplicating code
between the base and derived grammars. Is there any way for a derived
grammar to introduce a new production without overriding any existing
productions in the base grammar? Is there a better way to solve this kind of
problem?