[antlr-interest] Multiple-target parsers, and extending without overriding

Jim Idle jimi at temporal-wave.com
Tue Jan 4 12:59:06 PST 2011


> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Geoffrey Romer
> Sent: Tuesday, January 04, 2011 12:33 PM
> To: antlr-interest
> Cc: theov at google.com
> Subject: [antlr-interest] Multiple-target parsers, and extending
> without overriding
>
> Hi-
>
> I'm new to ANTLR, and I'm trying to evaluate its suitability for a
> project I'm working on. I'd appreciate help with a few questions:
>
> - What is the status of C++ support? The wiki indicates that C++
> support is coming "later in 2008", but this is obviously out of date.


Compile the C output as C++, keep custom code and actions entirely out of
the parser and produce AST outputs.

>
> - One goal of the project is to provide cross-platform parsing and
> unparsing support, i.e. to generate parsers and unparsers in multiple
> target languages (primarily C++ and Java) from a single representation
> of the grammar. As far as I can see, the only way to accomplish this in
> ANTLR is to provide a grammar with AST output type which uses only
> rewrite rules and AST operators (and, for unparsing, a tree grammar
> with template output type), with no target-language code at all.

Actually I use source code control for this. Start with a base definition
of all your grammars without any actions, then branch to specific targets
and add any target specific code.

> However, I'm not sure this is feasible; many ANTLR features (e.g.
> attributes and predicates, custom error handling) and techniques (e.g.
> implementing case insensitivity or keywords-as-identifiers) require use
> of a specific target language. Is this approach workable? Are there
> better options I'm overlooking?

You keep all such code outside the grammar and within your application
code. There are few differences if you do that. Also use:

id: ID | KEYWORD1 | KEYWORD2 ... etc;

And not comparison code on ID to workout keywords.

>
> - Another goal of the project is to provide a unified parsing framework
> for a family of closely related but distinct languages (specifically,
> SQL dialects).

You and everyone remotely interested in SQL ;)

> We want to be able to express the language grammars in
> terms of an inheritance hierarchy, where each language (other than the
> base) is specified in terms of its differences from the parent
> language. This seems like a natural fit for ANTLR's support for
> composite grammars, but I see two drawbacks with that approach: first,

SQL dialects are not really compatible enough to do that.

> the languages may differ in both lexical structure and syntax; since
> combined grammars cannot inherit from other combined grammars, this
> seems to imply that we'd need to maintain separate, parallel
> hierarchies of lexer and parser grammars, which are combined only in
> the leaves. Is there a cleaner solution? Second, the composition
> mechanism doesn't seem to support extending a grammar with new
> productions; only overriding existing productions.


Yes. Source code control is a better option here. Especially if you use
one that is good at branches such as perforce.

Jim


More information about the antlr-interest mailing list