[antlr-interest] Multiple-target parsers, and extending without overriding

Tue Jan 4 13:58:45 PST 2011

On Tue, Jan 4, 2011 at 12:59 PM, Jim Idle <jimi at temporal-wave.com> wrote:

> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Geoffrey Romer
> > Sent: Tuesday, January 04, 2011 12:33 PM
> > To: antlr-interest
> > Cc: theov at google.com
> > Subject: [antlr-interest] Multiple-target parsers, and extending
> > without overriding
> >
> > Hi-
> >
> > I'm new to ANTLR, and I'm trying to evaluate its suitability for a
> > project I'm working on. I'd appreciate help with a few questions:
> >
> > - What is the status of C++ support? The wiki indicates that C++
> > support is coming "later in 2008", but this is obviously out of date.
>
>
> Compile the C output as C++, keep custom code and actions entirely out of
> the parser and produce AST outputs.
>

Are there any plans for native C++ target support (classes, RAII, all that
good stuff)?

>
> >
> > - One goal of the project is to provide cross-platform parsing and
> > unparsing support, i.e. to generate parsers and unparsers in multiple
> > target languages (primarily C++ and Java) from a single representation
> > of the grammar. As far as I can see, the only way to accomplish this in
> > ANTLR is to provide a grammar with AST output type which uses only
> > rewrite rules and AST operators (and, for unparsing, a tree grammar
> > with template output type), with no target-language code at all.
>
> Actually I use source code control for this. Start with a base definition
> of all your grammars without any actions, then branch to specific targets
> and add any target specific code.
>

But once you branch it, you no longer have a single representation of the
grammar, you have two representations which happen to have a common ancestor
somewhere in their revision history

>
> > However, I'm not sure this is feasible; many ANTLR features (e.g.
> > attributes and predicates, custom error handling) and techniques (e.g.
> > implementing case insensitivity or keywords-as-identifiers) require use
> > of a specific target language. Is this approach workable? Are there
> > better options I'm overlooking?
>
> You keep all such code outside the grammar and within your application
> code. There are few differences if you do that. Also use:
>
> id: ID | KEYWORD1 | KEYWORD2 ... etc;
>
> And not comparison code on ID to workout keywords.
>
> >
> > - Another goal of the project is to provide a unified parsing framework
> > for a family of closely related but distinct languages (specifically,
> > SQL dialects).
>
> You and everyone remotely interested in SQL ;)
>
> > We want to be able to express the language grammars in
> > terms of an inheritance hierarchy, where each language (other than the
> > base) is specified in terms of its differences from the parent
> > language. This seems like a natural fit for ANTLR's support for
> > composite grammars, but I see two drawbacks with that approach: first,
>
> SQL dialects are not really compatible enough to do that.
>

I haven't told you what dialects I'm interested in ;-)

>
> > the languages may differ in both lexical structure and syntax; since
> > combined grammars cannot inherit from other combined grammars, this
> > seems to imply that we'd need to maintain separate, parallel
> > hierarchies of lexer and parser grammars, which are combined only in
> > the leaves. Is there a cleaner solution? Second, the composition
> > mechanism doesn't seem to support extending a grammar with new
> > productions; only overriding existing productions.
>
>
> Yes. Source code control is a better option here. Especially if you use
> one that is good at branches such as perforce.
>
> Jim
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>