[antlr-interest] v3: How could I construct a parser for an "active" language such as ASP.Net, PHP or (in my case) Active RTF?

Thu Apr 5 00:57:47 PDT 2012

I've taken over a product that produces reports using RTF (Rich Text
Format) with embedded codes where processing is required.  We need to make
some enhancements to it, and I'd like to use Antlr to parse these files -
the enhancements will be a lot easier with a sane parse tree.

My key problem is that parts of the codes might appear in the body of the
RTF text.  To take a terse example that demonstrates the problem:

*/bs row/*For the row */*rowid:int/*: */*num:roundu/* / */*denom:roundu/* =
*/*result:roundu//bf/*

The "active" parts are /**var*:*format*/, /bs *name*/ and /bf/.  Everything
else can be arbitrary text or RTF directives.  However, that "everything
else" can include content like /, :, and text that matches the rules for
identifiers.

My ideal would be to somehow construct a lexer rule that lexes an
"EVERYTHINGELSE" token including slashes, colons, identifiers and the like
but that stops just before it sees /*, /bs, or /bf, and then to somehow
enable the more complex parsing of identifiers and the like inside the
active part of the content.  However my Google-fu is failing me and, though
I'm sure it's possible, I've not yet managed to find a way of solving this
problem using Antlr.

Can any of the list members point me towards a possible solution, or an
approach within Antlr that could help?

Thanks in advance,

- Peter
--
Peter Crowther, Director, Melandra Limited