[antlr-interest] Embedding one language within another
Wincent Colaiuta
win at wincent.com
Mon Apr 16 11:20:50 PDT 2007
Hi all,
I'm trying to write a recognizer for a Cheetah-like[1] templating
language which effectively allows one language to be "embedded"
within another. Templates are mostly plain text and only a few tokens
have special meaning (directives starting with "#", placeholders
starting with "$", and escape sequences starting with "\"). So that
much is easy to lex/parse. The tricky bit is that many directives
take Ruby expressions as parameters, and that means I have to parse
at least a subset of Ruby as well.
I have a working prototype which is itself written in Ruby[2] but it
is both slow and memory hungry (due to memoization) so I am now
looking to re-implement the parser in compiled language, specifically
using ANTLR targeting C so that I can incorporate the generated
parser into a Ruby extension.
I'm new to ANTLR and have only been working on this for the last 24
hours; I've read as much of the new ANTLR book as I can but I'm not
really sure what the best approach is... My original pre-ANTLR
implementation uses an integrated lexer/parser (not separate phases)
and so can easily switch between Ruby and not-Ruby modes. But given
that ANTLR uses two separate phases I am not aware of how to proceed:
what constitutes a token is context-dependent depending on what the
preceding tokens are; for example in the main body of the template
"foo.bar" has no special meaning at all, but inside a Ruby section it
is a message send (message "bar" sent to object "foo").
My lexer rules are starting to look nastily complicated and parser-
like; in the end there'll be nothing left in the parser! Can I write
two lexers and switch to the right one depending on what tokens
arrive on the input stream? Is it likely that I'll be able to do this
with a single lexer if I very carefully prioritize my rules (rule
precendence is determined by order of appearance in the grammar file,
right?). Is there some other way around this issue that I haven't
thought of yet? I've seen some posts in the archives about parsing
"here documents", which is a similar issue, but the posts in the
archives are very old and I'm not sure how things stand in ANTLR v3.
Cheers,
Wincent
[1] http://cheetahtemplate.org/
[2] http://walrus.wincent.com/
More information about the antlr-interest
mailing list