[antlr-interest] language independant grammars

Kurt Harriger kurtharriger at comcast.net
Thu Jul 26 15:55:23 PDT 2007


I've always disliked embedding code into non-code files, in this case
grammars.   IDE's don't usually know how to interpret these source files so
you lose out on all the productivity features such as syntax validation,
intellisense, refactoring and  testing tools, and it complicates the build
process.  For starters, I have the actions delegate as much as they can to a
helper class that I added as a property on the parser to  minimize the
amount of code stored in the grammars and make testing easier.

 

One of the other things I found a bit difficult to deal with is testing
grammars for C# using AntlrWorks.  Before adding actions to the grammar file
I just needed to change the language to Java for testing and then back to
CSharp before building.  This bit me a few times when I forgot to change it
back to CSharp after testing and would take awhile before I realized that
the resulting binary was using a old version of the grammar.  I updated my
build scripts to delete the old files before running antlr to ensure build
would fail if I forgot to change the language back, but once I started
adding code to the grammar I could no longer use AntlrWorks directly.  So I
kept a version without the action code and then began copying and pasting
changes, but this wouldn't really work if you needed to implement a
predicate in code.  I saw something about using the remote debugger but
haven't really tried it yet.  

 

One thought that occurred to me was using StringTemplate to generate
different grammars for each language, which I suppose would work but ideally
I think it would be great if there was a language independent way of
invoking actions and predicates.  Rather than actually implementing the
action in the grammar you would implement a class it in your target language
and simply tell antlr the method name to invoke, stuffing the local vars
into a map/hashtable or something, additionally the parser wouldn't need to
worry about the action changing internal variables, just don't pass them
along to be changed.  By delegating to an object provided during creation
one could perhaps even use the same grammar to perform completely different
tasks using the same grammar.

 

I was also wondering how difficult it might be to collapse the switch and
if/else logic into an object and the rule would just ask it to predict the
alternative and then invoke the action for that alternative if any.  I'm
guessing some object representation probably must already exist in the tool
to generate the code in the first place.  If all the code generation could
be reduced to simply creating object instances rather than generating
language specific code it may become possible to actually create a
lexer/parser at runtime rather than compile time.  Regular expressions, for
example, are widely used for simple lexers and input validations and I think
a big part of the reason is that there isn't any need to complicate the
build process with a regular expression parser generator and you can use
them in sandboxed environments such as an internet browser.  What might one
use a runtime parser for anyway?  AntlrWorks perhaps for starters!  But also
for batch file processing  and documentation generators; there are so many
different file formats for the same basic bits of data that businesses often
write numerous programs that simply translate one file format into another,
and documentation generators should be more concerned with generating good
quality output rather than trying to create parsers for every potential
language they want to support.  These tools might allow the users to
redefine the definition of certain grammar elements (since token names would
obviously be important to the consumer it might not allow you to edit the
entire grammar, but they  could perhaps allow changes to the rule
definitions and perhaps the addition of new tokens and/or fragments) .  For
more complex customization where predicates and actions might need to be
implemented the user might create a jar or assembly file the program can
load at runtime.

 

Thoughts?

 

- Kurt

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070726/3612826d/attachment.html 


More information about the antlr-interest mailing list