[antlr-interest] how do you make Antlr work with a recursively changing input stream

Mark Wright markwright at internode.on.net
Thu Jul 1 05:14:45 PDT 2010


On Wed, Jun 30, 2010 at 12:40:44PM -0700, Alex McMains wrote:
> Thanks Mark.  I appreciate the response.  However, I'm not sure if I've made
> myself quite clear in what I'm trying to do.  I read the links you gave me,
> and I think I understand at least the basics of using a symbol table.  I
> implemented a rudimentary C compiler in college using flex/bison, but that was
> 12 years ago.  Plenty long enough for me to forget the details.
>  
> However, unless I can either call Antlr recursively (is this possible?), or
> change the stream that Antlr is parsing as I find substitutions, then I don't
> see how a symbol table will help me.  I get the feeling that I'm missing
> something completely obvious.  I'm sure I'm not the only person who has ever
> tried to do what I'm trying to do.
>  
> My problem isn't forward references in this case.  My problem is that the data
> stream is discovered dynamically by following a table of substitutions.  The
> tables are similar to a relational database.
>  
> I have a hand-coded parser using regular expressions that I've written that
> handles all of this, but it's somewhat hard to maintain (even for me the
> person who wrote it).
>  
> I would like Antlr to both tell me whether or not I need to substitute at any
> given node based on a grammar and then recurse over the substituted data and
> finally parse the leaf nodes using the grammar.
>   
> Here's a slightly more complex example than I gave before:
>  
> xml = Parse(Person); /* I pass in the table to start with - that's it.  The
>                         original stream only contains the Person table representation.  The rest is
>                         determined dynamically. */
> 
> /* Here's how the tables might be layed out. */
> 
> Person (table name)
> _Identifier_       {Name}        {Address}
> Person1          15555            15555       
> Person2          14444            14444
>  
> Name (table name)
> _Identifier_      Salutation    FirstName    LastName
> 15555               Mr             Joe         Blow
> 14444               Mrs            Jane        Doe
>  
> Address (table name)
> _Identifier_     Street      City        State      {Neighborhood}
> 15555            Main      Bozeman        MT           Stirling
> 14444           Second    Los Angeles     CA           Brentwood
>  
> Neighborhood (table name)
> _Identifier_           Name                 YearlyDues
> Stirling            Stirling Meadows          400
> Brentwood           Brentwood Estates         4000
>  
> Here's what the XML would look like - remember I only passed in the Person
> table representation as the original stream, the rest was determined
> dynamically as I look at each node and decide whether or not to perform a
> substitution.  Currently these substitutions are designated with the {} around
> the table's column header.
>  
> <Person1>
>   <Name>
>     <Salutation>Mr</Salutation>
>     <FirstName>Joe</FirstName>
>     <LastName>Blow</LastName>
>   </Name>
>   <Address>
>     <Street>Main</Street>
>     <City>Bozeman</City>
>     <State>MT</State>
>     <Neighborhood>
>       <Name>Stirling Meadows</Name>
>       <YearlyDues>400</YearlyDues>
>     </Neighborhood>
>   </Address>
> </Person1>
> <Person2>
>   <Name>
>     <Salutation>Mrs</Salutation>
>     <FirstName>Jane</FirstName>
>     <LastName>Doe</LastName>
>   </Name>
>   <Address>
>     <Street>Second</Street>
>     <City>Los Angeles</City>
>     <State>CA</State>
>     <Neighborhood>
>       <Name>Brentwood Estates</Name>
>       <YearlyDues>4000</YearlyDues>
>     </Neighborhood>
>   </Address>
> </Person2>
>  
> I'm sorry if this is too long.  Maybe there is something obvious I
> am missing here.  If you still think a symbol table is warranted,
> could you please write me some psuedocode that shows how I can get
> Antlr to parse the dynamic data recursively.
>  
> Thanks!
>  
>          -- alex

Hi Alex,

Some approaches to writing compilers/DSLs:

Syntax Directed Translation
===========================

On the cover of old dragon books, Principles of Compiler Design, and
Compilers Princples, Techniques and Tools, by Aho et al, it has
a picture of a knight with a shield, encsribed on the shield it says
"Syntax Directed Translation".  This simple idea is implemented in
ANTLR by writing a compiler or DSL (Domain Specific Language) in
two passes:

pass 1
------

Parses the grammar, outputing an abstract syntax tree.

pass 2
------

It just walks the abstract sytax tree, which is the syntax directed translation.
The abstract syntax tree could be walked with:

- a tree grammar
- visitor pattern

Then presumably it outputs something.  A neat option that ANTLR supports is
to output some string templates.

Which all seems fine if: 

(i)  if it is actually convenient to generate the output by walking over
     the abstract syntax tree that is based on the syntax of the input language.

(ii) you happen to have all the information at hand to generate the output
     when visiting the tree nodes in the tree grammar.

This is a very common approach, so of course there are ways to deal with gathering
missing information:

- add another pass.  An earlier pass could add information to a symbol table.
  Then a later pass can look up information from the symbol table and add this
  this information to the abstract syntax tree.  Or it could write a new
  abstract syntax tree based on the earlier abstract syntax tree and the new
  information.

- using attribute grammars.

- with the ANTLR java client, there is a tree wizard which can do pattern matching
  like in functional programming languages.

- using pattern matching with functional programming languages.

Walking the Symbol Table
========================

If the initial abstract syntax tree from the syntax directed translation approach
does not seem convenient for performing the compiler/DSL output, and/or it
seems like too much work to obtain an annotated abstract syntax tree in the
correct format, then an alternative idea is to generate a symbol table, then
walk the symbol table.

This seems to work OK when the symbol table is not too complex.

That is the problem with it though, when the symbol table is complex,
it seems too difficult to either:

- apply the visitor pattern.  There is a hierarchical visitor pattern which
  may help:

http://c2.com/cgi/wiki?HierarchicalVisitorPattern

- or to write nested loops doing external iteration over the symbol table.

Now I only know a little about your application, you could be
simplying it in order to describe it for all I know.  So it
seems to me that this approach might work for your application if its
not too complex.

I used this approach, and made it work.  I walked the symbol table
and output a new tree grammar, which was nothing like the input
language.  I walked it in the correct order for the code generation,
and output string templates.

However the resulting symbol table walk code and tree grammar generationg
looked too complex.  So I'm working on rewriting it using type theory and
functional programming languages.

Regards, Mark
   
> --- On Tue, 6/29/10, Mark Wright <markwright at internode.on.net> wrote:
> 
> 
> From: Mark Wright <markwright at internode.on.net>
> Subject: Re: [antlr-interest] how do you make Antlr work with a recursively changing input stream
> To: antlr-interest at antlr.org
> Date: Tuesday, June 29, 2010, 5:29 PM
> 
> 
> On Tue, Jun 29, 2010 at 03:18:55PM -0700, Alex McMains wrote:
> > Hi all,
> > 
> > I'm new to Antlr.  I've bought and read portions of the Antlr book,
> > and I've read dozens of postings and tutorials, but I still can't
> > see how to do deal with a recursively changing input stream in
> > Antlr.  Do I somehow use TokenRewriteStream, or am I missing
> > something?  
> 
> Hi Alex,
> 
> Yes, you are missing something, the Antlr book does not say much about
> symbol tables or type systems.
> 
> For simple type systems, it is probably easier to just use the
> approach which is called a "symbol table" in old school books
> on compiler construction.  A really good description of this
> approach is in chapter 7 of the book Language Implementation Patterns
> by Terence Parr.  An introduction to this approach is at:
> 
> http://www.antlr.org/wiki/display/CS652/Symbol+tables
> 
> You obviously need to implement forward references scopes.
> 
> Mainly for the mailing list archive, I would just like to say that I
> find this symbol table approach seems difficult to use for
> complex type systems.  For complex type systems the approach I
> recommend is described in the book "Types and Programming Languages"
> by Benjamin Pierce, the online book "Software Foundations", see the
> course:
> 
> http://www.seas.upenn.edu/~cis500/current/index.html
> 
> and the book "Certified Programming with Dependent Types" by
> Adam Chlipala:
> 
> http://adam.chlipala.net/cpdt/
> 
> Regards, Mark
> 
> > Here's the situation:
> > 
> > I start with an input stream.  As I move through the input I will
> > either encounter something that can be parsed directly, or I will
> > encounter something that tells me to go to a table and substitute an
> > entire row of the table at the node where I currently am.  This can
> > continue to happen recursively since each field from the substituted
> > row can again call for a substitution. 
> >  
> > Here's an example:
> >  
> > Person table:
> >  Identifier     Name   {Address}
> > Person1        Jon       Jon's Address
> >  
> > Address table:
> >  Identifier      HouseNumber  StreetName
> > Jon's Address   3477               Blue Lane
> >  
> > The {} around Address says to name the current node "Address" and
> > substitute the current value for whatever is at "Jon's Address" in
> > the Address table.
> >  
> > Eventually this will become XML that would look like:
> >  
> > <Person>
> >     <Name>Jon</Name>
> >     <Address>
> >          <HouseNumber>3477</HouseNumber>
> >          <StreetName>Blue Lane</StreetName>
> >      </Address>
> > </Person>
> >  
> > Any ideas would be appreciated.
> >  
> > Thanks.
> >  
> >        -- alex
> > 
> > 
> >       
> > 
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >   


More information about the antlr-interest mailing list