XML parsing (was RE: [antlr-interest] Places where Antlr can be used ....)

Fri Jun 24 04:12:16 PDT 2005

I like this and think it is superior to the parsing part of XPA. But
where is the tree transformation part?

Oliver

On 6/24/05, Scott Stanchfield <scott at javadude.com> wrote:
> > For example - I already have some completly mind boggling
> > feature planned for support xml parsing through antlr !!
> >
> > PRASHANT
> 
> FYI - I'll be releasing a beta of my XML parsing this weekend (if all goes
> as planned). It's an offshoot of ANTLR called ANTXR (ANother Tool for Xml
> Recognition), pronounced "Ant-zer". (I've copied & modified the antlreclipse
> plugin to support this as well.)
> 
> (Perhaps we should chat about what you plan and see if it makes more sense
> to integrate with ANTXR or pursue what you're planning)
> 
> 
> Basically I've modified the ANTLR syntax slightly so you can parse
> 
> <?xml version="1.0"?>
> <people>
>         <person ssn="111-11-1111">
>                 <first-name>Terence</first-name>
>                 <last-name>Parr</last-name>
>         </person>
>         <person ssn="222-22-2222">
>                 <first-name>Scott</first-name>
>                 <last-name>Stanchfield</last-name>
>         </person>
>         <person ssn="333-33-3333">
>                 <first-name>James</first-name>
>                 <last-name>Stewart</last-name>
>                 <sponge>Haha</sponge>
>                 <p>This is a <i>nested</i> other tag data</p>
>         </person>
> </people>
> 
> using the following grammar. (Note: I'm still working on the "any" tag --
> I'm trying to come up with a nice shortcut syntax, but the listed syntax is
> the verbose way of doing it.
> 
> The rules with <ruleName> automatically match the begin and end tag with
> their name. I'm still working on getting tags with dots in their names to
> work this way.
> 
> Attributes are referenced using "@attributeName" in an action.
> 
> ----------
> header {
> package com.javadude.antlr.xml.sample;
> 
> import java.util.List;
> import java.util.ArrayList;
> }
> 
> class PeopleParser extends Parser;
> 
> document returns [List results = null]
>         : results=people EOF
>         ;
> 
> <people> returns [List results = new ArrayList() ]
>         { Person p; }
>         :       (p=<person>  {results.add(p);} )*
>         ;
> 
> <person> returns [Person p = new Person() ]
>         {
>                 String first, last;
>                 p.setSsn(@ssn);
>         }
>         :       (
>                         first=<first-name>
>                         { p.setFirstName(first); }
>                 |
>                         last=<last-name>
>                         { p.setLastName(last);   }
>                 |
>                         otherTag
>                 )*
>         ;
> 
> <first-name> returns [String value=null]
>         :       pcdata:PCDATA { value = pcdata.getText(); }
>         ;
> 
> <last-name> returns [String value=null]
>         :       pcdata:PCDATA { value = pcdata.getText(); }
>         ;
> 
> otherTag
>         :       other:OTHER_TAG
>                 (       otherTag
>                 |       pcData:PCDATA
>                 )*
>                 XML_END_TAG
>         ;
> ----------
> 
> This example didn't use namespaces, but you can add something like
> 
> options {
>         xmlns="http://www.somedomain.com";
>         xmlns:stuff="http://www.crunchyfrog.com/plah/foo";
> }
> 
> and then use
> 
>   <someTag>       ("somedomain" namespace)
>   <stuff:someTag> ("crunchyfrog" namespace)
> 
> in the grammar rules.
> 
> I've been using an earlier version of this for several months with huge
> success. I plan to convert my work code to use this new grammar syntax soon
> (it uses the same constructs under the covers).
> 
> I used to have the rules look like
> 
> person options {xmlTag="person";}
>   : ...
>   ;
> 
> but I thought that was redundant.
> 
> Anyway, more when I release it.
> 
> Later,
> -- Scott
> 
> 
>