XML parsing (was RE: [antlr-interest] Places where Antlr can be used ....)

Scott Stanchfield scott at javadude.com
Thu Jun 23 18:38:49 PDT 2005


> For example - I already have some completly mind boggling 
> feature planned for support xml parsing through antlr !!
> 
> PRASHANT

FYI - I'll be releasing a beta of my XML parsing this weekend (if all goes
as planned). It's an offshoot of ANTLR called ANTXR (ANother Tool for Xml
Recognition), pronounced "Ant-zer". (I've copied & modified the antlreclipse
plugin to support this as well.)

(Perhaps we should chat about what you plan and see if it makes more sense
to integrate with ANTXR or pursue what you're planning)


Basically I've modified the ANTLR syntax slightly so you can parse

<?xml version="1.0"?>
<people>
	<person ssn="111-11-1111">
		<first-name>Terence</first-name>
		<last-name>Parr</last-name>
	</person>
	<person ssn="222-22-2222">
		<first-name>Scott</first-name>
		<last-name>Stanchfield</last-name>
	</person>
	<person ssn="333-33-3333">
		<first-name>James</first-name>
		<last-name>Stewart</last-name>
		<sponge>Haha</sponge>
		<p>This is a <i>nested</i> other tag data</p>
	</person>
</people>

using the following grammar. (Note: I'm still working on the "any" tag --
I'm trying to come up with a nice shortcut syntax, but the listed syntax is
the verbose way of doing it.

The rules with <ruleName> automatically match the begin and end tag with
their name. I'm still working on getting tags with dots in their names to
work this way.

Attributes are referenced using "@attributeName" in an action.

----------
header {
package com.javadude.antlr.xml.sample;

import java.util.List;
import java.util.ArrayList;
}

class PeopleParser extends Parser;

document returns [List results = null]
	: results=people EOF
	;

<people> returns [List results = new ArrayList() ]
	{ Person p; }
	:	(p=<person>  {results.add(p);} )*
	;

<person> returns [Person p = new Person() ]
	{
		String first, last;
		p.setSsn(@ssn);
	}
	:	(	
			first=<first-name>
			{ p.setFirstName(first); }
		|	
			last=<last-name>
			{ p.setLastName(last);   }
		|	
			otherTag
		)*
	;
	
<first-name> returns [String value=null]
	:	pcdata:PCDATA { value = pcdata.getText(); }
	;
	
<last-name> returns [String value=null]
	:	pcdata:PCDATA { value = pcdata.getText(); }
	;
	
otherTag
	:	other:OTHER_TAG
		(	otherTag
		|	pcData:PCDATA
		)*
		XML_END_TAG
	;
----------

This example didn't use namespaces, but you can add something like

options {
	xmlns="http://www.somedomain.com";
	xmlns:stuff="http://www.crunchyfrog.com/plah/foo";
}

and then use

  <someTag>       ("somedomain" namespace)
  <stuff:someTag> ("crunchyfrog" namespace)

in the grammar rules.

I've been using an earlier version of this for several months with huge
success. I plan to convert my work code to use this new grammar syntax soon
(it uses the same constructs under the covers).

I used to have the rules look like

person options {xmlTag="person";}
  : ...
  ;

but I thought that was redundant.

Anyway, more when I release it.

Later,
-- Scott




More information about the antlr-interest mailing list