[antlr-interest] Re: XML/XSD parser generators and processing
iank at bearcave.com
iank at bearcave.com
Mon Feb 24 09:47:59 PST 2003
> I don't agree that YACC has advantages over ANTLR in this respect.
> You could build solutions with either approach. State tables or
> parsers could be built beforehand for known XSDs. Then the
> application would read the doctype and choose the table/parser for the
> remainder of the document.
My explaination was rather terse, at best, so the point I was trying
to make about a state table driven parser vs. an ANTLR style
generated source code parser was probably unclear.
I'm not a big fan of YACC. Like any YACC user, I've spent hours
pouring over obscure dumps of the YACC trace to find errors in my
grammar. The actual YACC parser is also far less sophisticated than
ANTLR. So I'm not making an argument for YACC here. I hope to
never use YACC again.
What I am suggesting is that a validating XML processor may be best
constructed using YACC like technology. That is, state tables
generated from the XML schema (XSD).
I am assuming that schemas may change and I want to retain the
flexibility that currently exists in validating XML parser to add
new schemas. For a tool that generated a recrusive decent parser
that is not state table based (e.g., ANTLR), you would have to
generate the source code (say Java) and then compile it. I know
that the compiler can be invoked from Java, but this is slow and
seems rather awkward.
If a state table is generated from the XML schema it can be
immediately cached. No intermediate step is necessary (e.g.,
compiling). When an XML document is received that references the
schema, a finite state machine implementation processes this state
table to parse the XML.
Note that I am not proposing YACC here. A new "parser generator"
would have to be built that read in schemas as the grammar defintion.
The brings up another issue that makes ANTLR difficult to use. The
grammar is not expressed in ANTLR grammar form (or YACC form for
that matter), but rather as an XML schema. Also, the target is
restricted. The input language that will be parsed is an XML
document. So the parser generator would be rather specialized. The
result of an XML parse is known in advance. That is: SAX style
events or a DOM XML tree. This provides a simpler structure than
exists for most parser generators.
As someone pointed out, similar tools seem to have been built for
XML Document Type Descriptions (DTDs). So far, such a tool does not
seem to exist for XML schemas, which represent a more powerful
grammar that DTDs.
Again, I apologize for a somewhat off topic post to this group. I
appreciate your forbearance. This group is somewhat unique in its
experience with parsers, compilers and Java related technology
(into which I throw XML, although XML can also be processed from
C++).
Ian Kaplan
iank at bearcave.com
www.bearcave.com
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list