[antlr-interest] Re: XML/XSD parser generators and processing

Mon Feb 24 09:47:59 PST 2003

> I don't agree that YACC has advantages over ANTLR in this respect.
> You could build solutions with either approach.  State tables or
> parsers could be built beforehand for known XSDs.  Then the
> application would read the doctype and choose the table/parser for the
> remainder of the document.

  My explaination was rather terse, at best, so the point I was trying
  to make about a state table driven parser vs. an ANTLR style
  generated source code parser was probably unclear.

  I'm not a big fan of YACC.  Like any YACC user, I've spent hours
  pouring over obscure dumps of the YACC trace to find errors in my
  grammar.  The actual YACC parser is also far less sophisticated than
  ANTLR.  So I'm not making an argument for YACC here.  I hope to
  never use YACC again.

  What I am suggesting is that a validating XML processor may be best
  constructed using YACC like technology.  That is, state tables
  generated from the XML schema (XSD).

  I am assuming that schemas may change and I want to retain the
  flexibility that currently exists in validating XML parser to add
  new schemas.  For a tool that generated a recrusive decent parser
  that is not state table based (e.g., ANTLR), you would have to
  generate the source code (say Java) and then compile it.  I know
  that the compiler can be invoked from Java, but this is slow and
  seems rather awkward.

  If a state table is generated from the XML schema it can be
  immediately cached.  No intermediate step is necessary (e.g.,
  compiling).  When an XML document is received that references the
  schema, a finite state machine implementation processes this state
  table to parse the XML.

  Note that I am not proposing YACC here.  A new "parser generator"
  would have to be built that read in schemas as the grammar defintion.

  The brings up another issue that makes ANTLR difficult to use.  The
  grammar is not expressed in ANTLR grammar form (or YACC form for
  that matter), but rather as an XML schema.  Also, the target is
  restricted.  The input language that will be parsed is an XML
  document.  So the parser generator would be rather specialized.  The
  result of an XML parse is known in advance.  That is: SAX style
  events or a DOM XML tree.  This provides a simpler structure than
  exists for most parser generators.

  As someone pointed out, similar tools seem to have been built for
  XML Document Type Descriptions (DTDs).  So far, such a tool does not
  seem to exist for XML schemas, which represent a more powerful
  grammar that DTDs.

  Again, I apologize for a somewhat off topic post to this group.  I
  appreciate your forbearance.  This group is somewhat unique in its
  experience with parsers, compilers and Java related technology
  (into which I throw XML, although XML can also be processed from
  C++).

  Ian Kaplan
  iank at bearcave.com
  www.bearcave.com

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/