[antlr-interest] Re: XML/XSD parser generators and processing
iank at bearcave.com
iank at bearcave.com
Wed Mar 5 09:36:44 PST 2003
>> As someone pointed out, similar tools seem to have been built for
>> XML Document Type Descriptions (DTDs). So far, such a tool does not
>> seem to exist for XML schemas, which represent a more powerful
>> grammar that DTDs.
>
> I just stumbled over this. I haven't downloaded an eval,
> but it sounds very much like what you're suggesting.
>
> http://www.roguewave.com/developer/evaluations/xol/
Thanks for posting this. RogueWave's XML Object Link looks
interesting. "Object Link" is sort of an XML ANTLR, where it reads
an XML Schema (rather than the ANTLR grammar) and generates a C++
parser. I'm thinking of an XML YACC - an XML schema is read and a
state table is created which is used by a DFA to parse the XML
described by the schema. This allows dynamic loading of schemas
without a compilation step, but still supports very fast parsing.
This whole idea is kind of "high concept". That is, concept without
details and as we all know, the Devil is in the details.
I wrote my original note before I had dived deeply into XML
schemas. I bought the book "The XML Schema Complete Reference" by
Binstock et al which is over 900 pages of documentation on schemas.
Schemas are far more complicated than parser generator grammars. So
creating a parser generator for XML schemas is a major task.
Sometimes I wonder if parser generator issues ever occured to the
people who designed the schemas. For example, just as it is
possible to create an ambiguous grammar with ANTLR, an ambiguous
schema can be created for an XML document. I have not seen any
discussion on this issue (that is not say that it does not exist,
just that I have not seen it). Given that creating an error free
grammar for a language like Java is a lot of work, it seems that
schemas might be even worse, since the "grammar" is so complicated.
I have not used validation on Xerces (or any other XML processor),
so I don't know how they handle errors for ambiguous "grammars".
Ian
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list