Perhaps I'm over-simplifying, but in XML isn't it easier to reverse your thought processes, so that everything from ">" to "<" is slurped as data, very much like a classic text string? It would seem that the content (as opposed to the XML-isms) are simply raw goo that you don't want to process in any case - so treat it as something that is opaque.