[antlr-interest] Recognising XML in a grammar

Ric Klaren ric.klaren at gmail.com
Mon Sep 11 13:37:34 PDT 2006


Hi,

I messed a bit around with the xml lexer wondering if it would be
possible to keep it inside a lexer rule.

I attached what I came up with the XMLCHUNK rule seems to grab a lot..
but it's not perfect. Trying to do complete XML matching inside one
lexer rule is probably not something you'd want to do, it's pretty
tricky. I left out the PI stuff to keep things more readable.

I used a trick with a predicate to match start and end tags together.
This makes keeps at least close and end tags 'well-formed'. Using the
official rule for chardata breaks things though. I didn't want to
spend much more time on it so I didn't figure out what goes wrong
there... There are warnings.. I'm not sure wether there's a serious
one in it.

When you combine the rule with another lexer then you probably have to
resolve ambiguity. Or alternatively use a multiLexer approach (check
out the example in the distribution)

Anyway hope it's of some use ;)

Cheers,

Ric

PS if you start taking this as a start, edit and test it in little
incremental steps and safe copies to go back to previous versions.
PPS you'll probably get better results if you use a real XML parser
and a multiLexer combination.. but writing the glue between the antlr
input buffering and the XML parser is probably tricky.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xml.g
Type: application/octet-stream
Size: 2910 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20060911/a3fbe56c/attachment.obj 


More information about the antlr-interest mailing list