[antlr-interest] XML island grammar

Susan Jolly easjolly at ix.netcom.com
Mon Oct 8 08:59:43 PDT 2007


If you don't have a lot of different XML elements, you could let the lexer
look for "<xyz" rather than "<". Alternatively, could you have "<"
characters that aren't part of XML tags be escaped with &lt;?

Another possibility is to have your main lexer grab an entire XML section
plus tags and then actually lex that section with another lexer.  You'd use
something like the following to grab the section:
XML: '<' ( options {greedy=false;} : . )* '/>';

The key here is that with ANTLR v3 you can override the emit method in the
lexer.  See "Emitting More Than One Token per Lexer Rule" on p. 94 of
Section 4.3 in the ANTLR book. In other words, you don't have to let the
first lexer emit the whole enchilada as a single token.  

The emit method can do anything it wants, including invoking another lexer
to "re-tokenize".  This is actually simpler than the way v2 handled multiple
lexers using what it called a "shared input stream" and requiring that the
main lexer be able to detect just the start of the island as a token.

HTH 




More information about the antlr-interest mailing list