[antlr-interest] XML island grammar

Mon Oct 8 14:25:24 PDT 2007

On Mon, Oct 08, 2007 at 11:28:16AM -0700, Matthieu Riou wrote:
> Thanks a lot, that's really helpful! I roughly see how this can be pieced
> together to get something working although I don't fully understand how the
> lexer can recognize a bad match.
> 
> Say that you have something that looks like a regular expression but isn't
> really one, the island grammar parser won't be able to match it, so you have
> to "refuse" the match so that another rule in the main grammar can be
> checked, right? How does that work, does an exception thrown in
> parseRegexpLiteral or parseXMLLiteral forces the main grammar parser to go
> look for another match?

In both examples, the 'outer' parser recognises a boundary token known
to the outer lexer (i.e. '/' for a regexp or '<' for E4X) and at that
point, has to commit to interpreting the subsequent input with the
island lexer/parser (at least until recognising the next boundary token
that signals time to leave the island).  When the parser sees this token
in other contexts, it can give the normal interpretation (i.e. division
or less-than), but when the parser recognises the token in some special
contexts, it will invoke the lexer-switcheroo so that the island grammar
can take over.

So this approach is probably incompatible with backtracking (which works
by throwing exceptions as you describe, I think).

ta,
dave

-- 
http://david.holroyd.me.uk/