[antlr-interest] ANTLR3 Nested parser

David Holroyd dave at badgers-in-foil.co.uk
Wed Jan 23 06:00:45 PST 2008


On Wed, Jan 23, 2008 at 08:49:40AM +0100, Bertalan Fodor (LilyPondTool) wrote:
> 
> >The trouble with using the nested parser to detect the extents is by 
> >the time it runs the nested lexer will have already processed the 
> >entire input stream producing errors you would have to ignore.
> Yes, now I can see that can be an issue. Originally I was thinking about 
> subclassing the character and/or token stream to provide rewinding that 
> is better suited for me, but I think I will go with that fragment lexer 
> solution. That will at last render into Harald's original suggestion, 
> ie. to handle that fragment as a single String.

Does the following method apply..?

  http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control

I had similar situation with embedded 'E4X' XML-literal expressions.
These can recursively embed expressions from the outer language too, but
I didn't implement that yet.

I initially attempted to make this work by defining complicated
fragment lexer rules in the 'outer' language in order to snarf the
entire embedded language sequence.  Unfortunately that didn't work for
this grammar due to the 'start marker' being ambiguous with other
tokens.  In this language, it's difficult for the lexer to know whether
input that starts with,

   <foo

is part of a comparison, i.e.,

   bar<foo

or part of an XML literal, i.e.,

   <foo attr="33"/>

OTOH, it's trivial for the parser to know what's ahead depending on the
context it sees '<' in.

If the '#(' is lexically unambiguous in your target syntax, your life
will be easier that mine of course!  :)


ta,
dave

-- 
http://david.holroyd.me.uk/


More information about the antlr-interest mailing list