[antlr-interest] Island grammar in AntlrV3

Emond Papegaaij e.papegaaij at student.utwente.nl
Tue Dec 5 04:31:33 PST 2006


On Tuesday 05 December 2006 13:06, David Holroyd wrote:
> On Mon, Dec 04, 2006 at 11:20:05PM +0000, David Holroyd wrote:
> > On Sat, Sep 02, 2006 at 11:01:43PM +0000, David Holroyd wrote:
> > > My specific use case is regular expression literals, e.g. I'd like to
> > > be able to handle,
> > >
> > >   r =   / b; f = r/m;  // regexp literal with 'm' flag
> > >   r = a / b; f = r/m;  // two expr-statements involving division
> > >
> > > It seems that the lexer needs context from the grammar in order to tell
> > > what to do on seeing '/'.
> >
> > I've been avoiding working on this bit of my grammar, but I'm starting
> > to need it now.
> >
> > At what level should I attack the problem?
> >
> > My first idea is to have an action at the point in the outer grammar
> > where the island grammar's start-marker is recognised, which will...
> >
> >  1) take the unprocessed tail of the CommonTokenStream that the
> >     outer parser currently has as input, and turn back into a string
> >  2) create a new island lexer/TokenStream that reprocesses the tail
> >     from 1)
> >  3) create a parser for the island grammar, and parse the new token
> >     stream from 2)
> >  4) get the tail of the island grammar's token stream once the
> >     end-marker was found, and convert back to the lexer for 'this'
> >     grammar again
> >  5) replace the original 'input' reference the parser was using, and get
> >     going with the outer grammar again
> >
> > If all that works, I can hook the AST built by the island grammar into
> > the AST that the outer grammar is creating.
> >
> >
> > How does that compare with the approach that others are taking?  Does it
> > sound like it might work, or is it wrong-headed and silly?
>
> So I've just spotted that point 1) above is flawed bacause it suggests
> using CommonTokenStream, which immediately tokenizes the entire input.
> Of course, tokens of the island grammar aren't 'compatible' with tokens
> of the outer grammar (e.g. the regexp literal /"/ will look like an
> unterminated string to the outer grammar's lexer), so I would get lexer
> errors trying to use that implementation.
>
> I'm going to need a TokenStream which lazy-loads tokens from the
> TokenSource.
>
> Does my plan sound realistic otherwise?

I've tried something similar, and it does work, but be prepared for some 
difficulties. For example, the outer grammar MUST not look ahead past (or 
into) the island grammar. As that will cause trouble with the lexer (as you 
just mentioned). Interaction between the parser and the lexer is difficult to 
get right, due to the LA. Writing a lazy token stream is not very difficult 
with ANTLR v3 (just implement the interface).

Best regards,
Emond


More information about the antlr-interest mailing list