[antlr-interest] Island grammar in AntlrV3

David Holroyd dave at badgers-in-foil.co.uk
Tue Dec 5 04:06:44 PST 2006


On Mon, Dec 04, 2006 at 11:20:05PM +0000, David Holroyd wrote:
> On Sat, Sep 02, 2006 at 11:01:43PM +0000, David Holroyd wrote:
> > My specific use case is regular expression literals, e.g. I'd like to be
> > able to handle,
> > 
> >   r =   / b; f = r/m;  // regexp literal with 'm' flag
> >   r = a / b; f = r/m;  // two expr-statements involving division
> > 
> > It seems that the lexer needs context from the grammar in order to tell
> > what to do on seeing '/'.
> 
> I've been avoiding working on this bit of my grammar, but I'm starting
> to need it now.
> 
> At what level should I attack the problem?
> 
> My first idea is to have an action at the point in the outer grammar
> where the island grammar's start-marker is recognised, which will...
> 
>  1) take the unprocessed tail of the CommonTokenStream that the
>     outer parser currently has as input, and turn back into a string
>  2) create a new island lexer/TokenStream that reprocesses the tail
>     from 1)
>  3) create a parser for the island grammar, and parse the new token
>     stream from 2)
>  4) get the tail of the island grammar's token stream once the
>     end-marker was found, and convert back to the lexer for 'this'
>     grammar again
>  5) replace the original 'input' reference the parser was using, and get
>     going with the outer grammar again
> 
> If all that works, I can hook the AST built by the island grammar into
> the AST that the outer grammar is creating.
> 
> 
> How does that compare with the approach that others are taking?  Does it
> sound like it might work, or is it wrong-headed and silly?

So I've just spotted that point 1) above is flawed bacause it suggests
using CommonTokenStream, which immediately tokenizes the entire input.
Of course, tokens of the island grammar aren't 'compatible' with tokens
of the outer grammar (e.g. the regexp literal /"/ will look like an
unterminated string to the outer grammar's lexer), so I would get lexer
errors trying to use that implementation.

I'm going to need a TokenStream which lazy-loads tokens from the
TokenSource.

Does my plan sound realistic otherwise?


ta,
dave

-- 
http://david.holroyd.me.uk/


More information about the antlr-interest mailing list