[antlr-interest] Context-sensitive lexing

Mon Nov 19 20:00:01 PST 2007

On 11/19/07, Gavin Lambert <antlr at mirality.co.nz> wrote:
> Actually you can do that even for constructs that can only be
> recognised by the parser too.  Each token carries with it a
> location in the input stream, so if you can find two boundary
> tokens then you can extract the substream between them and pass it
> to another lexer/parser combo, if you want.

I guess this is the solution you're referring to:

http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control

Looks pretty complicated. I wonder if there's a simpler way of doing
it where in the first pass, the parser/lexer simply locates the block
(possibly composed of multiple tokens) and stores it. Then, a second
pass could find those blocks in the tree and reparse/relex them, just
using their .text properties.

This is pretty naive, ignorant thinking though :)

Theroetically, there could also exist grammars where the outer
parser/lexer couldn't identify the end of the inner block, but that
would be pretty evil.

In my particular case, I think I'll have to handle the different types
of blocks differently. In some, it's easy to lexically determine the
start and end of the block. In other cases, it really needs to be
parsed, so I just can't do that, and I honestly think the "island
grammar" solution is going to be more trouble than it's worth for me.
Maybe one day ANTLR will magically handle island grammars with some
beautiful new keyword ;)

Steve