[antlr-interest] Context-sensitive lexing

Mon Nov 19 01:07:47 PST 2007

Gavin Lambert wrote:
> At 21:48 19/11/2007, shmuel siegel wrote:
> >Since the lexer is capable of recognizing  the boundaries of
> >the comment, you can have it return a comment to the parser.
> >The parser calls another lexer/parser passing them the
> >content of the comment.
> >This involves double lexing but should be fast enough.
>
> Actually you can do that even for constructs that can only be 
> recognised by the parser too.  Each token carries with it a location 
> in the input stream, so if you can find two boundary tokens then you 
> can extract the substream between them and pass it to another 
> lexer/parser combo, if you want.
>
But it can be much more difficult when the inner language understands 
constructs that the outer language doesn't. Consider regular expressions 
in javascript. If the first lexer dealt with the input stream, there can 
very well be white space. Also there can be sequences that the outer 
lexer will reject, like +-+/*.