[antlr-interest] Context-sensitive lexing

Mon Nov 19 01:41:04 PST 2007

At 22:07 19/11/2007, shmuel siegel wrote:
 >But it can be much more difficult when the inner language
 >understands constructs that the outer language doesn't.
 >Consider regular expressions in javascript. If the first
 >lexer dealt with the input stream, there can very well be
 >white space. Also there can be sequences that the outer
 >lexer will reject, like +-+/*.

Well, the whitespace doesn't matter, since you're accessing the 
input stream directly (so any whitespace will be preserved, not 
skipped).

But you're right, your first lexer will still have to be able to 
produce some kind of token sequence and your first parser will 
have to recognise the appropriate boundary markers to pass along 
to the second lexer/parser.  Depending on your input language 
structure, this might be simple or it might be complicated.

Certainly it's easier if you can capture the whole thing in a 
single lexer token and process it that way, but that's not always 
possible.  I was just trying to point out that it's not impossible 
to do it the other way :)