[antlr-interest] Context-sensitive lexing
Gavin Lambert
antlr at mirality.co.nz
Mon Nov 19 01:41:04 PST 2007
At 22:07 19/11/2007, shmuel siegel wrote:
>But it can be much more difficult when the inner language
>understands constructs that the outer language doesn't.
>Consider regular expressions in javascript. If the first
>lexer dealt with the input stream, there can very well be
>white space. Also there can be sequences that the outer
>lexer will reject, like +-+/*.
Well, the whitespace doesn't matter, since you're accessing the
input stream directly (so any whitespace will be preserved, not
skipped).
But you're right, your first lexer will still have to be able to
produce some kind of token sequence and your first parser will
have to recognise the appropriate boundary markers to pass along
to the second lexer/parser. Depending on your input language
structure, this might be simple or it might be complicated.
Certainly it's easier if you can capture the whole thing in a
single lexer token and process it that way, but that's not always
possible. I was just trying to point out that it's not impossible
to do it the other way :)
More information about the antlr-interest
mailing list