[antlr-interest] Context-sensitive lexing

Sun Nov 18 23:35:58 PST 2007

At 15:55 19/11/2007, Steve Bennett wrote:
 >  What's the general solution when you need to switch lexers
 >midstream? In the classic C case, for example, an asm {...} 
block
 >lexes and parses differently from normal code. A "mov" would be 
a
 >special token inside the asm block, but would be nothing in
 >particular outside it.

Ko, have a look at the "island grammars" example.  And remember 
that since lexing occurs before parsing, you can't use any parser 
context to influence this changeover.

 >In normal text, almost anything goes. In an image tag,
 >lots of words have special meanings. In a table, suddenly
 >|- is a special token. In a template call, | is special. If I
 >can't actually tokenise any of these things (because they
 >don't have meaning everywhere), I seem to be back to testing
 >regular expressions on input.LT(1).getText() ?

Not necessarily.  You can tokenise them as barebones (eg. PIPE and 
HYPHEN) and then figure out whether it means something special in 
the parser.  You'll need to be careful though if you're creating 
any hidden or off-channel tokens (eg. comments or whitespace), 
since the parser will ignore them and happily treat "| -" exactly 
the same as "|-" (if you're hiding whitespace).  So you'll either 
need to avoid hiding things or create separate tokens for your 
composites (eg. PIPEHYPHEN), which will look a bit messier.