[antlr-interest] How do you structure a two-part lexer?

Fri May 29 13:18:12 PDT 2009

Can anyone offer advice about how to recognise a language with two or
more very distinct lexing modes? The sort of thing you see in a php
script, where you alternate between HTML and PHP code. Something like;

    script: html ('<?' php '?>' html)*;

The problem is that one language will have very different token sets;
while html might have tokens like LT, GT, and TAGNAME, php will have
ID, SEMICOLON, etc.

So should I go for a single lexer? Two lexers feeding into a single
parser? Two parsers? I have no idea to go about interlacing languages
like this. Any advice would be greatly appreciated.

Steve Cooper.