[antlr-interest] Changing lexers mid-stream

Dennis Sosnoski dms at sosnoski.com
Wed Jan 12 01:05:08 PST 2011


On 01/12/2011 08:54 PM, Bart Kiers wrote:
> On Wed, Jan 12, 2011 at 8:44 AM, Dennis Sosnoski <dms at sosnoski.com
> <mailto:dms at sosnoski.com>> wrote:
>
>     I'm wondering if there's an easy way to switch back and forth between
>     different lexers while processing a stream. I'm working with Java
>     language documents, and I'd like to be able to break down /**...*/
>     comments into components - but to do this, I'd need to switch
>     modes when
>     I enter the start of a comment, and switch back to normal Java lexer
>     rules at the end of the comment.
>
>     ...
>
>
> Have you seen this:
> http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control
> ?
>

No, I hadn't seen that. It does sound very similar to what I want,
though I'm not completely sure how to apply it.

In my case I'd like to have the comment token types extend those used in
the base grammar (so just adding a couple more token types). I'm
guessing that if I create a pair of lexers (one for the language grammar
with a start-of-comment token, the other for comments with an
end-of-comment token) I could switch between lexers on-the-fly by using
a filter that implements TokenSource:

             /---Language lexer--\
---Filter---<                     >--CharStream
             \---Comment lexer---/

Does that sound like the best approach?

To do this, I'd want to add several token types to those defined in the
language grammar, which I know I can do by adding them to the tokens { 
} list. Then I'd want to use the actual token values assigned in the
generated language lexer in my comment lexer (so that the token type
codes are all unique in the generated token stream seen from the
filter). How can I do that part?

Thanks,

  - Dennis


More information about the antlr-interest mailing list