[antlr-interest] Tokens that span across char streams

Stanislav Sokorac sokorac at gmail.com
Wed Aug 26 12:57:36 PDT 2009


I have a language that allows macros to be used just about anywhere, which
makes things a bit difficult. For example, a macro could define half a
string, and something like this is legal:

#define FOO "start of a string
String a = FOO end of a string";

If I do on-the-fly substitution of macros by switching char streams (using
the include file technique from the FAQ), lexer cannot recognize the string
in the second line: it parses the macro text, encounters EOF of that stream,
throws an exception ("couldn't match anything"), and then start over at the
second half of the string, again not matching anything.

What's a good way to "smooth over" the EOF bump, and merge the streams into
one from lexer's point of view? Do I need to implement a custom CharStream
to do something like this?

Of course, I could have a pre-process run that replaces all the macros, and
then run through the resulting code, but I'd like to avoid that because (1)
it's slow to go through the file twice, and (2) the character/line numbers
in tokens will be messed up in the second run and it'll take a bit of work
to bring them back to the original locations.

Thanks,
Stan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090826/1e9ab462/attachment.html 


More information about the antlr-interest mailing list