[antlr-interest] Triggering a lexical "include" from the parser

Jesse McGrew jmcgrew at hansprestige.com
Fri Aug 24 12:53:39 PDT 2012

Hi all, I posted this question to StackOverflow
but I'd appreciate your input on it too, either here or over there...

The ANTLR website describes two approaches
(http://www.antlr.org/wiki/pages/viewpage.action?pageId=557057) to
implementing "include" directives. The first approach is to recognize
the directive in the lexer and include the file lexically (by pushing
the CharStream onto a stack and replacing it with one that reads the
new file); the second is to recognize the directive in the parser,
launch a sub-parser to parse the new file, and splice in the AST
generated by the sub-parser. Neither of these are quite what I need.

In the language I'm parsing, recognizing the directive in the lexer is
impractical for a few reasons:

* There is no self-contained character pattern that always means "this
is an include directive". For example, 'Include "foo";' at top level
is an include directive, but in 'Array bar --> Include "foo";' or
'Constant Include "foo";' the word 'Include' is an identifier.
* The name of the file to include may be given as a string or as a
constant identifier, and such constants can be defined with
arbitrarily complex expressions. (That is, the Constant directive in
general supports arbitrary expressions, as long as they can be
calculated at compile time, although there are no compile-time string
operators so in this specific case the expressions are either a quoted
string or another constant identifier.)

So I want to trigger the inclusion from the parser. But to perform the
inclusion, I can't launch a sub-parser and splice the AST together; I
have to splice the tokens. It's legal for a block to begin with '{' in
the main file and be terminated by '}' in the included file. A file
included inside a function can even close the function definition and
start a new one.

It seems like I'll need something like the first approach but at the
level of TokenStreams instead of CharStreams. Is that a viable
approach? How much state would I need to keep on the stack, and how
would I make the parser switch back to the original token stream
instead of terminating when it hits EOF? Or is there a better way to
handle this?

Here's an example of the language, demonstrating that blocks opened in
the main file can be closed in the included file (and vice versa).
Note that the '#' before 'Include' is required when the directive is
inside a function, but optional outside.


    [ Main;
      print "This is Main!";
      if (0) {
      #include "other.h";
      print "This is OtherFunction!";


      } ! end if
    ];  ! end Main

    [ OtherFunction;

More information about the antlr-interest mailing list