[antlr-interest] Multiplexing questions

xadeck decoret at graphics.lcs.mit.edu
Tue Aug 31 08:02:17 PDT 2004


 I am stuck with multiplexing + I have miscelleanous small questions
for which I am requesting your help

Let's consider the following text file that I want to parse:

java
{
 // some java code
}
c++
{
 // some c++ code
}

The structure is pretty obvious. I assume I have a java lexer/parser
and a cxx lexer/parser. Trying to reuse them, I make the following
grammar:
options {
   language="Cpp";
}
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
class multiparser extends Parser;
options {}

file
   : (javablock | cxxblock)* EOF
   ;
javablock
   : JAVA LBRACE
       {
           javaparser p(getInputState());
           p.parse();
       }
       RBRACE
   ;
cxxblock
   : CPP LBRACE
       {
           cppparser p(getInputState());
           p.parse();
       }
       RBRACE
   ;
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
class multilexer extends Lexer;
options
{
}

LBRACE: '{';
RBRACE: '}';
JAVA:   "java";
CXX:    "c++";

Now for the first problem: from what I understand of the example given
in the manual, the above lexer should switch a selector by pushing the
java lexer when we start the javablock, and pushing the cxx lexer when
we start a cxxblock. I also understand that switching lexers should
*not* be done in the parser (it was reminded in a recent thread about
"rewinding tokens"). So in my example, I should switch the selector
somewhere in the lexer, probably inLBRACE rule:
LBRACE: '{'
                {
                     selector.push("???");
                }
The problem is that in that rule, I cannot know which lexer to select!
This info is a semantic/parser one. And I do not want the lexer to
keep a pointer on the parser, do I?

The second problem is similar. Again from the example, I understand
that the popping of the selector should be done in the java/cxx
lexers. The problem is again that those lexers do not know that the
parsing of the java/cxx part is finished. In the manual example, the
termination of a "lexer part" is indicated by a very specific token
(such as //@} ) so lexers can catch them and pop the selector. But in
my example, the end of the block is indicated by } which is also a
regular (and widely used!) token of the java/cxx grammars. So I *have
to* do the popping of the selector in the parser !?!? Am I right? But
in that case, it does not work (I am trying it without success)
because of token lookahead (I understand that the "closing }" is eaten
by the java/cxx lexer and therefore not found by the multilexer when
the selector switch backs to it.

Any help on how to solve this (I believe rather simple) example? 



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list