[antlr-interest] Returning from a sub-parser with no end token.

Thu Dec 27 00:41:01 PST 2001

I use sub-lexers and sub-parser to decode multiple formats in a single file.
This a great ANTLR feature as I can parse radically different portions of the input,
without writing a complicated multi-format parser.

But there is a strange behavior,
 when I want the sub-parser to match an optionnal end-token :

parse: (B)+ (BBB)?

The rule should return when the token next to B is not BBB, right ?
So why does it try ALL the tokens until it matches a BBB ???

Rémi

P.S. Here is a complete working example (two files Main_Parser.g and B_Parser.g):
-----
Main parser (file Main_Parser.g)
-----
Main parser and lexer :
// Main parser ------
class Main_Parser extends Parser;
{
  static antlr.TokenStreamSelector selector = new antlr.TokenStreamSelector();

  public static void main(String[] args) {
    try {
    // This string is simulating the real file
       java.io.StringReader input = new java.io.StringReader(
       "AAA aa BBB b bb BBB AAA a a BBB b b BBB\n" // with end BBB
      + "AAA aa BBB b bb AAA a BBB b b");     // without end BBB

      Main_Lexer main = new Main_Lexer(input);
      B_Lexer b_lexer = new B_Lexer(main.getInputState());
      selector.addInputStream(main, "main");
      selector.addInputStream(b_lexer, "BBB");
      selector.select("main");

      Main_Parser parser = new Main_Parser(selector);
      parser.parse();
    }
    catch(Exception e) {
      System.err.println("exception: "+e);
      e.printStackTrace(System.err);
    }
    try {System.in.read();} catch (Exception ex) {}
  }
}

// Only one rule in the Parser for simplicity ------
parse:
 ( BBB // Matching BBB switch to B_Parser with B_Lexer
  {
   selector.push("BBB");
   B_Parser b_parser = new B_Parser(getInputState());
   b_parser.parse();
   selector.pop();
   setInputState(b_parser.getInputState());
   System.out.println("BBB parsing complete");
  }
 | ( AAA (A)+ ) // Matching AAA is done within this parser for simplicity
   {System.out.println("AAA parsing complete");}
 )+ ;

// Main lexer ------
class Main_Lexer extends Lexer;
options {  filter = WS;  k =2; }
AAA: "AAA" ;
A: 'a' ;
BBB: "BBB" ;
protected WS: (' ' | '\t' | ('\n' | '\r' |"\r\n") {newline();} ) { _ttype = Token.SKIP; } ;

-----
Subparser and sublexer (file B_Parser.g):
 -----
// B parser ------
class B_Parser extends Parser;

parse: (B)+ (BBB)? // This is ok if there is a terminating token BBB
   // If there is no token to signify the termination of the subparser.
   // This fails as the lexer consumes all invalid chars.
 ;

// B lexer ------
// How can I tell this lexer to send EOF if the char is not in his vocabulary ?
// I wish I can push back the char witch rises a NoViableAltForCharException and return
Token.EOF_TYPE.
class B_Lexer extends Lexer;
options {
 filter = WS;
 k =2;
}
BBB: "BBB" ;
B: 'b'  ;
protected WS: (' ' | '\t' | ('\n' | '\r' |"\r\n") {newline();} ) { _ttype = Token.SKIP; } ;
-----
Compiling and running :
-----
// Compile :
java -classpath E:\antlr-2.7.1 antlr.Tool Main_Parser.g
java -classpath E:\antlr-2.7.1 antlr.Tool B_Parser.g
javac -classpath E:\antlr-2.7.1\antlr.jar *.java
// Run :
java -classpath E:\antlr-2.7.1\antlr.jar;. Main_Parser
AAA parsing complete // Theses ones are OK
BBB parsing complete
AAA parsing complete
BBB parsing complete
AAA parsing complete
line 2: unexpected char: A // There in the sub-parser (BBB)+ tries and consumes unexpected token
line 2: unexpected char: A
line 2: unexpected char: A
line 2: unexpected char: a
BBB parsing complete  // Until it finds BBB and returns.
line 2: unexpected char: b // Of course, the Main_Parser doesn't recognize 'b'
line 2: unexpected char: b

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/