[antlr-interest] How to measure character distance from the file start?

Georgios Petasis petasis at iit.demokritos.gr
Wed Apr 6 11:04:16 PDT 2005


Dear list members,

I have written a grammar that "filters" free text, to locate
text fragments that obey the grammar. This works like
the "filter" example of ANTLR distribution.
However, I don't know how to get the distance of an
identifed text fragment from the text start.
Because I have the rule in the lexer to skip anything that
doesn't match in another lexer rule, I have no idea what is
the character offset of the first token that finally matched
my grammar. I have attached a tiny fraction of my grammar
& the code that uses it. The real grammar is quite complex
(in fact I have to set AST generation to off to get the generated 
files to compile), so please take this into consideration in
any possible suggestions. In general, what is the
"official" way of getting the character offsets of both
the start and the end of the matching text fragment?
I have looked into the sources, and AST elements do not
seem to hold this information. Do I have to provide
a modified AST tree implementation? And if I turn off AST
generation, where can I store this information?

I think that the only information I have to compute is
the offset of parse start, as the end is easy to get (I can just ask
the lexer about the current line/column).

Any ideas?
Kind regards,

George Petasis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: out.g
Type: application/octet-stream
Size: 1381 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20050406/aab21fe5/out.obj
-------------- next part --------------
/*
 * This file was automatically generated from grammar out.grm
 */
#include <iostream>
#include <sstream>
#include "antlr/AST.hpp"
#include "outLexer.hpp"
#include "outParser.hpp"

int main( int argc, char* argv[] ) {
  ANTLR_USING_NAMESPACE(std)
  ANTLR_USING_NAMESPACE(antlr)
  try {
    istream *input = &cin;
    const char *filename = "<cin>";

    outLexer lexer(*input);
    lexer.setFilename(filename);
    lexer.resetHasMoreTokens();
    outParser parser(lexer);
    parser.setFilename(filename);
    ASTFactory ast_factory;
    parser.initializeASTFactory(ast_factory);
    parser.setASTFactory(&ast_factory);
    // Parse the input expression
    while (lexer.hasMoreTokens()) {
      cout << "Lexer Start State: line=" << lexer.getLine()
           << ", col=" << lexer.getColumn() << endl;
      try {
        parser.p_S();
      } catch (ANTLR_USE_NAMESPACE(antlr)RecognitionException& ex) {
        // Parsing has failed.
        // parser.reportError(ex);
        parser.consume();
        continue;
      }
      cout << "Lexer State: line=" << lexer.getLine()
           << ", col=" << lexer.getColumn() << endl;
      RefAST t = parser.getAST();
      if (t) {
        // Print the resulting tree out in LISP notation
        cout << t->toStringTree() << endl;
      } else {
        cout << "No tree produced" << endl;
      }
    }; // while (lexer.hasMoreTokens())
  }
  catch(ANTLRException& e) {
    cerr << "Parse exception: " << e.toString() << endl;
    return -1;
  }
  catch(exception& e) {
    cerr << "exception: " << e.what() << endl;
    return -1;
  }
  return 0;
}; /* main */


More information about the antlr-interest mailing list