[antlr-interest] [C target] ANTLR 3.1 issues with token offsets and generated AST return types

Sven Van Echelpoel sven.van.echelpoel at empolis.com
Wed Aug 20 02:13:00 PDT 2008


Hi,

I started out last week with ANTLR 3.1b2 to generate a parser with the C
target. All went very well and I must say that I was very impressed with
it. But then I wanted to get a hold of the token offsets (start and
stop). For that I need the functions getStartIndex() and getStopIndex()
of ANTLR3_COMMON_TOKEN_struct, right?

When I parse my input, the token offsets are all rubbish. Although in my
grammar I'm using rewrite rules to generate the AST, I can reproduce it
with a small grammar as well. Here's what I tried:

grammar MyGrammar;

options {
  /* Generate C code */
  language = C ;
  /* Build an AST */
  output=AST ;
}

translation_unit
  : NUMBER+
  ;

fragment
DIGIT_CHAR
  : '0'..'9'
  ;
  
fragment
DIGIT_CHAR_WITHOUT_ZERO
  : '1'..'9'
  ;
fragment
WHITESPACE_CHAR
  : ' ' |'\n' |'\r' | '\t'
  ;
  
NUMBER
  : ( '0' | DIGIT_CHAR_WITHOUT_ZERO ) DIGIT_CHAR*
  ;


WHITESPACE
  : WHITESPACE_CHAR {$channel = HIDDEN;} 
  ;

For an input "12376 87562356" (utf-16), the parse succeeds, but the
start and stop index of the tokens associated with the AST nodes are way
off the mark. Here's what I print out for each tree node (ts is token
start, te is token end):

ts: 6649008 te: 6649017
ts: 6649020 te: 6649035

Slightly bigger than the string I sent in. :-)

Naturally I was working with 3.1b2 and not the official release, so when
I saw that 3.1 was released I went ahead and tried that one. This was
even worse! 3.1 with the C target does not even generate the type of the
AST in the return structs of the rules. This is what comes out:

typedef struct WarpParser_translation_unit_return_struct
{
    /** Generic return elements for ANTLR3 rules that are not in tree
parsers or returning trees
     */
    pANTLR3_COMMON_TOKEN    start;
    pANTLR3_COMMON_TOKEN    stop;
    	tree;   <----------------  No type here!
   
}
    WarpParser_translation_unit_return;

Clearly we are missing something important here. Or maybe I am missing
something obvious. I used the C-runtime from the ANTLR source
distribution and tried it also with the stand-alone C lib distro. I'm
building an Ubuntu 7.1 with gcc 3.4 (64-bit).

Any help will be much appreciated,

Thanks in advance,

Sven




More information about the antlr-interest mailing list