[antlr-interest] [C target] ANTLR 3.1 issues with token offsets and generated AST return types
Sven Van Echelpoel
sven.van.echelpoel at empolis.com
Wed Aug 20 02:13:00 PDT 2008
Hi,
I started out last week with ANTLR 3.1b2 to generate a parser with the C
target. All went very well and I must say that I was very impressed with
it. But then I wanted to get a hold of the token offsets (start and
stop). For that I need the functions getStartIndex() and getStopIndex()
of ANTLR3_COMMON_TOKEN_struct, right?
When I parse my input, the token offsets are all rubbish. Although in my
grammar I'm using rewrite rules to generate the AST, I can reproduce it
with a small grammar as well. Here's what I tried:
grammar MyGrammar;
options {
/* Generate C code */
language = C ;
/* Build an AST */
output=AST ;
}
translation_unit
: NUMBER+
;
fragment
DIGIT_CHAR
: '0'..'9'
;
fragment
DIGIT_CHAR_WITHOUT_ZERO
: '1'..'9'
;
fragment
WHITESPACE_CHAR
: ' ' |'\n' |'\r' | '\t'
;
NUMBER
: ( '0' | DIGIT_CHAR_WITHOUT_ZERO ) DIGIT_CHAR*
;
WHITESPACE
: WHITESPACE_CHAR {$channel = HIDDEN;}
;
For an input "12376 87562356" (utf-16), the parse succeeds, but the
start and stop index of the tokens associated with the AST nodes are way
off the mark. Here's what I print out for each tree node (ts is token
start, te is token end):
ts: 6649008 te: 6649017
ts: 6649020 te: 6649035
Slightly bigger than the string I sent in. :-)
Naturally I was working with 3.1b2 and not the official release, so when
I saw that 3.1 was released I went ahead and tried that one. This was
even worse! 3.1 with the C target does not even generate the type of the
AST in the return structs of the rules. This is what comes out:
typedef struct WarpParser_translation_unit_return_struct
{
/** Generic return elements for ANTLR3 rules that are not in tree
parsers or returning trees
*/
pANTLR3_COMMON_TOKEN start;
pANTLR3_COMMON_TOKEN stop;
tree; <---------------- No type here!
}
WarpParser_translation_unit_return;
Clearly we are missing something important here. Or maybe I am missing
something obvious. I used the C-runtime from the ANTLR source
distribution and tried it also with the stand-alone C lib distro. I'm
building an Ubuntu 7.1 with gcc 3.4 (64-bit).
Any help will be much appreciated,
Thanks in advance,
Sven
More information about the antlr-interest
mailing list