[antlr-interest] [ANTLR C 3.1.3] UCS2 input stream attempting to read beyond end of input

Mon Jun 29 14:02:27 PDT 2009

Hi Jim,

I am hitting some errors which seem to be related to the UCS2 input  
stream attempting to read past the end of my UTF-16 input data. This  
is the error I am seeing in one of my tests:

-memory-(1)  : error 9 : Extraneous token, at offset 58
     near [
      : Extraneous input - expected <EOF>

Here are the input bytes:

SQLParser::parse() converted UTF-16 string (num bytes = 118) :
49 00 4e 00 53 00 45 00 52 00 54 00 20 00 49 00 4e 00 54 00 4f 00 20  
00 70 00 65 00 70 00 70 00 [I.N.S.E.R.T. .I.N.T.O. .p.e.p.p.]
65 00 72 00 20 00 28 00 6e 00 61 00 6d 00 65 00 2c 00 20 00 74 00 61  
00 73 00 74 00 65 00 29 00 [e.r. .(.n.a.m.e.,. .t.a.s.t.e.).]
20 00 56 00 41 00 4c 00 55 00 45 00 53 00 20 00 28 00 27 00 4a 00 61  
00 6c 00 61 00 70 00 65 00 [ .V.A.L.U.E.S. .(.'.J.a.l.a.p.e.]
f1 00 6f 00 27 00 2c 00 20 00 27 00 68 00 6f 00 74 00 27 00 29  
00                               [..o.'.,. .'.h.o.t.'.).]

The strange thing is that functionally, the test is working as expected.

Another of my tests is actually failing with a different problem - the  
generated parser is not calling the final action I have specified in  
my grammar (maybe because it is not hitting the EOF character?). I am  
wondering if these two issues are somehow related. I have run the code  
through valgrind and there are no errors detected in terms of memory  
access. Here is my code for doing the conversion from UTF-8 to UTF-16  
and creating the input stream.

         // input buffer
         const UTF8* source = (const UTF8*) sql;
         const UTF8* sourcestart = source;
         const UTF8* sourceend = sourcestart + length;

         // output buffer
         UTF16* target = new UTF16[length + 16]; // extra chars for  
safety
         UTF16* targetstart = target;
         UTF16* targetend = target + length;

         memset(target, 0, 2*(length+16)); // fill with \0 to ensure  
string is null-terminated with at least 16 nulls

         // conversion
         ConversionResult res = ConvertUTF8toUTF16(&sourcestart,  
sourceend, &targetstart, targetend, strictConversion);
         if (res != conversionOK) {
             // cleanup ..
             throw "failed";
         }

         input =  
antlr3NewUCS2StringInPlaceStream((pANTLR3_UINT16)target, length, NULL);

Any suggestions would be appreciated.

Thanks,

Andy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090629/7834e1a9/attachment.html