[antlr-interest] [ANTLR C 3.1.3] Error when parsing international characters
Andy Grove
andy.grove at codefutures.com
Tue Jun 16 09:18:13 PDT 2009
Jim,
Thanks. I've attempted to use the UCS input stream with this code:
SymbolTable* SQLParser::parse(std::string sql) {
....
std::wstring wsql(sql.begin(), sql.end());
const wchar_t *wsqlchars = wsql.c_str();
input = antlr3NewUCS2StringInPlaceStream((pANTLR3_UINT16)wsqlchars,
wsql.length(), NULL);
...
}
Am I even close with this? It compiles OK but now when I run my test
the app becomes unresponsive and consumes all the available RAM.
Thanks,
Andy.
On Jun 16, 2009, at 9:21 AM, Jim Idle wrote:
> You need the UCS version of the input stream or write a utf32 input
> stream and use to pre-supplied UTF8 to UTF32 conversion routine.
>
> If you can wait until next reLease I will be supplying these ready
> made but they are not difficult to produce, just copy the others.
> Internally the euntime uses 32 bit unicode and dies not care how you
> provide these.
>
> Jim
>
> On Jun 16, 2009, at 9:20 AM, Andy Grove <andy.grove at codefutures.com>
> wrote:
>
>> I have a SQL parser that is working fine with standard ASCII
>> characters but if I try and insert data containing international
>> characters such as:
>>
>> "INSERT INTO customer (username, password, title, first_name,
>> last_name, addr_line1, addr_line2, addr_city, addr_state,
>> country_id) VALUES (''username123', 'password', 'Mr', 'Tåst',
>> 'Test', 'Test', 'Test', 'Test', 'TE', 1)"
>>
>> I get this error:
>>
>> -memory-(1) : lexer error 1 :
>> Unexpected character at offset 179, near char(0XC3) :
>> åst', 'Test', 'Test
>>
>> Here is my setup code:
>>
>> input =
>> antlr3NewAsciiStringInPlaceStream((pANTLR3_UINT8)stringCopy, l,
>> NULL);
>> lexer = DbsMySQL_CPPLexerNew(input);
>> tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT, lexer-
>> >pLexer->rec->state->tokSource);
>> parser = DbsMySQL_CPPParserNew(tstream);
>>
>> Do I need to specify the character set somewhere?
>>
>> Thanks,
>>
>> Andy.
>>
>> ---
>> Andy Grove
>> Chief Architect
>> CodeFutures Corporation
>> "Share Nothing. Shard Everything."
>>
>> Cell: (303) 720-1285
>> E-Fax: (303) 395-0426
>> Web: http://www.codefutures.com/
>> Twitter: http://twitter.com/andygrove73
>>
>>
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090616/17b29c6c/attachment.html
More information about the antlr-interest
mailing list