[antlr-interest] [ANTLR C 3.1.3] Error when parsing international characters

Andy Grove andy.grove at codefutures.com
Tue Jun 16 09:18:13 PDT 2009


Jim,

Thanks. I've attempted to use the UCS input stream with this code:

SymbolTable* SQLParser::parse(std::string sql) {

	....

	std::wstring wsql(sql.begin(), sql.end());
	const wchar_t *wsqlchars = wsql.c_str();
	input = antlr3NewUCS2StringInPlaceStream((pANTLR3_UINT16)wsqlchars,  
wsql.length(), NULL);

	...

}

Am I even close with this? It compiles OK but now when I run my test  
the app becomes unresponsive and consumes all the available RAM.

Thanks,

Andy.


On Jun 16, 2009, at 9:21 AM, Jim Idle wrote:

> You need the UCS version of the input stream or write a utf32 input  
> stream and use to pre-supplied UTF8 to UTF32 conversion routine.
>
> If you can wait until next reLease I will be supplying these ready  
> made but they are not difficult to produce, just copy the others.  
> Internally the euntime uses 32 bit unicode and dies not care how you  
> provide these.
>
> Jim
>
> On Jun 16, 2009, at 9:20 AM, Andy Grove <andy.grove at codefutures.com>  
> wrote:
>
>> I have a SQL parser that is working fine with standard ASCII  
>> characters but if I try and insert data containing international  
>> characters such as:
>>
>> "INSERT INTO customer (username, password, title, first_name,  
>> last_name, addr_line1, addr_line2, addr_city, addr_state,  
>> country_id) VALUES (''username123', 'password', 'Mr', 'Tåst',  
>> 'Test', 'Test', 'Test', 'Test', 'TE', 1)"
>>
>> I get this error:
>>
>> -memory-(1) : lexer error 1 :
>> 	Unexpected character at offset 179, near char(0XC3) :
>> 	åst', 'Test', 'Test
>>
>> Here is my setup code:
>>
>> 	input =  
>> antlr3NewAsciiStringInPlaceStream((pANTLR3_UINT8)stringCopy, l,  
>> NULL);
>> 	lexer = DbsMySQL_CPPLexerNew(input);
>> 	tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT, lexer- 
>> >pLexer->rec->state->tokSource);
>> 	parser = DbsMySQL_CPPParserNew(tstream);
>>
>> Do I need to specify the character set somewhere?
>>
>> Thanks,
>>
>> Andy.
>>
>> ---
>> Andy Grove
>> Chief Architect
>> CodeFutures Corporation
>> "Share Nothing. Shard Everything."
>>
>> Cell:    (303) 720-1285
>> E-Fax:   (303) 395-0426
>> Web:     http://www.codefutures.com/
>> Twitter: http://twitter.com/andygrove73
>>
>>
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090616/17b29c6c/attachment.html 


More information about the antlr-interest mailing list