[antlr-interest] [ANTLR C 3.1.3] Error when parsing international characters

Andy Grove andy.grove at codefutures.com
Tue Jun 16 07:20:16 PDT 2009


I have a SQL parser that is working fine with standard ASCII  
characters but if I try and insert data containing international  
characters such as:

"INSERT INTO customer (username, password, title, first_name,  
last_name, addr_line1, addr_line2, addr_city, addr_state, country_id)  
VALUES (''username123', 'password', 'Mr', 'Tåst', 'Test', 'Test',  
'Test', 'Test', 'TE', 1)"

I get this error:

-memory-(1) : lexer error 1 :
	Unexpected character at offset 179, near char(0XC3) :
	åst', 'Test', 'Test

Here is my setup code:

	input = antlr3NewAsciiStringInPlaceStream((pANTLR3_UINT8)stringCopy,  
l, NULL);
	lexer = DbsMySQL_CPPLexerNew(input);
	tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT, lexer- 
 >pLexer->rec->state->tokSource);
	parser = DbsMySQL_CPPParserNew(tstream);

Do I need to specify the character set somewhere?

Thanks,

Andy.

---
Andy Grove
Chief Architect
CodeFutures Corporation
"Share Nothing. Shard Everything."

Cell:    (303) 720-1285
E-Fax:   (303) 395-0426
Web:     http://www.codefutures.com/
Twitter: http://twitter.com/andygrove73



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090616/54097d69/attachment.html 


More information about the antlr-interest mailing list