[antlr-interest] Additional char from LEXER->getText

Mike Lischke mike at lischke-online.de
Fri Aug 31 00:26:35 PDT 2012


Hi Jim,

> Actually, those routines are really only there for convenience. You will
> find them too slow and and cumbersome for any complicated tasks. It is
> better to use the pointer to the input stream directly and avoid any
> copying and malloc() calls.

Well, this is what the target uses for the $text token in the grammar. If the used code is not good shouldn't the code generator then use a better one? I would like to avoid language specific stuff in my grammar where I can.

> However is this because you have a UTF8 input but are using the 8 bit
> input stream?



My setup goes like this:

  input = antlr3StringStreamNew((pANTLR3_UINT8)utf8.c_str(), ANTLR3_ENC_UTF8, utf8.size(), (pANTLR3_UINT8)"sql-script");
  input->setUcaseLA(input, ANTLR3_TRUE); // Make input case-insensitive. String literals must all be upper case in the grammer!
  
  lexer = MySQL56LexerNew(input);
  tokens = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT, TOKENSOURCE(lexer));
  parser = MySQL56ParserNew(tokens);

  MySQL56Parser_query_return ast = parser->query(parser);

Isn't that how it is supposed to work?

Mike
-- 
www.soft-gems.net




More information about the antlr-interest mailing list