[antlr-interest] unexpected char error

Kevin J. Cummings cummings at kjchome.homeip.net
Sat Mar 17 22:37:25 PDT 2007


Gavin Lambert wrote:
> At 15:26 18/03/2007, Kevin J. Cummings wrote:
>>1) Why does this work (code snippets only):
>>
>>EQ : '=' ;
>>UINT : ( '0'..'9' )+ ;
>>
>>stmt : "a" EQ UINT ;
>>
>>while this gives me errors when I run it:
>>
>>UINT : ( '0'..'9' )+ ;
>>
>>stmt : "a" "=" UINT ;
> [...]
>>> Parse exception: <arguments>:1:8: unexpected char: '='
> 
> I believe it's because you haven't defined '=' anywhere in your lexer
> any more.  The lexer will normally only accept characters that it knows
> about, and since you haven't mentioned it anywhere it doesn't know what
> token to generate for it.

But, its supposed to implicitly define the token as a literal when I use
it.  That's the whole point of allowing me to use the string in the
parser.  It should then appear in the literal table.  IN the second case
above, I see:

"="=11 in my myLexerTokenTypes.txt file.

Furthermore, my myParser.cpp file contains the following code snippet:

>                 match(LITERAL_a);
>                 ANTLR_USE_NAMESPACE(antlr)RefAST tmp2_AST = ANTLR_USE_NAMESPACE(antlr)nullAST;
>                 tmp2_AST = astFactory->create(LT(1));
>                 astFactory->addASTChild(currentAST, tmp2_AST);
>                 match(11);

and, AFAICT, its the match(11) that fails.  Is this because its
commented out in the myLexerTokenTypes.hpp file?

> struct CUSTOM_API myLexerTokenTypes {
>         enum {
>                 EOF_ = 1,
>                 NL = 4,
>                 WHITESPACE = 5,
>                 SLCOMMENT = 6,
>                 UINT = 7,
>                 STRING = 8,
>                 IDENT = 9,
>                 LITERAL_a = 10,
>                 // "=" = 11
>                 NULL_TREE_LOOKAHEAD = 3
>         };

> To solve this you can either define a token for it, as you did in your
> first example, or use a catchall token in combination with the
> charVocabulary lexer option.

I believe you here, but my larger grammar has a mirad of problems with
some of the keyword defined token types and IDENTs.

I guess I still have some work cut out for me.   B^(

-- 
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)


More information about the antlr-interest mailing list