[antlr-interest] Re: check tokens for whitespace?

mazypath eitan at cs.ucla.edu
Wed Sep 29 17:05:06 PDT 2004


Thanks, I knew there had to be an easy way. Unfortunately, I think
your answer uncovered a bug in ANTLR (or is it just uncovering my
ignorance).

I want function names to be mapped to tokens because I am defining my
own AST classes.  So my token definition looked like:

  tokens {
    FUNC<AST=my.ast.here>;
  }

So according to my reading of the ANTRL docs I should be able to do this:

  tokens {
    FUNC="func"<AST=my.ast.Class>;
  }

Problem is that is generates parser code that looks like this:

    protected void buildTokenTypeASTClassMap(){
        tokenTypeToASTClassMap.put(new Integer("func"), my.ast.Class);
    }

Which of course causes an exception when java tries to make an integer
out of the string "func."

So, before I report this as a bug:
  I doing something wrong or is this an ANTLR bug?
  If this is a bug to whom do I report it?




BTW, Right now the only working solution I've found is a very tedious
semantic predicate:

  VAR : {FunctionTests.isFunc(LA(1), LA(2), LA(3), LA(4))}? 
('a'..'z') ('a'..'z'|'0'..'9')* { $setType(FUNC) ;} |
('a'..'z') ('a'..'z'|'0'..'9')* ;

Which isn't a very good solution at all.
			  
			






--- In antlr-interest at yahoogroups.com, Joan Pujol <joanpujol at g...> wrote:
> Hi,
> 
> I think that you have to do is use the tokens section of the lexer for
> your reserved keywords
> (in your case func)
> 
> tokens {
>     FUNC="func";
> 
> }
> VAR: ('a'..'z') ('a'..'z'|'0'..'9')*;
> 
> 
> Make sure that in VAR you use the testLiterals option to true. This is
> the default, but be sure that you haven't put it to false in global
> options.
> 
> Cheers,
> 
> On Fri, 24 Sep 2004 01:29:13 -0000, mazypath <eitan at c...> wrote:
> > 
> > Thanks for your quick answer.  My question may not have been clear.
> > 
> > I would like VAR to be any string including those starting including
> > those that start with "plus"  (or another keyword/token) followed by
> > letters or integers.  so:
> >   helloWorld ---> VAR
> >   plus ---> FUNC
> >   plus1 ---> VAR
> > 
> > In your reply VAR must start with "plus".  Add the origional VAR
> > defintion ('a'..'z') ('a'..'z'|'0'..'9' | '.')* to the rules below and
> > you get nondeterminism.
> > 
> >   VAR :
> >     ("plus " ( 'a'..'z'|'0'..'9')) => ('a'..'z') ('a'..'z'|'0'..'9' |
> > '.')* |
> >     (('a'..'z') ('a'..'z'|'0'..'9' | '.')*) |
> >     ("plus ") => "plus " {$setType(FUNC); } ;
> > 
> > There is now nondetermenism between block 2 and 3.  Move the last
> > block up and "plus1" is labled FUNC again.  Even if this were to work
> > I have a lot of keywords, defining them WITHIN another token
> > definition seems bad.
> > 
> > What would be ideal (in my mind) is if I could leave VAR as is and
> > change FUNC to be something like
> >   FUNC: "plus" ~( 'a'..'z'|'0'..'9')
> > And then have that last charater not be consumed (or re-inject it into
> > the stream).
> > 
> > Thank you agian!
> > 
> 
> -- 
> Joan Jesús Pujol Espinar



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list