[antlr-interest] Re: check tokens for whitespace?

mazypath eitan at cs.ucla.edu
Thu Sep 23 18:29:13 PDT 2004


Thanks for your quick answer.  My question may not have been clear.

I would like VAR to be any string including those starting including
those that start with "plus"  (or another keyword/token) followed by
letters or integers.  so:
  helloWorld ---> VAR
  plus ---> FUNC
  plus1 ---> VAR

In your reply VAR must start with "plus".  Add the origional VAR
defintion ('a'..'z') ('a'..'z'|'0'..'9' | '.')* to the rules below and
you get nondeterminism. 

  VAR : 
    ("plus " ( 'a'..'z'|'0'..'9')) => ('a'..'z') ('a'..'z'|'0'..'9' |
'.')* |
    (('a'..'z') ('a'..'z'|'0'..'9' | '.')*) |
    ("plus ") => "plus " {$setType(FUNC); } ;

There is now nondetermenism between block 2 and 3.  Move the last
block up and "plus1" is labled FUNC again.  Even if this were to work
I have a lot of keywords, defining them WITHIN another token
definition seems bad.

What would be ideal (in my mind) is if I could leave VAR as is and
change FUNC to be something like
  FUNC: "plus" ~( 'a'..'z'|'0'..'9') 
And then have that last charater not be consumed (or re-inject it into
the stream).

Thank you agian!



--- In antlr-interest at yahoogroups.com, "kozchris" <csnyder at a...>
wrote:
> Something like this is one way.
> 
> class LTest extends Lexer;
> 
> tokens {
>   FUNC;
> }
> 
> VAR : ("plus" ('a'..'z'|'0'..'9')) => ('a'..'z')
('a'..'z'|'0'..'9')*
>   | ("plus")=> "plus" {$setType(FUNC);};
> 
> WS : ( ' '| '\t' | '\f') { $setType(Token.SKIP); }
> 
> Chris
> 
> --- In antlr-interest at yahoogroups.com, "mazypath" <eitan at c...>
wrote:
> > Sorry if this is a newbie question but I can't seem to find an
answer
> > in the docs or online.
> > 
> > Is there anyway to define a token as a string and to only have
have
> > that string recognized as a token if it is not followed by
whitespace?
> > 
> > For example if I define the Lexer as follows:
> >   class L extends Lexer;
> > 
> >   FUNC : "plus";
> >   WS : ( ' '| '\t' | '\f') { $setType(Token.SKIP); }
> >   VAR : ('a'..'z') ('a'..'z'|'0'..'9')*;
> >   ;   
> > 
> > Can I get the Lexer to parse the string "plus1" as a VAR token
and not
> > a FUNC token followed by "1"?
> > 
> > Thanks in advance!




 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list