[antlr-interest] Re: more lexical determinism

tbrandonau tom at psy.unsw.edu.au
Thu Dec 6 15:24:35 PST 2001

You want anything with all letters to be a word and anything with 
a '_' or digit to be a identifier right? So can't you just have:
  | '_' {$setType(Identifier);} 
  | Digit {$setType(Identifier);}
i.e. if its got an '_' or a digit its an identifier otherwise its a 

But, you have non-determinism in that "Hello" is a valid word and a 
valid identifier, and it will get recognized as a valid Word. So in 
the parser you'd need:
pair: (Identifier|Word) COLON Word;
Then you could create an Identifier Token\AST for the LHS Word in the 

--- In antlr-interest at y..., "howardckatz" <howardk at f...> wrote:
> --- In antlr-interest at y..., Terence Parr <parrt at j...> wrote:
>  ...
> > As for distinguishing between the two kinds of words/ids, you 
> > do the following in one rule (assume Word unless you see _ or 
> > digit):
> > 
> > Word:	( Letter | '_'  {$setType(Identifier);}) (Letter | 
> > Digit{$setType(Identifier);})*;
> That didn't quite do it, I think, Doesn't the above say that 
> starting with a Letter is a Word? But that's not what I want, since 
> valid Identifiers can start with Letters too. The following should 
> legal input,
>      id : word
> but throws an "Unexpected token: id" error. I would guess the 
> sees this as "Word : Word" and accordingly chokes. Or am I 
> misunderstanding something?
> Howard


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

More information about the antlr-interest mailing list