[antlr-interest] Re: more lexical determinism

tbrandonau tom at psy.unsw.edu.au
Thu Dec 6 15:24:35 PST 2001


You want anything with all letters to be a word and anything with 
a '_' or digit to be a identifier right? So can't you just have:
Word:
  (
    Letter 
  | '_' {$setType(Identifier);} 
  | Digit {$setType(Identifier);}
  )+
;
i.e. if its got an '_' or a digit its an identifier otherwise its a 
word.

But, you have non-determinism in that "Hello" is a valid word and a 
valid identifier, and it will get recognized as a valid Word. So in 
the parser you'd need:
pair: (Identifier|Word) COLON Word;
Then you could create an Identifier Token\AST for the LHS Word in the 
parser.

Tom.
--- In antlr-interest at y..., "howardckatz" <howardk at f...> wrote:
> --- In antlr-interest at y..., Terence Parr <parrt at j...> wrote:
> 
>  ...
>  
> > As for distinguishing between the two kinds of words/ids, you 
could 
> > do the following in one rule (assume Word unless you see _ or 
> > digit):
> > 
> > Word:	( Letter | '_'  {$setType(Identifier);}) (Letter | 
> > Digit{$setType(Identifier);})*;
> 
> That didn't quite do it, I think, Doesn't the above say that 
anything 
> starting with a Letter is a Word? But that's not what I want, since 
> valid Identifiers can start with Letters too. The following should 
be 
> legal input,
> 
>      id : word
> 
> but throws an "Unexpected token: id" error. I would guess the 
parser 
> sees this as "Word : Word" and accordingly chokes. Or am I 
> misunderstanding something?
> 
> Howard


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list