[antlr-interest] Re: more lexical determinism
tbrandonau
tom at psy.unsw.edu.au
Thu Dec 6 15:24:35 PST 2001
You want anything with all letters to be a word and anything with
a '_' or digit to be a identifier right? So can't you just have:
Word:
(
Letter
| '_' {$setType(Identifier);}
| Digit {$setType(Identifier);}
)+
;
i.e. if its got an '_' or a digit its an identifier otherwise its a
word.
But, you have non-determinism in that "Hello" is a valid word and a
valid identifier, and it will get recognized as a valid Word. So in
the parser you'd need:
pair: (Identifier|Word) COLON Word;
Then you could create an Identifier Token\AST for the LHS Word in the
parser.
Tom.
--- In antlr-interest at y..., "howardckatz" <howardk at f...> wrote:
> --- In antlr-interest at y..., Terence Parr <parrt at j...> wrote:
>
> ...
>
> > As for distinguishing between the two kinds of words/ids, you
could
> > do the following in one rule (assume Word unless you see _ or
> > digit):
> >
> > Word: ( Letter | '_' {$setType(Identifier);}) (Letter |
> > Digit{$setType(Identifier);})*;
>
> That didn't quite do it, I think, Doesn't the above say that
anything
> starting with a Letter is a Word? But that's not what I want, since
> valid Identifiers can start with Letters too. The following should
be
> legal input,
>
> id : word
>
> but throws an "Unexpected token: id" error. I would guess the
parser
> sees this as "Word : Word" and accordingly chokes. Or am I
> misunderstanding something?
>
> Howard
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list