[antlr-interest] NON-reserved Words

Mon Apr 28 16:53:31 PDT 2003

I'm trying to parse a a dialect of SQL in which there are both
reserved and "non-reserved" words.  A non-reserved word is one
that can be an identifier or a syntactic marker depending on the
context.  For example CASE can be a variable, as in

  ... WHERE CASE=12 AND...

or it can begin an expression, as in

  SELECT CASE WHEN AGE<20 THEN 'kid' ELSE 'geezer' END, ...

The language has about 100 such non-reserved words and about 100
reserved words.

The approach suggested by the FAQ and the Yahoo Group seems to
something like

- Have the lexer treat non-reserved words (like CASE) as ordinary
  identifiers, i.e., don't represent them in the literals table.

- Use semantic predicates in the parser to distinguish the cases.

I started down this road, but found it to be complicated to
implement and hard/impossible to remove non-determinism.  So I'm 
considering a different approach:

- Have the lexer treat non-reserved words as keywords and collect
  them under a parser rule such as maybeIdentifier.

- In the parser, include maybeIdentifer in the rule for variable.

- Use syntactic predicates to distinguish the cases.

So far this seems to be easier, but I'm inexperienced with Antlr
and parsing in general and I'm concerned that there's some GOTCHA
lurking in there somewhere.

Ideas?  Experiences?  Thank you.

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/