[antlr-interest] NON-reserved Words

Tue Apr 29 01:37:44 PDT 2003

If you want a (complex) example of a grammar that has to handle this
problem, then go to www.maverick-dbms.org and download MaVerick. In
BASIC.g it has to cope with the REM identifier.

Given that there are multiple variants of databasic (which this is an
implementation of) where REM is variously the REMark keyword, not a
keyword at all, or can vary depending on context, I gather Rob had a lot
of fun getting it to work!

And even the commercial compilers don't get it right - when we ported
our code from PI to UV (PI didn't have REMark), the UV compiler got it
right (as a variable) sometimes and wrong (as a keyword) sometimes with
the result that the compile screwed up pretty spectacularly :-)

Cheers,
Wol

-----Original Message-----
From: Brian Hagenbuch [mailto:bhagenbuch at didera.com] 
Sent: 29 April 2003 00:54
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] NON-reserved Words

I'm trying to parse a a dialect of SQL in which there are both
reserved and "non-reserved" words.  A non-reserved word is one
that can be an identifier or a syntactic marker depending on the
context.  For example CASE can be a variable, as in

  ... WHERE CASE=12 AND...

or it can begin an expression, as in

  SELECT CASE WHEN AGE<20 THEN 'kid' ELSE 'geezer' END, ...

The language has about 100 such non-reserved words and about 100
reserved words.

The approach suggested by the FAQ and the Yahoo Group seems to
something like

- Have the lexer treat non-reserved words (like CASE) as ordinary
  identifiers, i.e., don't represent them in the literals table.

- Use semantic predicates in the parser to distinguish the cases.

I started down this road, but found it to be complicated to
implement and hard/impossible to remove non-determinism.  So I'm 
considering a different approach:

- Have the lexer treat non-reserved words as keywords and collect
  them under a parser rule such as maybeIdentifier.

- In the parser, include maybeIdentifer in the rule for variable.

- Use syntactic predicates to distinguish the cases.

So far this seems to be easier, but I'm inexperienced with Antlr
and parsing in general and I'm concerned that there's some GOTCHA
lurking in there somewhere.

Ideas?  Experiences?  Thank you.

Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/ 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

-------------- next part --------------
This transmission is intended for the named recipient only. It may contain private and confidential information. If this has come to you in error you must not act on anything disclosed in it, nor must you copy it, modify it, disseminate it in any way, or show it to anyone. Please e-mail the sender to inform us of the transmission error or telephone ECA International immediately and delete the e-mail from your information system.

Telephone numbers for ECA International offices are: Sydney +61 (0)2 9911 7799, Hong Kong + 852 2121 2388, London +44 (0)20 7351 5000 and New York +1 212 582 2333.