[antlr-interest] Languages where keywords can be used as identifiers

Tue Feb 7 15:37:08 PST 2006

Thanks.
The problem with this is that the list of (unreserved) keywords is
expanding, so I would need to maintain the unreservedKeyword rule.  I
need some way of guaranteeing that all of the keywords are in the rule,
so I could use the literals txt file generated to generate the
unreservedKeyword rule and import that rule into the grammar...

I have been trying a different approach, and have made a method that
(greedily) fetches and matches the next token

	/**
	  * Returns the string of the identifier.
	  * <p>Should be used instead of the ID token, since the ID
token will only be returned
	  * by the lexer if the identifier is not a keyword
	  * <p>Show caution in the use of this method, particularly if
ID is only one of many options.  
	  * If it is then getID should be the last option, as it will
physically force the parser to chew up the next token regardless (i.e.
always matches)
	  * @throws TokenStreamException 
	  * @throws MismatchedTokenException 
	  **/
	private Token getID() throws MismatchedTokenException,
TokenStreamException{
		Token result = LT(1);
		match(result.getType());
		return result;
	}

It works in the case that the rule actually can be greedy, but the
obvious downsides of this are that the getID needs to be the last option
within any selection, and if it is part of an optional clause it will
fail.  I could modify it to stop it calling match 100% of the time
(possibly by passing in an exception set of tokens)

But both of these approaches seem to be... less than elegant.

Are there any hooks (from the parser) into the lexer, to tell it to
switch off testLiterals, or (due to lookahead) is it already too late
once the parser is parsing a rule?

P.S. I'm leaning towards your solution.

-----Original Message-----
From: John Green [mailto:greenj at ix.netcom.com] 
Sent: Wednesday, 8 February 2006 12:13 p.m.
To: Adam Bishop (DSLWN)
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Languages where keywords can be used as
identifiers

I went through the same thing a long time ago. To do it similar to what
I did:

The lexer would always recognize "loop" as a keyword token LOOP.

The grammar would have a rule like:
  unreservedkeyword: loop | etc | etc ;

The grammar would use a rule named "id":
  id: ID | unreservedkeyword ;

But enhance that last rule a bit, so that when you add it to the tree,
you change the type from LOOP (or whatever keyword) to ID:
  id: ID | urk:unreservedkeyword { #urk.setType(ID); }
I probably have the syntax wrong for setType, sorry, this is off the top
of my head.

Now your grammar can use:
  "goto" id
and
  datatype id

HTH,
John
john at joanju dot com

Adam Bishop (DSLWN) wrote:
> I am parsing a language where "loop" is a keyword, however a label can

> be named loop.  The rule for label expects an identifier token, but
the 
> lexer will return a loop token.  Is there any way to switch
testLiterals 
> for a particular rule?
> 
>  
> 
> Ideally the Lexer shouldn't be doing testLiterals for any usage of the

> token ID in the parser.
> 
>  
> 
> NOTE:  To make things worse, I am having this problem wherever I have
a 
> rule in the parser that expects an identifier
> 
> e.g.
> 
>  
> 
> "goto" ID
> 
>  
> 
> Will fail for input "goto loop"
> 
>  
> 
> And
> 
>  
> 
> datatype ID
> 
>  
> 
> will fail for "Number length" (since length is a keyword in another
rule)
> 
>  
>