[antlr-interest] nondeterminism in lexer rule

Fri Mar 2 02:49:14 PST 2007

Hi people,
I tried to create now a simple lexer file as below:

Simple.g:

class SimpleLexer extends Lexer;
options {
  charVocabulary = '\0'..'\377';
  exportVocab = SimpleScr;   // call the vocabulary "FinScr"
  testLiterals =true;   // automatically test for literals
  k = 7;                  // eight characters of lookahead to
distinguish 'end' from 'end-->'
  caseSensitive = false;
  caseSensitiveLiterals = false;
//  filter = true;

}

IF	:	"if"	;

THEN	:	"then"	;

ELSE	:	"else"	;

GOTO	:	"goto"	;

DOT	:	'.'	;

protected
LITERAL
	: (('a'..'z') ('a'..'z'|'0'..'9'| '_' | '@' | '$')*)
	;

VAR_RCF
	: (LITERAL DOT )=> LITERAL DOT LITERAL (DOT LITERAL)? 
	| LITERAL {  $setType(NAME); }
	;

But alas, I'm still getting this nondeterministic warnings:
simple.g: warning:lexical nondeterminism between rules IF and VAR_RCF
upon
simple.g:     k==1:'i'
simple.g:     k==2:'f'
simple.g:     k==3:<end-of-token>
simple.g:     k==4:<end-of-token>
simple.g:     k==5:<end-of-token>
simple.g:     k==6:<end-of-token>
simple.g:     k==7:<end-of-token>

Basically the nextToken() method generates following code:
else if ((LA(1)=='i') && (LA(2)=='f') && (true) && (true) && (true) &&
true) && (true)) {
	mIF(true);
	theRetToken=_returnToken;
}else if (((LA(1) >= 'a' && LA(1) <= 'z')) && (true) && (true) && (true)
&& (true) && (true) && (true)) {
	mVAR_RCF(true);
	theRetToken=_returnToken;
}

So suppose you have some statement like goto IFINIT where IFINIT is a
label. The nextToken in the above loop matches IF rather than the
complete word IFINIT and throws error [line 13:8: expecting NAME, found
'IF']. How can I develop a grammer logic wherein I will take the
complete word and then look up the literal table. Please help.

Thanks & regards,
Abhijit.

-----Original Message-----
>If 'if' is a keyword in your language, why not using ANTLR's builtin
>support for keyword by doing "testLiterals = true"? You can use java.g
>as an example.

>  Can I put some logic wherein it will compare the whole token IFINIT
> with IF  rather character by character.
>
> I have declared IF as literal with following option for the grammer
and
> my label IFINIT/ENDSCRIPT forms part of token LITERAL which is defined

> as below:
>
--
Xue Yong Zhi
XRuby (Ruby to Java bytecode compiler):
http://xruby.blogspot.com

**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070302/e992a453/attachment-0001.html