[antlr-interest] nondeterminism in lexer rule
Abhijit Nandkumar Ghonge
Abhijit_Ghonge at infosys.com
Fri Mar 2 02:49:14 PST 2007
Hi people,
I tried to create now a simple lexer file as below:
Simple.g:
class SimpleLexer extends Lexer;
options {
charVocabulary = '\0'..'\377';
exportVocab = SimpleScr; // call the vocabulary "FinScr"
testLiterals =true; // automatically test for literals
k = 7; // eight characters of lookahead to
distinguish 'end' from 'end-->'
caseSensitive = false;
caseSensitiveLiterals = false;
// filter = true;
}
IF : "if" ;
THEN : "then" ;
ELSE : "else" ;
GOTO : "goto" ;
DOT : '.' ;
protected
LITERAL
: (('a'..'z') ('a'..'z'|'0'..'9'| '_' | '@' | '$')*)
;
VAR_RCF
: (LITERAL DOT )=> LITERAL DOT LITERAL (DOT LITERAL)?
| LITERAL { $setType(NAME); }
;
But alas, I'm still getting this nondeterministic warnings:
simple.g: warning:lexical nondeterminism between rules IF and VAR_RCF
upon
simple.g: k==1:'i'
simple.g: k==2:'f'
simple.g: k==3:<end-of-token>
simple.g: k==4:<end-of-token>
simple.g: k==5:<end-of-token>
simple.g: k==6:<end-of-token>
simple.g: k==7:<end-of-token>
Basically the nextToken() method generates following code:
else if ((LA(1)=='i') && (LA(2)=='f') && (true) && (true) && (true) &&
true) && (true)) {
mIF(true);
theRetToken=_returnToken;
}else if (((LA(1) >= 'a' && LA(1) <= 'z')) && (true) && (true) && (true)
&& (true) && (true) && (true)) {
mVAR_RCF(true);
theRetToken=_returnToken;
}
So suppose you have some statement like goto IFINIT where IFINIT is a
label. The nextToken in the above loop matches IF rather than the
complete word IFINIT and throws error [line 13:8: expecting NAME, found
'IF']. How can I develop a grammer logic wherein I will take the
complete word and then look up the literal table. Please help.
Thanks & regards,
Abhijit.
-----Original Message-----
>If 'if' is a keyword in your language, why not using ANTLR's builtin
>support for keyword by doing "testLiterals = true"? You can use java.g
>as an example.
> Can I put some logic wherein it will compare the whole token IFINIT
> with IF rather character by character.
>
> I have declared IF as literal with following option for the grammer
and
> my label IFINIT/ENDSCRIPT forms part of token LITERAL which is defined
> as below:
>
--
Xue Yong Zhi
XRuby (Ruby to Java bytecode compiler):
http://xruby.blogspot.com
**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070302/e992a453/attachment-0001.html
More information about the antlr-interest
mailing list