[antlr-interest] Re: Representing state in lexer

Wed May 14 07:28:09 PDT 2003

Thank you for the suggestion.
The only problem I have with this is that I need a pattern to be 
defined as a PATTERN token with or without quotes. If I try and 
specify without quotes, I am getting nondeterminism errors.

--- In antlr-interest at yahoogroups.com, "Brett Crawley" <brett at c...> 
wrote:
> I would define quote without the action, remove " from the allowed 
> characters and define pattern to be:
> 
> PATTERN : QUOTE (ALLOWCHARS)+ QUOTE ;
> 
> then define TERM to be:
> 
> TERM : PATTERN | 
>         ("gt")=> "gt"
>  	{$setType(GT_OP);}
>   	| (">")=> ">"
>  	{$setType(GT_OP);}
>  	|("ge")=> "ge" 
>  	{$setType(GE_OP);}
>  	|(">=")=> ">="
>  	{$setType(GE_OP);}
>  	|("lt")=>"lt" 
>  	{$setType(LT_OP);}
>  	|("<")=>"<"
>  	{$setType(LT_OP);}
>  	|("le")=>"le"
>  	{$setType(LE_OP);}
>  	|("<=")=>"<="
>  	{$setType(LE_OP);}
>  	|("eq")=>"eq" 
>  	{$setType(EQ_OP);}
>  	|("=")=>"="
>  	{$setType(EQ_OP);}
>  	|("-")=>"-"
>  	{$setType(DASH);}
>  	| ("or") => "or"
>  	{$setType(OR_OP);}
>  	| ("and") => "and"
>  	{$setType(AND_OP);}
>  	| ("not") => "not"
>  	{$setType(NOT_OP);}
>  	|(('a'..'z')('a'..'z') WS) => ('a'..'z')('a'..'z')
>        	{
>  		$setType(S_TAG);
>          }
>          | ('w'INT)=>'w'INT
>  	{$setType(W_OP);}
>          | ('n'INT)=>'n'INT
>  	{$setType(N_OP);} ;
> 
> Hope this is of help.
> 
> Regards
> 
> ---- In antlr-interest at yahoogroups.com, "cgodfrey86" 
<cgodfrey at e...> 
> wrote:
> > Hello,
> > 
> > I am trying to write a grammar file which recognizes a subset of 
> > tokens only if in a specific state.
> > 
> > For example AND is recognized as token AND_OP if NOT appearing 
> within 
> > quotes. If appearing within quotes, AND is recognized as a 
PATTERN 
> > token. I've included the grammar file which I have defined. Any 
> > suggestions as to what I am doing wrong would be appreciated.
> > 
> > When I run a test program using the generated lexer, tokens are 
> > recognized properly when appearing in quotes:
> > 
> > "WAR AND PEACE";
> > *************************************************
> >  > lexer mQUOTE; c=="
> >  < lexer mQUOTE; c==w
> > Token: [""",<17>,line=1,col=1]
> > Token Type: 17
> > Token Text: "
> >  > lexer mTERM; c==w
> >   > lexer mALLOWCHARS; c==w
> >   < lexer mALLOWCHARS; c==a
> >   > lexer mALLOWCHARS; c==a
> >   < lexer mALLOWCHARS; c==r
> >   > lexer mALLOWCHARS; c==r
> >   < lexer mALLOWCHARS; c==
> >  < lexer mTERM; c==
> > Token: ["WAR",<16>,line=1,col=2]
> > Token Type: 16
> > Token Text: WAR
> >  > lexer mWS; c==
> >  < lexer mWS; c==a
> >  > lexer mTERM; c==a
> >   > lexer mALLOWCHARS; c==a
> >   < lexer mALLOWCHARS; c==n
> >   > lexer mALLOWCHARS; c==n
> >   < lexer mALLOWCHARS; c==d
> >   > lexer mALLOWCHARS; c==d
> >   < lexer mALLOWCHARS; c==
> >  < lexer mTERM; c==
> > Token: ["AND",<16>,line=1,col=6]
> > Token Type: 16
> > Token Text: AND
> >  > lexer mWS; c==
> >  < lexer mWS; c==p
> >  > lexer mTERM; c==p
> >   > lexer mALLOWCHARS; c==p
> >   < lexer mALLOWCHARS; c==e
> >   > lexer mALLOWCHARS; c==e
> >   < lexer mALLOWCHARS; c==a
> >   > lexer mALLOWCHARS; c==a
> >   < lexer mALLOWCHARS; c==c
> >   > lexer mALLOWCHARS; c==c
> >   < lexer mALLOWCHARS; c==e
> >   > lexer mALLOWCHARS; c==e
> >   < lexer mALLOWCHARS; c=="
> >  < lexer mTERM; c=="
> > Token: ["PEACE",<16>,line=1,col=10]
> > Token Type: 16
> > Token Text: PEACE
> >  > lexer mQUOTE; c=="
> >  < lexer mQUOTE; c==;
> > Token: [""",<17>,line=1,col=15]
> > Token Type: 17
> > Token Text: "
> >  > lexer mSEMI; c==;
> >  < lexer mSEMI; c==
> > Token: [";",<26>,line=1,col=16]
> > Token Type: 26
> > Token Text: ;
> > done lexing...
> > *************************************************
> > 
> > When appearing without quotes, tokens are not recognized as 
> expected:
> > WAR AND PEACE;
> > *************************************************
> >  > lexer mTERM; c==w
> >   > lexer mWS; c==r
> >   < lexer mWS; c==r
> >  < lexer mTERM; c==w
> > exception: line 1:1: unexpected char: 'w'
> > *************************************************
> > AND PEACE;
> > *************************************************
> >  > lexer mTERM; c==a
> >  < lexer mTERM; c==
> > Token: ["AND",<6>,line=1,col=1]
> > Token Type: 6
> > Token Text: AND
> >  > lexer mWS; c==
> >  < lexer mWS; c==p
> >  > lexer mTERM; c==p
> >   > lexer mWS; c==a
> >   < lexer mWS; c==a
> >  < lexer mTERM; c==p
> > exception: line 1:5: unexpected char: 'p'
> > *************************************************
> > 
> > options
> > {
> > 	language = "CSharp";
> > }
> > 
> > class UserLexer extends Lexer;
> > options {
> >   k=3;
> >   caseSensitive=false;
> >   caseSensitiveLiterals=false;
> > }
> > 
> > tokens {
> > S_TAG;
> > OR_OP;
> > AND_OP;
> > NOT_OP;
> > GT_OP;
> > GE_OP;
> > LT_OP;
> > LE_OP;
> > EQ_OP;
> > DASH;
> > W_OP;
> > N_OP;
> > PATTERN;
> > }
> > 
> > 
> > {
> > 	
> > 
> > 	public bool isQuoted = false;
> > 
> > }
> > 
> > 
> > QUOTE : '"' {if (this.isQuoted) {this.isQuoted = false;} else 
> > {this.isQuoted = true;} };
> > 
> > OPEN_PAREN : '(';
> > 
> > CLOSE_PAREN : ')';
> > 
> > 
> > TERM 	:
> > 	{!this.isQuoted}?
> > 	(
> > 	("gt")=> "gt"
> > 	{$setType(GT_OP);}
> >  	| (">")=> ">"
> > 	{$setType(GT_OP);}
> > 	|("ge")=> "ge" 
> > 	{$setType(GE_OP);}
> > 	|(">=")=> ">="
> > 	{$setType(GE_OP);}
> > 	|("lt")=>"lt" 
> > 	{$setType(LT_OP);}
> > 	|("<")=>"<"
> > 	{$setType(LT_OP);}
> > 	|("le")=>"le"
> > 	{$setType(LE_OP);}
> > 	|("<=")=>"<="
> > 	{$setType(LE_OP);}
> > 	|("eq")=>"eq" 
> > 	{$setType(EQ_OP);}
> > 	|("=")=>"="
> > 	{$setType(EQ_OP);}
> > 	|("-")=>"-"
> > 	{$setType(DASH);}
> > 	| ("or") => "or"
> > 	{$setType(OR_OP);}
> > 	| ("and") => "and"
> > 	{$setType(AND_OP);}
> > 	| ("not") => "not"
> > 	{$setType(NOT_OP);}
> > 	|(('a'..'z')('a'..'z') WS) => ('a'..'z')('a'..'z')
> >       	{
> > 		$setType(S_TAG);
> >         }
> >         | ('w'INT)=>'w'INT
> > 	{$setType(W_OP);}
> >         | ('n'INT)=>'n'INT
> > 	{$setType(N_OP);}
> > 	)
> > 	|
> > 	(ALLOWCHARS)+
> > 	{$setType(PATTERN);}
> >      	;
> > 
> > 
> > protected
> > REAL   : INT'.'INT;
> > 
> > protected 
> > DIGIT	: ('0'..'9');
> > 
> > protected
> > INT     : (DIGIT)+;
> >  
> > 
> > protected
> > ALLOWCHARS : ~('"'|'('|')'|'\n'|' '|'\r'|'\t'|';');
> > 
> > WS	:	(
> > 			options {
> > 			generateAmbigWarnings=false;
> > 			}
> > 		:	' '
> > 		|	'\t'
> > 		|	'\n'	{ newline(); }
> > 		|	"\r\n"	{ newline(); }
> > 		|	'\r'	{ newline(); }
> > 		)+
> > 	{ $setType(Token.SKIP); }
> > 	;
> > 
> > // semi is made special for test here only
> > SEMI   : ';';

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/