[antlr-interest] Re: Representing state in lexer
Brett Crawley
brett at crawley.uk.com
Wed May 14 07:02:22 PDT 2003
I would define quote without the action, remove " from the allowed
characters and define pattern to be:
PATTERN : QUOTE (ALLOWCHARS)+ QUOTE ;
then define TERM to be:
TERM : PATTERN |
("gt")=> "gt"
{$setType(GT_OP);}
| (">")=> ">"
{$setType(GT_OP);}
|("ge")=> "ge"
{$setType(GE_OP);}
|(">=")=> ">="
{$setType(GE_OP);}
|("lt")=>"lt"
{$setType(LT_OP);}
|("<")=>"<"
{$setType(LT_OP);}
|("le")=>"le"
{$setType(LE_OP);}
|("<=")=>"<="
{$setType(LE_OP);}
|("eq")=>"eq"
{$setType(EQ_OP);}
|("=")=>"="
{$setType(EQ_OP);}
|("-")=>"-"
{$setType(DASH);}
| ("or") => "or"
{$setType(OR_OP);}
| ("and") => "and"
{$setType(AND_OP);}
| ("not") => "not"
{$setType(NOT_OP);}
|(('a'..'z')('a'..'z') WS) => ('a'..'z')('a'..'z')
{
$setType(S_TAG);
}
| ('w'INT)=>'w'INT
{$setType(W_OP);}
| ('n'INT)=>'n'INT
{$setType(N_OP);} ;
Hope this is of help.
Regards
---- In antlr-interest at yahoogroups.com, "cgodfrey86" <cgodfrey at e...>
wrote:
> Hello,
>
> I am trying to write a grammar file which recognizes a subset of
> tokens only if in a specific state.
>
> For example AND is recognized as token AND_OP if NOT appearing
within
> quotes. If appearing within quotes, AND is recognized as a PATTERN
> token. I've included the grammar file which I have defined. Any
> suggestions as to what I am doing wrong would be appreciated.
>
> When I run a test program using the generated lexer, tokens are
> recognized properly when appearing in quotes:
>
> "WAR AND PEACE";
> *************************************************
> > lexer mQUOTE; c=="
> < lexer mQUOTE; c==w
> Token: [""",<17>,line=1,col=1]
> Token Type: 17
> Token Text: "
> > lexer mTERM; c==w
> > lexer mALLOWCHARS; c==w
> < lexer mALLOWCHARS; c==a
> > lexer mALLOWCHARS; c==a
> < lexer mALLOWCHARS; c==r
> > lexer mALLOWCHARS; c==r
> < lexer mALLOWCHARS; c==
> < lexer mTERM; c==
> Token: ["WAR",<16>,line=1,col=2]
> Token Type: 16
> Token Text: WAR
> > lexer mWS; c==
> < lexer mWS; c==a
> > lexer mTERM; c==a
> > lexer mALLOWCHARS; c==a
> < lexer mALLOWCHARS; c==n
> > lexer mALLOWCHARS; c==n
> < lexer mALLOWCHARS; c==d
> > lexer mALLOWCHARS; c==d
> < lexer mALLOWCHARS; c==
> < lexer mTERM; c==
> Token: ["AND",<16>,line=1,col=6]
> Token Type: 16
> Token Text: AND
> > lexer mWS; c==
> < lexer mWS; c==p
> > lexer mTERM; c==p
> > lexer mALLOWCHARS; c==p
> < lexer mALLOWCHARS; c==e
> > lexer mALLOWCHARS; c==e
> < lexer mALLOWCHARS; c==a
> > lexer mALLOWCHARS; c==a
> < lexer mALLOWCHARS; c==c
> > lexer mALLOWCHARS; c==c
> < lexer mALLOWCHARS; c==e
> > lexer mALLOWCHARS; c==e
> < lexer mALLOWCHARS; c=="
> < lexer mTERM; c=="
> Token: ["PEACE",<16>,line=1,col=10]
> Token Type: 16
> Token Text: PEACE
> > lexer mQUOTE; c=="
> < lexer mQUOTE; c==;
> Token: [""",<17>,line=1,col=15]
> Token Type: 17
> Token Text: "
> > lexer mSEMI; c==;
> < lexer mSEMI; c==
> Token: [";",<26>,line=1,col=16]
> Token Type: 26
> Token Text: ;
> done lexing...
> *************************************************
>
> When appearing without quotes, tokens are not recognized as
expected:
> WAR AND PEACE;
> *************************************************
> > lexer mTERM; c==w
> > lexer mWS; c==r
> < lexer mWS; c==r
> < lexer mTERM; c==w
> exception: line 1:1: unexpected char: 'w'
> *************************************************
> AND PEACE;
> *************************************************
> > lexer mTERM; c==a
> < lexer mTERM; c==
> Token: ["AND",<6>,line=1,col=1]
> Token Type: 6
> Token Text: AND
> > lexer mWS; c==
> < lexer mWS; c==p
> > lexer mTERM; c==p
> > lexer mWS; c==a
> < lexer mWS; c==a
> < lexer mTERM; c==p
> exception: line 1:5: unexpected char: 'p'
> *************************************************
>
> options
> {
> language = "CSharp";
> }
>
> class UserLexer extends Lexer;
> options {
> k=3;
> caseSensitive=false;
> caseSensitiveLiterals=false;
> }
>
> tokens {
> S_TAG;
> OR_OP;
> AND_OP;
> NOT_OP;
> GT_OP;
> GE_OP;
> LT_OP;
> LE_OP;
> EQ_OP;
> DASH;
> W_OP;
> N_OP;
> PATTERN;
> }
>
>
> {
>
>
> public bool isQuoted = false;
>
> }
>
>
> QUOTE : '"' {if (this.isQuoted) {this.isQuoted = false;} else
> {this.isQuoted = true;} };
>
> OPEN_PAREN : '(';
>
> CLOSE_PAREN : ')';
>
>
> TERM :
> {!this.isQuoted}?
> (
> ("gt")=> "gt"
> {$setType(GT_OP);}
> | (">")=> ">"
> {$setType(GT_OP);}
> |("ge")=> "ge"
> {$setType(GE_OP);}
> |(">=")=> ">="
> {$setType(GE_OP);}
> |("lt")=>"lt"
> {$setType(LT_OP);}
> |("<")=>"<"
> {$setType(LT_OP);}
> |("le")=>"le"
> {$setType(LE_OP);}
> |("<=")=>"<="
> {$setType(LE_OP);}
> |("eq")=>"eq"
> {$setType(EQ_OP);}
> |("=")=>"="
> {$setType(EQ_OP);}
> |("-")=>"-"
> {$setType(DASH);}
> | ("or") => "or"
> {$setType(OR_OP);}
> | ("and") => "and"
> {$setType(AND_OP);}
> | ("not") => "not"
> {$setType(NOT_OP);}
> |(('a'..'z')('a'..'z') WS) => ('a'..'z')('a'..'z')
> {
> $setType(S_TAG);
> }
> | ('w'INT)=>'w'INT
> {$setType(W_OP);}
> | ('n'INT)=>'n'INT
> {$setType(N_OP);}
> )
> |
> (ALLOWCHARS)+
> {$setType(PATTERN);}
> ;
>
>
> protected
> REAL : INT'.'INT;
>
> protected
> DIGIT : ('0'..'9');
>
> protected
> INT : (DIGIT)+;
>
>
> protected
> ALLOWCHARS : ~('"'|'('|')'|'\n'|' '|'\r'|'\t'|';');
>
> WS : (
> options {
> generateAmbigWarnings=false;
> }
> : ' '
> | '\t'
> | '\n' { newline(); }
> | "\r\n" { newline(); }
> | '\r' { newline(); }
> )+
> { $setType(Token.SKIP); }
> ;
>
> // semi is made special for test here only
> SEMI : ';';
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list