[antlr-interest] C-Target $label.text / toString malfunction

Tobias Pape Das.Linux at gmx.de
Sun Apr 1 20:33:18 PDT 2007


Hi again,


Am 2007-03-31 um 18:25 schrieb Tobias Pape:

> Hi guys, hi Jim in particular.
>
>
>
[..]
> into the @init block of my first parser rule,
> but later patched the C.stg to say
> ==============================================
> <else>
> strStream->toStringSS(strStream,(ANTLR3_INT32)(<scope>.start- 
> >getTokenIndex(<scope>.start)),(ANTLR3_INT32)(<scope>.stop- 
> >getTokenIndex(<scope>.stop)))
> <endif>
> ==============================================
> instead of
> ==============================================
> <else>
> strStream->toString(<scope>.start,<scope>.stop)
> <endif>
> ==============================================
>
[..]

I tweaked it again, to really get the token:
<else>
<scope>.start->getText(<scope>.start)
<endif>

this merely looks even more like a hack ;)
but, finally, it does what I expect it to do..
It seems now, "multi-token tokens" will not return what expected,
(hence <scope>.start and <scope>.end in every _return-struce)
but until now, I haven't figured out, what this should be anyway.
( at this very moment i think of a problem I'm having, kind-of multi  
token,
but, heck it's in the lexer

here I go again:)

My problem today is my lexer generating "wrong" tokens.
Lexer:
==============================================

// Grammar productions
WhiteSpace
   : ( Space | Tab | Form )     { $channel=99; }
   | ( '\r' | '\n' | '\r''\n' ) { /*newline(); */ $channel=99; }
   ;

Comment
   : '\"'
     /* ( : ~('\"'|'\n'|'\r')
       | '\n'     { newline(); }
       | '\r'     { newline(); }
       //| '\r''\n' { newline(); }
     )*
     */
     ( options {greedy=false;} : . )*
     '\"' { $channel=99; }
   ;


KeywordOrIdentifier
	:	( Primitive ) => Primitive
	|	( KeywordSelector ) => KeywordSelector	
	|	( Keyword ) => Keyword
	|	Identifier
	;


Primitive
   : 'primitive' { $type=Primitive;  }
   ;



KeywordSelector
   :// sould be used twice, or its a keyword ( Keyword )+  
{ $type=KeywordSelector; }
   Keyword ( Keyword )+ { $type=KeywordSelector; }
   ;

Keyword
   : Identifier Colon { $type=Keyword;    }
   ;

Identifier
   : Letter ( Letter | Digit | '_' )* { $type=Identifier; }
   ;


fragment Letter
   : Upper | Lower
   ;

fragment Upper
   : 'A' .. 'Z'
   ;

fragment Lower
   : 'a' .. 'z'
   ;

fragment Digit
   : '0' .. '9'
   ;

Integer
   : ( Digit )+
   ;

String
   : ( '\'' ( ~'\'' )* '\'' )
   ;

NewBlock : '[' ;
EndBlock : ']' ;
Colon    : ':' ;
Period   : '.' ;
Exit     : '^' ;
Assign   : ':=' ;
NewTerm  : '(' ;
EndTerm  : ')' ;
Pound    : '#' ;

fragment Space : ' '  ;
fragment Tab   : '\t' ;
fragment Form   : '\f';


Not   : '~' {$type=SingleOperator;};
And   : '&' {$type=SingleOperator;};
Or    : '|' {$type=Or;};
Star  : '*' {$type=SingleOperator;};
Div   : '/' {$type=SingleOperator;};
Mod   : '\\' {$type=SingleOperator;};
Plus  : '+' {$type=SingleOperator;};
Minus : '-' {$type=Minus;};
Equal : '='  {$type=Equal;};
// We must obviously match the = counterpatr aka <> extra,
// because otherwise, < is treated as Op and > as Arg:
// Differ	:	'<>' {$type=SingleOperator;};
Differ	:	'<>' {$type=OperatorSequence;};
More  : '>' {$type=SingleOperator;};
Less  : '<' {$type=SingleOperator;};
Comma : ',' {$type=Comma;};
At    : '@' {$type=SingleOperator;};
Per   : '%' {$type=SingleOperator;};

Separator :
   Minus Minus Minus Minus ( Minus )* {$type=Separator;}
;


fragment  SingleOperator :
   Not | And | Or | Star | Div | Mod | Plus |
   Equal | More | Less | Comma | At | Per
   | Differ
;

OperatorSequence :
// ( SingleOperator )+ {$type=OperatorSequence;} // shouldn't this be  
the following? i.e. one op is _no_ op-sequence ;)
   SingleOperator ( SingleOperator )+ {$type=OperatorSequence;}
//  | SingleOperator* Differ SingeOperator* { $type=OperatorSequence; }
;
==============================================

As you see, I already insertet a "Differ" rule, because my <> wasn't  
matched as OperatorSequence.
Another Example was, that "error:", which  should be a "Keyword" was  
matched as "Identifier"
(discarding the : in the parser an generating errors then xD).

in the "orginal" v2-Grammar, there was lot of ( .. )=>, which i  
removed after
Gavin Lamberts suggest:<D1C0EC74-2D50-4E28-B9EE-1A00916D9404 at gmx.de>

for your information, the "old" Lexer:
==============================================
/ Grammar productions
WhiteSpace
   : ( Space | Tab | Form )     { $setType(Token.SKIP); }
   | ( '\r' | '\n' | '\r''\n' ) { newline(); $setType(Token.SKIP); }
   ;

Comment
   : '\"'
     (
       options
       {
         generateAmbigWarnings=false;
       }

       : ~('\"'|'\n'|'\r')
       | '\n'     { newline(); }
       | '\r'     { newline(); }
       | '\r''\n' { newline(); }
     )*
     '\"' { $setType(Token.SKIP); }
   ;

KeywordOrIdentifier
   : ( Keyword Keyword) => KeywordSelector { $setType 
(KeywordSelector); }
   | ( Keyword )        => Keyword         { $setType(Keyword);    }
   | ( Primitive )      => Primitive       { $setType(Primitive);  }
   | Identifier                            { $setType(Identifier); }
   ;

protected Identifier
   : Letter ( Letter | Digit | '_' )*
   ;

protected Primitive
   : "primitive"
   ;

protected Keyword
   : Identifier Colon
   ;

protected KeywordSelector
   : ( Keyword )+
   ;

protected Letter
   : Upper | Lower
   ;

protected Upper
   : 'A' .. 'Z'
   ;

protected Lower
   : 'a' .. 'z'
   ;

protected Digit
   : '0' .. '9'
   ;

Integer
   : ( Digit )+
   ;

String
   : ( '\'' ( ~'\'' )* '\'' )
   ;

NewBlock : '[' ;
EndBlock : ']' ;
Colon    : ':' ;
Period   : '.' ;
Exit     : '^' ;
Assign   : ":=" ;
NewTerm  : '(' ;
EndTerm  : ')' ;
Pound    : '#' ;

protected Space : ' '  ;
protected Tab   : '\t' ;
protected Form   : '\f';

BinarySelector :
( Minus Minus Minus Minus ) =>
   Separator {$setType(Separator);}
|
( SingleOperator SingleOperator ) =>
   OperatorSequence {$setType(OperatorSequence);}
|
( Minus SingleOperator ) =>
   Minus OperatorSequence {$setType(OperatorSequence);}
|
( Minus ) =>
   Minus {$setType(Minus);}
|
( Comma ) =>
   Comma {$setType(Comma);}
|
( Or ) =>
   Or {$setType(Or);}
|
( Equal ) =>
   Equal {$setType(Equal);}
|
   SingleOperator {$setType(SingleOperator);}
;

protected Separator :
   Minus Minus Minus Minus ( Minus )*
;

protected OperatorSequence :
   ( SingleOperator )+
;

protected SingleOperator :
   Not | And | Or | Star | Div | Mod | Plus |
   Equal | More | Less | Comma | At | Per
;

protected Not   : '~' ;
protected And   : '&' ;
protected Or    : '|' ;
protected Star  : '*' ;
protected Div   : '/' ;
protected Mod   : '\\' ;
protected Plus  : '+' ;
protected Minus : '-' ;
protected Equal : '=' ;
protected More  : '>' ;
protected Less  : '<' ;
protected Comma : ',' ;
protected At    : '@' ;
protected Per   : '%' ;
==============================================


whatever I'm doing wrong, I managed to obfuscate the orgin of the  
errors behing the conversion done :(.

any help apreciated,
thank you
	-Tobias


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: Signierter Teil der Nachricht
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20070402/a94e325b/attachment.bin 


More information about the antlr-interest mailing list