[antlr-interest] C-Target $label.text / toString malfunction
Tobias Pape
Das.Linux at gmx.de
Sun Apr 1 20:33:18 PDT 2007
Hi again,
Am 2007-03-31 um 18:25 schrieb Tobias Pape:
> Hi guys, hi Jim in particular.
>
>
>
[..]
> into the @init block of my first parser rule,
> but later patched the C.stg to say
> ==============================================
> <else>
> strStream->toStringSS(strStream,(ANTLR3_INT32)(<scope>.start-
> >getTokenIndex(<scope>.start)),(ANTLR3_INT32)(<scope>.stop-
> >getTokenIndex(<scope>.stop)))
> <endif>
> ==============================================
> instead of
> ==============================================
> <else>
> strStream->toString(<scope>.start,<scope>.stop)
> <endif>
> ==============================================
>
[..]
I tweaked it again, to really get the token:
<else>
<scope>.start->getText(<scope>.start)
<endif>
this merely looks even more like a hack ;)
but, finally, it does what I expect it to do..
It seems now, "multi-token tokens" will not return what expected,
(hence <scope>.start and <scope>.end in every _return-struce)
but until now, I haven't figured out, what this should be anyway.
( at this very moment i think of a problem I'm having, kind-of multi
token,
but, heck it's in the lexer
here I go again:)
My problem today is my lexer generating "wrong" tokens.
Lexer:
==============================================
// Grammar productions
WhiteSpace
: ( Space | Tab | Form ) { $channel=99; }
| ( '\r' | '\n' | '\r''\n' ) { /*newline(); */ $channel=99; }
;
Comment
: '\"'
/* ( : ~('\"'|'\n'|'\r')
| '\n' { newline(); }
| '\r' { newline(); }
//| '\r''\n' { newline(); }
)*
*/
( options {greedy=false;} : . )*
'\"' { $channel=99; }
;
KeywordOrIdentifier
: ( Primitive ) => Primitive
| ( KeywordSelector ) => KeywordSelector
| ( Keyword ) => Keyword
| Identifier
;
Primitive
: 'primitive' { $type=Primitive; }
;
KeywordSelector
:// sould be used twice, or its a keyword ( Keyword )+
{ $type=KeywordSelector; }
Keyword ( Keyword )+ { $type=KeywordSelector; }
;
Keyword
: Identifier Colon { $type=Keyword; }
;
Identifier
: Letter ( Letter | Digit | '_' )* { $type=Identifier; }
;
fragment Letter
: Upper | Lower
;
fragment Upper
: 'A' .. 'Z'
;
fragment Lower
: 'a' .. 'z'
;
fragment Digit
: '0' .. '9'
;
Integer
: ( Digit )+
;
String
: ( '\'' ( ~'\'' )* '\'' )
;
NewBlock : '[' ;
EndBlock : ']' ;
Colon : ':' ;
Period : '.' ;
Exit : '^' ;
Assign : ':=' ;
NewTerm : '(' ;
EndTerm : ')' ;
Pound : '#' ;
fragment Space : ' ' ;
fragment Tab : '\t' ;
fragment Form : '\f';
Not : '~' {$type=SingleOperator;};
And : '&' {$type=SingleOperator;};
Or : '|' {$type=Or;};
Star : '*' {$type=SingleOperator;};
Div : '/' {$type=SingleOperator;};
Mod : '\\' {$type=SingleOperator;};
Plus : '+' {$type=SingleOperator;};
Minus : '-' {$type=Minus;};
Equal : '=' {$type=Equal;};
// We must obviously match the = counterpatr aka <> extra,
// because otherwise, < is treated as Op and > as Arg:
// Differ : '<>' {$type=SingleOperator;};
Differ : '<>' {$type=OperatorSequence;};
More : '>' {$type=SingleOperator;};
Less : '<' {$type=SingleOperator;};
Comma : ',' {$type=Comma;};
At : '@' {$type=SingleOperator;};
Per : '%' {$type=SingleOperator;};
Separator :
Minus Minus Minus Minus ( Minus )* {$type=Separator;}
;
fragment SingleOperator :
Not | And | Or | Star | Div | Mod | Plus |
Equal | More | Less | Comma | At | Per
| Differ
;
OperatorSequence :
// ( SingleOperator )+ {$type=OperatorSequence;} // shouldn't this be
the following? i.e. one op is _no_ op-sequence ;)
SingleOperator ( SingleOperator )+ {$type=OperatorSequence;}
// | SingleOperator* Differ SingeOperator* { $type=OperatorSequence; }
;
==============================================
As you see, I already insertet a "Differ" rule, because my <> wasn't
matched as OperatorSequence.
Another Example was, that "error:", which should be a "Keyword" was
matched as "Identifier"
(discarding the : in the parser an generating errors then xD).
in the "orginal" v2-Grammar, there was lot of ( .. )=>, which i
removed after
Gavin Lamberts suggest:<D1C0EC74-2D50-4E28-B9EE-1A00916D9404 at gmx.de>
for your information, the "old" Lexer:
==============================================
/ Grammar productions
WhiteSpace
: ( Space | Tab | Form ) { $setType(Token.SKIP); }
| ( '\r' | '\n' | '\r''\n' ) { newline(); $setType(Token.SKIP); }
;
Comment
: '\"'
(
options
{
generateAmbigWarnings=false;
}
: ~('\"'|'\n'|'\r')
| '\n' { newline(); }
| '\r' { newline(); }
| '\r''\n' { newline(); }
)*
'\"' { $setType(Token.SKIP); }
;
KeywordOrIdentifier
: ( Keyword Keyword) => KeywordSelector { $setType
(KeywordSelector); }
| ( Keyword ) => Keyword { $setType(Keyword); }
| ( Primitive ) => Primitive { $setType(Primitive); }
| Identifier { $setType(Identifier); }
;
protected Identifier
: Letter ( Letter | Digit | '_' )*
;
protected Primitive
: "primitive"
;
protected Keyword
: Identifier Colon
;
protected KeywordSelector
: ( Keyword )+
;
protected Letter
: Upper | Lower
;
protected Upper
: 'A' .. 'Z'
;
protected Lower
: 'a' .. 'z'
;
protected Digit
: '0' .. '9'
;
Integer
: ( Digit )+
;
String
: ( '\'' ( ~'\'' )* '\'' )
;
NewBlock : '[' ;
EndBlock : ']' ;
Colon : ':' ;
Period : '.' ;
Exit : '^' ;
Assign : ":=" ;
NewTerm : '(' ;
EndTerm : ')' ;
Pound : '#' ;
protected Space : ' ' ;
protected Tab : '\t' ;
protected Form : '\f';
BinarySelector :
( Minus Minus Minus Minus ) =>
Separator {$setType(Separator);}
|
( SingleOperator SingleOperator ) =>
OperatorSequence {$setType(OperatorSequence);}
|
( Minus SingleOperator ) =>
Minus OperatorSequence {$setType(OperatorSequence);}
|
( Minus ) =>
Minus {$setType(Minus);}
|
( Comma ) =>
Comma {$setType(Comma);}
|
( Or ) =>
Or {$setType(Or);}
|
( Equal ) =>
Equal {$setType(Equal);}
|
SingleOperator {$setType(SingleOperator);}
;
protected Separator :
Minus Minus Minus Minus ( Minus )*
;
protected OperatorSequence :
( SingleOperator )+
;
protected SingleOperator :
Not | And | Or | Star | Div | Mod | Plus |
Equal | More | Less | Comma | At | Per
;
protected Not : '~' ;
protected And : '&' ;
protected Or : '|' ;
protected Star : '*' ;
protected Div : '/' ;
protected Mod : '\\' ;
protected Plus : '+' ;
protected Minus : '-' ;
protected Equal : '=' ;
protected More : '>' ;
protected Less : '<' ;
protected Comma : ',' ;
protected At : '@' ;
protected Per : '%' ;
==============================================
whatever I'm doing wrong, I managed to obfuscate the orgin of the
errors behing the conversion done :(.
any help apreciated,
thank you
-Tobias
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: Signierter Teil der Nachricht
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20070402/a94e325b/attachment.bin
More information about the antlr-interest
mailing list