[antlr-interest] first steps with a lexer/parser
Harald Mueller
harald_m_mueller at gmx.de
Thu Jan 3 06:59:11 PST 2008
Hi -
a) A quoted string should be a token, IMO, not a rule (except ... see the thread on parsing BSDL where we quarrel about "structured string parsing" ... but this would not be "first steps").
(I am constantly unsure whether ! works in lexer rules - so, if you wnat to strip the " and it does NOT work, first complain to Terence; and then do something like
$text = $text.Trim('\"'); // in C#
or
$text = $text.substring(1,$text.length-1); // in Java
b) Are you really sure that whitespace is that significant? According to your grammar,
{a=1}
is not allowed: You require a WS after { and before } - and WS is at least one blank. Also, { a = 1 } would be wrong: No WS around = ...
Almost all languages I know *ignore* whitespace. In ANTLR, you do this by sending the WS tokens to the HIDDEN channel via { $channel = HIDDEN; }.
c) There is no good reason to have artificial roots for single tokens - instead of ^(INT_VAL INT), just use the INT; same for STR_VAL.
d) Also for the '=', I would not add an artificial symbol, but simply use the '=' as root:
...: NAME '='^ valueExpr;
- but this is a matter of taste, I'd say.
Regards
Harald
-------- Original-Nachricht --------
> Datum: Thu, 3 Jan 2008 08:40:38 -0500
> Von: body <antlr-list at splitbody.com>
> An: antlr-interest at antlr.org
> Betreff: [antlr-interest] first steps with a lexer/parser
> hello,
>
> i am trying to deal with the messages that look like this:
>
> { a=1 b="2" c="t" d="stuff" e="one two" f={ g="three four" h={ i=5
> j="a ha" } } }
>
> below is my lexer/parser. it seems to work and emit proper-looking
> tree, but i want to run it by you, because it does not feel right.
>
> it seems like i should be using fragments somewhere, also i cannot
> figure out how to build a proper tree grammar out of it.
>
> any suggestions appreciated.
>
> thank you.
>
> -----------------
> grammar MsgString;
>
> options { output = AST; }
>
> tokens {
> PAIR;
> MSG;
> STR_VAL;
> INT_VAL;
> }
>
> start : msg NL? EOF -> ^(MSG msg) ;
>
> msg : '{' WS nameValuePairExpr* WS '}' -> ^(MSG nameValuePairExpr*)
> ;
>
> nameValuePairExpr
> : NAME '=' valueExpr WS? -> ^(PAIR NAME valueExpr) ;
>
> valueExpr
> : quotedString -> ^(STR_VAL quotedString)
> | INT -> ^(INT_VAL INT)
> | msg
> ;
>
> quotedString
> : '"'! .* '"'!
> ;
>
> INT : '0'..'9'+ ;
>
> NAME : ('a'..'z'|'A'..'Z'|'0'..'9')+ ;
>
> WS : ' '+ ;
>
> NL : ('\n'|'\r')+ ;
> -----------------
--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger?did=10
More information about the antlr-interest
mailing list