[antlr-interest] first steps with a lexer/parser
body
antlr-list at splitbody.com
Fri Jan 4 07:03:12 PST 2008
thanks again for the quick response, it really, really helps.
> a) WS and NL should get a marker
> { $channel = HIDDEN; }
> so that the parser does not even see them - because I'm quite sure that also
> {a=1}
> etc. (see first mail) should be allowed.
> And then you can remove all references to WS and NL from the parser! - the language should then look much more like your language definition ---skipped ---
all great points! but luckily it is an incoming data file; so i just
replicated it. but you are right; so i used hidden channel and it made
the grammar much simpler without losing my spaces inside the string.
here's a question - what would i have to change if i had escaped
quotes inside of the string (\")? then surely i would have to use .*
to match the string, and then do something different inside of it.
>
> b) In the rule
> start : msg NL? EOF ;
> put an ! behind EOF: You dont want this in the AST (unfortunately, it becomes a null Token - see the end of your output, which creates troubles off and on; and you get an artificial null root also - both are ugly).
> (and remove the NL? - see a)).
ah! i was wondering about that null! and i forgot about hidden channel
for NL - good point.
>
> c) You do a "double job" in the STR rules:
>
> > STR
> [...]
> > : '"' ANYCHAR* '"'
> > ;
> >
> > fragment ANYCHAR
> > : (~'"')+
> > ;
>
> There is a + in ANYCHAR, and a * in STR. What you want is simply either
>
> STR
> [...]
> : '"' (~'"')* '"'
> ;
>
> or, if you want to keep this ANYCHAR rule,
>
> STR
> [...]
> : '"' ANYCHAR* '"'
> ;
>
> fragment ANYCHAR
> : ~'"' // without +
> ;
yes, you are right, both former and latter seem to work.
>
> d) You might also want to capture tabs ('\t') in your WS rule.
done, thank you.
------------------------
grammar MsgString;
options { output = AST; }
tokens {
PAIR;
MSG;
}
start : msg EOF! ;
msg : '{' nameValuePairExpr* '}' -> ^(MSG nameValuePairExpr*) ;
nameValuePairExpr
: NAME '=' valueExpr WS? -> ^(PAIR NAME valueExpr) ;
valueExpr
: STR
| INT
| msg
;
STR
@after{
setText(getText().substring(1, getText().length()-1));
}
: '"' ~'"'* '"'
;
INT : '0'..'9'+ ;
NAME : ('a'..'z'|'A'..'Z'|'0'..'9')+ ;
WS : (' '|'\t')+ { $channel = HIDDEN; } ;
NL : ('\n'|'\r')+ { $channel = HIDDEN; } ;
------------------------
and input/output:
{ a=1 b="2" c="t" d="text" e="one two" f={ g="three four" h={ i=5 j="a ha" } } }
(MSG (PAIR a 1) (PAIR b 2) (PAIR c t) (PAIR d text) (PAIR e one two)
(PAIR f (MSG (PAIR g three four) (PAIR h (MSG (PAIR i 5) (PAIR j a
ha))))))
More information about the antlr-interest
mailing list