[antlr-interest] Re: Regular expression "repetition"
Mark Lentczner
markl at glyphic.com
Mon May 17 14:05:23 PDT 2004
> This will probably do when the number of repetitions are low - but I
> am facing a problem with r{0,63} and I hope there is another way :-)
Well, just to be geeky, there is this approach:
r1: (options{greedy=true;}: r)? ;
r2: r1 r1 ;
r4: r2 r2 ;
r8: r4 r4 ;
r16: r8 r8 ;
r32: r16 r16;
r63: r32 r16 r8 r4 r2 r1;
Yes - this runs through Antlr w/o warning! And it generalizes to any
range of numbers for the repeat.
BUT - When people ask for r{x,y}, I always wonder if that is really
what their grammar wants. Consider this fragment of a grammar for
reading byte values, assuming we had the r{x,y} syntax:
bytes: (BYTE)+ ;
BYTE: DIGIT{1,3} ;
protected DIGIT: '0'..'9' ;
WS: (' ' | '\t')+ { $setType(SKIP); } ;
NL: '\n' { newLine(); $setType(SKIP); } ;
Someone too cleverly spec'd the values to be between one and three
decimal digits because that is what fits in a byte. This doesn't work
well in practice:
"1" --> [ 1 ] parses as one byte
"1 2" --> [ 1, 2 ] parses as two
"12" --> [ 12 ] of course this parses as one
"123" --> [ 123 ] ditto
"1234" --> [ 123, 4 ] is this what any user would expect?
Really, any user expects to see a parse error: "1234, value too big for
a byte".
In this case, the {1,3} is really expressing a semantic constraint
(values must fit in bytes), not a syntactic one. Trying to write
semantic constraints as syntactic ones rarely works. In the case of
the byte example you can see easily how it fails: "456" parses, but
doesn't fit in a byte, and changing the grammar so it parses as [45, 6]
is just plain perverse and sure to vex your users.
I have found that it is often much more useful, both for the grammar
and for the user to express size limits (on characters in identifiers,
in number of digits for numbers, or repeats of some rule) as semantic
constraints: Write the grammar to accept any number at all, and then
generate an error for the user if the limits are exceeded or not met.
Consider:
ID: LETTER{1,8} ;
protected LETTER: 'a'..'z' ;
Does anyone expect "subtotals" to parse as two IDs?
In your case, have you considered what a run of 64 r structures should
be? It is just an error, or is really a structure of 63 r.s followed
by 1 r?
- Mark
Mark Lentczner
markl at wheatfarm.org
http://www.wheatfarm.org/
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list