[antlr-interest] Problem and commentary on semantic predicates and prediction

Sam Harwell sharwell at pixelminegames.com
Thu Mar 31 08:49:00 PDT 2011


By reordering on binary search basis, the number of comparisons becomes 3
for '\n', 3 for '\r', and 3 for '<'. In other words, the way comparisons are
expressed even for a set with only 4 sub-ranges results in 7 comparisons for
characters where only 3 are actually needed. This is O(log n) instead of
O(n), so as the interval set gets more ranges the discrepancy quickly grows
- for 8 sub-ranges used in testing SetA from before, 14 comparisons are used
where only 4 are needed.

(LA5_0 <= '\f' && (LA5_0 >= '\u000B' || (LA5_0 <= '\t' && LA5_0 >=
'\u0000'))) || ((LA5_0 <= ';' && LA5_0 >= '\u000E') || (LA5_0 >= '=' &&
LA5_0 <= '\uFFFF'))

Sam

-----Original Message-----
From: Sam Harwell [mailto:sharwell at pixelminegames.com] 
Sent: Thursday, March 31, 2011 10:15 AM
To: antlr-interest at antlr.org
Subject: Problem and commentary on semantic predicates and prediction


4. The test (LA5_0 in SetB) expands to the following. Edit here: I meant to
show this for SetA, but since I'll accidentally used SetB, we'll go with
that.

((LA5_5>='\u0000' && LA5_5<='\t')||(LA5_5>='\u000B' &&
LA5_5<='\f')||(LA5_5>='\u000E' && LA5_5<=';')||(LA5_5>='=' &&
LA5_5<='\uFFFF'))

A simple transform (moving a few parenthesis) of this already ordered
expression gives:

((LA5_5 >= '\u0000' && (LA5_5 <= '\t' || (LA5_5 >= '\u000B' && (LA5_5 <=
'\f' || (LA5_5 >= '\u000E' && (LA5_5 <= ';' || (LA5_5 >= '=' && LA5_5 <=
'\uFFFF'))))))))

Which reduces the number of comparisons from 5 to 3 for eliminating '\n', 6
to 5 for eliminating '\r', and remains at 7 for eliminating '<'. For sets
with a very large number of ranges such as the ID start character in the
Java grammar, the time savings for this change could be tremendous
considering nearly all identifiers start with a character in the range
0..127 which is tested in the early portion of the expression.

Thanks,
Sam



More information about the antlr-interest mailing list