[antlr-interest] Another parsing question
Gavin Lambert
antlr at mirality.co.nz
Thu Aug 7 05:34:48 PDT 2008
At 09:32 7/08/2008, Randall R Schulz wrote:
> On Wednesday 06 August 2008 14:00, Loring Craymer wrote:
>> and that can get very slow when you have to do all of
>> the tests to find out that a character is not in the set.
I did say it'd likely be slower, but I doubt it'd be significantly
so. (Looking for presence/absence in a BitSet is an O(1)
operation, but so is this kind of conditional tree -- and at worst
case it's only O(n).)
Besides which, small sets could be rendered as switch blocks or a
simple logical-or expression, which are probably even faster than
using a BitSet.
>Prove it.
Indeed.
>Anyway, why not just use a regular hashed set? The Trove4J
library
>has native collections, too.
Some kind of sparse boolean collection would work well (or even
something more fancy, if you want to indicate the possible
following rules at end-of-token as well).
But you'll also need some way of specifying whether it's an
"include" collection or an "exclude" one, so that you only need to
populate the smaller of the two. In other words, ~('a' | 'b')
should be recorded as an "exclude" set of 'a' and 'b', not as an
"include" set of every character except for 'a' and 'b' (as ANTLR
v2 did) :)
Although I still think just generating the matching code will end
up simpler than futzing around with collections. (The compiler
itself will need a collection, of course. But I don't see why the
runtime lexer would.)
More information about the antlr-interest
mailing list