[antlr-interest] Java, apps
parrt at jguru.com
Sun Oct 7 10:59:10 PDT 2001
On Saturday, October 6, 2001, at 09:56 PM, Robert Colquhoun wrote:
> At 12:00 PM 10/4/01 +0200, Ric Klaren wrote:
>> I didn't say thanks yet did I? Shame on me =)
> I was looking at the current classes used internally for bitsets in
> antlr -
> the java BitSet class i wrote is going to need some additional methods
> replace the current implementation.
> I am not sure what to do here - from the eratosthenes sieve benchmark
> included shows only a marginal difference in performance(less than 10%)
> between antlr's current BitSet class, the BitSet i wrote and the
> java.util.BitSet. It might be better long term for maintainability to
> use the standard java.util.BitSet plus some utility routines to feed in
> raw data(the long int arrays). For C++ just write something(perhaps
> adapting my current code) to look like the equivalent java class.
Don't forget that ANTLR uses my bitset very heavily during grammar
analysis. I recall that the java.util.BitSet didn't do inline stuff or
something which concerned/irritated me (memory/GC considerations).
> On the other hand maybe i need a better test. The way bitsets are used
> the lexer you are usually only looking for a handful of characters in
> whole unicode range. The idea behind the bitset i did was to use an
> offset so that largely empty int arrays covering the whole unicode range
> were not constructed.
ANTLR can do much more to remove bitsets than it does. It needs
subrange -> simple comparison convertion etc... Now, it only works if
the whole bitset is a range. After that, a bitset tuned more for
unicode would be useful, but we'd need to be careful that its degenerate
case for ascii and my grammar analysis still move along fast enough.
Simple hints like the offset you mention would be easy and work well.
Fortunately, most of the time people reference ranges like "valid
identifier character" that can be formally described a priori. This
prevents the lexer code generator from having to figure out an efficient
bitset. One might even have BitSet subclasses that efficiently
implement a particular character set.
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest