[antlr-interest] Java, apps

Sun Oct 7 10:59:10 PDT 2001

On Saturday, October 6, 2001, at 09:56  PM, Robert Colquhoun wrote:

> At 12:00 PM 10/4/01 +0200, Ric Klaren wrote:
>> Hi,
>>
>> I didn't say thanks yet did I? Shame on me =)
>>
>> Thanks!!
>
> I was looking at the current classes used internally for bitsets in 
> antlr -
> the java BitSet class i wrote is going to need some additional methods 
> to
> replace the current implementation.
>
> I am not sure what to do here - from the eratosthenes sieve benchmark
> included shows only a marginal difference in performance(less than 10%)
> between antlr's current BitSet class,  the BitSet i wrote and the 
> standard
> java.util.BitSet.  It might be better long term for maintainability to 
> just
> use the standard java.util.BitSet plus some utility routines to feed in 
> the
> raw data(the long int arrays).  For C++ just write something(perhaps
> adapting my current code) to look like the equivalent java class.

Don't forget that ANTLR uses my bitset very heavily during grammar 
analysis.  I recall that the java.util.BitSet didn't do inline stuff or 
something which concerned/irritated me (memory/GC considerations).

> On the other hand maybe i need a better test.  The way bitsets are used 
> in
> the lexer you are usually only looking for a handful of characters in 
> the
> whole unicode range.   The idea behind the bitset i did was to use an
> offset so that largely empty int arrays covering the whole unicode range
> were not constructed.

ANTLR can do much more to remove bitsets than it does.  It needs 
subrange -> simple comparison convertion etc...  Now, it only works if 
the whole bitset is a range.  After that, a bitset tuned more for 
unicode would be useful, but we'd need to be careful that its degenerate 
case for ascii and my grammar analysis still move along fast enough.  
Simple hints like the offset you mention would be easy and work well.  
Fortunately, most of the time people reference ranges like "valid 
identifier character" that can be formally described a priori.  This 
prevents the lexer code generator from having to figure out an efficient 
bitset.  One might even have BitSet subclasses that efficiently 
implement a particular character set.

Ter

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/