[antlr-interest] Another parsing question

Wed Aug 6 12:58:42 PDT 2008

At 02:10 7/08/2008, Loring Craymer wrote:
>I think that much of this discussion would be moot if ANTLR 3 
>lexers had the capabilities of ANTLR  2 lexers; unfortunately, 
>that requires an efficient way of doing FOLLOW sets for unicode 
>ranges--and no solution has yet presented itself for that.

Can't you just use an algorithmic test (similar to how sempreds 
work)?  Obviously the standard table/bitset-based solution won't 
work for Unicode (at least not without generating very large 
bitsets [and by "very large" I mean that to represent the full 
UTF-32 range would require a bitset taking 512MB... and that's 
just a single follow set])  Whereas expressing the same thing in 
code should be much more compact, since there are likely to be 
large contiguous ranges.  And it'd have the added bonus of being 
more readable, too.

(Of course, ANTLR might still need to hold that 512MB bitset in 
memory while compiling the grammar, depending on how it works the 
set out -- and possibly more than one.  But this should be less of 
a burden than trying to do it at runtime for every rule.)