[antlr-interest] Another parsing question

Wed Aug 6 14:17:36 PDT 2008

Could you implement FOLLOW using a hybrid approach?
  (BitSets for non-UNICODE ranges + algorithmic tests for data 
   outside that range? Many DSLs are still non-UNICODE, other
   than strings, which use ASCII delimiters.)

B

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Gavin Lambert
> Sent: Wednesday, August 06, 2008 12:59 PM
> To: Loring Craymer; Terence Parr; antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Another parsing question
> 
> At 02:10 7/08/2008, Loring Craymer wrote:
> >I think that much of this discussion would be moot if ANTLR 3
> >lexers had the capabilities of ANTLR  2 lexers; unfortunately,
> >that requires an efficient way of doing FOLLOW sets for unicode
> >ranges--and no solution has yet presented itself for that.
> 
> Can't you just use an algorithmic test (similar to how sempreds
> work)?  Obviously the standard table/bitset-based solution won't
> work for Unicode (at least not without generating very large
> bitsets [and by "very large" I mean that to represent the full
> UTF-32 range would require a bitset taking 512MB... and that's
> just a single follow set])  Whereas expressing the same thing in
> code should be much more compact, since there are likely to be
> large contiguous ranges.  And it'd have the added bonus of being
> more readable, too.
> 
> (Of course, ANTLR might still need to hold that 512MB bitset in
> memory while compiling the grammar, depending on how it works the
> set out -- and possibly more than one.  But this should be less of
> a burden than trying to do it at runtime for every rule.)