[antlr-interest] Another parsing question
lgcraymer at yahoo.com
Wed Aug 6 14:00:37 PDT 2008
Yes, but that has performance problems since a collection range tests look like
if (foo > 1 && foo < 10 )
else if (foo == 13)
<do something else>
and that can get very slow when you have to do all of the tests to find out that a character is not in the set. It is possible to use a 64KB array to map characters into bytes (there is text where there are more than 256 useful characters, but that requires a mix of languages), but that layer of indirection is also undesirable.
I suspect that there is a clever solution, but I haven't thought of one.
----- Original Message ----
> From: Gavin Lambert <antlr at mirality.co.nz>
> To: Loring Craymer <lgcraymer at yahoo.com>; Terence Parr <parrt at cs.usfca.edu>; antlr-interest at antlr.org
> Sent: Wednesday, August 6, 2008 12:58:42 PM
> Subject: Re: [antlr-interest] Another parsing question
> At 02:10 7/08/2008, Loring Craymer wrote:
> >I think that much of this discussion would be moot if ANTLR 3
> >lexers had the capabilities of ANTLR 2 lexers; unfortunately,
> >that requires an efficient way of doing FOLLOW sets for unicode
> >ranges--and no solution has yet presented itself for that.
> Can't you just use an algorithmic test (similar to how sempreds
> work)? Obviously the standard table/bitset-based solution won't
> work for Unicode (at least not without generating very large
> bitsets [and by "very large" I mean that to represent the full
> UTF-32 range would require a bitset taking 512MB... and that's
> just a single follow set]) Whereas expressing the same thing in
> code should be much more compact, since there are likely to be
> large contiguous ranges. And it'd have the added bonus of being
> more readable, too.
> (Of course, ANTLR might still need to hold that 512MB bitset in
> memory while compiling the grammar, depending on how it works the
> set out -- and possibly more than one. But this should be less of
> a burden than trying to do it at runtime for every rule.)
More information about the antlr-interest