[antlr-interest] Re: Unicode & C++ & ANTLR2 (and a bit 3)

Terence Parr parrt at cs.usfca.edu
Tue Jul 6 11:25:25 PDT 2004


On Jul 6, 2004, at 6:56 AM, Ric Klaren wrote:
> Some observations/conclusions:
>
> - I'd definitely like to get int arrays or int vectors for the strings 
> in
>   codegen (with pure unicode codepoints).

Yep, i'm leaning that way myself.

> ANTLR2 needs to support 32 bit escapes in the the lexer to support full
> unicode. (currently can't specify values above \uFFFF could opt to
> introduce a new escape syntax that support variable length hex values
> \u{(HEXDIGIT)+} or something)

I'm pretty sure we'll support 32-bit \U+10FFFF notation for ANTLR 3.

>  - Question can ANTLR 2's analysis engine deal with such values?

Hmm...i'm not sure it's 32 bit clean.  probably not.

> Current bitset generation for unicode is quite expensive (parse times 
> are
> long because of bitset generation)

Yep, horribly slow in worst case.

> I guess we could gain lexer speed by dropping support for the ! 
> operator in
> a lexer (or restrict it's use to the start and end of the token text, 
> that
> way we could probably only deal with indexes or pointers in stead of 
> all
> the copying that is happening now).

Good thought.  Yes, I think that having a common token that simply 
points into the ANTLRIntStream rather than strcpy would be great.  
Better yet, we can notice that the text for every single "for" keyword 
should point to the same object as they are static. Strcmp becomes a 
ptr check and we save memory and memcpy.  SHould be super fast right?

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Cofounder, http://www.peerscope.com pure link sharing





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list