[antlr-interest] Unicode handling

Wed Apr 21 17:36:23 PDT 2004

On Apr 21, 2004, at 4:31 PM, John D. Mitchell wrote:

>>>>>> "Mark" == Mark Lentczner <markl at glyphic.com> writes:
> [...]
>
>> Does anyone see any pitfalls to this other than increasing the look 
>> ahead
>> for the lexer?  Since in our source language, all the meaningful
>> punctuation is in the visible US-ASCII range, the only place the
>> difference between parsing Unicode characters vs. UTF-8 encoded 
>> Unicode
>> characters would be in things like the NAME token production.
>
>> This seems much more preferable to me than extending the C++ support 
>> with
>> some Unicode library (like IBM's icu or some such).
>
> I concur.
>
> In fact, I almost took that same approach but I was able to dodge the
> Unicode bullet completely. :-)
>
> For Antlr v3, aside from my perennial haranguing for complete and 
> proper
> hoisting support, I really want to get rid of all of this ridiculous 
> use of
> in-band signalling.  Please join me in pestering Ter about this. :-)

UNICODE will work well.  Note that 2.7.3 should do pretty well at 
UNICODE.  Give it a shot :)  \uFFFE is the max valid unicode right?  -1 
shouldn't be a problem anymore.
Ter
--
Professor Comp. Sci., University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Cofounder, http://www.peerscope.com pure link sharing

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/