[antlr-interest] Unicode handling
Terence Parr
parrt at cs.usfca.edu
Wed Apr 21 17:36:23 PDT 2004
On Apr 21, 2004, at 4:31 PM, John D. Mitchell wrote:
>>>>>> "Mark" == Mark Lentczner <markl at glyphic.com> writes:
> [...]
>
>> Does anyone see any pitfalls to this other than increasing the look
>> ahead
>> for the lexer? Since in our source language, all the meaningful
>> punctuation is in the visible US-ASCII range, the only place the
>> difference between parsing Unicode characters vs. UTF-8 encoded
>> Unicode
>> characters would be in things like the NAME token production.
>
>> This seems much more preferable to me than extending the C++ support
>> with
>> some Unicode library (like IBM's icu or some such).
>
> I concur.
>
> In fact, I almost took that same approach but I was able to dodge the
> Unicode bullet completely. :-)
>
> For Antlr v3, aside from my perennial haranguing for complete and
> proper
> hoisting support, I really want to get rid of all of this ridiculous
> use of
> in-band signalling. Please join me in pestering Ter about this. :-)
UNICODE will work well. Note that 2.7.3 should do pretty well at
UNICODE. Give it a shot :) \uFFFE is the max valid unicode right? -1
shouldn't be a problem anymore.
Ter
--
Professor Comp. Sci., University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com
Cofounder, http://www.knowspam.net enjoy email again!
Cofounder, http://www.peerscope.com pure link sharing
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list