[antlr-interest] Unicode handling

Wed Apr 21 16:31:14 PDT 2004

>>>>> "Mark" == Mark Lentczner <markl at glyphic.com> writes:
[...]

> Does anyone see any pitfalls to this other than increasing the look ahead
> for the lexer?  Since in our source language, all the meaningful
> punctuation is in the visible US-ASCII range, the only place the
> difference between parsing Unicode characters vs. UTF-8 encoded Unicode
> characters would be in things like the NAME token production.

> This seems much more preferable to me than extending the C++ support with
> some Unicode library (like IBM's icu or some such).

I concur.

In fact, I almost took that same approach but I was able to dodge the
Unicode bullet completely. :-)

For Antlr v3, aside from my perennial haranguing for complete and proper
hoisting support, I really want to get rid of all of this ridiculous use of
in-band signalling.  Please join me in pestering Ter about this. :-)

Have fun,
	John

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/