[antlr-interest] ANTLR C: Question regarding the portability of generated lexer C code

Mon Oct 12 15:21:12 PDT 2009

I just recently noticed that the generated code from my lexer grammar
contains something like the following snippet:

            .
            .
            else if ( (((LA17_0 >= 'A') && (LA17_0 <= 'Z'))) )
            {
                alt17=2;
            }
            else if ( (((LA17_0 >= 'a') && (LA17_0 <= 'z'))) )
            {
                alt17=3;
            }
            else if ( (((LA17_0 >= 0x00A0) && (LA17_0 <= 0xD7FF))) )
            {
                alt17=4;
            }
            .
            .

The generated code seems to comfortably use 'A' ... 'Z' literals.  This may
not be good if let's say I compile the generated code in an IBM z/OS EBCDIC
environment as ['A' .. 'Z'] range contains more than just the 26 alphabet
codes and the value of the codes are not the same as the ones in Unicode
character set.

I'm expecting something like in the third expression where 'A' is written
explicitly as 0x0041 (Unicode for 'A').

Please confirm.

-Lego
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091012/76847c2b/attachment.html