[antlr-interest] ANTLR C: Question regarding the portability of generated lexer C code

Sat Oct 17 20:53:56 PDT 2009

Well, you could pay me to make an EBCDIC version ;) However, there is in practice no problem with mixing this – I have done it before on zOS.

I think that you need to look at this in the opposite light in that it isn’t that ANTLR code isn’t portable, but your lexer specification (and the fact that EBCDIC is stupid).

Why are you specifying your rule as:

ID: ‘a’..’z’

When that is not a valid range in your target environment? 

Change the ranges to:

ID: ‘a’..’k’ | ‘l’..’t’ …

Or whatever the valid ranges are. ANLTR might be ‘clever’ here and assuming ASCII, may merge those ranges, so you might need to fold the ranges into fragments and so on. However, if you rework your lexer rules, I am sure that this can be done in portable fashion that does not require ASCII assumptions within the compiler.

Jim

From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Lego Haryanto
Sent: Friday, October 16, 2009 2:59 AM
To: David-Sarah Hopwood
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] ANTLR C: Question regarding the portability of generated lexer C code

Thanks for the response, ...

Unfortunately, it won't work in our situation without major changes.  We already have legacy C code which is compiled with default/native, and while we can use a different compile option for the ANTLR generated code, I'm not sure if it's good moving forward with mixed compilation rules.

The argument remains that it means the generated C lexer code has to be compiled by an ASCII-based compiler which may not be that portable.

Best,
-Lego

On Thu, Oct 15, 2009 at 12:30 PM, David-Sarah Hopwood <david-sarah at jacaranda.org> wrote:

Lego Haryanto wrote:
> Jim, thanks for your response ...
>
> I know that in the EBCDIC system we feed a Unicode stream into the lexer,
> thus I'm pretty sure when the generated lexer code I pasted before is
> executed, it is already operating on the 32-bit unicode stream.
>
> The problem is more about the native C compilation in an EBCDIC system like
> IBM z/OS mainframe.
>
> To see if a character from the Unicode stream is an 'A', we have to compare
> with a value 0x0041 ... If we match it with a native 'A' in the code, this
> will not be a match in an EBCDIC C compilation.

The z/OS C compiler is able to compile in a mode where string and character
literals are treated as ISO-8859-1.
<http://lists.gnupg.org/pipermail/gcrypt-devel/2009-July/001469.html>

--
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
Fear of the LORD is the beginning of knowledge (Proverbs 1:7)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091018/bbdee716/attachment.html