[antlr-interest] Need help with EOL mess

Mon Dec 15 08:37:19 PST 2003

I am writing a PostScript interpreter based upon antlr. I am having 
problems matching the correct end-of-line sequence. I would like to 
match CR-LF on those files that contain this sequence, but 
PostScript mandates all three EOL sequences are supported.

Normally this would be a trivial problem - who cares if I matched a 
CR and then a LF if it is all being ignored by an interpreter? 
However I need this for two reasons. The first is that I would like 
an accurate line count for debugging purposes. The second is that 
PostScript allows user programs to read from the current file, 
essentially bypassing the interpreter. (This is how image data is 
embedded into PostScript programs).

This issue I have is that the operator that reads from the current 
file (named "image") expects the data to be present immediately 
after the operator. For example, an image where four bytes of data 
are expected:

image<CR>1234 nextoperator

seems to work but

image<CR><LF>1234 nextoperator

does not. The data should begin with the "1" but in the second 
example it seems to begin with the LF because the scanner has 
matched the CR and not the CR-LF pair. 

Here is my whitespace definition from the grammar file:

WHITESPACE
    // This rule matches and discards any whitespace.
    : ( ' '
      | '\t'
      | ( options { generateAmbigWarnings=false; }
          : "\r\n"          { newline(); }      // Microsoft
          | '\r'            { newline(); }      // Macintosh
          | '\n'            { newline(); }      // Unix
        )
      )+  { $setType(Token.SKIP); }
    ;

This *does* generate ambiguous warnings but I don't know how to 
address this. Does anyone have any suggestions?

Regards,

   Steve

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/