[antlr-interest] Need help with EOL mess

Mon Dec 15 09:23:15 PST 2003

>>From the rule it looks like you should be handling everything ok.  I would
recommend running it through a debugger to see what is happening, or using
the -traceLexer option to antlr.Tool and diagnosing it that way.

If you can't find the problem then we'll need a small complete grammar with
test input to try out for ourselves to help you further.

On a tangent, how do you know the length of the binary data for the image?
Or is it not binary?  Anyhow, how do you know when it ends?

Monty

-----Original Message-----
From: skappskapp [mailto:skapp at rochester.rr.com] 
Sent: Monday, December 15, 2003 8:37 AM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Need help with EOL mess

I am writing a PostScript interpreter based upon antlr. I am having 
problems matching the correct end-of-line sequence. I would like to 
match CR-LF on those files that contain this sequence, but 
PostScript mandates all three EOL sequences are supported.

Normally this would be a trivial problem - who cares if I matched a 
CR and then a LF if it is all being ignored by an interpreter? 
However I need this for two reasons. The first is that I would like 
an accurate line count for debugging purposes. The second is that 
PostScript allows user programs to read from the current file, 
essentially bypassing the interpreter. (This is how image data is 
embedded into PostScript programs).

This issue I have is that the operator that reads from the current 
file (named "image") expects the data to be present immediately 
after the operator. For example, an image where four bytes of data 
are expected:

image<CR>1234 nextoperator

seems to work but

image<CR><LF>1234 nextoperator

does not. The data should begin with the "1" but in the second 
example it seems to begin with the LF because the scanner has 
matched the CR and not the CR-LF pair. 

Here is my whitespace definition from the grammar file:

WHITESPACE
    // This rule matches and discards any whitespace.
    : ( ' '
      | '\t'
      | ( options { generateAmbigWarnings=false; }
          : "\r\n"          { newline(); }      // Microsoft
          | '\r'            { newline(); }      // Macintosh
          | '\n'            { newline(); }      // Unix
        )
      )+  { $setType(Token.SKIP); }
    ;

This *does* generate ambiguous warnings but I don't know how to 
address this. Does anyone have any suggestions?

Regards,

   Steve

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/