[antlr-interest] Re: Need help with EOL mess

Mon Dec 15 21:00:22 PST 2003

I seemed to have better luck by removing the CR-LF pair and just 
dealing with CCR or LF. (I am betting is that I can correctly 
postprocess a sequence of CR & LF characters to figure out the 
appropriate line number). However, after my image operator is done, 
antlr reports finding a CR, CR, and LF sequence. There is only one 
CR-LF sequence in the PS file - do I need to clear out the lookahead 
characters?  And if so, what is the right way to do this?

On the tangent, PostScript has reverse Polish notation. All operands 
are manipulated through an operand stack. The image width, height, 
bits/sample, color components, etc., should all precede the image 
operator and sit on the operand stack. Then the trick is to keep 
reading from the appropriate data source (the current file) until 
enough data has been accumulated.

   Steve

--- In antlr-interest at yahoogroups.com, mzukowski at y... wrote:
> From the rule it looks like you should be handling everything ok.  
I would
> recommend running it through a debugger to see what is happening, 
or using
> the -traceLexer option to antlr.Tool and diagnosing it that way.
> 
> If you can't find the problem then we'll need a small complete 
grammar with
> test input to try out for ourselves to help you further.
> 
> On a tangent, how do you know the length of the binary data for 
the image?
> Or is it not binary?  Anyhow, how do you know when it ends?
> 
> Monty
> 
> -----Original Message-----
> From: skappskapp [mailto:skapp at r...] 
> Sent: Monday, December 15, 2003 8:37 AM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] Need help with EOL mess
> 
> I am writing a PostScript interpreter based upon antlr. I am 
having 
> problems matching the correct end-of-line sequence. I would like 
to 
> match CR-LF on those files that contain this sequence, but 
> PostScript mandates all three EOL sequences are supported.
> 
> Normally this would be a trivial problem - who cares if I matched 
a 
> CR and then a LF if it is all being ignored by an interpreter? 
> However I need this for two reasons. The first is that I would 
like 
> an accurate line count for debugging purposes. The second is that 
> PostScript allows user programs to read from the current file, 
> essentially bypassing the interpreter. (This is how image data is 
> embedded into PostScript programs).
> 
> This issue I have is that the operator that reads from the current 
> file (named "image") expects the data to be present immediately 
> after the operator. For example, an image where four bytes of data 
> are expected:
> 
> image<CR>1234 nextoperator
> 
> seems to work but
> 
> image<CR><LF>1234 nextoperator
> 
> does not. The data should begin with the "1" but in the second 
> example it seems to begin with the LF because the scanner has 
> matched the CR and not the CR-LF pair. 
> 
> Here is my whitespace definition from the grammar file:
> 
> WHITESPACE
>     // This rule matches and discards any whitespace.
>     : ( ' '
>       | '\t'
>       | ( options { generateAmbigWarnings=false; }
>           : "\r\n"          { newline(); }      // Microsoft
>           | '\r'            { newline(); }      // Macintosh
>           | '\n'            { newline(); }      // Unix
>         )
>       )+  { $setType(Token.SKIP); }
>     ;
> 
> This *does* generate ambiguous warnings but I don't know how to 
> address this. Does anyone have any suggestions?
> 
> Regards,
> 
>    Steve
> 
> 
> 
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to 
http://docs.yahoo.com/info/terms/

Yahoo! Groups Links

To visit your group on the web, go to:
 http://groups.yahoo.com/group/antlr-interest/

To unsubscribe from this group, send an email to:
 antlr-interest-unsubscribe at yahoogroups.com

Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/