[antlr-interest] Re: Need help with EOL mess

lgcraymer lgc at mail1.jpl.nasa.gov
Mon Dec 15 20:27:47 PST 2003


There is an ambiguity because of the newline() action. "\r\n" can be 
interpreted as either
  "\r\n" { newline(); } or
  '\r' { newline(); } '\n' { newline(); }

I think that either phrasing newline whitespace as
  ('\r')? '\n { newline(); }
  | '\r' { newline(); }

or setting the greedy option to "true" (preferred) works.

--Loring


--- In antlr-interest at yahoogroups.com, mzukowski at y... wrote:
> From the rule it looks like you should be handling everything ok.  
I would
> recommend running it through a debugger to see what is happening, 
or using
> the -traceLexer option to antlr.Tool and diagnosing it that way.
> 
> If you can't find the problem then we'll need a small complete 
grammar with
> test input to try out for ourselves to help you further.
> 
> On a tangent, how do you know the length of the binary data for 
the image?
> Or is it not binary?  Anyhow, how do you know when it ends?
> 
> Monty
> 
> -----Original Message-----
> From: skappskapp [mailto:skapp at r...] 
> Sent: Monday, December 15, 2003 8:37 AM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] Need help with EOL mess
> 
> I am writing a PostScript interpreter based upon antlr. I am 
having 
> problems matching the correct end-of-line sequence. I would like 
to 
> match CR-LF on those files that contain this sequence, but 
> PostScript mandates all three EOL sequences are supported.
> 
> Normally this would be a trivial problem - who cares if I matched 
a 
> CR and then a LF if it is all being ignored by an interpreter? 
> However I need this for two reasons. The first is that I would 
like 
> an accurate line count for debugging purposes. The second is that 
> PostScript allows user programs to read from the current file, 
> essentially bypassing the interpreter. (This is how image data is 
> embedded into PostScript programs).
> 
> This issue I have is that the operator that reads from the current 
> file (named "image") expects the data to be present immediately 
> after the operator. For example, an image where four bytes of data 
> are expected:
> 
> image<CR>1234 nextoperator
> 
> seems to work but
> 
> image<CR><LF>1234 nextoperator
> 
> does not. The data should begin with the "1" but in the second 
> example it seems to begin with the LF because the scanner has 
> matched the CR and not the CR-LF pair. 
> 
> Here is my whitespace definition from the grammar file:
> 
> WHITESPACE
>     // This rule matches and discards any whitespace.
>     : ( ' '
>       | '\t'
>       | ( options { generateAmbigWarnings=false; }
>           : "\r\n"          { newline(); }      // Microsoft
>           | '\r'            { newline(); }      // Macintosh
>           | '\n'            { newline(); }      // Unix
>         )
>       )+  { $setType(Token.SKIP); }
>     ;
> 
> This *does* generate ambiguous warnings but I don't know how to 
> address this. Does anyone have any suggestions?
> 
> Regards,
> 
>    Steve
> 
> 
> 
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to 
http://docs.yahoo.com/info/terms/


 

Yahoo! Groups Links

To visit your group on the web, go to:
 http://groups.yahoo.com/group/antlr-interest/

To unsubscribe from this group, send an email to:
 antlr-interest-unsubscribe at yahoogroups.com

Your use of Yahoo! Groups is subject to:
 http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list