[antlr-interest] Re: Need help with EOL mess
lgcraymer
lgc at mail1.jpl.nasa.gov
Mon Dec 15 20:27:47 PST 2003
There is an ambiguity because of the newline() action. "\r\n" can be
interpreted as either
"\r\n" { newline(); } or
'\r' { newline(); } '\n' { newline(); }
I think that either phrasing newline whitespace as
('\r')? '\n { newline(); }
| '\r' { newline(); }
or setting the greedy option to "true" (preferred) works.
--Loring
--- In antlr-interest at yahoogroups.com, mzukowski at y... wrote:
> From the rule it looks like you should be handling everything ok.
I would
> recommend running it through a debugger to see what is happening,
or using
> the -traceLexer option to antlr.Tool and diagnosing it that way.
>
> If you can't find the problem then we'll need a small complete
grammar with
> test input to try out for ourselves to help you further.
>
> On a tangent, how do you know the length of the binary data for
the image?
> Or is it not binary? Anyhow, how do you know when it ends?
>
> Monty
>
> -----Original Message-----
> From: skappskapp [mailto:skapp at r...]
> Sent: Monday, December 15, 2003 8:37 AM
> To: antlr-interest at yahoogroups.com
> Subject: [antlr-interest] Need help with EOL mess
>
> I am writing a PostScript interpreter based upon antlr. I am
having
> problems matching the correct end-of-line sequence. I would like
to
> match CR-LF on those files that contain this sequence, but
> PostScript mandates all three EOL sequences are supported.
>
> Normally this would be a trivial problem - who cares if I matched
a
> CR and then a LF if it is all being ignored by an interpreter?
> However I need this for two reasons. The first is that I would
like
> an accurate line count for debugging purposes. The second is that
> PostScript allows user programs to read from the current file,
> essentially bypassing the interpreter. (This is how image data is
> embedded into PostScript programs).
>
> This issue I have is that the operator that reads from the current
> file (named "image") expects the data to be present immediately
> after the operator. For example, an image where four bytes of data
> are expected:
>
> image<CR>1234 nextoperator
>
> seems to work but
>
> image<CR><LF>1234 nextoperator
>
> does not. The data should begin with the "1" but in the second
> example it seems to begin with the LF because the scanner has
> matched the CR and not the CR-LF pair.
>
> Here is my whitespace definition from the grammar file:
>
> WHITESPACE
> // This rule matches and discards any whitespace.
> : ( ' '
> | '\t'
> | ( options { generateAmbigWarnings=false; }
> : "\r\n" { newline(); } // Microsoft
> | '\r' { newline(); } // Macintosh
> | '\n' { newline(); } // Unix
> )
> )+ { $setType(Token.SKIP); }
> ;
>
> This *does* generate ambiguous warnings but I don't know how to
> address this. Does anyone have any suggestions?
>
> Regards,
>
> Steve
>
>
>
>
>
>
>
> Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/
Yahoo! Groups Links
To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list