[antlr-interest] Need help with EOL mess
mzukowski at yci.com
mzukowski at yci.com
Mon Dec 15 09:23:15 PST 2003
>>From the rule it looks like you should be handling everything ok. I would
recommend running it through a debugger to see what is happening, or using
the -traceLexer option to antlr.Tool and diagnosing it that way.
If you can't find the problem then we'll need a small complete grammar with
test input to try out for ourselves to help you further.
On a tangent, how do you know the length of the binary data for the image?
Or is it not binary? Anyhow, how do you know when it ends?
Monty
-----Original Message-----
From: skappskapp [mailto:skapp at rochester.rr.com]
Sent: Monday, December 15, 2003 8:37 AM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Need help with EOL mess
I am writing a PostScript interpreter based upon antlr. I am having
problems matching the correct end-of-line sequence. I would like to
match CR-LF on those files that contain this sequence, but
PostScript mandates all three EOL sequences are supported.
Normally this would be a trivial problem - who cares if I matched a
CR and then a LF if it is all being ignored by an interpreter?
However I need this for two reasons. The first is that I would like
an accurate line count for debugging purposes. The second is that
PostScript allows user programs to read from the current file,
essentially bypassing the interpreter. (This is how image data is
embedded into PostScript programs).
This issue I have is that the operator that reads from the current
file (named "image") expects the data to be present immediately
after the operator. For example, an image where four bytes of data
are expected:
image<CR>1234 nextoperator
seems to work but
image<CR><LF>1234 nextoperator
does not. The data should begin with the "1" but in the second
example it seems to begin with the LF because the scanner has
matched the CR and not the CR-LF pair.
Here is my whitespace definition from the grammar file:
WHITESPACE
// This rule matches and discards any whitespace.
: ( ' '
| '\t'
| ( options { generateAmbigWarnings=false; }
: "\r\n" { newline(); } // Microsoft
| '\r' { newline(); } // Macintosh
| '\n' { newline(); } // Unix
)
)+ { $setType(Token.SKIP); }
;
This *does* generate ambiguous warnings but I don't know how to
address this. Does anyone have any suggestions?
Regards,
Steve
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list