[antlr-interest] Need help with EOL mess
skappskapp
skapp at rochester.rr.com
Mon Dec 15 08:37:19 PST 2003
I am writing a PostScript interpreter based upon antlr. I am having
problems matching the correct end-of-line sequence. I would like to
match CR-LF on those files that contain this sequence, but
PostScript mandates all three EOL sequences are supported.
Normally this would be a trivial problem - who cares if I matched a
CR and then a LF if it is all being ignored by an interpreter?
However I need this for two reasons. The first is that I would like
an accurate line count for debugging purposes. The second is that
PostScript allows user programs to read from the current file,
essentially bypassing the interpreter. (This is how image data is
embedded into PostScript programs).
This issue I have is that the operator that reads from the current
file (named "image") expects the data to be present immediately
after the operator. For example, an image where four bytes of data
are expected:
image<CR>1234 nextoperator
seems to work but
image<CR><LF>1234 nextoperator
does not. The data should begin with the "1" but in the second
example it seems to begin with the LF because the scanner has
matched the CR and not the CR-LF pair.
Here is my whitespace definition from the grammar file:
WHITESPACE
// This rule matches and discards any whitespace.
: ( ' '
| '\t'
| ( options { generateAmbigWarnings=false; }
: "\r\n" { newline(); } // Microsoft
| '\r' { newline(); } // Macintosh
| '\n' { newline(); } // Unix
)
)+ { $setType(Token.SKIP); }
;
This *does* generate ambiguous warnings but I don't know how to
address this. Does anyone have any suggestions?
Regards,
Steve
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list