[antlr-interest] TokenStreamRewriteEngine question

Scott Amort jsamort at rogers.com
Sun Mar 12 08:51:26 PST 2006


Hi All,

I am using a TokenStreamRewriteEngine to discard unwanted whitespace and
comments, while still retaining the original file contents for debug and
error messages.  However, I have noticed that within my lexer, I
'prediscard' a number of other characters, such as double-quotes,
backslashes, etc.  These latter types are necessary to define certain
tokens, but I don't want them actually passed on to the parser, so I
have lexer defines like:

TAG
: '\\'! IDENT
;

Where IDENT is an alphanumeric identifier.  What I have noticed,
however, is that the backslash character never makes it to the rewrite
engine, and so, is missing from the output of originalToStream.

A possible solution to this is to not have my lexer do as much
'parsing', and just be concerned with more basic token types, but once I
do that I get a wide variety of non-determinism errors.  There are
actually only three characters that I discard in the lexer - the equals
sign, double-quotes and the backslash.  Is there an easier way to have
these included, or do I have to redesign my lexer?  Thanks!

Best,
Scott



More information about the antlr-interest mailing list