[antlr-interest] Manipulating text in the lexer
Sam Barnett-Cormack
s.barnett-cormack at lancaster.ac.uk
Thu Feb 26 07:48:57 PST 2009
Hey again all,
So, having returned to ANTLR (as previously mentioned), I've been trying
to do things that used to be possible, and appear no longer to be so.
http://www.antlr.org/blog/antlr3/lexical.tml suggests that it's no
longer possible to alter the content of a token away from what's on the
input at all. Crafting an ASN.1 grammar this is rather a pain - as well
as the obvious matter of wanting to be able to strip the '"' from each
end of a string literal, ASN.1 string literals have an odd requirement
on the handling of whitespace and newlines within them, hopefully
illustrated by these grammar fragments:
fragment
CSTRINGNL : WSNONL* NL WSNONL* {setText("");};
CSTRING : '"' ((CSTRINGNL)=> CSTRINGNL | '""' | ~'"') '"';
WS : (WSNONL | NL) {$channel=HIDDEN;};
fragment
NL : ('\n' | '\r' | '\v' | '\f');
fragment
WSNONL : (' ' | '\t');
Ideally, I'd also want to turn the '""' that's found inside a string
literal into a single '"' before passing it on to the parser, as there's
no need whatsoever to hold onto that. However, it's a *requirement* to
discard newlines, along with any other whitespace immediately preceding
or succeeding each. It'd be really frustrating to have to change that at
a later stage in processing.
So, can anyone clarify this for me, or let me know of some sort of
workaround?
Thanks,
Sam Barnett-Cormack
More information about the antlr-interest
mailing list