[antlr-interest] Manipulating lexer text output
Terence Parr
parrt at cs.usfca.edu
Sun Apr 1 13:51:28 PDT 2007
I think it's in the faq:
http://www.antlr.org/wiki/pages/viewpage.action?pageId=1461
Ter
On Mar 31, 2007, at 4:26 PM, Gavin Lambert wrote:
> Ok, next question :)
>
> Is there some way for a lexer rule to manipulate the output text of
> the lexer token, when it's not the rule responsible for generating
> that token? (I'm using the C language target, if that makes a
> difference.)
>
> For example, imagine this grammar fragment:
>
> fragment
> EscapeSequence
> : '\\'
> ( '\\'
> | 'n'
> | ('\r' | '\n') WS?
> )
> ;
> STRING
> : '"' (~('"' | '\\') | EscapeSequence)* '"'
> ;
>
> This works as is, but the result is identical to the source text,
> including all escape sequences and quotes. What I'd like to have
> instead is the semantic equivalent -- ie. output a STRING token
> where the quotes are removed and the escape sequences have been
> resolved, ie. \\ is converted to a single backslash, \n to a real
> newline character, and the final alt's text is removed entirely
> (that's a line-folding escape). This means that parsing only has
> to be done once, instead of having to reparse the token text
> outside of ANTLR.
>
> Rewriting rules sound like the sort of thing that would help here,
> but they don't seem to work in the lexer. And I tried calling
> emitNew from the subrule, but that resulted in replacing the entire
> string, not just the substring matched by the subtoken. Any ideas?
>
More information about the antlr-interest
mailing list