[antlr-interest] Bug in antlr-3.0.1 DOTTreeGenerator.getNodeST()
Pranab Dhar
pdhar at tibco.com
Mon Nov 5 12:21:27 PST 2007
Hi Ter,
All those \ characters can quickly become confusing. :-)
Let me try to clarify:
In general, what is needed is:
1) Choose an escape character.
-> Here, backslash. In java: '\\'.
2) Start by escaping the escape character itself. This is what is
missing in the 3.0.1 DOTGenerator code.
-> In java: "\\" (string that contains one backslash character) needs to
become "\\\\" (string that contains two backslash characters).
3) Then, escape whatever other character needs escaping.
-> Here, double quotes. In java: "\"" (string that contains only '"')
needs to become "\\\"" (string that contains one backslash character and
one double quote).
Now, there are multiple ways to do that. Either 2 followed by 3, or 2
and 3 at the same time, which is what was proposed in the previous
email:
text.replaceAll("(\\\\|\")","\\\\$1"); // steps 2 and 3
You can try it, it works :-).
Let's have a look at both parameters:
"(\\\\|\")" is a regular expression that contains one matching group
(expression between parentheses) that can be referred to in the
replacement string. It matches, in the input string:
* either one single backslash, noted \\\\ (backslash is a special
character in regex so it needs to be escaped => doubled, and a special
character in java string so it needs to be escaped a second time => to
express one backslash you need to type four backslashes...),
* or one single double quote, noted \"
"\\\\$1" is the replacement string. It contains one backslash character,
noted \\\\ (similar as above: \ is a special char in both replacement
text and java string so it has to be escaped twice), followed by the
first group that matched in the regexp, noted $1.
To summarize:
We try to match one character which is \ or ", and for all we find
(replaceAll), we prefix them with a \
Alternatively, you can use two steps instead, but I am not sure that it
is more readable:
text.replaceAll("\\\\","\\\\\\\\") // step 2
.replaceAll("\"","\\\\\""); // step 3
Let us know if it helps... :-)
Nicolas and Pranab.
PS: Also, it may be more efficient to use a statically compiled regular
expression, something like:
private static final Pattern BACKSLASH_OR_DOUBLE_QUOTE_PATTERN =
Pattern.compile("(\\\\|\")");
text =
BACKSLASH_OR_DOUBLE_QUOTE_PATTERN.matcher(text).replaceAll("\\\\$1");
-----Original Message-----
From: Terence Parr [mailto:parrt at cs.usfca.edu]
Sent: Monday, November 05, 2007 11:17 AM
To: Pranab Dhar
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Bug in antlr-3.0.1
DOTTreeGenerator.getNodeST()
On Nov 3, 2007, at 9:17 AM, Pranab Dhar wrote:
> Hi,
>
> In the 3.0.1 release of antlr, the DOTTreeGenerator escapes
> " with \ but not \ character like \\ . As a result token
> string that includes \" is incorrectly escaped to \\" instead of \\\"
>
>
>
> The current code shows this
>
>
>
> if (text!=null) text = text.replaceAll("\"", "\\\\\"");
>
>
>
> and this should be
>
>
>
> if (text!=null) text = text.replaceAll("(\\\\|\")","\\\\$1");
hi. What's $1?
\\\\\" yields \\" in output; hmm...i guess we want \" in output so we
need \\\" instead?
Also I need a "\\\\ -> "\\" right?
Ter
More information about the antlr-interest
mailing list