[antlr-interest] Bug in antlr-3.0.1 DOTTreeGenerator.getNodeST()

Pranab Dhar pdhar at tibco.com
Mon Nov 5 12:21:27 PST 2007


Hi Ter,

All those \ characters can quickly become confusing. :-)
Let me try to clarify:


In general, what is needed is:

1) Choose an escape character.
-> Here, backslash. In java: '\\'.

2) Start by escaping the escape character itself. This is what is
missing in the 3.0.1 DOTGenerator code.
-> In java: "\\" (string that contains one backslash character) needs to
become "\\\\" (string that contains two backslash characters).

3) Then, escape whatever other character needs escaping.
-> Here, double quotes. In java: "\"" (string that contains only '"')
needs to become "\\\"" (string that contains one backslash character and
one double quote).


Now, there are multiple ways to do that. Either 2 followed by 3, or 2
and 3 at the same time, which is what was proposed in the previous
email:

    text.replaceAll("(\\\\|\")","\\\\$1");      // steps 2 and 3

You can try it, it works :-).
Let's have a look at both parameters:

"(\\\\|\")" is a regular expression that contains one matching group
(expression between parentheses) that can be referred to in the
replacement string. It matches, in the input string:
 * either one single backslash, noted \\\\ (backslash is a special
character in regex so it needs to be escaped => doubled, and a special
character in java string so it needs to be escaped a second time => to
express one backslash you need to type four backslashes...),
 * or one single double quote, noted \"

"\\\\$1" is the replacement string. It contains one backslash character,
noted \\\\ (similar as above: \ is a special char in both replacement
text and java string so it has to be escaped twice), followed by the
first group that matched in the regexp, noted $1.


To summarize:
We try to match one character which is \ or ", and for all we find
(replaceAll), we prefix them with a \



Alternatively, you can use two steps instead, but I am not sure that it
is more readable:

    text.replaceAll("\\\\","\\\\\\\\")     // step 2
        .replaceAll("\"","\\\\\"");        // step 3



Let us know if it helps... :-)
Nicolas and Pranab.



PS: Also, it may be more efficient to use a statically compiled regular
expression, something like:


private static final Pattern BACKSLASH_OR_DOUBLE_QUOTE_PATTERN =
                                 Pattern.compile("(\\\\|\")");


text =
BACKSLASH_OR_DOUBLE_QUOTE_PATTERN.matcher(text).replaceAll("\\\\$1");



-----Original Message-----
From: Terence Parr [mailto:parrt at cs.usfca.edu] 
Sent: Monday, November 05, 2007 11:17 AM
To: Pranab Dhar
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Bug in antlr-3.0.1
DOTTreeGenerator.getNodeST()


On Nov 3, 2007, at 9:17 AM, Pranab Dhar wrote:

> Hi,
>
>      In the 3.0.1 release of antlr, the DOTTreeGenerator  escapes  
> "  with \  but   not    \  character like \\ . As a result token  
> string that includes \"  is incorrectly escaped to \\" instead of \\\"
>
>
>
> The current code shows this
>
>
>
> if (text!=null) text = text.replaceAll("\"", "\\\\\"");
>
>
>
> and this should be
>
>
>
> if (text!=null) text = text.replaceAll("(\\\\|\")","\\\\$1");
hi. What's $1?

\\\\\" yields \\" in output; hmm...i guess we want \" in output so we  
need \\\" instead?

Also I need a "\\\\ -> "\\" right?

Ter



More information about the antlr-interest mailing list