[antlr-interest] C# parser grammar problem
Terence Parr
parrt at cs.usfca.edu
Tue Mar 6 14:05:43 PST 2007
On Mar 6, 2007, at 12:58 PM, Johannes Luber wrote:
> Terence Parr wrote:
>> Hi. That line in the code indicates a malformed \uxxxx cha ref.
>> Do you
>> see one in your code?
>
> No, I don't. :( I've searched through all unicode references and
> neither
> they have more or less than four hexdigits and nor they have an
> unallowed character in them - I've checked that with a regular
> expression. The only reason which I suspect, why Java complains, is
> that
> it doesn't accept one of the characters as a valid code point, which
> means that the accepted Unicode version isn't the most current one.
> But
> I don't know, what kind of character that would be.
Weird. That line is the last one here:
public static StringBuffer getUnescapedStringFromGrammarStringLiteral
(String literal) {
//System.out.println("escape: ["+literal+"]");
StringBuffer buf = new StringBuffer();
int last = literal.length()-1; // skip quotes on outside
for (int i=1; i<last; i++) {
char c = literal.charAt(i);
if ( c=='\\' ) {
i++;
c = literal.charAt(i);
if ( Character.toUpperCase(c)=='U' ) {
// \u0000
i++;
String unicodeChars = literal.substring(i,i+4);
Given
java.lang.StringIndexOutOfBoundsException: String index out of range: 7
Oh, when I debug, it says literal='\u'
So, here is your problem:
fragment unicode_escape_sequence[string unicodeClasses]
: '\u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
| '\U' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
:) You want 'u' and 'U'.
Ter
More information about the antlr-interest
mailing list