[antlr-interest] C# parser grammar problem

Tue Mar 6 14:05:43 PST 2007

On Mar 6, 2007, at 12:58 PM, Johannes Luber wrote:

> Terence Parr wrote:
>> Hi.  That line in the code indicates a malformed \uxxxx cha ref.   
>> Do you
>> see one in your code?
>
> No, I don't. :( I've searched through all unicode references and  
> neither
> they have more or less than four hexdigits and nor they have an
> unallowed character in them - I've checked that with a regular
> expression. The only reason which I suspect, why Java complains, is  
> that
>  it doesn't accept one of the characters as a valid code point, which
> means that the accepted Unicode version isn't the most current one.  
> But
> I don't know, what kind of character that would be.

Weird.  That line is the last one here:

	public static StringBuffer getUnescapedStringFromGrammarStringLiteral 
(String literal) {
		//System.out.println("escape: ["+literal+"]");
		StringBuffer buf = new StringBuffer();
		int last = literal.length()-1; // skip quotes on outside
		for (int i=1; i<last; i++) {
			char c = literal.charAt(i);
			if ( c=='\\' ) {
				i++;
				c = literal.charAt(i);
				if ( Character.toUpperCase(c)=='U' ) {
					// \u0000
					i++;
					String unicodeChars = literal.substring(i,i+4);

Given

java.lang.StringIndexOutOfBoundsException: String index out of range: 7

Oh, when I debug, it says literal='\u'

So, here is your problem:

fragment unicode_escape_sequence[string unicodeClasses]
         :       '\u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
         |       '\U' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT  
HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
         ;

:)  You want 'u' and 'U'.

Ter