[antlr-interest] 8 bit ASCII and cpp source code

Jim O'Connor Jim.O'Connor at microfocus.com
Fri Jan 26 13:25:45 PST 2007


Hi all,
            Quick sanity check.  Background: Antlr 2.7.5 cpp source,
compiled into a library.  I am running into Swedish characters in my
input file octal 330, hex 0xd8, decimal 216.
 
 
I limited the charVocab in the test case to     
charVocabulary = '\3' .. '\177'|'\330';
 
Lexer checks for IDENTIFIERs
IDENTIFIER : 
       (SIMPLE_LETTER) (SIMPLE_LETTER | '_' | '0'..'9')* 
;
 
SIMPLE_LETTER : 
      'a'..'z' | '\330' 
;
 
the switch statement in the lexer is
 
void SqlCobolLexer::mSIMPLE_LETTER(bool _createToken) {
      int _ttype; ANTLR_USE_NAMESPACE(antlr)RefToken _token;
ANTLR_USE_NAMESPACE(std)string::size_type _begin = text.length();
      _ttype = SIMPLE_LETTER;
      ANTLR_USE_NAMESPACE(std)string::size_type _saveIndex;
      
      switch ( LA(1)) {
      case 0x61 /* 'a' */ :
      case 0x62 /* 'b' */ :
      case 0x63 /* 'c' */ :
      case 0x64 /* 'd' */ :
      case 0x65 /* 'e' */ :
      case 0x66 /* 'f' */ :
      case 0x67 /* 'g' */ :
      case 0x68 /* 'h' */ :
      case 0x69 /* 'i' */ :
      case 0x6a /* 'j' */ :
      case 0x6b /* 'k' */ :
      case 0x6c /* 'l' */ :
      case 0x6d /* 'm' */ :
      case 0x6e /* 'n' */ :
      case 0x6f /* 'o' */ :
      case 0x70 /* 'p' */ :
      case 0x71 /* 'q' */ :
      case 0x72 /* 'r' */ :
      case 0x73 /* 's' */ :
      case 0x74 /* 't' */ :
      case 0x75 /* 'u' */ :
      case 0x76 /* 'v' */ :
      case 0x77 /* 'w' */ :
      case 0x78 /* 'x' */ :
      case 0x79 /* 'y' */ :
      case 0x7a /* 'z' */ :
      {
            matchRange('a','z');
            break;
      }
      case 0xd8:
      {
            match(static_cast<unsigned char>('\330') /* charlit */ );
            break;
      }
 
 
 
 
 
 The charscanner and inputbuffer classes have a return type of int for
LA().  LA(1) returns 0Xffd8 for my "problem" character.  
 
 
Solution: change the LA() to return unsigned?  Ric hinted at such in a
2004 archive note.
 
Thanks for reading to here
 
Jim
 
Hope all is going well
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070126/cb524ae5/attachment-0001.html 


More information about the antlr-interest mailing list