[antlr-interest] Bug with C++ charVocabulary option

Trey Spiva Trey.Spiva at embarcadero.com
Thu May 30 15:50:26 PDT 2002


When the C++ istream::get method is call it will return a char.  The data
type char is a signed value. 
Therefore, the range of a char is -127...127.  When the BitSit::member(int
el) is called it test uses the 
test:
 
if ( el < 0 || static_cast<unsigned int>(el) >= storage.size()) 
 
to test if a character is in the charVocabulary.  When a character value is
greater than 127 it will 
be a negative number.  Therfore the static_cast<unsigned int> will return a
really large number.
Subsequently the result of the cast will be larger than the storage size.  
 
Example:
charVocabulary='\3'...'\377'
el = -107 (0x95) [unsigned value = 149]
 
el fits in the range of the vocabulary.  When el is cast to unsigned int the
value becames 
42949667189 and fails the test.  So, BitSet::member return that the
character is not in
the charVocabulary.
 
When BitSet::member is change to look like
 
bool BitSet::member(int el) const
{
   unsigned char tempEL = (unsigned char)el;
   if ( tempEL < 0 || static_cast<unsigned int>(tempEL) >= storage.size())
            return false;
 
   return storage[tempEL];
}
 
Every thing works ok.  However this approach will not support Unicode.
 
 
 
Trey Spiva
Senior Software Engineer
trey.spiva at embarcadero.com
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20020530/169734cd/attachment.html


More information about the antlr-interest mailing list