[antlr-interest] Bug with C++ charVocabulary option
Trey Spiva
Trey.Spiva at embarcadero.com
Thu May 30 15:50:26 PDT 2002
When the C++ istream::get method is call it will return a char. The data
type char is a signed value.
Therefore, the range of a char is -127...127. When the BitSit::member(int
el) is called it test uses the
test:
if ( el < 0 || static_cast<unsigned int>(el) >= storage.size())
to test if a character is in the charVocabulary. When a character value is
greater than 127 it will
be a negative number. Therfore the static_cast<unsigned int> will return a
really large number.
Subsequently the result of the cast will be larger than the storage size.
Example:
charVocabulary='\3'...'\377'
el = -107 (0x95) [unsigned value = 149]
el fits in the range of the vocabulary. When el is cast to unsigned int the
value becames
42949667189 and fails the test. So, BitSet::member return that the
character is not in
the charVocabulary.
When BitSet::member is change to look like
bool BitSet::member(int el) const
{
unsigned char tempEL = (unsigned char)el;
if ( tempEL < 0 || static_cast<unsigned int>(tempEL) >= storage.size())
return false;
return storage[tempEL];
}
Every thing works ok. However this approach will not support Unicode.
Trey Spiva
Senior Software Engineer
trey.spiva at embarcadero.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20020530/169734cd/attachment.html
More information about the antlr-interest
mailing list