[antlr-interest] BitSet and big charVocabulary in C++

Vitaliy Akimov vitaliy.akimov at gmail.com
Fri Feb 16 01:44:49 PST 2007


Hi, I'm implementing unicode lexer using antlr v.2.7 (for C++). And
I've found annoyance with patterns which translated to BitSet. Using
big vocabulary lead to spending sensible amount of time for BitSet
construction.  Extending codeGenBitsetTestThreshold gets large
condition line in "if" statement (millions of symbols). Why doesn't
antlr generate conditions with not operator (!) in simple expressions
such as "(~ ('a'| 'z'))" ? And why does antlr copy generated bitset
from array of longs to vector<bool> which is very time consuming?  I
think it's more reasonable use reference to this packed array and
unpack bits in match function.


More information about the antlr-interest mailing list