[antlr-interest] XML Unicode char filtering

avoelker avoelker at yahoo.com
Fri Sep 6 17:19:18 PDT 2002


For a Unicode lexer, what is the best way to filter-out a complex set 
of character ranges?

I suppose my choices are:
1. Define a huge set of ranges in the CharVocabulary option.
2. Use a complex filter rule.

I'm filtering all invalid Unicode characters, which are defined by: 
http://www.w3.org/TR/REC-xml#CharClasses

Currently, I'm using the second choice, a filter rule. For the first 
choice, it would be great if the CharVocabulary option could take a 
lexer rule.

Thanks,
Andrew


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list