[antlr-interest] <feff> ??

ian eyberg ian at telematter.com
Tue May 5 10:03:06 PDT 2009


Hi,

  someone has sent me a file to parse and there are all sorts of
'<feff>' characters in them in arbritrary spots -- looking it up
online it appears it's some sort of character to indicate what
encoding the strings are -- '(bom) byte order mark'

  my question -- what should I do with these? should I accept that
some files are going to have these and convert them to spaces as a
sort of pre-processor or should I take the easy way out and say
"we don't support this" ;)

  the person handing me the file says he never opened it in a text
editor and it was a piece of software on a OSX box

  maybe if I detect a bom in one of my documents I can convert the
entire file to the appropriate encoding first??

thanks,

-- 
ian eyberg


More information about the antlr-interest mailing list