Print

Print


Hi Terry,

On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry
<[log in to unmask]> wrote:
> This is one of the reasons you really can't trust the information found in position 9.  This is one of the reasons why when I wrote MarcEdit, I utilize a mixed process when working with data and determining characterset -- a process that reads this byte and takes the information under advisement, but in the end treats it more as a suggestion and one part of a larger heuristic analysis of the record data to determine whether the information is in UTF8 or not.  Fortunately, determining if a set of data is in UTF8 or something else, is a fairly easy process.  Determining the something else is much more difficult, but generally not necessary.

Can you describe in a bit more detail how MARCEdit sniffs the record
to determine the encoding? This has come up enough times w/ pymarc to
make it worth implementing.

//Ed