Hi Terry,
On Thu, Mar 8, 2012 at 2:36 PM, Reese, Terry
<[log in to unmask]> wrote:
> This is one of the reasons you really can't trust the information found in position 9. This is one of the reasons why when I wrote MarcEdit, I utilize a mixed process when working with data and determining characterset -- a process that reads this byte and takes the information under advisement, but in the end treats it more as a suggestion and one part of a larger heuristic analysis of the record data to determine whether the information is in UTF8 or not. Fortunately, determining if a set of data is in UTF8 or something else, is a fairly easy process. Determining the something else is much more difficult, but generally not necessary.
Can you describe in a bit more detail how MARCEdit sniffs the record
to determine the encoding? This has come up enough times w/ pymarc to
make it worth implementing.
//Ed
|