On 11/20/13 11:40 AM, Scott Prater wrote: > Not sure what the details of our issue was on Monday -- but we do have > records that are supposedly encoded in UTF-8, but nonetheless contain > invalid characters. Oh, and I'd clarify, if you haven't figured it out already, if those are ISO 2709 binary records, you can ask the reader to do different things there in that case (already avail in current ruby-marc release): # raise: MARC::Reader("something.marc", :validate_encoding => true) # replace with unicode replacement char: MARC::Reader("something.marc", :invalid => :replace) This is already available in present ruby-marc release. I would suggest one or the other -- the default of leaving bad bytes in your ruby strings is asking for trouble, and you probably don't want to do it, but was made the default for backwards compat reasons with older versions of ruby-marc. (See why I am reluctant to add another default that we don't think hardly anyone would actually want? :) ) Oh, and you may also want to explicitly specify the expected encoding to avoid confusing: MARC::Reader("something.marc", :external_encoding => "UTF-8", :validate_encoding => true) (It will also work with any other encoding recognized by ruby, for those with legacy, possibly international, data). This stuff is confusing to explain, there are so many permutations and combinations of circumstances involved. But I'll try to improve the ruby-marc docs on this stuff, as part of adding the yet more options for MARC8 handling.