On Thu, Mar 8, 2012 at 3:32 PM, Godmar Back <[log in to unmask]> wrote: > One side comment here; while smart handling/automatic detection of > encodings would be a nice feature to have, it would help if pymarc could > operate in an 'agnostic', or 'raw' mode where it would simply preserve the > encoding that's there after a record has been read when writing the record. > > [ Right now, pymarc does not have such a mode - if leader[9] == 'a', the > data is unconditionally utf8 encoded on output as per mbklein's patch. ] Please feel free to write a patch and submit a pull request if you're able to contribute code to do this. On Thu, Mar 8, 2012 at 3:45 PM, Jonathan Rochkind <[log in to unmask]> wrote: > The thing that's encoding as unicode on the way out? Instead of raising on > an invalid char, it should have the option of silently eating it, replacing > it with either empty string or the unicode "replacement character" ( "used > to replace an incoming character whose value is unknown or unrepresentable > in Unicode" [http://www.fileformat.info/info/unicode/char/fffd/index.htm] ) There is a way to do this in Python, see the discussion of the Unicode type in [0]. [0] http://docs.python.org/howto/unicode.html Mark