On Thu, Mar 8, 2012 at 3:32 PM, Godmar Back <[log in to unmask]> wrote:
> One side comment here; while smart handling/automatic detection of
> encodings would be a nice feature to have, it would help if pymarc could
> operate in an 'agnostic', or 'raw' mode where it would simply preserve the
> encoding that's there after a record has been read when writing the record.
>
> [ Right now, pymarc does not have such a mode - if leader[9] == 'a', the
> data is unconditionally utf8 encoded on output as per mbklein's patch. ]
Please feel free to write a patch and submit a pull request if you're
able to contribute code to do this.
On Thu, Mar 8, 2012 at 3:45 PM, Jonathan Rochkind <[log in to unmask]> wrote:
> The thing that's encoding as unicode on the way out? Instead of raising on
> an invalid char, it should have the option of silently eating it, replacing
> it with either empty string or the unicode "replacement character" ( "used
> to replace an incoming character whose value is unknown or unrepresentable
> in Unicode" [http://www.fileformat.info/info/unicode/char/fffd/index.htm] )
There is a way to do this in Python, see the discussion of the Unicode
type in [0].
[0] http://docs.python.org/howto/unicode.html
Mark
|