I'm not quite convinced that it's marc-8 just because there's \xC2 ;). If you look at a hex dump I'm seeing a lot of what might be combining characters. The leader appears to have 'a' in the field to indicate unicode. In the raw hex I'm seeing a lot of two character sequences like: 756c 69c3 83c2 a872 (culi....r). If I knew my utf-8 better, I could guess what combining diacritics these are. Doing a look up on http://www.fileformat.info seems to indicate that this might be utf-8, a 'DIAERESIS' When debugging any encoding issue it's always good to know a) how the records were obtained b) how have they been manipulated before you touch them (basically, how many times may they have been converted by some bungling process)? c) what encoding they claim to be now? and d) what encoding they are, if any? It's been a while since I used Marc::Batch. Is there any reason you're using that instead of just using MARC::Record? I'd try just creating a MARC::Record object. I've seen people do really bizarre things to break MARC files such as editing the raw binary, thus invalidating the leader and the directory as the byte counts were no longer right) I hate to say it, but we still come across files that are no longer in any encoding due to too many bad conversions. It's possible these are as well. The enca tool (haven't used it much) guesses this at utf-8 mixed w/ "non-text data". Jon