Gavin, It looks like ExifTool is extracting the XML metadata, but isn't translating it into ASCII - 60 63 120 109 108 is the "<?xml" header, and I'm sure that the rest of those values are the fulltext that you're looking for. According to the FAQ ( http://www.sno.phy.queensu.ca/~phil/exiftool/faq.html), the --charset exif=CHARSET will tell it to convert to your character set of choice. Regards, Alex On Tue, May 13, 2014 at 1:29 PM, Gavin Spomer <[log in to unmask]> wrote: > Thanks for the suggestion. I have it downloaded and installed on my test > server and have run it with various options on one of the tiff files Even > if this doesn't work for me, what a fantastic tool; I may have applications > for it later. :) > > Can't seem to get the text out of the tiff file though. Here's what I was > able to get: > > # exiftool -a -u 00000001.tif > ExifTool Version Number : 9.60 > File Name : 00000001.tif > Directory : . > File Size : 936 kB > File Modification Date/Time : 2013:07:17 15:59:31-07:00 > File Access Date/Time : 2014:05:13 10:07:30-07:00 > File Inode Change Date/Time : 2014:04:30 09:24:55-07:00 > File Permissions : rw-r--r-- > File Type : TIFF > MIME Type : image/tiff > Exif Byte Order : Little-endian (Intel, II) > Subfile Type : Full-resolution Image > Image Width : 4802 > Image Height : 7189 > Bits Per Sample : 1 > Compression : T6/Group 4 Fax > Photometric Interpretation : WhiteIsZero > Fill Order : Normal > Document Name : The Observer > Strip Offsets : (Binary data 195 bytes, use -b option to > extract) > Orientation : Horizontal (normal) > Samples Per Pixel : 1 > Rows Per Strip : 256 > Strip Byte Counts : (Binary data 166 bytes, use -b option to > extract) > X Resolution : 300 > Y Resolution : 300 > Page Name : 1 > T6 Options : (none) > Resolution Unit : inches > Software : ResCarta SDK v3.1.6 > Modify Date : 2013:06:28 19:13:26 > Exif 0x1637 : 60 63 120 109 108 32 118 101 114 115 105 > 111 110 61 34 [...] > Exif 0x1638 : 226 128 162 97 108 117 10 67 101 110 116 > 114 97 108 10 [...] > Exif 0x1639 : 133 156 203 142 38 57 110 133 247 245 44 > 57 64 232 70 7[...] > Image Size : 4802x7189 > > Not sure any of this helps me. > > > - Gavin > > > >>> "Reser, Gregory" <[log in to unmask]> 5/12/2014 3:30 PM >>> > You might try http://www.sno.phy.queensu.ca/~phil/exiftool/ , a Perl > library to read and write embedded metadata. > > Greg Reser > UC San Diego Library > 9500 Gilman Drive, 0175K > La Jolla, CA 92093-0175 > > Phone: 858.246.0998 > Skype: gregreser > > > > -----Original Message----- > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of > Stuart Yeates > Sent: Monday, May 12, 2014 3:26 PM > To: [log in to unmask] > Subject: Re: [CODE4LIB] Extracting Text From .tiff Files > > Your first step is to pin down the format. TIFF is a container form (like > zip) and can contain pretty much anything. Likely candidates for you format > include https://en.wikipedia.org/wiki/IPTC_Information_Interchange_Modeland > https://en.wikipedia.org/wiki/Extensible_Metadata_Platform > > Your second step is to find a library / tool for your platform that > supports your format. > > Cheers > stuart > > -----Original Message----- > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of > Gavin Spomer > Sent: Tuesday, 13 May 2014 10:01 a.m. > To: [log in to unmask] > Subject: [CODE4LIB] Extracting Text From .tiff Files > > Hello folks, > > I'm in the process of migrating a student newspaper collection, currently > implemented with ResCarta, into our new bepress institutional repository. > ResCarta has each page of a newspaper stored as a tiff file. Not only does > the tiff file contain the graphics data, but it has some metadata in xml > format and the fulltext of the page. I know this because I opened up some > of the tiffs with a plain-text editor (Vim). > > Although I can see the text in the file, I've only been about 90% accurate > in extracting it with a script. Some of those "weird" characters seem to do > some wonky things when doing file IO for some reason. Is there a more > reliable way to extract text stored in a tiff file? I've Googled and > Googled and have pulled up almost nothing. But there's got to be a way, > since ResCarta stores it there and can extract it. > > Any ideas? > Gavin Spomer > Systems Programmer > Brooks Library > Central Washington University >