Katherine,
Thanks for the suggestion. The AI metadata assistant might be useful for books, but the material we are ingesting is mostly early education "kits". i.e. custom creations of objects related to an early educational development concept combined into a small box. Some might contain a book but also toys, games, and other items for manipulation and stimulation of children's brains. Definitely NOT standard material.
Thanks,
Erich
On Monday, July 28, 2025 at 04:23, Katherine O'brien eloquently inscribed:
> Likely something you've already thought of and ruled out, but it looks
> like you're on Alma/Primo so could you try using Ex Libris's AI Metadata
> Assistant<https://knowledge.exlibrisgroup.com/Alma/Product_Documentatio
> n/010Alma_Online_Help_(English)/Metadata_Management/005Introduction
> _to_Metadata_Management/The_AI_Metadata_Assistant_in_the_Metadata_
> Editor>?
>
> Katherine
>
> If using assistive software, message ends here. Email signature below.
>
> Katherine O’Brien (she/her<https://pronouns.org/what-and-why>) |
> Application Administrator, Online Services University Library |
> Fremantle Campus - located on Nyungar land The University of Notre Dame
> Australia T: +61 8 9433 0703 |
> [log in to unmask]<mailto:[log in to unmask]>
>
> I respect and acknowledge the Traditional owners of the land on which I live
> and work as the First People and Custodians of this country.
>
> ________________________________
> From: Code for Libraries <[log in to unmask]> on behalf of Hammer,
> Erich F <[log in to unmask]>
> Sent: 22 July 2025 10:46 PM
> To: [log in to unmask] <[log in to unmask]>
> Subject: Re: [CODE4LIB] Converting image of MARC to text MARC?
>
> It's a not-very-interesting story of disorganization, poor communication, too
> few employees and a touch of corporate greed:
>
> A nearby, small college shuttered. Our University decided to try to scoop up
> the well-regarded early-education program and snag the former library's
> unique collection of educational "kits". The former site was scheduled for
> deletion in short-order, and ExLibris essentially tried to extort us for a
> ridiculously astronomical amount to give us the records. Nobody thought to
> ask our sole developer (who may have been able to scrape the records in a
> useable format) until they had just left for a 3-month parental leave, so
> someone assigned a student to manually bring up all the records to capture
> the information. Their solution was to generate PDFs of every page. The site
> and data is no more at this point, so we have what we have.
>
> The PDFs were generated with text, not OCR'd (as I originally
> suggested), so the text is accurate. However, the strings are broken
> up, and of course, PDF readers don't know how the text "fits" together.
> Thus, selected text is recognized in columns, but not of the same length
> due to wrapping. It's a mess.
>
> Erich
>
> On Monday, July 21, 2025 at 21:48, Kyle Banerjee eloquently inscribed:
>
>> On Mon, Jul 21, 2025 at 12:20 PM Hammer, Erich F <[log in to unmask]>
>> wrote:
>>
>>> Without going into details, we inherited a sizeable collection of physical
>>> materials from another library, and were only able to capture the unique
>>> MARC records in image (PDF) form.
>>
>> The details provide the parameters for the easiest/best methods (and
>> it's hard to imagine there's not a good story behind getting stuck with
>> images of records without actually having records). I assume there's a
>> reason you don't just do the conversion in Acrobat or use one of the
>> many utilities or services.
>>
>> A true OCR process is likely to be error prone, I'd be concerned about
>> positional data and encoding issues even if the other stuff is right.
>> Parsing for identifiers and downloading actual MARC records might prove
>> faster and more reliable if these aren't local only.
>>
>> kyle
>
> Disclaimer
>
> The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and others authorized
> to receive it. If you are not the recipient, you are hereby notified that any
> disclosure, copying, distribution or taking action in relation of the contents of
> this information is strictly prohibited and may be unlawful.
>
> This email has been scanned for viruses and malware, and may have been
> automatically archived by Mimecast Ltd, an innovator in Software as a Service
> (SaaS) for business. Providing a safer and more useful place for your human
> generated data. Specializing in; Security, archiving and compliance. To find out
> more visit the Mimecast website.
|