LISTSERV 16.5 - CODE4LIB Archives

On Feb 26, 2024, at 4:05 PM, Eric Lease Morgan <[log in to unmask]> wrote:

> Who out here in Code4Lib Land is practicing with either one or both of the following things: 1) fine-tuning large-language models, or 2) retrieval-augmented generation (RAG). If there is somebody out there, then I'd love to chat...


Many things.

  1. First of all, I'm happy there were a number of different replies. I learned something.

  2. Second, I believe the phrase "artificial intelligence" (AI) is poor choice of words if not a misnomer. What is "intelligence" anyway, and why should I give any credence to fake intelligence? Is the ability to do mathematics very quickly intelligent? Is the ability to store and retrieve vast amounts of information intelligent? I say not but some people call such things "smart". AI has ebbed & flowed over the course of computing history. In the 1990's AI was implemented as "expert systems". We are experiencing an ebb.

  3. Third, computer technology evolves. Think of the all the computer technology evolutions libraries have experienced. Cards to MARC. MARC to OPAC. Print indexes to indexes on CD-ROMS. Field searching to free text searching with relevancy ranking. Every time these things happen, some blindly embrace the evolution, some are skeptical, and some believe the evolution is a fad. This is natural. Generative AI is just another example; the current flavor of AI is a mash-up of natural language processing, image processing, and data science all on steroids.

  4. Fourth, with the incarnation of generative AI, for the first time in my life, I feel threatened by a computer. A computer can do some of my job. It can write software. It can summarize text. It can classify text. It can create MARC records. Yikes!?

  5. Fifth, I found a few places to discuss AI in libraries. First of all there is the AI4LAM Slack channel, and there are couple of similar sub-channels in the Code4Lib Slack (#ai-dl-ml and #generative-ai). 

  6. Sixth, a few projects where brought to my attention, and of particular interest to me were WARC-GPT, Talpa, and Daybooks of Susan B. Anthony. [1, 2, 3] In each of these cases the developers: 1) had a collection, 2) used large-language model technology to index/analyze the content, 3) provided a mechanism to query the collection/analysis, and 4) returned a useful result.

Finally, I see generative AI as a tool, and just like any other tool -- a hammer, for example -- one needs to practice in order to use it effectively. My toolbox is getting bigger.


Links

[1] WARC-GPT - https://github.com/harvard-lil/warc-gpt
[2] Talpa - https://www.talpa.ai/
[3] Daybooks of Susan B. Anthony - https://thisismattmiller.com/post/using-gpt-on-library-collections/

--
Eric Morgan <[log in to unmask]>
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame