I recently watched a recorded AI4LAM Community Call on RAG (retrieval-augmented generation) in libraries, and I can not recommend it highly enough. It was excellent. [1]
The first presentation was on WARC-GPT where folks at Harvard took their WARC files, parsed out underlying text, vectorized ("indexed") the text, and provided a chat interface to the result. [2] In other words, they had a collection of collections, and they provided AI-augmented services against them to support learning. Kudos to the presenters (Matteo Cargnelutti and Kristi Mukk) as well as the Library Innovation Lab where they work. Three points I believe were particularly salient:
1. librarians and engineers must work together
2. think like a librarian (metacognition): reflect, refine, repeat
3. the librarianship of AI: the study of models, their
implementation, usage, and behavior as a way of helping
users make informed decisions and empowering them to
use AI responsibly.
The second presentation, by Daniel Hutchinson of Belmont Abbey College, described a RAG system using the writings of and about Abraham Lincoln as the underlying content -- Nicolay. [3] Lincoln's writings were amassed, augmented with metadata, modeled in a number of different ways, questions are given, relevant documents are identified, and the results reformulated as a response. This presentation outlined very well many of the concerns I hear here and from my colleagues about the utility of generative AI.
The third presentation was by Adam Faci and Antoine Silvestre de Sacy of Le Huma-Num Lab. [4] They described how they wanted to get more meaningful results from bibliographic search. Their solution was to index documents in the traditional way, search, identify author relationships, paper relationships, detect communities, extract paper semantics, and use all of these things as the input to a RAG interface. The attached image illustrates their overall approach. A succinct take-away from their presentation is, "RAG should be used on specific corpuses."
Unlike my toy implementations of a month ago, all of these implementations employed robust RAG pipelines thus making their results much more useful.
Again, I was very impressed. The presentations demonstrated some things I believe will come to fruition in libraries. We have collections. We curate the collections. We then provide various services against these collections: search, consultation, exhibit, browse, borrow, read, digitize, print, etc. In the near future (say, a few years) I think we will want to provide an additional type of search -- RAG, and RAG will not be a replacement of other services. Instead it will be a supplement. Moreover, I think we ought to learn how these technologies work before vendors begin to include them in their services.
Links
[1] recorded session - https://bit.ly/3R7XwA6
[2] WARC GPT - https://bit.ly/4c2IZOd
[3] Nicolay - https://bit.ly/4e4wRhe
[4] Le Huma-Num Lab - https://bit.ly/3yDVz86
--
Eric Lease Morgan <[log in to unmask]>
Navari Family Center for Digital Scholarship
|