Print

Print


I have been playing with a new toy -- a question and answer system. [1, 2]

Here's how it works. Save a document as a plain text file. The document can be just about anything that makes sense. Examples include: a job posting, a conference announcement, or a journal article. Apply a previously created machine learning model to the document, and the result is a list of questions. Feed the list of questions and the document to another model, and get back a list of answers. These models are embedded and configurable in a couple of Python scripts, as the links below outline. Most of the models are available from a repository of models called Hugging Face. [3]

I applied my implementation to a message sent to our list earlier today, and a few of the more interesting questions and answers include:

  How much do participants travel stipends?

     answer: up to $1000
    context: rous support from the Mellon Foundation, participant
             travel stipends (up to $1000) are available to offset air
             and/or ground transportation, parking, 

  What date will we follow up with you if your application is accepted?

     answer: February 3, 2023
    context: application is accepted, we will follow up with you no
             later than February 3, 2023. For more details, including an
              agenda, see the Event Website <ht

  What is a publication medium that is both a primary source and a networked
  container of primary sources?

     answer: the web
    context: is both a primary source and a networked container of
             primary sources, the web presents challenges of scale and
             complexity for those that seek to int


The full list of about twenty questions and answers is attached.

I did this same sort of thing against chapters in Moby Dick, asked questions like "Who is Ahab?", "Where did they sail?", and "What is whaling?" The answers are often times quite plausible.

This sort of system can be applied more broadly in Library Land. Students, researchers, and scholars are suffering from information overload; we all continue to drink from the proverbial firehose. Given something like the system outlined above, librarians and libraries can go beyond providing access to data, information, knowledge. More specifically, we can support the process of using & understanding data, information, and knowledge.

Fun with digital scholarship?


[1] generate questions - https://haystack.deepset.ai/tutorials/13_question_generation
[2] answer questions - https://haystack.deepset.ai/tutorials/01_basic_qa_pipeline
[3] Hugging Face - https://huggingface.co/models

--
Eric Lease Morgan
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame

https://cds.library.nd.edu