I have been playing with a new toy -- a question and answer system. [1, 2]
Here's how it works. Save a document as a plain text file. The document can be just about anything that makes sense. Examples include: a job posting, a conference announcement, or a journal article. Apply a previously created machine learning model to the document, and the result is a list of questions. Feed the list of questions and the document to another model, and get back a list of answers. These models are embedded and configurable in a couple of Python scripts, as the links below outline. Most of the models are available from a repository of models called Hugging Face. [3]
I applied my implementation to a message sent to our list earlier today, and a few of the more interesting questions and answers include:
How much do participants travel stipends?
answer: up to $1000
context: rous support from the Mellon Foundation, participant
travel stipends (up to $1000) are available to offset air
and/or ground transportation, parking,
What date will we follow up with you if your application is accepted?
answer: February 3, 2023
context: application is accepted, we will follow up with you no
later than February 3, 2023. For more details, including an
agenda, see the Event Website <ht
What is a publication medium that is both a primary source and a networked
container of primary sources?
answer: the web
context: is both a primary source and a networked container of
primary sources, the web presents challenges of scale and
complexity for those that seek to int
The full list of about twenty questions and answers is attached.
I did this same sort of thing against chapters in Moby Dick, asked questions like "Who is Ahab?", "Where did they sail?", and "What is whaling?" The answers are often times quite plausible.
This sort of system can be applied more broadly in Library Land. Students, researchers, and scholars are suffering from information overload; we all continue to drink from the proverbial firehose. Given something like the system outlined above, librarians and libraries can go beyond providing access to data, information, knowledge. More specifically, we can support the process of using & understanding data, information, and knowledge.
Fun with digital scholarship?
[1] generate questions - https://haystack.deepset.ai/tutorials/13_question_generation
[2] answer questions - https://haystack.deepset.ai/tutorials/01_basic_qa_pipeline
[3] Hugging Face - https://huggingface.co/models
--
Eric Lease Morgan
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame
https://cds.library.nd.edu
Questions and answers
This is a list of questions and answers rooted in a conference annoucement posted to the Code4Lib mailing list. The announcment was fed to a machine learning model which returned a list of questions. The questions were then fed to another model which returned answers. In this particular case, the answers are more than plausible, if not 100% accurate. Fun with the digital scholarship. --Eric Lease Morgan <[log in to unmask]>, January 12, 2023
How much do participants travel stipends?
answer: up to $1000
context: rous support from the Mellon Foundation, participant
travel stipends (up to $1000) are available to offset air
and/or ground transportation, parking,
On what date will the workshop be held alongside the ACRL 2023 Conference?
answer: March 15, 2023
context: https://archive-it.org/blog/digital-scholarship-and-the-web/>
held on March 15, 2023 alongside the ACRL 2023 Conference
<https://acrl2023.us2.pathable.c
What date will we follow up with you if your application is accepted?
answer: February 3, 2023
context: application is accepted, we will follow up with you no
later than February 3, 2023. For more details, including an
agenda, see the Event Website <ht
What does the Event Website contain?
answer: an agenda
context: with you no later than February 3, 2023. For more
details, including an agenda, see the Event Website
<https://archive-it.org/blog/digital-scholarsh
What do participants gain familiarity with using web archives?
answer: web archive research use cases and how libraries support them
context: s as a primary source, gain familiarity with web
archive research use cases and how libraries support them; and
acquire hands-on experience creating w
What is a publication medium that is both a primary source and a networked container of primary sources?
answer: the web
context: is both a primary source and a networked container of
primary sources, the web presents challenges of scale and
complexity for those that seek to int
What is required to attend the workshop?
answer: applicants
context: The Internet Archive <https://archive.org/> invites
applicants to a daylong workshop Digital Scholarship and the
Web: An Introduction to Data Analysi
What is the acronym for Archives Research Compute Hub?
answer: ARCH
context: putationally analyzing web archives using Archives
Research Compute Hub (ARCH)
<https://webservices.archive.org/pages/arch>. Participant
Support This
What is the maximum amount of travel stipends?
answer: $1000
context: s support from the Mellon Foundation, participant
travel stipends (up to $1000) are available to offset air
and/or ground transportation, parking, two
What is the priority deadline for all applications?
answer: January 27, 2023
context: space is limited and the priority deadline for all
applications is January 27, 2023. If your application is
accepted, we will follow up with you no la
What kind of support does the Mellon Foundation provide?
answer: generous support from the Mellon Foundation, participant travel stipends
context: ever registration is limited, and with generous
support from the Mellon Foundation, participant travel stipends
(up to $1000) are available to offset
What type of production occurs globally?
answer: digital information
context: 023.us2.pathable.com/> in Pittsburgh, PA. Every day,
significant digital information production occurs globally,
much of it across the web (e.g., new
What will participants learn about web archives as a primary source?
answer: familiarity with web archive research use cases and how libraries support them
context: archives as a primary source, gain familiarity with
web archive research use cases and how libraries support them;
and acquire hands-on experience cr
Where can you send any questions?
answer: [log in to unmask]
context: genda, see the Event Website
<https://archive-it.org/blog/digital-scholarship-and-the-web/>.
Please direct any questions to [log in to unmask]
Where can you submit an application?
answer: The Internet Archive
context: The Internet Archive <https://archive.org/> invites
applicants to a daylong workshop Digital Scholarship and the
Web: An Introduction to Data Analysi
Where is the workshop held?
answer: Pittsburgh, PA
context: de the ACRL 2023 Conference
<https://acrl2023.us2.pathable.com/> in Pittsburgh, PA. Every
day, significant digital information production occurs glob
Who invites applicants to a daylong workshop on Digital Scholarship and the Web: An Introduction to Data Analysis and Instruction?
answer: The Internet Archive
context: The Internet Archive <https://archive.org/> invites
applicants to a daylong workshop Digital Scholarship and the
Web: An Introduction to Data Analysi
|