LISTSERV 16.5 - CODE4LIB Archives

Issue 51 of the Code4Lib Journal has been published. Many thanks to the
authors and the editorial committee!

The new issue is available at:
https://journal.code4lib.org/issues/issues/issue51

Here are the abstracts from this issue:

Issue 51, 2021-06-14
Editorial: Closer to 100 than to 1
<https://journal.code4lib.org/articles/15971>

Edward M. Corrado

With the publication of Issue 51, the Code4Lib Journal is now closer to
Issue 100 than we are to Issue 1. Also, we are developing a name change
policy.
Adaptive Digital Library Services: Emergency Access Digitization at the
University of Illinois at Urbana-Champaign During the COVID-19 Pandemic
<https://journal.code4lib.org/articles/15915>

Kyle R. Rimkus, Alex Dolski, Brynlee Emery, Rachael Johns, Patricia
Lampron, William Schlaack, Angela Waarala

This paper describes how the University of Illinois at Urbana-Champaign
Library provided access to circulating library materials during the 2020
COVID-19 pandemic. Specifically, it details how the library adapted
existing staff roles and digital library infrastructure to offer on-demand
digitization of and limited online access to library collection items
requested by patrons working in a remote teaching and learning environment.
The paper also provides an overview of the technology used, details how
dedicated staff with strong local control of technology were able to scale
up a university-wide solution, reflects on lessons learned, and analyzes
nine months of usage data to shed light on library patrons’ changing needs
during the pandemic.
Assessing High-volume Transfers from Optical Media at NYPL
<https://journal.code4lib.org/articles/15908>

Michelle Rothrock, Alison Rhonemus, and Nick Krabbenhoeft

NYPL’s workflow for transferring optical media to long-term storage was met
with a challenge: an acquisition of a collection containing thousands of
recordable CDs and DVDs. Many programs take a disk-by-disk approach to
imaging or transferring optical media, but to deal with a collection of
this size, NYPL developed a workflow using a Nimbie AutoLoader and a
customized version of KBNL’s open-source IROMLAB software to batch disks
for transfer. This workflow prioritized quantity, but, at the outset, it
was difficult to tell if every transfer was as accurate as it could be. We
discuss the process of evaluating the success of the mass transfer
workflow, and the improvements we made to identify and troubleshoot errors
that could occur during the transfer. A background of the institution and
other institutions’ approaches to similar projects is given, then an
in-depth discussion of the process of gathering and analyzing data. We
finish with a discussion of our takeaways from the project.
Better Together: Improving the Lives of Metadata Creators with Natural
Language Processing <https://journal.code4lib.org/articles/15946>

Paul Kelley

DC Public Library has long held digital copies of the full run of local
alternative weekly, Washington City Paper, but had no official status as a
rights grantor to enable use. That recently changed due to a full agreement
being reached with the publisher. One condition of that agreement, however,
was that issues become available with usable descriptive metadata and
subject access in time to celebrate the upcoming 40th anniversary of the
publication, which at that time was in six months.

One of the most time intensive tasks our metadata specialists work on is
assigning description to digital objects. This paper details how we applied
Python’s Natural Language Toolkit and OpenRefine’s reconciliation functions
to the collection’s OCR text to simplify subject selection for staff with
no background in programming.
Choose Your Own Educational Resource: Developing an Interactive OER Using
the Ink Scripting Language <https://journal.code4lib.org/articles/15721>

Stewart Baker

Learning games are games created with the purpose of educating, as well as
entertaining, players. This article describes the potential of interactive
fiction (IF), a type of text-based game, to serve as learning games. After
summarizing the basic concepts of interactive fiction and learning games,
the article describes common interactive fiction programming languages and
tools, including Ink, a simple markup language that can be used to create
choice based text games that play in a web browser. The final section of
the article includes code putting the concepts of Ink, interactive fiction,
and learning games into action using part of an interactive OER created by
the author in December of 2020.
Enhancing Print Journal Analysis for Shared Print Collections
<https://journal.code4lib.org/articles/15649>

Dana Jemison, Lucy Liu, Anna Striker, Alison Wohlers, Jing Jiang, and Judy
Dobry

The Western Regional Storage Trust (WEST
<https://cdlib.org/west/about-west/west-membership/>), is a distributed
shared print journal repository program serving research libraries, college
and university libraries, and library consortia in the Western Region of
the United States. WEST solicits serial bibliographic records and related
holdings biennially, which are evaluated and identified as candidates for
shared print archiving using a complex collection analysis process.
California Digital Library’s Discovery & Delivery
<https://cdlib.org/services/d2d/> WEST operations team (WEST-Ops) supports
the functionality behind this collection analysis process used by WEST
program staff (WEST-Staff) and members.

For WEST, proposals for shared print archiving have been historically
predicated on what is known as an Ulrich’s journal family
<https://www.ulrichsweb.com/ulrichsweb/faqs.asp>, which pulls together
related serial titles, for example, succeeding and preceding serial titles,
their supplements, and foreign language parallel titles. Ulrich’s, while it
has been invaluable, proves problematic in several ways, resulting in the
approximate omission of half of the journal titles submitted for collection
analysis.

Part of WEST’s effectiveness in archiving hinges upon its ability to
analyze local serials data across its membership as holistically as
possible. The process that enables this analysis, and subsequent archiving
proposals, is dependent on Ulrich’s journal family, for which ISSN has been
traditionally used to match and cluster all related titles within a
particular family. As such, the process is limited in that many journals
have never been assigned ISSNs, especially older publications, or member
bibliographic records may lack an ISSN(s), though the ISSN may exist in an
OCLC primary record.

Building a mechanism for matching on ISSNs that goes beyond the base set of
primary, former, and succeeding titles, expands the number of eligible
ISSNs that facilitate Ulrich’s journal family matching. Furthermore, when
no matches in Ulrich’s can be made based on ISSN, other types of control
numbers within a bibliographic record may be used to match with records
that have been previously matched with an Ulrich’s journal family via ISSN,
resulting in a significant increase in the number of titles eligible for
collection analysis.

This paper will discuss problems in Ulrich’s journal family matching,
improved functional methodologies developed to address those problems, and
potential strategies to improve in serial title clustering in the future.
How We Built a Spatial Subject Classification Based on Wikidata
<https://journal.code4lib.org/articles/15875>

Adrian Pohl

From the fall of 2017 to the beginning of 2020 a project had been carried
out to upgrade spatial subject indexing in North Rhine-Westphalian
Bibliography (NWBib) from uncontrolled strings to controlled values. For
this purpose, a spatial classification with around 4,500 entries was
created from Wikidata and published as SKOS (Simple Knowledge Organization
System) vocabulary. The article gives an overview over the initial problem
and outlines the different implementation steps.
Institutional Data Repository Development, a Moving Target
<https://journal.code4lib.org/articles/15821>

Colleen Fallaw, Genevieve Schmitt, Hoa Luong, Jason Colwell, and Jason
Strutz

At the end of 2019, the Research Data Service (RDS) at the University of
Illinois at Urbana-Champaign (UIUC) completed its fifth year as a
campus-wide service. In order to gauge the effectiveness of the RDS in
meeting the needs of Illinois researchers, RDS staff developed a five-year
review consisting of a survey and a series of in-depth focus group
interviews. As a result, our institutional data repository developed
in-house by University Library IT staff, Illinois Data Bank, was recognized
as the most useful service offering by our unit. When launched in 2016,
storage resources and web servers for Illinois Data Bank and supporting
systems were hosted on-premises at UIUC. As anticipated, researchers
increasingly need to share large, and complex datasets. In a responsive
effort to leverage the potentially more reliable, highly available,
cost-effective, and scalable storage accessible to computation resources,
we migrated our item bitstreams and web services to the cloud. Our efforts
have met with success, but also with painful bumps along the way. This
article describes how we supported data curation workflows through
transitioning from on-premises to cloud resource hosting. It details our
approaches to ingesting, curating, and offering access to dataset files up
to 2TB in size–which may be archive type files (e.g., .zip or .tar)
containing complex directory structures.
On the Nature of Extreme Close-Range Photogrammetry: Visualization and
Measurement of North African Stone Points
<https://journal.code4lib.org/articles/15769>

Michael J. Bennett

Image acquisition, visualization, and measurement are examined in the
context of extreme close-range photogrammetric data analysis. Manual
measurements commonly used in traditional stone artifact investigation are
used as a starting point to better gauge the usefulness of high-resolution
3D surrogates and the flexible digital tool sets that can work with them.
The potential of various visualization techniques are also explored in the
context of future teaching, learning, and research in virtual environments.
Optimizing Elasticsearch Search Experience Using a Thesaurus
<https://journal.code4lib.org/articles/15749>

Emmanuel Di Pretoro, Edwin De Roock, Wim Fremout, Erik Buelinckx, Stephanie
Buyle, Véronique Van der Stede

The Belgian Art Links and Tools (BALaT) (http://balat.kikirpa.be/) is the
continuously expanding online documentary platform of the Royal Institute
for Cultural Heritage (KIK-IRPA), Brussels (Belgium). BALaT contains over
750,000 images of KIK-IRPA’s unique collection of photo negatives on the
cultural heritage of Belgium, but also the library catalogue, PDFs of
articles from KIK-IRPA’s Bulletin and other publications, an extensive
persons and institutions authority list, and several specialized thematic
websites, each of those collections being multilingual as Belgium has three
official languages. All these are interlinked to give the user easy access
to freely available information on the Belgian cultural heritage. During
the last years, KIK-IRPA has been working on a detailed and inclusive data
management plan. Through this data management plan, a new project HESCIDA
(Heritage Science Data Archive) will upgrade BALaT to BALaT+, enabling
access to searchable registries of KIK-IRPA datasets and data
interoperability. BALaT+ will be a building block of DIGILAB, one of the
future pillars of the European Research Infrastructure for Heritage Science
(E-RIHS), which will provide online access to scientific data concerning
tangible heritage, following the FAIR-principles
(Findable-Accessible-Interoperable-Reusable). It will include and enable
access to searchable registries of specialized digital resources (datasets,
reference collections, thesauri, ontologies, etc.). In the context of this
project, Elasticsearch has been chosen as the technology empowering the
search component of BALaT+. An essential feature of this search
functionality of BALaT+ is the need for linguistic equivalencies, meaning a
term query in French should also return the matching results containing the
equivalent term in Dutch. Another important feature is to offer a mechanism
to broaden the search with elements of more precise terminology: a term
like “furniture” could also match records containing chairs, tables, etc.
This article will explain how a thesaurus developed in-house at KIK-IRPA
was used to obtain these functionalities, from the processing of that
thesaurus to the production of the configuration needed by Elasticsearch.
Pythagoras: Discovering and Visualizing Musical Relationships Using
Computer Analysis <https://journal.code4lib.org/articles/15949>

Brandon Bellanti

This paper presents an introduction to Pythagoras, an in-progress digital
humanities project using Python to parse and analyze XML-encoded music
scores. The goal of the project is to use recurring patterns of notes to
explore existing relationships among musical works and composers.

An intended outcome of this project is to give music performers, scholars,
librarians, and anyone else interested in digital humanities new insights
into musical relationships as well as new methods of data analysis in the
arts.