LISTSERV mailing list manager LISTSERV 16.5

Help for CODE4LIB Archives


CODE4LIB Archives

CODE4LIB Archives


CODE4LIB@LISTS.CLIR.ORG


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CODE4LIB Home

CODE4LIB Home

CODE4LIB  September 2021

CODE4LIB September 2021

Subject:

Issue 52 of the Code4Lib Journal has been published

From:

Mark Swenson <[log in to unmask]>

Reply-To:

Code for Libraries <[log in to unmask]>

Date:

Wed, 22 Sep 2021 19:42:59 +0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (75 lines)

Issue 52 of the Code4Lib Journal has been published.  Many thanks to the authors and the editorial committee!

The new issue is available at: https://journal.code4lib.org/issues/issues/issue52

Here are the abstracts from this issue:

Issue 52, 2021-09-22

Editorial : The Cost of Knowing Our Users<https://journal.code4lib.org/articles/16208>

Mark Swenson

Some musings on the difficulty of wanting to know our users’ secrets and simultaneously wanting to not know them.

Building and Maintaining Metadata Aggregation Workflows Using Apache Airflow<https://journal.code4lib.org/articles/16171>

Leanne Finnigan and Emily Toner

PA Digital is a Pennsylvania network that serves as the state’s service hub for the Digital Public Library of America (DPLA). The group developed a homegrown aggregation system in 2014, used to harvest digital collection records from contributing institutions, validate and transform their metadata, and deliver aggregated records to the DPLA. Since our initial launch, PA Digital has expanded significantly, harvesting from an increasing number of contributors with a variety of repository systems. With each new system, our highly customized aggregator software became more complex and difficult to maintain. By 2018, PA Digital staff had determined that a new solution was needed. From 2019 to 2021, a cross-functional team implemented a more flexible and scalable approach to metadata aggregation for PA Digital, using Apache Airflow for workflow management and Solr/Blacklight for internal metadata review. In this article, we will outline how we use this group of applications and the new workflows adopted, which afford our metadata specialists more autonomy to contribute directly to the ongoing development of the aggregator. We will discuss how this work fits into our broader sustainability planning as a network and how the team leveraged shared expertise to build a more stable approach to maintenance.

Closing the Gap between FAIR Data Repositories and Hierarchical Data Formats<https://journal.code4lib.org/articles/16223>

Connor B. Bailey, Fedor F. Balakirev, and Lyudmila L. Balakireva

Many in the scientific community, particularly in publicly funded research, are pushing to adhere to more accessible data standards to maximize the findability, accessibility, interoperability, and reusability (FAIR) of scientific data, especially with the growing prevalence of machine learning augmented research. Online FAIR data repositories, such as the Open Science Framework (OSF), help facilitate the adoption of these standards by providing frameworks for storage, access, search, APIs, and other features that create organized hubs of scientific data. However, the wider acceptance of such repositories is hindered by the lack of support of hierarchical data formats, such as Technical Data Management Streaming (TDMS) and Hierarchical Data Format 5 (HDF5), that many researchers rely on to organize their datasets. Various tools and strategies should be used to allow hierarchical data formats, FAIR data repositories, and scientific organizations to work more seamlessly together. A pilot project at Los Alamos National Laboratory (LANL) addresses the disconnect between them by integrating the OSF FAIR data repository with hierarchical data renderers, extending support for additional file types in their framework. The multifaceted interactive renderer displays a tree of metadata alongside a table and plot of the data channels in the file. This allows users to quickly and efficiently load large and complex data files directly in the OSF webapp. Users who are browsing files can quickly and intuitively see the files in the way they or their colleagues structured the hierarchical form and immediately grasp their contents. This solution helps bridge the gap between hierarchical data storage techniques and FAIR data repositories, making both of them more viable options for scientific institutions like LANL which have been put off by the lack of integration between them.

Conspectus: A Syllabi Analysis Platform for Leganto Data Sources<https://journal.code4lib.org/articles/15995>

David Massey, Thomas Sødring

In recent years, higher education institutions have implemented electronic solutions for the management of syllabi, resulting in new and exciting opportunities within the area of large-scale syllabi analysis. This article details an information pipeline that can be used to harvest, enrich and use such information.

Core Concepts and Techniques for Library Metadata Analysis<https://journal.code4lib.org/articles/16078>

Stacie Traill and Martin Patrick

Metadata analysis is a growing need in libraries of all types and sizes, as demonstrated in many recent job postings. Data migration, transformation, enhancement, and remediation all require strong metadata analysis skills. But there is no well-defined body of knowledge or competencies list for library metadata analysis, leaving library staff with analysis-related responsibilities largely on their own to learn how to do the work effectively. In this paper, two experienced metadata analysts will share what they see as core knowledge areas and problem solving techniques for successful library metadata analysis. The paper will also discuss suggested tools, though the emphasis is intentionally not to prescribe specific tools, software, or programming languages, but rather to help readers recognize tools that will meet their analysis needs. The goal of the paper is to help library staff and their managers develop a shared understanding of the skill sets required to meet their library’s metadata analysis needs. It will also be useful to individuals interested in pursuing a career in library metadata analysis and wondering how to enhance their existing knowledge and skills for success in analysis work.

Digitization Decisions: Comparing OCR Software for Librarian and Archivist Use<https://journal.code4lib.org/articles/16132>

Leanne Olson and Veronica Berry

This paper is intended to help librarians and archivists who are involved in digitization work choose optical character recognition (OCR) software. The paper provides an introduction to OCR software for digitization projects, and shares the method we developed for easily evaluating the effectiveness of OCR software on resources we are digitizing.

We tested three major OCR programs (Adobe Acrobat, ABBYY FineReader, Tesseract) for accuracy on three different digitized texts from our archives and special collections at the University of Western Ontario. Our test was divided into two parts: a word accuracy test (to determine how searchable the final documents were), and a test with a screen reader (to determine how accessible the final documents were). We share our findings from the tests and make recommendations for OCR work on digitized documents from archives and special collections.

Introducing SAGE: An Open-Source Solution for Customizable Discovery Across Collections<https://journal.code4lib.org/articles/15740>

David B. Lowe, James Creel, Elizabeth German, Douglas Hahn, and Jeremy Huff

Digital libraries at research universities make use of a wide range of unique tools to enable the sharing of eclectic sets of texts, images, audio, video, and other digital objects. Presenting these assorted local treasures to the world can be a challenge, since text is often siloed with text, images with images, and so on, such that per type, there may be separate user experiences in a variety of unique discovery interfaces. One common tool that has been developed in recent years to potentially unite them all is the Apache Solr index. Texas A&M University (TAMU) Libraries has harnessed Solr for internal indexing for repositories like DSpace, Fedora, and Avalon. Impressed by frameworks like Blacklight at peer institutions, TAMU Libraries wrote an analogous set of tools in Java, and thus was born SAGE, the Solr AGgregation Engine, with two primary functions: 1) aggregating Solr indices or “cores,” from various local sources, and 2) presenting search facility to the user in a discovery interface.

Leveraging a Custom Python Script to Scrape Subject Headings for Journals<https://journal.code4lib.org/articles/16080>

Shelly R. McDavid, Eric McDavid, and Neil E. Das

In our current library fiscal climate with yearly inflationary cost increases of 2-6+% for many journals and journal package subscriptions, it is imperative that libraries strive to make our budgets go further to expand our suite of resources. As a result, most academic libraries annually undertake some form of electronic journal review, employing factors such as cost per use to inform budgetary decisions. In this paper we detail some tech savvy processes we created to leverage a Python script to automate journal subject heading generation within the OCLC’s WorldCat catalog, the MOBIUS (A Missouri Library Consortium) Catalog, and the VuFind Library Catalog, a now retired catalog for the CARLI (Consortium for Academic and Research Libraries in Illinois). We also describe the rationale for the inception of this project, the methodology we utilized, the current limitations, and details of our future work in automating our annual analysis of journal subject headings by use of an OCLC API.

On Two Proposed Metrics of Electronic Resource Use<https://journal.code4lib.org/articles/16087>

William Denton

There are many ways to look at electronic resource use, individually or aggregated. I propose two new metrics to help give a better understanding of comparative use across an online collection. Users per mille is a relative annual measure of how many users a platform had for every thousand potential users: this tells us how many people used a given platform. Interest factor is the average number of uses of a platform by people who used it more than once: this tells us how much people used a given platform. These two metrics are enough to give us good insight into collection use. Dividing each into quartiles allows a quadrant comparison of lows and highs on each metric, giving a quick view of platforms many people use a lot (the big expensive ones), many people use very little (a curious subset), a few people use a lot (very specific to a narrow subject) and a few people use very little (deserves attention). This helps understand collection use and informs collection management.

Using Low Code to Automate Public Service Workflows: Three Cases<https://journal.code4lib.org/articles/16096>

Dianna Morganti and Jess Williams

Public service librarians without coding experience or technical education may not always be aware of or consider automation to be an option to streamline their regular work tasks, but the new prevalence of enterprise-level low code solutions allows novices to take advantage of technology to make their work more efficient and effective. Low code applications apply a graphic user interface on top of a coding platform to make it easy for novices to leverage automation at work. This paper presents three cases of using low code solutions for automating public service problems using the prevalent Microsoft Power Automate application, available in many library workplaces that use the Microsoft Office ecosystem. From simplifying the communication and scheduling process for instruction classes to connecting our student workers’ hourly floor counts to our administrators’ dashboard of building occupancy, we’ve leveraged simple low code automation in a scalable and replicable manner. Pseudo-code examples provided.

An XML-Based Migration from Digital Commons to Open Journal Systems<https://journal.code4lib.org/articles/15988>

Cara M. Key

The Oregon Library Association has produced its peer-reviewed journal, the OLA Quarterly (OLAQ), since 1995, and OLAQ was published in Digital Commons beginning in 2014. When the host institution undertook to move away from Bepress, their new repository solution was no longer a good match for OLAQ. Oregon State University and University of Oregon agreed to move the journal into their joint instance of Open Journal Systems (OJS), and a small team from OSU Libraries carried out the migration project. The OSU project team declined to use PKP’s existing migration plugin for a number of reasons, instead pursuing a metadata-centered migration pipeline from Digital Commons to OJS. We used custom XSLT to convert tabular data exported from Bepress into PKP’s Native XML schema, which we imported using the OJS Native XML Plugin. This approach provided a high degree of control over the journal’s metadata and a robust ability to test and make adjustments along the way. The article discusses the development of the transformation stylesheet, the metadata mapping and cleanup work involved, as well as advantages and limitations of using this migration strategy.

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

April 2024
March 2024
February 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
July 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
August 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003

ATOM RSS1 RSS2



LISTS.CLIR.ORG

CataList Email List Search Powered by the LISTSERV Email List Manager