Join us!

Workshop on Cyberinfrastructure and Machine Learning for Digital Libraries and Archives - June 3rd at JCDL 2018 - Final schedule

The workshop introduces a tryptic model that connects digital libraries and archives, cyberinfrastructure, and machine learning to stimulate research and implementation of automated methods to describe, represent, preserve, and facilitate access and reuse of large-scale scholarly data. Cyberinfrastructure refers to shared online research environments, backed up by advanced computing resources and supported by experts. Coupled with cyberinfrastructure, machine learning methods and tools can provide digital libraries and archives with powerful resources to enhance their ability to curate, organize, represent, and provide persistent access to large-scale collections, thus facilitating their discoverability and reuse.

The papers and activities address the combination of cyberinfrastructure and machine learning throughout the lifecycle of digital collections; from data management planning, requirements gathering, description, preservation, access and publication. Bring your laptop for a day of lectures by remarkable researchers and for exciting hands-on exercises.

9:00-9:30 Conference chairs: Welcome and overview of the workshop’s agenda.

9:30-10:00 Matthew McEniry, Jessica Trelogan, and Santi Thompson: Expanding Library Capacity and Facilitating Reuse through a Consortial Data Repository

10:00-10:30 Coffee break.

10:30-11:00 Tanya Clement, Jon Dunn, Juliet Hardesty, Chris Lacinak and Amy Rudersdorf, Audiovisual Metadata Platform Planning Project.

11:00-11:30 Will Thomas, Benjamin Galewsky, Gregory Jansen, Sandeep Satheesan, Richard Marciano, Shannon Bradley, Jong Lee, Luigi Marini and Kenton McHenry, Petabytes in Practice: Working with Collections as Data at Scale.

11:30- 12:00 Maria Esteva, Hands-on tutorial: A Method for Modeling Large-scale Data Requirements to Cyberinfrastructure and Machine Learning.

After introducing how data modelling was used in the design of the Digital Rocks Portal (https://www.digitalrocksportal.org/), attendees working in multi-disciplinary groups will model large-scale data use cases including analysis, curation, access and publication functions and will map those to cyberinfrastructure. You are welcomed to share large scale data curation and analysis cases to discuss and resolve during the workshop.

12:00-1:30 Lunch Break

1:30 -2:00 Matt Lease: What can Machine Learning and Crowdsourcing do for you? Exploring New Tools for Scalable Data Processing.

2:00-2:30 Sachith Withana, Inna Kouper, and Beth Plale, Data Capsule Appliance for Restricted Data in Libraries.

2:30-3:00 Ruizhu Huang, Hands-on tutorial: Machine Learning on Cyberinfrastructure

Attendees will get chance to learn how to log on to a supercomputer and start an interactive sessions using a big data cluster to explore how it can be used for a machine learning project.

3:00-3:30 Coffee Break

3:30-4:00 Dan Wu and Shaobo Liang, Predicting Library OPAC Users’ Cross-device Transitions.

4:00-4:30 Amit Gupta, Pankaj Jaiswal, Crispin Taylor, and Weijia Xu, Improve Accessibility of Biology Papers through Integration of Domain Information Extraction in the Publication Pipeline.

4:30-5:00 Closing Discussion