LISTSERV 16.5 - CODE4LIB Archives

Hi,
I’m Laurie Allen from the libraries at Penn that were mentioned in the news coverage yesterday. I’m working with a group led by the Penn Program in Environmental Humanities to create collaborative and distributed networks for backing up government datasets that are vulnerable to removal during the transition. We’ve been moving really quickly as the news coverage got a bit ahead of our capacity to coordinate and document, but we’re really eager for help and participation. And I, personally, am really hopeful for help from the code4lib community. I think this is the skillset we most need!

We’re updating our site http://www.ppehlab.org/datarefuge with the basic outline of the project.

As you likely know, government datasets are made available on the web in a huge range of ways. Some can be harvested with targeted crawls. For those, we are recommending that people contribute to the End Of Term Harvest Seed list, and join that effort.
digital2.library.unt.edu/nomination/eth2016/<http://digital2.library.unt.edu/nomination/eth2016/>

Then, there are the variety of datasets, apps, databases, etc that aren’t capturable by web crawling. So, we’re basically tackling those in two ways. One is by following the lead of colleagues at University of Toronto<https://technoscienceunit.wordpress.com/2016/12/04/guerrilla-archiving-event-saving-environmental-data-from-trump/> and encouraging folks to set up data rescue events, or hackathons where developers and scientists can come together to rescue particular datasets. The other is by maintaining a spreadsheet<https://docs.google.com/spreadsheets/d/12-__RqTqQxuxHNOln3H5ciVztsDMJcZ2SVs1BrfqYCc/edit?usp=sharing> that Eric Holthaus and others started, but asked us to take over. It’s a little behind as the news coverage slammed it, but we’re combining it with a form for inviting people to claim datasets and nominate new ones<https://docs.google.com/spreadsheets/d/1GGoZY15w9rEktMTR-dW-TuvvgdUjj_PHI5HxajPfDWs/edit#gid=1233435770> (in addition to offering other kinds of help.) We’re a little behind in moving from the form to the spreadsheet, but we’ll get to that today.

Over the next couple of days, we’re hoping to get a better sense of what kinds of institutional resources we can bring from Penn. That will help, I think. But, I think folks from this community might be really helpful, especially if any of you have expertise for web harvesting. Or documentation writing! The things we really need right now are help in creating some documentation of the advice we want to give, or that people are asking for. (My beginning list of questions we keep getting/facing is here<https://docs.google.com/document/d/1bgyL-E_cE9ymDd-6CptADVZV2ZZf-ITY8DjuArPEFOc/edit?usp=sharing>)If you’d like to help with that, or with anything, I’m happy to invite you to the slack group at datamirror.slack.com.

Thanks very much. We’ve had a huge outpouring of interest from lots of companies (including the one on everyone’s mind when it comes to storing giant data) about offering free storage space and greasing the wheels to make it easier to get data in. I think with continued coordination, we can do a huge amount.

Thanks much.
Laurie

--
Assistant Director for Digital Scholarship
University of Pennsylvania Libraries
Room 122
3420 Walnut St
Philadelphia PA 19104
215-746-2662
@librlaurie