Print

Print


All,

If you have wrestled with copyright, licensing, or ethical questions when compiling corpora for text data mining or other applications, or if you work with researchers who do, you might be interested in a new NEH-funded DH Institute on Legal Literacies for TDM, hosted at UC Berkeley. The Institute is open to DH researchers and professionals. Participation in the 4-day Institute comes with a stipend to cover travel and living expenses. The application deadline is December 20. We hope you will join us!

Building Legal Literacies for Text Data Mining
June 23-26, 2020
UC Berkeley
Application Deadline: December 20th

Please join us June 23-26, 2020 to gain the skills you need for navigating law, policy, ethics, and risk in digital humanities text and data mining projects. 

What will the Institute cover?
If you attend the Institute, you can expect to learn about how the following law and policy matters pertain to text data mining research:
In general, the Institute will teach foundational skills to help digital humanities researchers and professionals:
Why Is This Important?
Until now, humanities researchers conducting text data mining have had to navigate a thicket of legal issues without much guidance or assistance. For instance, imagine researchers need to scrape content about Egyptian artifacts from online databases in order to conduct automated analysis. And then imagine the researchers also want to share these content-rich data sets with others to encourage research reproducibility or enable other researchers to query the data sets with new questions. This kind of work can raise issues of copyright, contract, and privacy law, not to mention ethics if there are issues of, say, indigenous knowledge or cultural heritage materials plausibly at risk. Indeed, in a recent study of humanities scholars’ text analysis needs, participants noted that access to and use of copyright-protected texts was a “frequent obstacle” in their ability to select appropriate texts for text data mining.

Potential legal hurdles do not just deter text data mining research; they also bias it toward particular topics and sources of data. In response to confusion over copyright, website terms of use, and other perceived legal roadblocks, some digital humanities researchers have gravitated to low-friction research questions and texts to avoid decision-making about rights-protected data. They use texts that have entered into the public domain or use materials that have been flexibly licensed through initiatives such as Creative Commons or Open Data Commons. When researchers limit their research to such sources, it is inevitably skewed, leaving important questions unanswered, and rendering resulting findings less broadly applicable. A growing body of research also demonstrates how race, gender, and other biases found in openly available texts have contributed to and exacerbated bias in developing artificial intelligence tools. 

Building Legal Literacies for Text Data Mining (“Building LLTDM”) is an Institute for Advanced Topics in the Digital Humanities, and has been made possible by a grant from the National Endowment for the Humanities.

On behalf of our project team,
Stacy Reardon


Stacy Reardon
Literatures and Digital Humanities Librarian
she/her
438 Doe Library | University of California, Berkeley | Berkeley, CA 94720


to manage your DLF-ANNOUNCE subscription, visit diglib.org/announce