Early Modern Data Curation Fellow
Carnegie Mellon University
Carnegie Mellon University's Department of English and University Libraries
jointly seek an Early Modern Data Curation Fellow to lead data curation
activities for the Six Degrees of Francis Bacon (SDFB) project, a digital
reconstruction of the early modern social network that scholars and students
can collaboratively expand, revise, curate, and critique. The fellow will
leverage expertise in early modern studies along with technical aptitude in
order to contribute meaningfully to a rich data lifecycle, including
collecting, processing, textmining, analyzing, and archiving data related to
the early modern social network.
The Carnegie Mellon Dept. of English is a growing hub for Early Modern
Studies, supporting not only SDFB but also the Pittsburgh Consortium of
Medieval and Renaissance Studies (PCMRS;
http://www.medren.org/). PCMRS, which was founded by the Department of English
at Carnegie Mellon, draws members from numerous disciplines (art history,
English literature and drama, history, music, philosophy, religious studies,
Romance and European Languages & Literatures) and a range of institutions
(Carnegie Mellon, Duquesne, the University of Pittsburgh, West Virginia
University, Chatham College and Slippery Rock). It
coordinates activities among the different campuses in the region and sponsors
an active speaker series.
Carnegie Mellon University Libraries has worked actively in partnership with
the University's top-ranked School of Computer Science (SCS) to achieve a
digital future for libraries. The successes of CMU
Libraries' Universal Library and Million Book Projects, acknowledged
inspirations for Google Books, are now being followed by work on equally
challenging initiatives including the Olive project
(https://olivearchive.org/) archiving "born digital" executable content such
as digital humanities projects, software, scientific models, and games under
the direction of PIs Mahadev Satyanarayanan and Gloriana St. Clair.
Six Degrees of Francis Bacon (SDFB;
http://sixdegreesoffrancisbacon.com/) is a collaborative, multidisciplinary
digital humanities project with wide utility for several subfields in early
modern studies. Historians, literary critics,
musicologists, art historians, and others have long studied the way that early
modern people associated with each other and participated in various kinds of
formal and informal groups. Yet their scholarship, published in countless
books and articles, is scattered and unsynthesized. By data-mining existing
scholarship and documents that describe relationships between early modern
persons, SDFB has created a unified, systematized representation of the way
people in early modern Britain were connected.
In SDFB's start-up stage, which has been supported by two Google Faculty
Research Awards, team members Christopher Warren, Daniel Shore, Cosma Shalizi,
Michael Finegold, and Lawrence Wang have mined a single source, the Oxford
Dictionary of National Biography (ODNB), to produce a preliminary data set of
10,000 individuals and to infer, with confidence estimates, a map of the
associations between them. Initial results, available
at http://sixdegreesoffrancisbacon.com/ and
http://www.viewur.com/sixdegrees/ (BETA) already make it possible to visualize
and understand the early modern social network in exciting new ways.
The Early Modern Data Curation fellow will bring his/her understanding of key
research and pedagogical questions in Early Modern Studies to enhance the
project's impact on the field in four specific ways. The Fellow will be
centrally involved in collaborative efforts (a.) documenting, archiving, and
refining existing data sets and workflows; (b.) curating the dynamic
crowdsourcing interface where users validate and annotate existing data; (c.)
coordinating with major text repositories including Google Books, Hathi Trust
Research Center, and the Institute for Historical Research, to develop
workflows, corpora, and data sets; and (d.) communicating findings in multi-
author and single-author publications.
The Fellow will be housed in the English Department in the Dietrich College of
Arts and Social Sciences but work jointly across English and the University
Libraries. Within the Department of English, the Fellow will work with Asst.
Prof. Christopher Warren, Principal Investigator of the Six Degrees of Francis
Bacon project. Within CMUL, s/he will be supervised by Gabrielle V. Michalek,
Principal Archivist and Head of Archives and Digital Library Initiatives. Data
description and metadata capture will be overseen by Gabrielle Michalek and
Data Services Librarian Steve van Tuyl. Day-to-day work
will involve a disciplinarily diverse and geographically disparate team of
SDFB collaborators, including literary historians, librarians, statisticians,
network scientists, and software developers.
Within CMU's larger academic community, the CLIR early modern data curation
Fellow will work alongside world leaders in digital strategies, data
initiatives, innovations, and design. The Fellow will be
encouraged to draw formally and informally from researchers and events across
the CMU Campus, including those associated with CMU's Language Technologies
Institute, a leader in text mining and the initial of home of the reCAPTCHA
system; the Human Computer Interaction Institute, a leader in research related
to computer technology in support of human activity and society supporting
labs like ProtoLab, which conducts research on social computing and design,
and Social Computing, which conducts research in the design of online
communities); and the School of Design, one of the most respected programs in
the country, with strengths in communication and interaction
design. The Fellow will have ample opportunities to
participate in events sponsored by the Pittsburgh Consortium of Medieval and
Renaissance Studies and the Medieval and Renaissance Studies Program at the
University of Pittsburgh, a short five minute walk from the CMU campus.
The successful candidate will likely have a PhD in Early Modern English,
History, or Library and Information Science with demonstrated research
strengths in historicist approaches, digital humanities, book history, and/or
early modern networks broadly conceived. The ideal candidate will have a
strong technical aptitude and be willing to learn or apply skills in data
identification, data preparation, data ingest, and metadata generation.
* Develop and implement workflows for Six Degrees of Francis Bacon datasets including data identification, data preparation, data ingest, and metadata generation.
* Curate dynamic crowdsourcing interface where users validate and annotate existing data.
* Coordinate with major text repositories including Google Books, Hathi Trust Research Center, and the Institute for Historical Research to develop research workflows, corpora, and data sets.
* Communicate methodologies and research findings in multi-author and single- author publications.
Required Knowledge and Skills
* A PhD in a relevant subfield of early modern studies or Library and Information
* Sciences with demonstrated expertise in early modern studies.
* Demonstrated ability to work collaboratively and successfully in a team-based environment.
* Demonstrated willingness to learn and implement key standards in data curation
* Active research in early modern studies, preferably animated by historicist methodologies, book history, and/or network studies.
* Excellent verbal and written communication skills.
* Desired Knowledge and Skills
Ability to blend early modern expertise with technical expertise in prevailing
standards and best practices in the development of early modern data
Familiarity with or demonstrated capacity to work within the HTRC research
Proficiency in a current programming language such as Python and/or background
in any of the following: Stanford CoreNLP, Gephi, SEASR, XML, R, Neo4j, Dublin
Brought to you by code4lib jobs: http://jobs.code4lib.org/job/11174/