LISTSERV 16.5 - CODE4LIB Archives

I have a small amount of money to work with and am looking for two people to help with extracting data from MARC records as described below. This is part of a larger project to develop a FRBR-based data store and discovery interface for moving images. Our previous work includes a consideration of the feasibility of the project from a cataloging perspective (http://www.olacinc.org/drupal/?q=node/27), a prototype end-user interface (https://blazing-sunset-24.heroku.com/, https://blazing-sunset-24.heroku.com/page/about) and a web form to crowdsource the parsing of movie credits (http://olac-annotator.org/#/about).
Planned work period: six months beginning around the second week of December (I can be somewhat flexible on the dates if you want to wait and start after the New Year)
Payment: flat sum of $2500 upon completion of the work

Required skills and knowledge:

* Familiarity with the MARC 21 bibliographic format
* Familiarity with Natural Language Processing concepts (or willingness to learn)
* Experience with Java, Python, and/or Ruby programming languages

Description of work: Use language and text processing tools and provided strategies to write code to extract and normalize data in existing MARC bibliographic records for moving images. Refine code based on feedback from analysis of results obtained with a sample dataset.

Data to be extracted:
Tasks for Position 1:
Titles (including the main title of the video, uniform titles, variant titles, series titles, television program titles and titles of contents)
Authors and titles of related works on which an adaptation is based
Duration
Color
Sound vs. silent
Tasks for Position 2:
Format (DVD, VHS, film, online, etc.)
Original language
Country of production
Aspect ratio
Flag for whether a record represents multiple works or not
We have already done some work with dates, names and roles and have a framework to work in. I have the basic logic for the data extraction processes, but expect to need some iteration to refine these strategies.

To apply please send me an email at kelleym@uoregon explaining why you are interested in this project, what relevant experience you would bring and any other reasons why I should hire you. If you have a preference for position 1 or 2, let me know (it's not necessary to have a preference). The deadline for applications is Monday, December 2, 2013. Let me know if you have any questions.

Thank you for your consideration.

Kelley

PS In the near future, I will also be looking for someone to help with work clustering based on title, name, date and identifier data from MARC records. This will not involve any direct interaction with MARC.

Kelley McGrath
Metadata Management Librarian
University of Oregon Libraries
541-346-8232
[log in to unmask]