Hi Kelley, 

Thanks for posting this. When I began work on I was hoping it would encourage people to post short term contracts. The thought being that it may be easier for some institutions to find money for projects than full-time staff, and it could encourage more open source collaboration between organizations, similar to what the Hydra Project are doing.

So, I added your post to [1]. Ordinarily the person who publishes a job posting is the only one who can edit it. But if you would like to make any changes to it please let me know and Iíll make you the editor.

Incidentally I was curious about your decision to hire two programmers to do what appears to be a very similar task. Was your intent to have two implementations to compare to see which you liked better? Were the two developers supposed to work together or separately?



On Nov 11, 2013, at 10:58 PM, Kelley McGrath <[log in to unmask]> wrote:

> I have a small amount of money to work with and am looking for two people to help with extracting data from MARC records as described below. This is part of a larger project to develop a FRBR-based data store and discovery interface for moving images. Our previous work includes a consideration of the feasibility of the project from a cataloging perspective (, a prototype end-user interface (, and a web form to crowdsource the parsing of movie credits (
> Planned work period: six months beginning around the second week of December (I can be somewhat flexible on the dates if you want to wait and start after the New Year)
> Payment: flat sum of $2500 upon completion of the work
> Required skills and knowledge:
>  *   Familiarity with the MARC 21 bibliographic format
>  *   Familiarity with Natural Language Processing concepts (or willingness to learn)
>  *   Experience with Java, Python, and/or Ruby programming languages
> Description of work: Use language and text processing tools and provided strategies to write code to extract and normalize data in existing MARC bibliographic records for moving images. Refine code based on feedback from analysis of results obtained with a sample dataset.
> Data to be extracted:
> Tasks for Position 1:
> Titles (including the main title of the video, uniform titles, variant titles, series titles, television program titles and titles of contents)
> Authors and titles of related works on which an adaptation is based
> Duration
> Color
> Sound vs. silent
> Tasks for Position 2:
> Format (DVD, VHS, film, online, etc.)
> Original language
> Country of production
> Aspect ratio
> Flag for whether a record represents multiple works or not
> We have already done some work with dates, names and roles and have a framework to work in. I have the basic logic for the data extraction processes, but expect to need some iteration to refine these strategies.
> To apply please send me an email at kelleym@uoregon explaining why you are interested in this project, what relevant experience you would bring and any other reasons why I should hire you. If you have a preference for position 1 or 2, let me know (it's not necessary to have a preference). The deadline for applications is Monday, December 2, 2013. Let me know if you have any questions.
> Thank you for your consideration.
> Kelley
> PS In the near future, I will also be looking for someone to help with work clustering based on title, name, date and identifier data from MARC records. This will not involve any direct interaction with MARC.
> Kelley McGrath
> Metadata Management Librarian
> University of Oregon Libraries
> 541-346-8232
> [log in to unmask]