Is it out of the question to extract technical metadata from the
audiovisual materials themselves (via MediaInfo et al)? It would minimize
the amount of MARC that needs to be processed and give more
accurate/complete data than relying on old cataloging records.
On Mon, Dec 2, 2013 at 12:37 AM, Kelley McGrath <[log in to unmask]> wrote:
> I wanted to follow up on my previous post with a couple points.
> 1. This is probably too late for anybody thinking about applying, but I
> thought there may be some general interest. I have put up some more
> detailed specifications about what I am hoping to do at
> http://pages.uoregon.edu/kelleym/miw/. Data extraction overview.doc is
> the general overview and the other files contain supporting documents.
> 2. I replied some time ago to Heather's offer below about her website that
> will connect researchers with volunteer software developers. I have to
> admit that looking for volunteer software developers had not really
> occurred to me. However, I do have additional things that I would like to
> do for which I currently have no funding so if you would be interested in
> volunteering in the future, let me know.
> [log in to unmask]
> On Tue, Nov 12, 2013 at 6:33 PM, Heather Claxton <[log in to unmask]
> <mailto:[log in to unmask]>> wrote:
> Hi Kelley,
> I might be able to help in your search. I'm in the process of starting a
> website that connects academic researchers with volunteer software
> developers. I'm looking for people to post programming projects on the
> website once it's launched in late January. I realize that may be a
> little late for you, but perhaps the project you mentioned in your PS
> ("clustering based on title, name, date ect.") would be perfect? The
> one caveat is that the website is targeting software developers who wish to
> volunteer. Anyway, if you're interested in posting, please send me an
> e-mail at [log in to unmask]<mailto:[log in to unmask]>
> I would greatly appreciate it.
> Oh and of course it would be free to post :) Best of luck in your
> hiring process,
> Heather Claxton-Douglas
> On Mon, Nov 11, 2013 at 9:58 PM, Kelley McGrath <[log in to unmask]
> <mailto:[log in to unmask]>> wrote:
> > I have a small amount of money to work with and am looking for two people
> > to help with extracting data from MARC records as described below. This
> > part of a larger project to develop a FRBR-based data store and discovery
> > interface for moving images. Our previous work includes a consideration
> > the feasibility of the project from a cataloging perspective (
> > http://www.olacinc.org/drupal/?q=node/27), a prototype end-user
> > (https://blazing-sunset-24.heroku.com/,
> > https://blazing-sunset-24.heroku.com/page/about) and a web form to
> > crowdsource the parsing of movie credits (
> > http://olac-annotator.org/#/about).
> > Planned work period: six months beginning around the second week of
> > December (I can be somewhat flexible on the dates if you want to wait and
> > start after the New Year)
> > Payment: flat sum of $2500 upon completion of the work
> > Required skills and knowledge:
> > * Familiarity with the MARC 21 bibliographic format
> > * Familiarity with Natural Language Processing concepts (or
> > willingness to learn)
> > * Experience with Java, Python, and/or Ruby programming languages
> > Description of work: Use language and text processing tools and provided
> > strategies to write code to extract and normalize data in existing MARC
> > bibliographic records for moving images. Refine code based on feedback
> > analysis of results obtained with a sample dataset.
> > Data to be extracted:
> > Tasks for Position 1:
> > Titles (including the main title of the video, uniform titles, variant
> > titles, series titles, television program titles and titles of contents)
> > Authors and titles of related works on which an adaptation is based
> > Duration
> > Color
> > Sound vs. silent
> > Tasks for Position 2:
> > Format (DVD, VHS, film, online, etc.)
> > Original language
> > Country of production
> > Aspect ratio
> > Flag for whether a record represents multiple works or not
> > We have already done some work with dates, names and roles and have a
> > framework to work in. I have the basic logic for the data extraction
> > processes, but expect to need some iteration to refine these strategies.
> > To apply please send me an email at kelleym@uoregon explaining why you
> > are interested in this project, what relevant experience you would bring
> > and any other reasons why I should hire you. If you have a preference for
> > position 1 or 2, let me know (it's not necessary to have a preference).
> > deadline for applications is Monday, December 2, 2013. Let me know if you
> > have any questions.
> > Thank you for your consideration.
> > Kelley
> > PS In the near future, I will also be looking for someone to help with
> > work clustering based on title, name, date and identifier data from MARC
> > records. This will not involve any direct interaction with MARC.
> > Kelley McGrath
> > Metadata Management Librarian
> > University of Oregon Libraries
> > 541-346-8232<tel:541-346-8232>
> > [log in to unmask]<mailto:[log in to unmask]>