Is the Freebase data good enough for your purposes? It appears that it lists the most important characters first, but that may just be the order in which they were added. You may not be able to rely on that sequence.
A Tale of Two Cities: http://www.freebase.com/m/09c55p
Pride and Prejudice: http://www.freebase.com/m/060xy
-Josh
Joshua Gomez | Sr. Software Engineer
Getty Research Institute | Los Angeles, CA
(310) 440-7410
________________________________________
From: Code for Libraries <[log in to unmask]> on behalf of Joel Marchesoni <[log in to unmask]>
Sent: Tuesday, April 14, 2015 7:18 AM
To: Joshua Gomez; Code for Libraries
Subject: Re: [CODE4LIB] Protagonists
ISBNdb [1] was the closest thing I could find but is probably not filled out enough for what you're wanting to do. I also found RDF Book Mashup [2] but it's nowhere near as granular as you are talking and looks pretty much dead (no news since 2009).
I agree that this seems like it would fall to library workers to solve, or at the very least someone passionate about books. It is a little disappointing that I couldn't find the IMDB of the literary world. I think ISBNdb started out to be that but hasn't quite gotten there yet. Search results for "IMDB for books" mostly focused on the social aspects of IMDB and not the actual database part.
Reading the IMDB "origin story" [3], it started with a message much like yours on a usenet...
[1] http://isbndb.com/
[2] http://wifo5-03.informatik.uni-mannheim.de/bizer/bookmashup/
[3] http://en.wikipedia.org/wiki/Internet_Movie_Database#History
Joel Marchesoni
Tech Support Analyst
Hunter Library, Western Carolina University
http://library.wcu.edu/
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of davesgonechina
Sent: Monday, April 13, 2015 22:12
To: [log in to unmask]
Subject: [CODE4LIB] Protagonists
So I have this idea I'd like to do for a hobby project, but it requires finding a table that lists a classic novel, a Gutenberg.org link to an instance of that work (first listed, one with most downloads, whichever), the lead female character, and the lead male character (can be null). E.g.
Pride and Prejudice, http://www.gutenberg.org/ebooks/42671, Elizabeth Bennet, Mr. Darcy. Even leaving the Gutenberg part for another day, this has been really difficult to find.
I've had no success with Dbpedia/Wikidata since there's no real standardized format for novels, characters often are associated more strongly with films or video games than original works (Cheshire Cat), and when characters are listed they are neither prioritized nor link to a record that clearly states gender. And then there's how to select some sort of "Western Canon" list. ISBNs are nowhere to be found, nor any other identifier that might help to corral a fair chunk of results.
I looked at OCLC, but WorldCat Works is still an experiment and frankly looks like too much work to query for too little return even if it had good coverage. Amazon? Librarything? Goodreads? No luck yet.
I raise this partly because a) I would like to make some toys with that list, and b) I feel this is a good test case for "what developers might want" from library data, linked or otherwise. It is the sort of request that includes many unspoken assumptions (that there is a canon, and it is
well-defined) that app users, product managers, and developers typically want even if it is woefully incomplete or imperfect, so long as it matches expectations. While I appreciate what it takes to make such a list, I feel like this really ought to be a solved problem in the library space. Not "in the process of being solved, hopefully, by new emerging standards" solved, but like "we solved this ages ago, here ya go" solved.
I'm posting this basically in the hopes that someone will say "No, doofus, there's an easy way to do this, you just aren't very good at this - look:"
and show me where I'm wrong.
D
|