Hmm. So, I'm a big fan of WikiPedia and would still go that way even
if the data can be haphazard. WikiPedia has a lot of classics with a
section called "Lead characters" (Pride and Prejudice included) where
the focus is the novel first, which should be easy to call and then
trim with some simple text parsing to get basic characterizations,
like gender, possibly age, place and purpose to the story (main
protagonist, antagonist, support character, etc.)
I'd start with a page like "Le Monde's 100 Books of the Century"
and give each of them a visit, scraping for "main characters" or
"characters" headings, and devise a small set of parsing rules to grab
the top ones and their properties. Sounds like a fun day or so.
On Tue, Apr 14, 2015 at 3:35 PM, McAulay, Elizabeth
<[log in to unmask]> wrote:
> Cool set of questions! Here's a funny "cheat" -- what about querying Amazon or the like for a list of "Cliff's Notes" and call the subjects of the Cliff's Notes "the Canon"? That could serve as a the canon list. Another idea would be to consult a reference work, but I can't think of a good source offhand. One example that's not perfect is the "Dictionary of Literary Biography." The Canon is created by what is included in the reference work.
> As for finding lead character names, that's something I don't have an immediate answer for.
> Good luck!
> Elizabeth "Lisa" McAulay
> Librarian for Digital Collection Development
> UCLA Digital Library Program
> email: emcaulay [at] library.ucla.edu
> From: Code for Libraries <[log in to unmask]> on behalf of davesgonechina <[log in to unmask]>
> Sent: Monday, April 13, 2015 7:12 PM
> To: [log in to unmask]
> Subject: [CODE4LIB] Protagonists
> So I have this idea I'd like to do for a hobby project, but it requires
> finding a table that lists a classic novel, a Gutenberg.org link to an
> instance of that work (first listed, one with most downloads, whichever),
> the lead female character, and the lead male character (can be null). E.g.
> Pride and Prejudice, http://www.gutenberg.org/ebooks/42671, Elizabeth
> Bennet, Mr. Darcy. Even leaving the Gutenberg part for another day, this
> has been really difficult to find.
> I've had no success with Dbpedia/Wikidata since there's no real
> standardized format for novels, characters often are associated more
> strongly with films or video games than original works (Cheshire Cat), and
> when characters are listed they are neither prioritized nor link to a
> record that clearly states gender. And then there's how to select some sort
> of "Western Canon" list. ISBNs are nowhere to be found, nor any other
> identifier that might help to corral a fair chunk of results.
> I looked at OCLC, but WorldCat Works is still an experiment and frankly
> looks like too much work to query for too little return even if it had good
> coverage. Amazon? Librarything? Goodreads? No luck yet.
> I raise this partly because a) I would like to make some toys with that
> list, and b) I feel this is a good test case for "what developers might
> want" from library data, linked or otherwise. It is the sort of request
> that includes many unspoken assumptions (that there is a canon, and it is
> well-defined) that app users, product managers, and developers typically
> want even if it is woefully incomplete or imperfect, so long as it matches
> expectations. While I appreciate what it takes to make such a list, I feel
> like this really ought to be a solved problem in the library space. Not "in
> the process of being solved, hopefully, by new emerging standards" solved,
> but like "we solved this ages ago, here ya go" solved.
> I'm posting this basically in the hopes that someone will say "No, doofus,
> there's an easy way to do this, you just aren't very good at this - look:"
> and show me where I'm wrong.
Project Wrangler, SOA, Info Alchemist, UX, RESTafarian, Topic Maps
http://shelter.nu/blog | google.com/+AlexanderJohannesen
http://xsiteable.org | http://www.linkedin.com/in/shelterit