Cool set of questions! Here's a funny "cheat" -- what about querying Amazon or the like for a list of "Cliff's Notes" and call the subjects of the Cliff's Notes "the Canon"? That could serve as a the canon list. Another idea would be to consult a reference work, but I can't think of a good source offhand. One example that's not perfect is the "Dictionary of Literary Biography." The Canon is created by what is included in the reference work.
As for finding lead character names, that's something I don't have an immediate answer for.
Elizabeth "Lisa" McAulay
Librarian for Digital Collection Development
UCLA Digital Library Program
email: emcaulay [at] library.ucla.edu
From: Code for Libraries <[log in to unmask]> on behalf of davesgonechina <[log in to unmask]>
Sent: Monday, April 13, 2015 7:12 PM
To: [log in to unmask]
Subject: [CODE4LIB] Protagonists
So I have this idea I'd like to do for a hobby project, but it requires
finding a table that lists a classic novel, a Gutenberg.org link to an
instance of that work (first listed, one with most downloads, whichever),
the lead female character, and the lead male character (can be null). E.g.
Pride and Prejudice, http://www.gutenberg.org/ebooks/42671, Elizabeth
Bennet, Mr. Darcy. Even leaving the Gutenberg part for another day, this
has been really difficult to find.
I've had no success with Dbpedia/Wikidata since there's no real
standardized format for novels, characters often are associated more
strongly with films or video games than original works (Cheshire Cat), and
when characters are listed they are neither prioritized nor link to a
record that clearly states gender. And then there's how to select some sort
of "Western Canon" list. ISBNs are nowhere to be found, nor any other
identifier that might help to corral a fair chunk of results.
I looked at OCLC, but WorldCat Works is still an experiment and frankly
looks like too much work to query for too little return even if it had good
coverage. Amazon? Librarything? Goodreads? No luck yet.
I raise this partly because a) I would like to make some toys with that
list, and b) I feel this is a good test case for "what developers might
want" from library data, linked or otherwise. It is the sort of request
that includes many unspoken assumptions (that there is a canon, and it is
well-defined) that app users, product managers, and developers typically
want even if it is woefully incomplete or imperfect, so long as it matches
expectations. While I appreciate what it takes to make such a list, I feel
like this really ought to be a solved problem in the library space. Not "in
the process of being solved, hopefully, by new emerging standards" solved,
but like "we solved this ages ago, here ya go" solved.
I'm posting this basically in the hopes that someone will say "No, doofus,
there's an easy way to do this, you just aren't very good at this - look:"
and show me where I'm wrong.