Cool set of questions! Here's a funny "cheat" -- what about querying Amazon or the like for a list of "Cliff's Notes" and call the subjects of the Cliff's Notes "the Canon"? That could serve as a the canon list. Another idea would be to consult a reference work, but I can't think of a good source offhand. One example that's not perfect is the "Dictionary of Literary Biography." The Canon is created by what is included in the reference work. As for finding lead character names, that's something I don't have an immediate answer for. Good luck! Best, Lisa ------------------------------------- Elizabeth "Lisa" McAulay Librarian for Digital Collection Development UCLA Digital Library Program http://digital.library.ucla.edu/ email: emcaulay [at] library.ucla.edu ________________________________________ From: Code for Libraries <[log in to unmask]> on behalf of davesgonechina <[log in to unmask]> Sent: Monday, April 13, 2015 7:12 PM To: [log in to unmask] Subject: [CODE4LIB] Protagonists So I have this idea I'd like to do for a hobby project, but it requires finding a table that lists a classic novel, a Gutenberg.org link to an instance of that work (first listed, one with most downloads, whichever), the lead female character, and the lead male character (can be null). E.g. Pride and Prejudice, http://www.gutenberg.org/ebooks/42671, Elizabeth Bennet, Mr. Darcy. Even leaving the Gutenberg part for another day, this has been really difficult to find. I've had no success with Dbpedia/Wikidata since there's no real standardized format for novels, characters often are associated more strongly with films or video games than original works (Cheshire Cat), and when characters are listed they are neither prioritized nor link to a record that clearly states gender. And then there's how to select some sort of "Western Canon" list. ISBNs are nowhere to be found, nor any other identifier that might help to corral a fair chunk of results. I looked at OCLC, but WorldCat Works is still an experiment and frankly looks like too much work to query for too little return even if it had good coverage. Amazon? Librarything? Goodreads? No luck yet. I raise this partly because a) I would like to make some toys with that list, and b) I feel this is a good test case for "what developers might want" from library data, linked or otherwise. It is the sort of request that includes many unspoken assumptions (that there is a canon, and it is well-defined) that app users, product managers, and developers typically want even if it is woefully incomplete or imperfect, so long as it matches expectations. While I appreciate what it takes to make such a list, I feel like this really ought to be a solved problem in the library space. Not "in the process of being solved, hopefully, by new emerging standards" solved, but like "we solved this ages ago, here ya go" solved. I'm posting this basically in the hopes that someone will say "No, doofus, there's an easy way to do this, you just aren't very good at this - look:" and show me where I'm wrong. D