I declare this scriptable and doable, just not by me, since I can't programme me way out of a wet paper bag. (Well, I prolly can at gunpoint, but yeah, that's what it would take.)
> So I have this idea I'd like to do for a hobby project, but it requires
> finding a table that lists a classic novel,
First I'm afraid you'll have to define how you're choosing to categorise Classics. As someone charged with that task, it sucks and it's not as straightforward as one might think. I'd encourage you to either lewt and pillage someone else's preextant classification or pick summat easier for a computer, like publication date or inclusion on one of those stuffy arse bibliographies of the 100 Greatest Books. (Please do mull over how white bespoke lists tend to be.)
> a Gutenberg.org link to an> instance of that work (first listed, one with most downloads, whichever),
> the lead female character, and the lead male character (can be null). E.g.
> Pride and Prejudice, http://www.gutenberg.org/ebooks/42671, Elizabeth
> Bennet, Mr. Darcy. Even leaving the Gutenberg part for another day, this
> has been really difficult to find.
Might I suggest having your scraper haphazardly search through 650 a fields for the phrase "Fictitious Character"?
> I've had no success with Dbpedia/Wikidata since there's no real
> standardized format for novels, characters often are associated more
> strongly with films or video games than original works (Cheshire Cat), and
> when characters are listed they are neither prioritized nor link to a
> record that clearly states gender.
Thanks to the antiquated subject stuff that happens at dear olde LOC, also picking through the data for "Women" should get you some gender data.
You might scoff, but I do like the lists of lists at Wikipedia in terms of this hypothetical. For instance:
could be quite helpful. One could have one's bot check on that page for edits. Surely this is easier to sort than reinventing the wheel and being one person against a sea of publishers.
Were I you, I'd also be keen to hook in Open Library since closed datakeepers have a nasty tendency of waking up and deciding to charge or lock things away.
> And then there's how to select some sort> of "Western Canon" list. ISBNs are nowhere to be found, nor any other
> identifier that might help to corral a fair chunk of results.
> I looked at OCLC, but WorldCat Works is still an experiment and frankly
> looks like too much work to query for too little return even if it had good
> coverage. Amazon? Librarything? Goodreads? No luck yet.
Did you try Novelist if you must try the proprietary DB route? I really think what you needs do is pick a good cataloguer's brain for a bit and come up with a brute force script that will harvest stuff for you and autoupdate on RSS or summat else since effort begins with eh. Your data set isn't infinite, it's just not small. I wouldn't even properly call it large given how unrich and less problematic text Library data is in comparison to say audio or video files.