Eric,
I think loading them into a triplestore and trying to answer questions is a fine idea. From there, you might be able to create some visualizations, if you have those skills at hand. This also strikes me as the sort of data that could augment data in a research profiling system.
If you'd like to see an example of what others have done with harvested linked data, check out CTSASearch: http://research.icts.uiowa.edu/polyglot/
Marijane White, M.S.L.I.S.
Data Librarian, Assistant Professor
Oregon Health & Science University Library
Phone: 503.494.3484
Email: [log in to unmask]
ORCiD: https://orcid.org/0000-0001-5059-4132
On 2019/01/15, 6:38 AM, "Code for Libraries on behalf of Eric Lease Morgan" <[log in to unmask] on behalf of [log in to unmask]> wrote:
How might I exploit & learn from a set of RDF files harvested from DOI's?
For a good time, I have written a suite of software to harvest bibliographic data from Web of Science, cache the results, and report on the whole. [1] Along the way I programmatically collect DOI's and then resolve them. The results include RDF streams. ("Thanks, Kevin Ford!") For example:
curl -i -L -H "Accept: application/rdf+xml" http://dx.doi.org/10.3352/jeehp.2013.10.3
And:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="http://purl.org/dc/terms/"
xmlns:j.1="http://prismstandard.org/namespaces/basic/2.1/"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:j.2="http://purl.org/ontology/bibo/"
xmlns:j.3="http://xmlns.com/foaf/0.1/">
<rdf:Description rdf:about="http://dx.doi.org/10.3352/jeehp.2013.10.3">
<j.0:isPartOf>
<j.2:Journal rdf:about="http://id.crossref.org/issn/1975-5937">
<owl:sameAs>urn:issn:1975-5937</owl:sameAs>
<j.0:title>Journal of Educational Evaluation for Health Professions</j.0:title>
<j.1:issn>1975-5937</j.1:issn>
<j.2:issn>1975-5937</j.2:issn>
</j.2:Journal>
</j.0:isPartOf>
<j.0:creator>
<j.3:Person rdf:about="http://id.crossref.org/contributor/sun-huh-112veziy3vi1o">
<j.3:name>Sun Huh</j.3:name>
<j.3:familyName>Huh</j.3:familyName>
<j.3:givenName>Sun</j.3:givenName>
</j.3:Person>
</j.0:creator>
<j.0:title>Revision of the instructions to authors to require... </j.0:title>
<j.1:doi>10.3352/jeehp.2013.10.3</j.1:doi>
<j.0:date rdf:datatype="http://www.w3.org/2001/XMLSchema#date"
>2013-04-30</j.0:date>
<owl:sameAs rdf:resource="info:doi/10.3352/jeehp.2013.10.3"/>
<j.0:identifier>10.3352/jeehp.2013.10.3</j.0:identifier>
<j.2:volume>10</j.2:volume>
<j.2:pageStart>3</j.2:pageStart>
<j.1:startingPage>3</j.1:startingPage>
<j.0:publisher>XMLArchive</j.0:publisher>
<owl:sameAs rdf:resource="doi:10.3352/jeehp.2013.10.3"/>
<j.1:volume>10</j.1:volume>
<j.2:doi>10.3352/jeehp.2013.10.3</j.2:doi>
</rdf:Description>
</rdf:RDF>
That's a pretty rich RDF stream! [2]
As of right now, I have about 8000 of these streams representing publications of faculty here at my university. I can easily get 10's of thousands more. How might I take advantage of this data? How can I go beyond parsing the RDF with XPath, stuffing the results into a database, and applying SQL to the result? How can truly exploit the nature of the RDF and possibly manifest it as linked data?
To answer my own question, I might put the data into a triple store, and then try to answer questions such as: what authors are central, what journals are central, what authors are "related" to whom, etc.
What do you think?
[1] https://github.com/ericleasemorgan/api-taskforce
[2] And this rich data does not even take into account the cool, sometimes full text URLs/URIs found in the HTTP link header!
--
Eric Lease Morgan
Digital Initiatives Librarian, Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame
250E Hesburgh Library
Notre Dame, IN 46556
o: 574-631-8604
e: [log in to unmask]
w: cds.library.nd.edu
|