On Feb 26, 2012, at 9:42 AM, Godmar Back wrote:
> May I ask a side question and make a side observation regarding the
> harvesting of full text of the object to which a OAI-PMH record refers?
>
> In general, is the idea to use the <dc:source>/text() element, treat it as
> a URL, and then expect to find the object there (provided that there was a
> suitable <dc:type> and <dc:format> element)?
>
> Example: http://scholar.lib.vt.edu/theses/OAI/cgi-bin/index.pl allows the
> harvesting of ETD metadata. Yet, its metadata reads:
>
> <ListRecords>
> ....
> <metadata>
> <dc>
> <type>text</type>
> <format>application/pdf</format>
> <source>
> http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/</source>
> ....
>
>
> When one visits
> http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/ however
> there is no 'text' document of type 'application/pdf' - rather, it's an
> HTML title page that embeds links to one or more PDF documents, such as
> http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/unrestricted/Walker_1.pdfto
> Walker_5.pdf.
>
> Is VT's ETD OAI implementation deficient, or is OAI-PMH simply not set up
> to allow the harvesting of full-text without what would basically amount to
> crawling the ETD title page, or other repository-specific mechanisms?
I don't know if it's the official method, and I've never actually
implemented OAI-PMH myself, but I'd be inclined to have <source>
point to an OAI-ORE document, which can then point to the PDF,
full text, or whatever else.
If it's not currently an ORE document, you might still be able to
do some creative redirection on the webserver if you see the
appropriate Accept header and handling it as you would normal
content negotiation
You could also add a 'resourcemap' <link> element in the HTML
page to point to the ORE document. If it's XHTML, you could
add the appropriate ORE elements; I think the microformat
style HTML was deprecated, as it's not mentioned in the 1.0 spec:
http://www.openarchives.org/ore/1.0/
-Joe
|