May I ask a side question and make a side observation regarding the
harvesting of full text of the object to which a OAI-PMH record refers?
In general, is the idea to use the <dc:source>/text() element, treat it as
a URL, and then expect to find the object there (provided that there was a
suitable <dc:type> and <dc:format> element)?
Example: http://scholar.lib.vt.edu/theses/OAI/cgi-bin/index.pl allows the
harvesting of ETD metadata. Yet, its metadata reads:
<ListRecords>
....
<metadata>
<dc>
<type>text</type>
<format>application/pdf</format>
<source>
http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/</source>
....
When one visits
http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/ however
there is no 'text' document of type 'application/pdf' - rather, it's an
HTML title page that embeds links to one or more PDF documents, such as
http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/unrestricted/Walker_1.pdfto
Walker_5.pdf.
Is VT's ETD OAI implementation deficient, or is OAI-PMH simply not set up
to allow the harvesting of full-text without what would basically amount to
crawling the ETD title page, or other repository-specific mechanisms?
On a related note, regarding rights. As a faculty member, I regularly sign
ETD approval forms. At Tech, students have three options to choose from:
(a) open and immediate access, (b) restricted to VT for 1 year, (c)
withhold access completely for 1 year for patent/security purposes. The
current form does not allow student authors to address whether the
full-text of their dissertation may be harvested for the purposes of
full-text indexing in such indexes as Google or Summon, not does it allow
them to restrict where copies are served from. Similarly, the dc:rights
section in the OAI-PMH records address copyright only. In practice, Google
crawls, indexes, and serves full-text copies of our dissertations.
- Godmar
|