LISTSERV 16.5 - CODE4LIB Archives

May I ask a side question and make a side observation regarding the
harvesting of full text of the object to which a OAI-PMH record refers?

In general, is the idea to use the <dc:source>/text() element, treat it as
a URL, and then expect to find the object there (provided that there was a
suitable <dc:type> and <dc:format> element)?

Example: http://scholar.lib.vt.edu/theses/OAI/cgi-bin/index.pl allows the
harvesting of ETD metadata.  Yet, its metadata reads:

<ListRecords>
   ....
   <metadata>
     <dc>
        <type>text</type>
        <format>application/pdf</format>
        <source>
http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/</source>
    ....


When one visits
http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/ however
there is no 'text' document of type 'application/pdf' - rather, it's an
HTML title page that embeds links to one or more PDF documents, such as
http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/unrestricted/Walker_1.pdfto
Walker_5.pdf.

Is VT's ETD OAI implementation deficient, or is OAI-PMH simply not set up
to allow the harvesting of full-text without what would basically amount to
crawling the ETD title page, or other repository-specific mechanisms?

On a related note, regarding rights. As a faculty member, I regularly sign
ETD approval forms.  At Tech, students have three options to choose from:
(a) open and immediate access, (b) restricted to VT for 1 year, (c)
withhold access completely for 1 year for patent/security purposes.  The
current form does not allow student authors to address whether the
full-text of their dissertation may be harvested for the purposes of
full-text indexing in such indexes as Google or Summon, not does it allow
them to restrict where copies are served from.  Similarly, the dc:rights
section in the OAI-PMH records address copyright only.  In practice, Google
crawls, indexes, and serves full-text copies of our dissertations.

 - Godmar