Take a look at "Best Practices for Shareable Metadata": http://webservices.itcs.umich.edu/mediawiki/oaibp/index.php/ShareableMetadataPublic
There is a specific section on "Linking from a Record to a Resource and Other Linking Issues".
Regards,
Tom
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Joe Hourcle
> Sent: Monday, February 27, 2012 10:43 AM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] "Repositories", OAI-PMH and web crawling
>
> On Feb 27, 2012, at 10:51 AM, Godmar Back wrote:
> > On Mon, Feb 27, 2012 at 8:31 AM, Diane Hillmann
> <[log in to unmask]>wrote:
> >> On Mon, Feb 27, 2012 at 5:25 AM, Owen Stephens
> <[log in to unmask]> wrote:
>
> >>> This issue is certainly not unique to VT - we've come across this as
> >>> part of our project. While the OAI-PMH record may point at the PDF,
> >>> it can
> >> also
> >>> point to a intermediary page. This seems to be standard practice in
> >>> some instances - I think because there is a desire, or even
> >>> requirement, that
> >> a
> >>> user should see the intermediary page (which may contain rights
> >> information
> >>> etc.) before viewing the full-text item. There may also be an issue
> >>> where multiple files exist for the same item - maybe several data
> >>> files and a
> >> pdf
> >>> of the thesis attached to the same metadata record - as the metadata
> >>> via OAI-PMH may not describe each asset.
> >>>
> >>>
> >> This has been an issue since the early days of OAI-PMH, and many
> >> large providers provide such intermediate pages (arxiv.org, for
> >> instance). The other issue driving providers towards intermediate
> >> pages is that it allows them to continue to derive statistics from
> >> usage of their materials, which direct access URIs and multiple web
> >> caches don't. For providers dependent on external funding, this is a
> biggie.
> >>
> > Why do you place direct access URI and multiple web caches into the
> > same category? I follow your argument re: usage statistics for web
> > caches, but as long as the item remains hosted in the repository
> > direct access URIs should still be counted (provided proper
> > cache-control headers are sent.) Perhaps it would require server-side
> statistics rather than client-based GA.
>
> I'd agree -- if you can't get good statistics from direct linking, something's
> wrong with the methods you're using to collect usage information. Google
> Analytics and similar tools might produce pretty reports, but they're really
> meant for tracking web sites and won't work when someone has javascript
> turned off, has specifically blacklisted the analytics server, or on anything
> that's not HTML.
>
> You *really* need to analyze the server logs directly, as you can't be sure
> that all access is going to go through the intermediate 'landing pages' or that
> it'd be tracked even if they did.
>
> ...
>
> I admit, the stuff I'm serving is a little different than most people on this list,
> but we also have the issue that the collections are so large that we don't
> want people retrieving the files unless they really need them. We serve
> multiple TB per day -- I'd rather a person figure out if they want a file
> *before* they retrieve it, rather than download a few GB of data and find
> out it won't serve their purposes.
>
> It might not help our 'look how much we serve!' metrics to justify our
> funding, but it helps keep our costs down, and I personally believe it helps
> with good will in our designated community as they don't spend a day (or
> more) downloading only to find it's not what they thought. (and it fits in with
> Ranganathan's 4th law better than saving them from an extra click)
>
> -Joe
|