The argument I've tried to make to content vendors (just in casual
conversation, never in actual negotiations) is that we'll still send the
user to their platform for actually accessing the text, we just want the
"metadata" (possibly including textual fulltext for searching) for
_searching_. So they can still meter and completely control actual
article access.
Sadly, in casual conversation, I have not generally found this to be
persuasive with content vendors. Especially ones which are metadata
aggregators only without any fulltext in the first place, heh.
Publishers are more open to this -- but then publishers may have been
ensnared in exclusive contracts with aggregators that leave them unable
to do it even if they wanted. (See EBSCO).
I wrote on article on this a couple years ago in Library Journal. In
retrospect, I think my article is over-optimistic about the technical
feasibility of doing this -- running a Solr instance isn't that bad, but
the technical issues of maintaining the regular flow of updates from
dozens of content providers, and normalizing all data to go in the same
index, are non-trivial, I think now.
http://www.libraryjournal.com/article/CA6413442.html
Owen Stephens wrote:
> As others have suggested I think much of this is around the practicalities
> of negotiating access, and the server power & expertise needed to run the
> service - simply more efficient to do this in one place.
>
> For me the change that we need to open this up is for publishers to start
> pushing out a lot more of this data to all comers, rather than having to
> have this conversation several times over with individual sites or
> suppliers. How practical this is I'm not sure - especially as we are talking
> about indexing full-text where available (I guess). I think the Google News
> model (5-clicks free) is an interesting one - but not sure whether this, or
> a similar approach, would work in a niche market which may not be so
> interested in total traffic.
>
> It seems (to me) obviously in the publishers interest for their content to
> be as easily discoverable as possible that I am optimistic they will
> gradually become more open to sharing more data that aids this - at least
> metadata. I'd hope that this would eventually open up the market to a
> broader set of suppliers, as well as institutions doing their own thing.
>
> Owen
>
> On Thu, Jul 1, 2010 at 2:37 AM, Eric Lease Morgan <[log in to unmask]> wrote:
>
>
>> On Jun 30, 2010, at 8:43 PM, Blake, Miriam E wrote:
>>
>>
>>> We have locally loaded records from the ISI databases, INSPEC,
>>> BIOSIS, and the Department of Energy (as well as from full-text
>>> publishers, but that is another story and system entirely.) Aside
>>> from the contracts, I can also attest to the major amount of
>>> work it has been. We have 95M bibliographic records, stored in >
>>> 75TB of disk, and counting. Its all running on SOLR, with a local
>>> interface and the distributed aDORe repository on backend. ~ 2
>>> FTE keep it running in production now.
>>>
>> I definitely think what is outlined above -- local indexing -- is the way
>> to go in the long run. Get the data. Index it. Integrate it into your other
>> system. Know that you have it when you change or drop the license. No
>> renting of data. And, "We don't need no stinkin' interfaces!" I believe a
>> number of European institutions have been doing this for a number of years.
>> I hear a few of us in the United States following suit. ++
>>
>> --
>> Eric Morgan
>> University of Notre Dame.
>>
>>
>
>
>
>
|