I am eager to see you try it, Cory. Please consider writing up your
results for the Code4Lib Journal. I'd be curious to hear the complete
story, from issues of getting metadata, to issues of the technical
infrastructure, any metadata normalization you need to do, issues of
continuing to get the metadata on a regular basis, etc.
Whether you succeed or fail, but especially if you succeed, your project
with just a couple databases could serve as a useful "pilot" for people
considering doing it with more.
Jonathan
Cory Rockliff wrote:
> We're looking at an infrastructure based on Marklogic running on Amazon
> EC2, so the scale of data to be indexed shouldn't actually be that big
> of an issue. Also, as I said to Jonathan, I only see myself indexing a
> handful of highly-relevant resources, so we're talking millions, rather
> than 100s of millions, of records.
>
> On 6/30/2010 4:22 PM, Walker, David wrote:
>
>> You might also need to factor in an extra server or three (in the cloud or otherwise) into that equation, given that we're talking 100s of millions of records that will need to be indexed.
>>
>>
>>
>>> companies like iii and Ex Libris are the only ones with
>>> enough clout to negotiate access
>>>
>>>
>> I don't think III is doing any kind of aggregated indexing, hence their decision to try and leverage APIs. I could be wrong.
>>
>> --Dave
>>
>> ==================
>> David Walker
>> Library Web Services Manager
>> California State University
>> http://xerxes.calstate.edu
>> ________________________________________
>> From: Code for Libraries [[log in to unmask]] On Behalf Of Jonathan Rochkind [[log in to unmask]]
>> Sent: Wednesday, June 30, 2010 1:15 PM
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] DIY aggregate index
>>
>> Cory Rockliff wrote:
>>
>>
>>> Do libraries opt for these commercial 'pre-indexed' services simply
>>> because they're a good value proposition compared to all the work of
>>> indexing multiple resources from multiple vendors into one local index,
>>> or is it that companies like iii and Ex Libris are the only ones with
>>> enough clout to negotiate access to otherwise-unavailable database
>>> vendors' content?
>>>
>>>
>>>
>> A little bit of both, I think. A library probably _could_ negotiate
>> access to that content... but it would be a heck of a lot of work. When
>> the staff time to negotiations come in, it becomes a good value
>> proposition, regardless of how much the licensing would cost you. And
>> yeah, then the staff time to actually ingest and normalize and
>> troubleshoot data-flows for all that stuff on the regular basis -- I've
>> heard stories of libraries that tried to do that in the early 90s and it
>> was nightmarish.
>>
>> So, actually, I guess i've arrived at convincing myself it's mostly
>> "good value proposition", in that a library probably can't afford to do
>> that on their own, with or without licensing issues.
>>
>> But I'd really love to see you try anyway, maybe I'm wrong. :)
>>
>>
>>
>>> Can I assume that if a database vendor has exposed their content to me
>>> as a subscriber, whether via z39.50 or a web service or whatever, that
>>> I'm free to cache and index all that metadata locally if I so choose? Is
>>> this something to be negotiated on a vendor-by-vendor basis, or is it an
>>> impossibility?
>>>
>>>
>>>
>> I doubt you can assume that. I don't think it's an impossibility.
>>
>> Jonathan
>> ---
>> [This E-mail scanned for viruses by Declude Virus]
>>
>>
>>
>>
>>
>
>
>
|