LISTSERV 16.5 - CODE4LIB Archives

On 7/1/10 9:44 AM, "Jonathan Rochkind" <[log in to unmask]> wrote:

the technical issues of maintaining the regular flow of updates from
dozens of content providers, and normalizing all data to go in the same
          index, are non-trivial, I think now.

>>
This is very much one of the hardest parts, Jonathan.
Also, thinking about the kinds of services that users want from this data, we've
found the biggest need is to focus on citation references if you can get them. (e.g. ISI)
And if you think the bibliographic metadata is poor quality, try
matching on brief reference metadata (that which doesn't contain unique identifiers, of course.)
Complex fuzzy string matching and it still is never really great.
(this is part of the problem with cite counts being all over the map in the the apps out there!)

My words to the wise are to NOT do local loading unless you have a lot of time and money.
Vendors who are doing it have economies of scale.  Individual institutions typically
do not.  If the community were to make agreements to have centralized management
at a few institutions for this kind of "open" dataset, maybe. But, as someone noted, the middle-men
("value add" A&I producers - Thompson, EBSCO, etc.) are not going to love this idea.

Miriam Blake
Los Alamos National Laboratory Research Library