These comments are all good ones, and those of you who know me (and
Walter is in that number) know that I'm nothing if not practical. In my
defense I can only put forward the fact that I suggested a "profile"
idea which would hopefully abstract to at least one level the kind of
maintenance that would be required. That is, I would not want to go
into (name your favorite language here) code every time a page changed
that we were basically screen-scraping. That's a recipe for disaster.
Rather, I was hoping we could come up with a method that would allow
virtually anyone (not just code jockeys) to update some key elements
that the program would then use to properly process the page. This of
course would still rely upon the very tenuous fact that the typical
journal HTML makes any sense whatsoever.
But that's just one level of what I was after. I was also just trying
to make the _general_ point that we are not necessarily limited to
exactly what we find _in_situ_. We can, with imagination and the right
tools, manipulate what is there to our advantage. And that was really
the point I was trying to make (clumsily, admittedly). Let's think
imaginatively about how we might be able to take what we can easily get
and improve it with information from other sources, such as Walter's
good idea about snatching RSS feeds (good), or some kind of software
manipulation such as I suggested (less good).
Finally, as a practical man, I realize that we will never be successful
if we rely on journal publishers to do metadata, or page coding, the
way we wish them to. I mean, we may as well just give up now if that is
what is required. Therefore, if we wish to do this, we _must_ come up
with an infrastructure that can accommodate no metadata whatsoever.
That, my friends, is life. It's also why the "semantic web" is a
complete non-starter. So the sooner we start dealing with reality, the
better off we'll all be.
Roy
On Mar 3, 2004, at 6:06 PM, Dinberg Donna wrote:
> Responding to Roy's interesting suggesting and being mindful of
> Walter's/Cliff's cautions, my tale of woe in the hard-copy world was
> always
> wanting a way to get at that "In Brief" stuff without having to
> eyeball the
> journal. Today, online "In Brief" notices still need to be found
> efficiently by some of us for various reasons. Anything that improves
> retrieval of these smaller items would be welcomed by me. You are
> correct,
> Walter, that the best federated search results are those resulting from
> standards-based procedures; but I like Roy's idea, too, for the other,
> smaller stuff.
>
> Back to lurking now.
> Din.
>
> Donna Dinberg
> Systems Librarian/Analyst
> Virtual Reference Canada
> Library and Archives Canada
> Ottawa, ON K1A 0N4
> Voice: 613-995-9227
> E-mail: [log in to unmask]
>
> <Opinions all mine, of course. Usual disclaimers apply.>
>
>
>
>> -----Original Message-----
>> From: Walter Lewis [mailto:[log in to unmask]]
>> Sent: Wednesday, March 03, 2004 7:18 PM
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] index of open access journals
>>
>>
>> Roy Tennant wrote:
>>
>>> [snip] There may be other ways to leverage more information out of
>>> what we're indexing. For example, a number of journals have
>> sections,
>>> such as "In Brief" from D-Lib Magazine [snip] It would of
>> course take
>>> more work to both setup and maintain,
>>> but the result would be better.
>>
>> I am reminded of a piece of advice Cliff Lynch offered at an
>> Access conference I attended in the early days of the web
>> ('95 in Fredericton) where he talked about the fundamental
>> fragility of programs that supplied web content by screen
>> scraping vt100 interfaces.
> <snip>
> < The best federated search
>> results, IMHO, hang on standard search and result protocols
>> like Z39.50 where the underlying structure is abstracted into
>> standardized access points and published record syntax.
>
|