Print

Print


On 04-03-2004 05:31, "Roy Tennant" <[log in to unmask]> wrote:
> These comments are all good ones, and those of you who know me (and
> Walter is in that number) know that I'm nothing if not practical. In my
> defense I can only put forward the fact that I suggested a "profile"
> idea which would hopefully abstract to at least one level the kind of
> maintenance that would be required. That is, I would not want to go
> into (name your favorite language here) code every time a page changed
> that we were basically screen-scraping. That's a recipe for disaster.
> Rather, I was hoping we could come up with a method that would allow
> virtually anyone (not just code jockeys) to update some key elements
> that the program would then use to properly process the page. This of
> course would still rely upon the very tenuous fact that the typical
> journal HTML makes any sense whatsoever.

I think that the best way to do this kind of thing would be to store a
regular expression (PCRE if possible) for each journal/section. Of course
that begs the question of how accessible writing regular expressions is for
the average Joe. (Though I'd like to think that with some basic instructions
a well-trained librarian would flock to the idea instantly.)

--
Harold Bakker
webmaster virtuele mediatheek FSAO
http://virtmed.fsao.hvu.nl/