> When I tackled Ebsco, I ran into issues of site authentication via
> cookies that were passed to the search gateway but not on to the client
> browser. Peter Binkley, at the University of Alberta recommended a proxy
> configuration to balance off this issue. Essentially those connections
> would have to continue to operate inside a search gateway proxied session.
yeah...haven't really tackled passing the searches thru...that's why i
just tried to give the stable urls from the result lists if i could get
them....something to consider tho
> I don't know how the perl tools stack up in terms of parallel search
> streams. The php/curl combination is purely serial and the last targets
> will time out if there is a tardy responder in the middle of the serial
> queue.
yeah....another reason for some of my decisions, esp. the iframe stuff;
not terribly pleased with it, but avoided the concurrency issue...
there is an extension to perl's LWP that allows parallel searching:
http://www2.inf.ethz.ch/~langhein/ParallelUA/
haven't looked very closely, but would probably be another solution....
my main problem right now is the parsing....ugh, ugh...ugh....
> Art Rhyno, at the University of Windsor, suggested a parallel
> approach might be possible in a Cocoon environment. This has the
> advantage of passing all the inbound HTML pages through JTidy and giving
> you the XHTML/XML compliant input stream you wanted (in most cases, even
> when the output from the target was some distance from compliance).
another possible solution....art's done some pretty cool things with
cocoon, but i haven't tried that kool-aid yet ;)
|