Just a follow-up from my mail yesterday, I received a few messages that
suggest I should have made some aspects more clear. One is that the
parallel setup was a very quick test, I tried putting a dozen targets in
the same setup but to really nail down the extent of the added efficiencies
of concurrent retrievals, you would have to work around browser caching and
probably lots of other factors. I also don't have any exception clauses
built into the generator, so if one target times out, you get the beauty of
an unhelpful message and options for a java stacktrace. So, yes, it does
break if the network is slow. Timeouts must be a fairly intense kind of
calculation for federated searching, I guess if you were trying to do some
sort of de-duping you would want to try to wait for every target but
there's got to be some tough trade-offs here.
As near as I can tell, the parallel processing is for collecting the
content, the aggregation piece in Cocoon seems to have to hear back from
all of the targets before pushing any content further down the pipeline.
You could set timeouts in the generator, but I think Cocoon would still be
somewhat at the mercy of one unresponsive target. One part of federated
searching that probably shouldn't be underestimated is the browser itself.
If you can push some of the content out through some sort of iFrame setup,
for example, the browser still shows the content surrounding the iFrames
while doing a pretty good job of juggling multiple processes. It's the same
as when your web page has a sequence of <img src=".."> elements, a user
still gets some content right away and the images get added as they are
available or as they can be processed.