LISTSERV 16.5 - CODE4LIB Archives

Eric Lease Morgan wrote:
> How do I write a computer program that spawns many processes but
> returns one result?
>
> I suppose the classic example of my query is the federated search. Get
> user input. Send it to many remote indexes. Wait. Combine results.
> Return. In this scenario when one of the remote indexes is slow things
> grind to a halt.
>
> I have a more modern example. Suppose I want to take advantage of many
> Web Services. One might be spell checker. Another might be a
> thesaurus. Another might be an index. Another might be a user lookup
> function. Given this environment, where each Web Service will return
> different sets of streams, how do I query each of them simultaneously
> and then aggregate the result? I don't want to so this sequentially. I
> want to fork them all at once and wait for their return before a
> specific time out. In Perl I can use the system command to fork a
> process, but I must wait for it to return. There is another Perl
> command allowing me to fork a process and keep going but I don't
> remember what it is. Neither one of these solutions seem feasible. Is
> the idea of threading in Java suppose to be able to address this
> problem?
Yes.  Take a look at Brian Goetz book, Java: Concurrency in Practice.
It's the best resource I have found on creating multi-threaded applications.

On a recent project I worked on there are several steps that must be
taken in a workflow which takes a very large set of files, moves them
from one server to another, then does some qa work, some data
transformation, and finally stores a set of artifacts in a digital
repository.

I used ActiveMQ to build a message based system such that all of this
work can occur simultaneously.   It may seem that simply transferring
files from one server to another would be fairly basic operation, but
when you're dealing with hundreds of thousands of files that are
anywhere from 100Mb to over a GB is size a sequential process just can't
handle the amount of data.
>