LISTSERV 16.5 - CODE4LIB Archives

On Wed, 2008-11-12 at 09:43 -0500, Ed Summers wrote:
> On Wed, Nov 12, 2008 at 9:30 AM, Phil Cryer <[log in to unmask]> wrote:
> > Thanks for the Tahoe mention, I hadn't heard of that.  Looking at it now
> > for differences from Hadoop.
> 
> I think the main difference is the locality-of-reference you get with
> Hadoop, which allows you to distribute processing as well as data.
> This can be important in intensive data crunching exercises, where
> having the data you are working with right there on a local disk,
> rather than coming over the network is important. But distributing
> processing in this way may not be important to you.
> 
> //Ed

Good point Ed, my thought is that in the future with all the data we'll
be consuming we'll *need* distributed filesystems for redundancy/fault
tolerance (and sanity), but to have something that can distribute big
jobs around, like indexing huge datasets, the dist processing would be a
bonus.

P
-- 
Phil Cryer | Open Source Dev Lead | web www.mobot.org | skype phil.cryer