On Wed, 2008-11-12 at 09:43 -0500, Ed Summers wrote: > On Wed, Nov 12, 2008 at 9:30 AM, Phil Cryer <[log in to unmask]> wrote: > > Thanks for the Tahoe mention, I hadn't heard of that. Looking at it now > > for differences from Hadoop. > > I think the main difference is the locality-of-reference you get with > Hadoop, which allows you to distribute processing as well as data. > This can be important in intensive data crunching exercises, where > having the data you are working with right there on a local disk, > rather than coming over the network is important. But distributing > processing in this way may not be important to you. > > //Ed Good point Ed, my thought is that in the future with all the data we'll be consuming we'll *need* distributed filesystems for redundancy/fault tolerance (and sanity), but to have something that can distribute big jobs around, like indexing huge datasets, the dist processing would be a bonus. P -- Phil Cryer | Open Source Dev Lead | web www.mobot.org | skype phil.cryer