Print

Print


I saw an interesting article on hackernews (news.ycombinator.com) yesterday published by AT&T interactive

-- article summary -- 

A Graph Processing Stack
http://engineering.attinteractive.com/2010/12/a-graph-processing-stack/

"[...] AT&Ti along with other collaborators (see acknowledgments), have been working on an open-source, graph processing stack. This stack depends on the use of a graph database. There are numerous graph databases in the market today. To name a few, there exist Neo4j, OrientDB, DEX, InfiniteGraph, Sones, HyperGraphDB, and others."

"Blueprints can be considered the 'JDBC' for the graph database community. "  there is a driver for Sesame Sail Quad Store
https://github.com/tinkerpop/blueprints/wiki/

Pipes is a low level access to a graph in the database, looks almost like DOM or SAX but for graphs.
https://github.com/tinkerpop/pipes/wiki/

Gremlin is a higher level access API and looks more like XPath
https://github.com/tinkerpop/gremlin/wiki

"Finally, at the top of the stack, there exists Rexster. Rexster exposes any Blueprints-enabled graph database as a RESTful server."
https://github.com/tinkerpop/rexster/wiki/

-- end of article summary --

We just released the first public prototype for a Social Networks and Archival Context project, right now you can search 123,920 EAC-CPF records with XTF.
http://socialarchive.iath.virginia.edu/prototype.html

So there is full text search of names and keywords, and some faceted browsing and basic search stuff.  Some of the really interesting data is in all the "correspondedWith" and "associatedWith" relationships we have in the EAC records, but putting those into XTF facet values does not seem to be too useful.

Now I think that loading these correspondedWith and associatedWith relationships into a graph database with a simple model and then plopping this graph processing stack on top of it might be the best way to index and search these relationships.  I had been trying to figure out RDF and how FOAF would map to our data, but I'm stuck on that and don't know how to start.  

"graph processing stack on top of a graph database" resonates with me more than "RDF store with SPARQL access" but I guess they are basically/functionally saying the same thing?  Maybe the "graph database" way of thinking about it is  potentially less interoperable open data linking way? -- but I've always believed you have to operate before you can interoperate.

Anyway, I hope the recycled hackernews was interesting, and if anyone has any ideas, suggestions, criticism or advice on how to expose access to the social graph in the SNAC project prototype please let me know.