Just out of curiosity, why are you focusing on a document store database?
It seems that wide column store might also be appropriate for this type of
application given the similarity in both structure and content of many
fields.
In any case, the trick in my mind is how to represent MARC in JSON
(disclaimer: I haven't tried to do this yet). Breaking it into pieces that
index well but which also can be recombined without going through
contortions doesn't sound easy because the obvious solution of converting
each field into an object strikes me as more awkward than it should be. My
gut reaction would be to store the entire MARC record in MARCXML, and
normalize and index field values to facilitate search/retrieval.
JSON maybe a great data exchange format, but it's not a markup language
like XML so doing things like preserving field order or just getting a
bird's eye view of content across multiple fields or subfields becomes more
complex.
kyle
2010/5/13 Fernando Gómez <[log in to unmask]>
> There's been some talk in code4lib about using MongoDB to store MARC
> records in some kind of JSON format. I'd like to know if you have
> experimented with indexing those documents in MongoDB. From my limited
> exposure to MongoDB, it seems difficult, unless MongoDB supports some
> kind of "custom indexing" functionality.
>
> According to the MongoDB docs [1], "you can create an index by calling
> the ensureIndex() function, and providing a document that specifies
> one or more keys to index." Examples of this are:
>
> db.things.ensureIndex({"city": 1})
> db.things.ensureIndex({"address.city": 1})
>
> That is, you specify the keys giving a path from the root of the
> document to the data element you are interested in. Such a path acts
> both as the index's name, and as an specification of how to get the
> keys's values.
>
> In the case of two proposed MARC-JSON formats [2, 3], I can't see such
> "path". For example, say you want an index on field 001. Simplifying,
> the JSON docs would look like this
>
> { "fields" : [ ["001", "001 value"], ... ] }
>
> or this
>
> { "controlfield" : [ { "tag" : "001", "data" : "fst01312614" }, ... ] }
>
> How would you specify field 001 to MongoDB?
>
> It would be nice to have some kind of custom indexing, where one could
> provide an index name and separately a JavaScript function specifying
> how to obtain the keys's values for that index.
>
> Any suggestions? Do other document oriented databases offer a better
> solution for this?
>
>
> BTW, I fed MongoDB with the example MARC records in [2] and [3], and
> it choked on them. Both are missing some commas :-)
>
>
> [1] http://www.mongodb.org/display/DOCS/Indexes
> [2] http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/
> [3] http://worldcat.org/devnet/wiki/MARC-JSON_Draft_2010-03-11
>
>
> --
> Fernando Gómez
> Biblioteca "Antonio Monteiro"
> INMABB (Conicet / Universidad Nacional del Sur)
> Av. Alem 1253
> B8000CPB Bahía Blanca, Argentina
> Tel. +54 (291) 459 5116
> http://inmabb.criba.edu.ar/
>
--
----------------------------------------------------------
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
[log in to unmask] / 503.999.9787
|