Hi Fernando,
Yesterday I changed the Ubuntu to 64 bit version, because I'd like to try
out
MongoDB indexing library records, and the 32 bit version has some limitation
(the maximal database could not exceed 2 GB). I haven't tried MARC yet, only
XC records, which is a derivative of MARC, but from the documentation I read
that the idea is absolutely possible.
This is an example from Mongo's document [1]:
doc = { author: 'joe',
created : new Date('03-28-2009'),
title : 'Yet another blog post',
text : 'Here is the text...',
tags : [ 'example', 'joe' ],
comments : [ { author: 'jim', comment: 'I disagree' },
{ author: 'nancy', comment: 'Good post' }
]
}
db.post.insert(doc)
db.posts.find( { "comments.author" : "jim" } )
The most exciting here - for me - that is is not just a simple key-value
storage (a Lucene/Solr), but provides embeding field, so you can bravely
insert subfields, indicators etc. The will remain compact and findable.
So you can combine the relations known from traditional relational
databases and the flexibility and speed known from Solr.
I will let you know as soon I could insert first MARC records to Mongo.
[1] http://www.mongodb.org/display/DOCS/Inserting
regards,
Péter
eXtensible Catalog
----- Original Message -----
From: "Fernando Gómez" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Thursday, May 13, 2010 2:59 PM
Subject: [CODE4LIB] Indexing MARC(-JSON) with MongoDB?
> There's been some talk in code4lib about using MongoDB to store MARC
> records in some kind of JSON format. I'd like to know if you have
> experimented with indexing those documents in MongoDB. From my limited
> exposure to MongoDB, it seems difficult, unless MongoDB supports some
> kind of "custom indexing" functionality.
>
> According to the MongoDB docs [1], "you can create an index by calling
> the ensureIndex() function, and providing a document that specifies
> one or more keys to index." Examples of this are:
>
> db.things.ensureIndex({"city": 1})
> db.things.ensureIndex({"address.city": 1})
>
> That is, you specify the keys giving a path from the root of the
> document to the data element you are interested in. Such a path acts
> both as the index's name, and as an specification of how to get the
> keys's values.
>
> In the case of two proposed MARC-JSON formats [2, 3], I can't see such
> "path". For example, say you want an index on field 001. Simplifying,
> the JSON docs would look like this
>
> { "fields" : [ ["001", "001 value"], ... ] }
>
> or this
>
> { "controlfield" : [ { "tag" : "001", "data" : "fst01312614" }, ... ] }
>
> How would you specify field 001 to MongoDB?
>
> It would be nice to have some kind of custom indexing, where one could
> provide an index name and separately a JavaScript function specifying
> how to obtain the keys's values for that index.
>
> Any suggestions? Do other document oriented databases offer a better
> solution for this?
>
>
> BTW, I fed MongoDB with the example MARC records in [2] and [3], and
> it choked on them. Both are missing some commas :-)
>
>
> [1] http://www.mongodb.org/display/DOCS/Indexes
> [2] http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/
> [3] http://worldcat.org/devnet/wiki/MARC-JSON_Draft_2010-03-11
>
>
> --
> Fernando Gómez
> Biblioteca "Antonio Monteiro"
> INMABB (Conicet / Universidad Nacional del Sur)
> Av. Alem 1253
> B8000CPB Bahía Blanca, Argentina
> Tel. +54 (291) 459 5116
> http://inmabb.criba.edu.ar/
>
|