I did something similar to this recently. I used Python with oaipmh + simplejson + couchdb (the python library) to write a simple oai2json script which converts OAI XML into JSON and inserts it into a CouchDB instance. I then used the CouchDB river service to index the CouchDB JSON in ElasticSearch. Now that I'm writing this, there may be a way to remove the Couch intermediary step from the process if you're not intending to ever change the records.
Once it was indexed I just needed to write some simple javascript to query the Elastic Search and then display the records in a friendly way (eg create that link into your system). The whole thing seems to be only ~160 lines of code including HTML and curl mojo to setup couchdb. You could block access to the ES indexer by IP via firewall rules for your access control if this is sufficient. Since the app is all in JS you don't even need a web server. There is a bit of work to get all of those other servers configured though.
If this sounds useful, drop me a line and I'll see about getting you the code.
Regards,
Alex Lemann
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Edward M. Corrado
Sent: Wednesday, March 16, 2011 11:01 AM
To: [log in to unmask]
Subject: [CODE4LIB] Simple Web-based Dublin Core search engine?
Hi,
I [will soon] have a small set (< 1000 records) of Dublin Core metadata published in OAI_DC format that I want to be searchable via a Web browser. Normally we would use Ex Libris's Primo for this, but this particular set of data may have some confidential information and our repository only has minimal built in search functions. While we still may go with Primo for these records, I am looking for at other possibilities. The requirements as I see them are:
1) Can ingest records in OAI_DC format
2) Allow remote end-users who are familiar with the collection search these ingest records via a Web browser.
3)Search should be keyword anywhere or individual fields although it does not need to have every whizzbang feature out there. In other words, basic search feature are fine.
4) Should support the ability to link to the display copy in our repository (probably goes without saying)
5) Should be simple to install and maintain (Thus, at least in my mind, eliminating something like Blacklight)
6) Preferably a LAMP application although a Windows server based solution is a possibility as well
7) Preferably Open Source, or at least no- or low-cost
I haven't been able to find anything searching the Web, but it seems like something people may have done before. Before I re-invent the wheel or shoe-horn something together, does anyone have any suggestions?
Edward
|