Thanks Roy,
I will look into Swish-e.
Edward
On Wed, Mar 16, 2011 at 11:32 AM, Roy Tennant <[log in to unmask]> wrote:
> These requirements fit Swish-e [1] to a "T". I've used it to index
> millions of XML records [2], and there are no particular requirements
> for the XML -- it just needs to be well-formed. You can have it
> automatically detect and index XML fields as well as index all words
> across all fields. This is all handled by a very simple text config
> file. The only downside is you will need to write the user interface
> (CGI) in your favorite language to interact with Swish-e.
>
> For example, here is my entire config file for Current Cites [3],
> where I store citations in my own XML format:
>
> DefaultContents XML*
> UndefinedMetaTags auto
> IndexDir /home/tennantr/public_html/currentcites/cites/
> ReplaceRules remove /home/tennantr/public_html/currentcites/cites/
> PropertyNames creator title description booktitle source
> IndexOnly .xml
>
> This tells Swish-e to expect XML, the line "UndefinedMetaTags auto"
> tells it to keep track of any XML tag it sees, the next two lines
> telll it where the files are and I remove the path from the index so I
> only get returned each file title without the server path included.
> The "PropertyNames" line defines with elements are actually stored in
> the index, which I can then retrieve directly in the search results
> for display to the user. The "IndexOnly .xml" line tells Swish-e to
> ignore anything without that filename extension. Nothing could be
> easier.
> Roy
>
> [1] http://swish-e.org/
> [2] http://roytennant.com/proto/hathi/
> [3] http://lists.webjunction.org/currentcites/
>
> On Wed, Mar 16, 2011 at 8:00 AM, Edward M. Corrado <[log in to unmask]> wrote:
>> Hi,
>>
>> I [will soon] have a small set (< 1000 records) of Dublin Core
>> metadata published in OAI_DC format that I want to be searchable via a
>> Web browser. Normally we would use Ex Libris's Primo for this, but
>> this particular set of data may have some confidential information and
>> our repository only has minimal built in search functions. While we
>> still may go with Primo for these records, I am looking for at other
>> possibilities. The requirements as I see them are:
>>
>> 1) Can ingest records in OAI_DC format
>> 2) Allow remote end-users who are familiar with the collection search
>> these ingest records via a Web browser.
>> 3)Search should be keyword anywhere or individual fields although it
>> does not need to have every whizzbang feature out there. In other
>> words, basic search feature are fine.
>> 4) Should support the ability to link to the display copy in our
>> repository (probably goes without saying)
>> 5) Should be simple to install and maintain (Thus, at least in my
>> mind, eliminating something like Blacklight)
>> 6) Preferably a LAMP application although a Windows server based
>> solution is a possibility as well
>> 7) Preferably Open Source, or at least no- or low-cost
>>
>> I haven't been able to find anything searching the Web, but it seems
>> like something people may have done before. Before I re-invent the
>> wheel or shoe-horn something together, does anyone have any
>> suggestions?
>>
>> Edward
>>
>
|