Print

Print


+1 to Dave's suggestions!

I've used eXist extensively to create websites drawn from XML data. Northeastern's Digital Scholarship Group is in the process of setting up a BaseX-specific server; the projects we support will be able to store XML documents and query them.

XML databases shine when it comes to XML-aware search. eXist has a powerful full-text indexing system using Lucene, which lets you selectively index elements: http://exist-db.org/exist/apps/doc/lucene

BaseX is lightweight and easy to get set up. In my experience, its indexes make for *very* fast query responses, though its fulltext search isn't as customizable as eXist's: https://docs.basex.org/wiki/Indexes. BaseX is also great if you want to use another programming language to interface with the database: https://docs.basex.org/wiki/Clients

Hope this helps!

Warmly,
Ash
________________________________
From: Code for Libraries <[log in to unmask]> on behalf of David Mayo <[log in to unmask]>
Sent: Thursday, December 17, 2020 3:15 PM
To: [log in to unmask] <[log in to unmask]>
Subject: Re: [CODE4LIB] Web app to search XML files

A lot of good suggestions; if you're looking for fast turnaround without
having to decompose and shift the data, it might be worth looking at
dedicated XML databases like eXistDB and Basex

https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fexist-db.org%2Fexist%2Fapps%2Fhomepage%2Findex.html&amp;data=04%7C01%7Cas.clark%40NORTHEASTERN.EDU%7Ca80839dbb0b84d17692708d8a2c8a566%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637438329952908446%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UvJo9eWEJkfuxeQAPZawgNWGywm4%2BcFMZ6TkQPmpoaA%3D&amp;reserved=0
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbasex.org%2F&amp;data=04%7C01%7Cas.clark%40NORTHEASTERN.EDU%7Ca80839dbb0b84d17692708d8a2c8a566%7Ca8eec281aaa34daeac9b9a398b9215e7%7C0%7C0%7C637438329952918402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7zCbnAla0cXebracAtTCOntw2vFBMjrPQCh3I8w4IA4%3D&amp;reserved=0

IIRC, eXist-db has dedicated functionality for building applications built
in; even if you don't go that way, I've found these very useful for
analysis of XML corpuses prior to running other software to transform them.

- Dave Mayo (He/Him)
Software Dev @ Harvard LTS


On Thu, Dec 17, 2020 at 2:53 PM Stuart A. Yeates <[log in to unmask]> wrote:

> There's XML and XML.
>
> I suggest that you enquire about the exact format that you're going to
> be receiving and ask around for systems that support it out of the
> box.
>
> cheers
> stuart
>
>
> --
> ...let us be heard from red core to black sky
>
> On Fri, 18 Dec 2020 at 07:37, Pennington, Buddy D. <[log in to unmask]>
> wrote:
> >
> > Hi all,
> >
> > We're purchasing an XML dataset for the historical NY Times and I am
> curious about any suggestions to quickly build a web app to search and
> display those records for end users.
> >
> > Buddy Pennington
> > Head of Electronic Resources & Systems
> > University Libraries
> > University of Missouri - Kansas City
> > (he/him/his)
>