On Nov 28, 2006, at 3:28 PM, Andrew Nagy wrote: > The major problem > with it all is the ugly mess that is marcxml This brings up an interesting point about just dropping our source XML data into an XML-savvy database and using XQuery on it. Maybe y'all have much cleaner data that I've seen, but my experience with Rossetti Archive has had many XML data hurdles. When I came on board, Tamino was being used for the "search engine", with XPath queries all over the place. The raw data is not consistent, and a single word query expanded into an enormous XPath query to look at many elements and attributes, not to mention it was SLOW. Analyzing the user interface and the real-world searching needs, I wrote Java code that normalized the data for searching purposes into a much courser grained set of fields, indexing it into Lucene, and voila: http://www.rossettiarchive.org/rose The point is that even with super fast full-text searching with XQuery, most of our archives are probably going to require hideous expressions to query them using their raw structure, especially if have to account for data cleanup too (such as date formatting issues, which we also have in RA raw data). I realize I'm sounding anti-XQuery, which is sorta true, but only because in the real-world in which I work it works better to have some custom digesting of the raw data than to just toss it in and work with standards. Indexing is lossy - it's about keying things the way they need to be looked up. If your data is clean, you're in better shape than me. And if XQuery on your raw data does what you need, by all means I recommend it. Erik