On Jan 19, 2007, at 4:07 AM, Erik Hatcher wrote:
> On Jan 17, 2007, at 3:26 PM, Andrew Nagy wrote:
>> One thing I am hoping that can come out of the preconference is a
>> standard XSLT doc. I sat down with my metadata librarian to
>> develop our
>> XSLT doc -- determining what fields are to be searchable what fields
>> should be left out to help speed up results, etc.
>>
>> It's pretty easy, I think you will be amazed how fast you can have a
>> functioning system with very little effort.
>
> You're quite right with that last statement.
>
> I am, however, skeptical of a purely MARC -> XSLT -> Solr solution.
> The MARC data I've seen requires some basic cleanup (removing dots at
> the end of subjects, normalizing dates, etc) in order to be useful as
> facets. While XSLT is powerful, this type of data manipulation is
> better (IMO) done with scripting languages that allow for easy
> tweaking in a succinct way. I'm sure XSLT could do everything that
> you'd want done; you can also drive screws in with a hammer :)
So the punctuation stripping has already been done in XSLT.
LoC has a MARCXML -> MODS XSLT stylesheet [1] which strips out the evil
ISBD punctuation. I've generally found mapping from MODS to be more
convenient than mapping from MARC, so while it's an extra step, it does
save a little programmer time since some of the hidden hierarchy in the
MARC data is made explicit in the MODS structure.
If hopping through MODS is unacceptable, the LoC has the punctuation-
stripping nicely tucked away into a MARC Conversion Utility Stylesheet
that you could use directly in a MARC XML -> Solr transformation. [2]
[1] http://www.loc.gov/standards/mods/v3/MARC21slim2MODS.xsl
[2] http://www.loc.gov/marcxml/xslt/MARC21slimUtils.xsl
Tod Olson <[log in to unmask]>
Programmer/Analyst
University of Chicago Library
|