LISTSERV 16.5 - CODE4LIB Archives

Well, I think that's a *bit* harsh. But the "YMMV" addition was appreciated, because it can and will. That is, if the XML is completely consistent, then Kyle's suggestion is an excellent one. If it isn't, then Kevin's link applies, IMHO. Since it appears from what we have been told that the records are consistent, I think Kyle's solution is not only workable but the most efficient. Given the caveat stated above.
Roy

> On Jan 10, 2017, at 5:57 PM, Kevin S. Clarke <[log in to unmask]> wrote:
> 
> On the mention of parsing XML with string operations, I'm compelled to post one of my favorite StackOverflow responses:
> 
> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
> 
> YMMV of course...
> 
> Kevin 
> 
> 
> 
> -----Original message-----
>> From:Kyle Banerjee
>> Sent: Tuesday, January 10 2017, 5:44 pm
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] MARCXML help again
>> 
>> Howdy Julie,
>> 
>> Depending on your specific needs, it's often easier/faster to use string
>> rather than XML operations to work with XML.
>> 
>> Especially if you have a large number of files and/or the files are very
>> big, stripping the whitespace between elements and then performing a simple
>> string substitution would be a fast low tech way to remove the unwanted
>> fields.
>> 
>> kyle
>> 
>> On Tue, Jan 10, 2017 at 1:13 PM, Julie Swierczek <[log in to unmask]>
>> wrote:
>> 
>>> Thanks to all who responded to my earlier plea for help.  I now have a new
>>> problem.  I'm not sure if I can do this with find and replace in Oxygen, or
>>> if this requires XSLT, or what.
>>> 
>>> I have a project of MARCXML records like this:
>>> 
>>> <?xml version="1.0" encoding="UTF-8" ?>
>>> <marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim"
>>>    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>    xsi:schemaLocation="http://www.loc.gov/MARC21/slim
>>> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
>>>  <marc:record>
>>> <!--Lots of other datafields here -->
>>>    <marc:datafield tag="710" ind1="2" ind2=" ">
>>>            <marc:subfield code="a">Faux College</marc:subfield>
>>>            <marc:subfield code="b">Special Collections</marc:subfield>
>>>        </marc:datafield>
>>>  </marc:record>
>>> </marc:collection>
>>> 
>>> I want to strip out all instances of:
>>>    <marc:datafield tag="710" ind1="2" ind2=" ">
>>>            <marc:subfield code="a">Faux College</marc:subfield>
>>>            <marc:subfield code="b">Special Collections</marc:subfield>
>>>        </marc:datafield>
>>> but I want to leave other <marc:datafield tag="710" ind1="2" ind2=" ">
>>> instances intact.  I only want to delete ones with both the Faux College
>>> and Special Collections text in the subfields.
>>> 
>>> Where would I go from here? I thought of doing an xsl:template match in an
>>> XSL stylesheet, and then not providing any instructions for replacing the
>>> match, but I don't know how to select for that specific text. My attempts
>>> to figure that out have not worked. You can only read so much W3C
>>> documentation and Stack Overflow before you need to just sit quietly and
>>> stare at a wall for a while.
>>> 
>>> Thanks in advance --
>>> 
>>> Julie
>>> 
>>