Print

Print


Oh, and that StackOverflow post really is epic. I can only hope to achieve that level of artistry one day. 
Roy

> On Jan 10, 2017, at 5:57 PM, Kevin S. Clarke <[log in to unmask]> wrote:
> 
> On the mention of parsing XML with string operations, I'm compelled to post one of my favorite StackOverflow responses:
> 
> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
> 
> YMMV of course...
> 
> Kevin 
> 
> 
> 
> -----Original message-----
>> From:Kyle Banerjee
>> Sent: Tuesday, January 10 2017, 5:44 pm
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] MARCXML help again
>> 
>> Howdy Julie,
>> 
>> Depending on your specific needs, it's often easier/faster to use string
>> rather than XML operations to work with XML.
>> 
>> Especially if you have a large number of files and/or the files are very
>> big, stripping the whitespace between elements and then performing a simple
>> string substitution would be a fast low tech way to remove the unwanted
>> fields.
>> 
>> kyle
>> 
>> On Tue, Jan 10, 2017 at 1:13 PM, Julie Swierczek <[log in to unmask]>
>> wrote:
>> 
>>> Thanks to all who responded to my earlier plea for help.  I now have a new
>>> problem.  I'm not sure if I can do this with find and replace in Oxygen, or
>>> if this requires XSLT, or what.
>>> 
>>> I have a project of MARCXML records like this:
>>> 
>>> <?xml version="1.0" encoding="UTF-8" ?>
>>> <marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim"
>>>    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>    xsi:schemaLocation="http://www.loc.gov/MARC21/slim
>>> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
>>>  <marc:record>
>>> <!--Lots of other datafields here -->
>>>    <marc:datafield tag="710" ind1="2" ind2=" ">
>>>            <marc:subfield code="a">Faux College</marc:subfield>
>>>            <marc:subfield code="b">Special Collections</marc:subfield>
>>>        </marc:datafield>
>>>  </marc:record>
>>> </marc:collection>
>>> 
>>> I want to strip out all instances of:
>>>    <marc:datafield tag="710" ind1="2" ind2=" ">
>>>            <marc:subfield code="a">Faux College</marc:subfield>
>>>            <marc:subfield code="b">Special Collections</marc:subfield>
>>>        </marc:datafield>
>>> but I want to leave other <marc:datafield tag="710" ind1="2" ind2=" ">
>>> instances intact.  I only want to delete ones with both the Faux College
>>> and Special Collections text in the subfields.
>>> 
>>> Where would I go from here? I thought of doing an xsl:template match in an
>>> XSL stylesheet, and then not providing any instructions for replacing the
>>> match, but I don't know how to select for that specific text. My attempts
>>> to figure that out have not worked. You can only read so much W3C
>>> documentation and Stack Overflow before you need to just sit quietly and
>>> stare at a wall for a while.
>>> 
>>> Thanks in advance --
>>> 
>>> Julie
>>> 
>>