Oh, and that StackOverflow post really is epic. I can only hope to achieve that level of artistry one day.
Roy
> On Jan 10, 2017, at 5:57 PM, Kevin S. Clarke <[log in to unmask]> wrote:
>
> On the mention of parsing XML with string operations, I'm compelled to post one of my favorite StackOverflow responses:
>
> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
>
> YMMV of course...
>
> Kevin
>
>
>
> -----Original message-----
>> From:Kyle Banerjee
>> Sent: Tuesday, January 10 2017, 5:44 pm
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] MARCXML help again
>>
>> Howdy Julie,
>>
>> Depending on your specific needs, it's often easier/faster to use string
>> rather than XML operations to work with XML.
>>
>> Especially if you have a large number of files and/or the files are very
>> big, stripping the whitespace between elements and then performing a simple
>> string substitution would be a fast low tech way to remove the unwanted
>> fields.
>>
>> kyle
>>
>> On Tue, Jan 10, 2017 at 1:13 PM, Julie Swierczek <[log in to unmask]>
>> wrote:
>>
>>> Thanks to all who responded to my earlier plea for help. I now have a new
>>> problem. I'm not sure if I can do this with find and replace in Oxygen, or
>>> if this requires XSLT, or what.
>>>
>>> I have a project of MARCXML records like this:
>>>
>>> <?xml version="1.0" encoding="UTF-8" ?>
>>> <marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim"
>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>> xsi:schemaLocation="http://www.loc.gov/MARC21/slim
>>> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
>>> <marc:record>
>>> <!--Lots of other datafields here -->
>>> <marc:datafield tag="710" ind1="2" ind2=" ">
>>> <marc:subfield code="a">Faux College</marc:subfield>
>>> <marc:subfield code="b">Special Collections</marc:subfield>
>>> </marc:datafield>
>>> </marc:record>
>>> </marc:collection>
>>>
>>> I want to strip out all instances of:
>>> <marc:datafield tag="710" ind1="2" ind2=" ">
>>> <marc:subfield code="a">Faux College</marc:subfield>
>>> <marc:subfield code="b">Special Collections</marc:subfield>
>>> </marc:datafield>
>>> but I want to leave other <marc:datafield tag="710" ind1="2" ind2=" ">
>>> instances intact. I only want to delete ones with both the Faux College
>>> and Special Collections text in the subfields.
>>>
>>> Where would I go from here? I thought of doing an xsl:template match in an
>>> XSL stylesheet, and then not providing any instructions for replacing the
>>> match, but I don't know how to select for that specific text. My attempts
>>> to figure that out have not worked. You can only read so much W3C
>>> documentation and Stack Overflow before you need to just sit quietly and
>>> stare at a wall for a while.
>>>
>>> Thanks in advance --
>>>
>>> Julie
>>>
>>
|