Do you think that Kaitlin Duck Sherwood actually asked Paris Hilton to write an operating system? > On Jan 10, 2017, at 2:57 PM, Kevin S. Clarke <[log in to unmask]> wrote: > > On the mention of parsing XML with string operations, I'm compelled to post one of my favorite StackOverflow responses: > > http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 > > YMMV of course... > > Kevin > > > > -----Original message----- >> From:Kyle Banerjee >> Sent: Tuesday, January 10 2017, 5:44 pm >> To: [log in to unmask] >> Subject: Re: [CODE4LIB] MARCXML help again >> >> Howdy Julie, >> >> Depending on your specific needs, it's often easier/faster to use string >> rather than XML operations to work with XML. >> >> Especially if you have a large number of files and/or the files are very >> big, stripping the whitespace between elements and then performing a simple >> string substitution would be a fast low tech way to remove the unwanted >> fields. >> >> kyle >> >> On Tue, Jan 10, 2017 at 1:13 PM, Julie Swierczek <[log in to unmask]> >> wrote: >> >>> Thanks to all who responded to my earlier plea for help. I now have a new >>> problem. I'm not sure if I can do this with find and replace in Oxygen, or >>> if this requires XSLT, or what. >>> >>> I have a project of MARCXML records like this: >>> >>> <?xml version="1.0" encoding="UTF-8" ?> >>> <marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim" >>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>> xsi:schemaLocation="http://www.loc.gov/MARC21/slim >>> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"> >>> <marc:record> >>> <!--Lots of other datafields here --> >>> <marc:datafield tag="710" ind1="2" ind2=" "> >>> <marc:subfield code="a">Faux College</marc:subfield> >>> <marc:subfield code="b">Special Collections</marc:subfield> >>> </marc:datafield> >>> </marc:record> >>> </marc:collection> >>> >>> I want to strip out all instances of: >>> <marc:datafield tag="710" ind1="2" ind2=" "> >>> <marc:subfield code="a">Faux College</marc:subfield> >>> <marc:subfield code="b">Special Collections</marc:subfield> >>> </marc:datafield> >>> but I want to leave other <marc:datafield tag="710" ind1="2" ind2=" "> >>> instances intact. I only want to delete ones with both the Faux College >>> and Special Collections text in the subfields. >>> >>> Where would I go from here? I thought of doing an xsl:template match in an >>> XSL stylesheet, and then not providing any instructions for replacing the >>> match, but I don't know how to select for that specific text. My attempts >>> to figure that out have not worked. You can only read so much W3C >>> documentation and Stack Overflow before you need to just sit quietly and >>> stare at a wall for a while. >>> >>> Thanks in advance -- >>> >>> Julie >>> >>