That is, if the XML is completely consistent AND you're guaranteed to never encounter MARC data with XML special characters, then Kyle's suggestion is an excellent one. I really need to find an excuse to publish a document with a title starting "<marc:datafield ..." cheers stuart -- ...let us be heard from red core to black sky On Wed, Jan 11, 2017 at 12:06 PM, Roy Tennant <[log in to unmask]> wrote: > Well, I think that's a *bit* harsh. But the "YMMV" addition was > appreciated, because it can and will. That is, if the XML is completely > consistent, then Kyle's suggestion is an excellent one. If it isn't, then > Kevin's link applies, IMHO. Since it appears from what we have been told > that the records are consistent, I think Kyle's solution is not only > workable but the most efficient. Given the caveat stated above. > Roy > > > On Jan 10, 2017, at 5:57 PM, Kevin S. Clarke <[log in to unmask]> > wrote: > > > > On the mention of parsing XML with string operations, I'm compelled to > post one of my favorite StackOverflow responses: > > > > http://stackoverflow.com/questions/1732348/regex-match- > open-tags-except-xhtml-self-contained-tags/1732454#1732454 > > > > YMMV of course... > > > > Kevin > > > > > > > > -----Original message----- > >> From:Kyle Banerjee > >> Sent: Tuesday, January 10 2017, 5:44 pm > >> To: [log in to unmask] > >> Subject: Re: [CODE4LIB] MARCXML help again > >> > >> Howdy Julie, > >> > >> Depending on your specific needs, it's often easier/faster to use string > >> rather than XML operations to work with XML. > >> > >> Especially if you have a large number of files and/or the files are very > >> big, stripping the whitespace between elements and then performing a > simple > >> string substitution would be a fast low tech way to remove the unwanted > >> fields. > >> > >> kyle > >> > >> On Tue, Jan 10, 2017 at 1:13 PM, Julie Swierczek < > [log in to unmask]> > >> wrote: > >> > >>> Thanks to all who responded to my earlier plea for help. I now have a > new > >>> problem. I'm not sure if I can do this with find and replace in > Oxygen, or > >>> if this requires XSLT, or what. > >>> > >>> I have a project of MARCXML records like this: > >>> > >>> <?xml version="1.0" encoding="UTF-8" ?> > >>> <marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim" > >>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > >>> xsi:schemaLocation="http://www.loc.gov/MARC21/slim > >>> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"> > >>> <marc:record> > >>> <!--Lots of other datafields here --> > >>> <marc:datafield tag="710" ind1="2" ind2=" "> > >>> <marc:subfield code="a">Faux College</marc:subfield> > >>> <marc:subfield code="b">Special Collections</marc:subfield> > >>> </marc:datafield> > >>> </marc:record> > >>> </marc:collection> > >>> > >>> I want to strip out all instances of: > >>> <marc:datafield tag="710" ind1="2" ind2=" "> > >>> <marc:subfield code="a">Faux College</marc:subfield> > >>> <marc:subfield code="b">Special Collections</marc:subfield> > >>> </marc:datafield> > >>> but I want to leave other <marc:datafield tag="710" ind1="2" ind2=" "> > >>> instances intact. I only want to delete ones with both the Faux > College > >>> and Special Collections text in the subfields. > >>> > >>> Where would I go from here? I thought of doing an xsl:template match > in an > >>> XSL stylesheet, and then not providing any instructions for replacing > the > >>> match, but I don't know how to select for that specific text. My > attempts > >>> to figure that out have not worked. You can only read so much W3C > >>> documentation and Stack Overflow before you need to just sit quietly > and > >>> stare at a wall for a while. > >>> > >>> Thanks in advance -- > >>> > >>> Julie > >>> > >> >