LISTSERV 16.5 - CODE4LIB Archives

That is, if the XML is completely consistent AND you're guaranteed to never
encounter MARC data with XML special characters, then Kyle's suggestion is
an excellent one.

I really need to find an excuse to publish a document with a title starting
"<marc:datafield ..."

cheers
stuart

--
...let us be heard from red core to black sky

On Wed, Jan 11, 2017 at 12:06 PM, Roy Tennant <[log in to unmask]> wrote:

> Well, I think that's a *bit* harsh. But the "YMMV" addition was
> appreciated, because it can and will. That is, if the XML is completely
> consistent, then Kyle's suggestion is an excellent one. If it isn't, then
> Kevin's link applies, IMHO. Since it appears from what we have been told
> that the records are consistent, I think Kyle's solution is not only
> workable but the most efficient. Given the caveat stated above.
> Roy
>
> > On Jan 10, 2017, at 5:57 PM, Kevin S. Clarke <[log in to unmask]>
> wrote:
> >
> > On the mention of parsing XML with string operations, I'm compelled to
> post one of my favorite StackOverflow responses:
> >
> > http://stackoverflow.com/questions/1732348/regex-match-
> open-tags-except-xhtml-self-contained-tags/1732454#1732454
> >
> > YMMV of course...
> >
> > Kevin
> >
> >
> >
> > -----Original message-----
> >> From:Kyle Banerjee
> >> Sent: Tuesday, January 10 2017, 5:44 pm
> >> To: [log in to unmask]
> >> Subject: Re: [CODE4LIB] MARCXML help again
> >>
> >> Howdy Julie,
> >>
> >> Depending on your specific needs, it's often easier/faster to use string
> >> rather than XML operations to work with XML.
> >>
> >> Especially if you have a large number of files and/or the files are very
> >> big, stripping the whitespace between elements and then performing a
> simple
> >> string substitution would be a fast low tech way to remove the unwanted
> >> fields.
> >>
> >> kyle
> >>
> >> On Tue, Jan 10, 2017 at 1:13 PM, Julie Swierczek <
> [log in to unmask]>
> >> wrote:
> >>
> >>> Thanks to all who responded to my earlier plea for help.  I now have a
> new
> >>> problem.  I'm not sure if I can do this with find and replace in
> Oxygen, or
> >>> if this requires XSLT, or what.
> >>>
> >>> I have a project of MARCXML records like this:
> >>>
> >>> <?xml version="1.0" encoding="UTF-8" ?>
> >>> <marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim"
> >>>    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> >>>    xsi:schemaLocation="http://www.loc.gov/MARC21/slim
> >>> http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
> >>>  <marc:record>
> >>> <!--Lots of other datafields here -->
> >>>    <marc:datafield tag="710" ind1="2" ind2=" ">
> >>>            <marc:subfield code="a">Faux College</marc:subfield>
> >>>            <marc:subfield code="b">Special Collections</marc:subfield>
> >>>        </marc:datafield>
> >>>  </marc:record>
> >>> </marc:collection>
> >>>
> >>> I want to strip out all instances of:
> >>>    <marc:datafield tag="710" ind1="2" ind2=" ">
> >>>            <marc:subfield code="a">Faux College</marc:subfield>
> >>>            <marc:subfield code="b">Special Collections</marc:subfield>
> >>>        </marc:datafield>
> >>> but I want to leave other <marc:datafield tag="710" ind1="2" ind2=" ">
> >>> instances intact.  I only want to delete ones with both the Faux
> College
> >>> and Special Collections text in the subfields.
> >>>
> >>> Where would I go from here? I thought of doing an xsl:template match
> in an
> >>> XSL stylesheet, and then not providing any instructions for replacing
> the
> >>> match, but I don't know how to select for that specific text. My
> attempts
> >>> to figure that out have not worked. You can only read so much W3C
> >>> documentation and Stack Overflow before you need to just sit quietly
> and
> >>> stare at a wall for a while.
> >>>
> >>> Thanks in advance --
> >>>
> >>> Julie
> >>>
> >>
>