Print

Print


On the mention of parsing XML with string operations, I'm compelled to post one of my favorite StackOverflow responses:

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

YMMV of course...

Kevin 


 
-----Original message-----
> From:Kyle Banerjee
> Sent: Tuesday, January 10 2017, 5:44 pm
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] MARCXML help again
> 
> Howdy Julie,
> 
> Depending on your specific needs, it's often easier/faster to use string
> rather than XML operations to work with XML.
> 
> Especially if you have a large number of files and/or the files are very
> big, stripping the whitespace between elements and then performing a simple
> string substitution would be a fast low tech way to remove the unwanted
> fields.
> 
> kyle
> 
> On Tue, Jan 10, 2017 at 1:13 PM, Julie Swierczek <[log in to unmask]>
> wrote:
> 
> > Thanks to all who responded to my earlier plea for help.  I now have a new
> > problem.  I'm not sure if I can do this with find and replace in Oxygen, or
> > if this requires XSLT, or what.
> >
> > I have a project of MARCXML records like this:
> >
> > <?xml version="1.0" encoding="UTF-8" ?>
> > <marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim"
> >     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> >     xsi:schemaLocation="http://www.loc.gov/MARC21/slim
> > http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
> >   <marc:record>
> > <!--Lots of other datafields here -->
> >     <marc:datafield tag="710" ind1="2" ind2=" ">
> >             <marc:subfield code="a">Faux College</marc:subfield>
> >             <marc:subfield code="b">Special Collections</marc:subfield>
> >         </marc:datafield>
> >   </marc:record>
> > </marc:collection>
> >
> > I want to strip out all instances of:
> >     <marc:datafield tag="710" ind1="2" ind2=" ">
> >             <marc:subfield code="a">Faux College</marc:subfield>
> >             <marc:subfield code="b">Special Collections</marc:subfield>
> >         </marc:datafield>
> > but I want to leave other <marc:datafield tag="710" ind1="2" ind2=" ">
> > instances intact.  I only want to delete ones with both the Faux College
> > and Special Collections text in the subfields.
> >
> > Where would I go from here? I thought of doing an xsl:template match in an
> > XSL stylesheet, and then not providing any instructions for replacing the
> > match, but I don't know how to select for that specific text. My attempts
> > to figure that out have not worked. You can only read so much W3C
> > documentation and Stack Overflow before you need to just sit quietly and
> > stare at a wall for a while.
> >
> > Thanks in advance --
> >
> > Julie
> >
>