Thanks Nate, I worked out pretty quickly about
http://id.loc.gov/authorities/names/collection_NamesUndifferentiated and
http://id.loc.gov/authorities/names/collection_NamesAuthorizedHeadings
I'll look at the converter
cheers
stuart
--
...let us be heard from red core to black sky
On Tue, 5 May 2026 at 10:37, Trail, Nate <[log in to unmask]> wrote:
>
> Thanks for pointing this out; I'll look at the mads:affiliation; may be an
> oversight. Is it better in madsrdf?
>
> We run the html display from the MADSRDF; here's the conversion from MARC
> to MADSRDF:
> https://github.com/lcnetdev/marcauth-to-madsrdf
>
> If you look at the MADSRDF, we've taken a lot of MARC controlfield data
> and converted it to collections:
>
> From: https://id.loc.gov/authorities/names/n2025504122.madsrdf.rdf
> <madsrdf:isMemberOfMADSCollection rdf:resource="
> http://id.loc.gov/authorities/names/collection_NamesAuthorizedHeadings"/>
> <madsrdf:isMemberOfMADSScheme rdf:resource="
> http://id.loc.gov/authorities/names"/>
> <madsrdf:isMemberOfMADSCollection rdf:resource="
> http://id.loc.gov/authorities/names/collection_LCNAF"/>
>
> Some are more important to pay attention to, like:
> <madsrdf:isMemberOfMADSCollection rdf:resource="
> http://id.loc.gov/authorities/names/collection_NamesUndifferentiated"/>
>
> Undiferrentiated names are not for a single person. I looked up "Smith,
> Jim" and found https://id.loc.gov/authorities/names/n87147358; not a real
> person but a placeholder. You might not want to put much effort into a
> generic name like that.
>
> Nate
>
> -----Original Message-----
> From: Code for Libraries <[log in to unmask]> On Behalf Of Stuart A.
> Yeates
> Sent: Monday, May 04, 2026 6:06 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] Regexp for rewriting LoC LCCN authorised personal
> names
>
> CAUTION: This email message has been received from an external source.
> Please use caution when opening attachments, or clicking on links.
>
> Either not all of the information in the HTML is in the MARCXML, or the
> documentation at https://www.loc.gov/marc/authority/index.html is
> insufficient for me to extract it.
>
> For example "Collection Membership(s) - Names Collection - Authorized
> Headings" is not obviously apparent in the MARCXML using the documentation.
>
> If the HTML is generated from the MARCXML, can you point me to the code
> for that?
>
> Note that only one of four 373a / mads:affiliation tags comes into the
> mads XML from the MARCXML, is that deliberate?
>
> cheers
> stuart
> --
> ...let us be heard from red core to black sky
>
>
> On Tue, 5 May 2026 at 09:12, Trail, Nate <[log in to unmask]> wrote:
>
> > MARC xml and MADS xml are listed at the bottom of each Name page
> > under "Alternate Formats". Since you are using XSL those should work
> > for you way better than a scraped html page in a warc file.
> >
> > If you know the lccn, you can fetch the single page in the
> > serialization you like:
> >
> > https://id.loc.gov/authorities/names/n2001028682.madsxml.xml
> > https://id.loc.gov/authorities/names/n2001028682.marcxml.xml
> >
> > Nate
> > -----Original Message-----
> > From: Code for Libraries <[log in to unmask]> On Behalf Of Stuart
> A.
> > Yeates
> > Sent: Monday, May 04, 2026 4:45 PM
> > To: [log in to unmask]
> > Subject: Re: [CODE4LIB] Regexp for rewriting LoC LCCN authorised
> > personal names
> >
> > CAUTION: This email message has been received from an external source.
> > Please use caution when opening attachments, or clicking on links.
> >
> > I've got many pages like
> > https://id.loc.gov/authorities/names/n2001028682.html (stored in WARC
> > files)
> >
> > I've got names.madsrdf.xml.gz which is all the names in madsrdf, but
> > it's disaggregated rather than in the format exampled in
> > https://www.loc.gov/standards/mads/rdf/ so it's not really amenable to
> > processing in XSL. I'd prefer not to spin up a triple store and
> > reasoner of any kind.
> >
> > I suspect that what I need is the MARCXML, which I'm familiar with
> > manipulating with XSL and has all the subfields I need explicitly marked.
> >
> > As I work, I've been documenting the differences I find between LoC
> > and wikidata, on the understanding that bridging LCCNs and wikidata is
> > unlikely to be the work of a single person, see
> > https://urldefense.us/v3/__https://www.wikidata.org/wiki/User:Stuartye
> > ates/Wikidata_-_LoC_ontological_mismatches__;!!MrYkk0_46kUzGAu-DfDRZGQ
> > !eCVHA4UrUnLZ4pxsftyKHSpGCX-NTX6bW29M5KEBEtBBodS7cFJzptHPrFpLyJhG_F55J
> > LGs_WLC$
> >
> > cheers
> > stuart
> > --
> > ...let us be heard from red core to black sky
> >
> >
> > On Tue, 5 May 2026 at 07:40, Michael Monaco <
> > [log in to unmask]> wrote:
> >
> > > As Kevin mentioned, there are in fact many possible patterns for
> > > names to appear in, so it's probably not possible to un-invert all
> > > the names in the NAF with a single RegEx.
> > >
> > > You mention that you've downloaded the records in bulk -- what
> > > format are the records in? Could you provide some examples?
> > >
> > > Thanks,
> > >
> > > Mike Monaco
> > > Head, Technical Services & Coordinator, Cataloging Services
> > > Associate Professor of Bibliography University Libraries Technical
> > > Services 261B Bierce Library The University of Akron Akron, Ohio
> > > 44325-1712 He/him/his
> > > Office: 330-972-2446
> > > [log in to unmask]
> > > ORCID: 0000-0001-7244-5154
> > > https://urldefense.us/v3/__https://www.uakron.edu/libraries__;!!MrYk
> > > k0
> > > _46kUzGAu-DfDRZGQ!eCVHA4UrUnLZ4pxsftyKHSpGCX-NTX6bW29M5KEBEtBBodS7cF
> > > Jz
> > > ptHPrFpLyJhG_F55JEk5-yDI$
> > >
> > >
> > > -----Original Message-----
> > > From: Code for Libraries <[log in to unmask]> On Behalf Of
> > > Stuart
> > A.
> > > Yeates
> > > Sent: Monday, May 4, 2026 3:07 PM
> > > To: [log in to unmask]
> > > Subject: Re: [CODE4LIB] Regexp for rewriting LoC LCCN authorised
> > > personal names
> > >
> > > CAUTION:This email originated from outside of The University of Akron.
> > > Use caution when opening attachments, clicking links or responding
> > > to requests for information.
> > >
> > >
> > >
> > > As it happens, I have already downloaded the records in bulk. What I
> > > need is a regexp to parse the "quoted text"
> > >
> > > cheers
> > > stuart
> > >
> > > --
> > > ...let us be heard from red core to black sky
> > >
> > >
> > > On Tue, 5 May 2026 at 06:33, Trail, Nate <[log in to unmask]> wrote:
> > >
> > > > Stuart,
> > > >
> > > > You could download the entire Names file in "nt" serialization,
> > > > then there's one line for each name you can filter on:
> > > >
> > > >
> > > > <https://urldefense.us/v3/__http://id.l/__;!!MrYkk0_46kUzGAu-DfDRZ
> > > > GQ
> > > > !eCVHA4UrUnLZ4pxsftyKHSpGCX-NTX6bW29M5KEBEtBBodS7cFJzptHPrFpLyJhG_
> > > > F5
> > > > 5JFyT7rah$
> > > > oc.gov%2Fauthorities%2Fnames%2Fnr2001046558&data=05%7C02%7Cmmonaco
> > > > %
> > > 40UAKRON.EDU%7C65c1a7fc4f6d48f5610608deaa106e9e%7Ce8575dedd7f94ecea4
> > > aa
> > > 0b32991aeedd%7C0%7C0%7C639135184716106736%7CUnknown%7CTWFpbGZsb3d8ey
> > > JF
> > > bXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFp
> > > bC
> > > IsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XITloQ5ZybEL5qrdAojXpx%2FZ21we
> > > dG
> > > 6%2BA%2BO%2B1ix4cok%3D&reserved=0>
> > > < http://www.loc.gov/mads/rdf/v1#authoritativeLabel > "Smith, Jim,
> > > 1940 October 17-" .
> > > >
> > > > Then you can do what you want with the quoted text.
> > > >
> > > > Saves bandwidth for you and us.
> > > >
> > > > https://urldefense.us/v3/__https://id.l/__;!!MrYkk0_46kUzGAu-DfDRZ
> > > > GQ
> > > > !eCVHA4UrUnLZ4pxsftyKHSpGCX-NTX6bW29M5KEBEtBBodS7cFJzptHPrFpLyJhG_
> > > > F5
> > > > 5JKbGlPyQ$
> > > > oc.gov%2Fdownload%2F&data=05%7C02%7Cmmonaco%40UAKRON.EDU%7C65c1a7f
> > > > c4
> > > > f6
> > > > d48f5610608deaa106e9e%7Ce8575dedd7f94ecea4aa0b32991aeedd%7C0%7C0%7
> > > > C6
> > > > 39
> > > > 135184716159980%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIl
> > > > Yi
> > > > Oi
> > > > IwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> > > > %7
> > > > C%
> > > > 7C%7C&sdata=T7OhOWgr1s4TxHLYmtL5hgQR7rNT3rcLIT5LfjFSvoA%3D&reserve
> > > > d=
> > > > 0
> > > >
> > > > Good luck,
> > > >
> > > > Nate
> > > >
> > > >
> > > > -----------------------------------------
> > > > Nate Trail
> > > > Network Development & MARC Standards Office LCSG/DPS/ABA/NDMSO
> > > > Library of Congress Washington DC 20540
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Code for Libraries <[log in to unmask]> On Behalf Of
> > > > Kevin Hawkins
> > > > Sent: Monday, May 04, 2026 2:08 PM
> > > > To: [log in to unmask]
> > > > Subject: Re: [CODE4LIB] Regexp for rewriting LoC LCCN authorised
> > > > personal names
> > > >
> > > > CAUTION: This email message has been received from an external
> source.
> > > > Please use caution when opening attachments, or clicking on links.
> > > >
> > > > Hello Stuart,
> > > >
> > > > Do you mean that you want to convert LCNAF personal names from
> > > > this sort of order:
> > > >
> > > > Mudge, Lewis Seymour, 1868-1945
> > > >
> > > > to something like this:
> > > >
> > > > Lewis Seymour Mudge, 1868-1945
> > > >
> > > > ? But then also deal with authorized forms containing no commas,
> > > > forms with more than two commas, and occasional use of parentheses.
> > > > So, as you know, it gets complicated.
> > > >
> > > > I wonder if a different approach might make more sense here:
> > > >
> > > > 1. Query the inverted LCNAF form at
> > > > https://urldefense.us/v3/__https://id.l/__;!!MrYkk0_46kUzGAu-DfDRZ
> > > > GQ
> > > > !eCVHA4UrUnLZ4pxsftyKHSpGCX-NTX6bW29M5KEBEtBBodS7cFJzptHPrFpLyJhG_
> > > > F5
> > > > 5JKbGlPyQ$
> > > > oc.gov%2F&data=05%7C02%7Cmmonaco%40UAKRON.EDU%7C65c1a7fc4f6d48f561
> > > > 06
> > > > 08
> > > > deaa106e9e%7Ce8575dedd7f94ecea4aa0b32991aeedd%7C0%7C0%7C6391351847
> > > > 16
> > > > 17
> > > > 8598%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuM
> > > > DA
> > > > wM
> > > > CIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&s
> > > > da
> > > > ta
> > > > =FkP48ZXE11h7Qq1kXsl9JK%2FBhQvnswsYpC8rPoPGgYg%3D&reserved=0
> > > >
> > > > 2. Retrieve the URI, extracting the identifier (beginning with
> > > > "n")
> > > >
> > > > 3. Query Wikidata using this identifier.
> > > >
> > > > 4. Retrieve Wikidata's form of the name, which is not inverted.
> > > >
> > > > --Kevin
> > > >
> > > > On 5/3/26 1:25 PM, Stuart A. Yeates wrote:
> > > > > Does anyone know of somewhere that describes LCCN authorised
> > > > > personal names as regexps? I want to be able to rewrite them at
> > > > > scale
> > > to 'normal'
> > > > order.
> > > > >
> > > > > AI appears to be actively undermining the functionality of
> > > > > search
> > > > engines.
> > > > >
> > > > > cheers
> > > > > stuart
> > > > > --
> > > > > ...let us be heard from red core to black sky
> > > >
> > >
> >
>
|