Thanks for pointing this out; I'll look at the mads:affiliation; may be an oversight. Is it better in madsrdf?
We run the html display from the MADSRDF; here's the conversion from MARC to MADSRDF:
https://github.com/lcnetdev/marcauth-to-madsrdf
If you look at the MADSRDF, we've taken a lot of MARC controlfield data and converted it to collections:
From: https://id.loc.gov/authorities/names/n2025504122.madsrdf.rdf
<madsrdf:isMemberOfMADSCollection rdf:resource="http://id.loc.gov/authorities/names/collection_NamesAuthorizedHeadings"/>
<madsrdf:isMemberOfMADSScheme rdf:resource="http://id.loc.gov/authorities/names"/>
<madsrdf:isMemberOfMADSCollection rdf:resource="http://id.loc.gov/authorities/names/collection_LCNAF"/>
Some are more important to pay attention to, like:
<madsrdf:isMemberOfMADSCollection rdf:resource="http://id.loc.gov/authorities/names/collection_NamesUndifferentiated"/>
Undiferrentiated names are not for a single person. I looked up "Smith, Jim" and found https://id.loc.gov/authorities/names/n87147358; not a real person but a placeholder. You might not want to put much effort into a generic name like that.
Nate
-----Original Message-----
From: Code for Libraries <[log in to unmask]> On Behalf Of Stuart A. Yeates
Sent: Monday, May 04, 2026 6:06 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Regexp for rewriting LoC LCCN authorised personal names
CAUTION: This email message has been received from an external source. Please use caution when opening attachments, or clicking on links.
Either not all of the information in the HTML is in the MARCXML, or the documentation at https://www.loc.gov/marc/authority/index.html is insufficient for me to extract it.
For example "Collection Membership(s) - Names Collection - Authorized Headings" is not obviously apparent in the MARCXML using the documentation.
If the HTML is generated from the MARCXML, can you point me to the code for that?
Note that only one of four 373a / mads:affiliation tags comes into the mads XML from the MARCXML, is that deliberate?
cheers
stuart
--
...let us be heard from red core to black sky
On Tue, 5 May 2026 at 09:12, Trail, Nate <[log in to unmask]> wrote:
> MARC xml and MADS xml are listed at the bottom of each Name page
> under "Alternate Formats". Since you are using XSL those should work
> for you way better than a scraped html page in a warc file.
>
> If you know the lccn, you can fetch the single page in the
> serialization you like:
>
> https://id.loc.gov/authorities/names/n2001028682.madsxml.xml
> https://id.loc.gov/authorities/names/n2001028682.marcxml.xml
>
> Nate
> -----Original Message-----
> From: Code for Libraries <[log in to unmask]> On Behalf Of Stuart A.
> Yeates
> Sent: Monday, May 04, 2026 4:45 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] Regexp for rewriting LoC LCCN authorised
> personal names
>
> CAUTION: This email message has been received from an external source.
> Please use caution when opening attachments, or clicking on links.
>
> I've got many pages like
> https://id.loc.gov/authorities/names/n2001028682.html (stored in WARC
> files)
>
> I've got names.madsrdf.xml.gz which is all the names in madsrdf, but
> it's disaggregated rather than in the format exampled in
> https://www.loc.gov/standards/mads/rdf/ so it's not really amenable to
> processing in XSL. I'd prefer not to spin up a triple store and
> reasoner of any kind.
>
> I suspect that what I need is the MARCXML, which I'm familiar with
> manipulating with XSL and has all the subfields I need explicitly marked.
>
> As I work, I've been documenting the differences I find between LoC
> and wikidata, on the understanding that bridging LCCNs and wikidata is
> unlikely to be the work of a single person, see
> https://urldefense.us/v3/__https://www.wikidata.org/wiki/User:Stuartye
> ates/Wikidata_-_LoC_ontological_mismatches__;!!MrYkk0_46kUzGAu-DfDRZGQ
> !eCVHA4UrUnLZ4pxsftyKHSpGCX-NTX6bW29M5KEBEtBBodS7cFJzptHPrFpLyJhG_F55J
> LGs_WLC$
>
> cheers
> stuart
> --
> ...let us be heard from red core to black sky
>
>
> On Tue, 5 May 2026 at 07:40, Michael Monaco <
> [log in to unmask]> wrote:
>
> > As Kevin mentioned, there are in fact many possible patterns for
> > names to appear in, so it's probably not possible to un-invert all
> > the names in the NAF with a single RegEx.
> >
> > You mention that you've downloaded the records in bulk -- what
> > format are the records in? Could you provide some examples?
> >
> > Thanks,
> >
> > Mike Monaco
> > Head, Technical Services & Coordinator, Cataloging Services
> > Associate Professor of Bibliography University Libraries Technical
> > Services 261B Bierce Library The University of Akron Akron, Ohio
> > 44325-1712 He/him/his
> > Office: 330-972-2446
> > [log in to unmask]
> > ORCID: 0000-0001-7244-5154
> > https://urldefense.us/v3/__https://www.uakron.edu/libraries__;!!MrYk
> > k0
> > _46kUzGAu-DfDRZGQ!eCVHA4UrUnLZ4pxsftyKHSpGCX-NTX6bW29M5KEBEtBBodS7cF
> > Jz
> > ptHPrFpLyJhG_F55JEk5-yDI$
> >
> >
> > -----Original Message-----
> > From: Code for Libraries <[log in to unmask]> On Behalf Of
> > Stuart
> A.
> > Yeates
> > Sent: Monday, May 4, 2026 3:07 PM
> > To: [log in to unmask]
> > Subject: Re: [CODE4LIB] Regexp for rewriting LoC LCCN authorised
> > personal names
> >
> > CAUTION:This email originated from outside of The University of Akron.
> > Use caution when opening attachments, clicking links or responding
> > to requests for information.
> >
> >
> >
> > As it happens, I have already downloaded the records in bulk. What I
> > need is a regexp to parse the "quoted text"
> >
> > cheers
> > stuart
> >
> > --
> > ...let us be heard from red core to black sky
> >
> >
> > On Tue, 5 May 2026 at 06:33, Trail, Nate <[log in to unmask]> wrote:
> >
> > > Stuart,
> > >
> > > You could download the entire Names file in "nt" serialization,
> > > then there's one line for each name you can filter on:
> > >
> > >
> > > <https://urldefense.us/v3/__http://id.l/__;!!MrYkk0_46kUzGAu-DfDRZ
> > > GQ
> > > !eCVHA4UrUnLZ4pxsftyKHSpGCX-NTX6bW29M5KEBEtBBodS7cFJzptHPrFpLyJhG_
> > > F5
> > > 5JFyT7rah$
> > > oc.gov%2Fauthorities%2Fnames%2Fnr2001046558&data=05%7C02%7Cmmonaco
> > > %
> > 40UAKRON.EDU%7C65c1a7fc4f6d48f5610608deaa106e9e%7Ce8575dedd7f94ecea4
> > aa
> > 0b32991aeedd%7C0%7C0%7C639135184716106736%7CUnknown%7CTWFpbGZsb3d8ey
> > JF
> > bXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFp
> > bC
> > IsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XITloQ5ZybEL5qrdAojXpx%2FZ21we
> > dG
> > 6%2BA%2BO%2B1ix4cok%3D&reserved=0>
> > < http://www.loc.gov/mads/rdf/v1#authoritativeLabel > "Smith, Jim,
> > 1940 October 17-" .
> > >
> > > Then you can do what you want with the quoted text.
> > >
> > > Saves bandwidth for you and us.
> > >
> > > https://urldefense.us/v3/__https://id.l/__;!!MrYkk0_46kUzGAu-DfDRZ
> > > GQ
> > > !eCVHA4UrUnLZ4pxsftyKHSpGCX-NTX6bW29M5KEBEtBBodS7cFJzptHPrFpLyJhG_
> > > F5
> > > 5JKbGlPyQ$
> > > oc.gov%2Fdownload%2F&data=05%7C02%7Cmmonaco%40UAKRON.EDU%7C65c1a7f
> > > c4
> > > f6
> > > d48f5610608deaa106e9e%7Ce8575dedd7f94ecea4aa0b32991aeedd%7C0%7C0%7
> > > C6
> > > 39
> > > 135184716159980%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIl
> > > Yi
> > > Oi
> > > IwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0
> > > %7
> > > C%
> > > 7C%7C&sdata=T7OhOWgr1s4TxHLYmtL5hgQR7rNT3rcLIT5LfjFSvoA%3D&reserve
> > > d=
> > > 0
> > >
> > > Good luck,
> > >
> > > Nate
> > >
> > >
> > > -----------------------------------------
> > > Nate Trail
> > > Network Development & MARC Standards Office LCSG/DPS/ABA/NDMSO
> > > Library of Congress Washington DC 20540
> > >
> > >
> > > -----Original Message-----
> > > From: Code for Libraries <[log in to unmask]> On Behalf Of
> > > Kevin Hawkins
> > > Sent: Monday, May 04, 2026 2:08 PM
> > > To: [log in to unmask]
> > > Subject: Re: [CODE4LIB] Regexp for rewriting LoC LCCN authorised
> > > personal names
> > >
> > > CAUTION: This email message has been received from an external source.
> > > Please use caution when opening attachments, or clicking on links.
> > >
> > > Hello Stuart,
> > >
> > > Do you mean that you want to convert LCNAF personal names from
> > > this sort of order:
> > >
> > > Mudge, Lewis Seymour, 1868-1945
> > >
> > > to something like this:
> > >
> > > Lewis Seymour Mudge, 1868-1945
> > >
> > > ? But then also deal with authorized forms containing no commas,
> > > forms with more than two commas, and occasional use of parentheses.
> > > So, as you know, it gets complicated.
> > >
> > > I wonder if a different approach might make more sense here:
> > >
> > > 1. Query the inverted LCNAF form at
> > > https://urldefense.us/v3/__https://id.l/__;!!MrYkk0_46kUzGAu-DfDRZ
> > > GQ
> > > !eCVHA4UrUnLZ4pxsftyKHSpGCX-NTX6bW29M5KEBEtBBodS7cFJzptHPrFpLyJhG_
> > > F5
> > > 5JKbGlPyQ$
> > > oc.gov%2F&data=05%7C02%7Cmmonaco%40UAKRON.EDU%7C65c1a7fc4f6d48f561
> > > 06
> > > 08
> > > deaa106e9e%7Ce8575dedd7f94ecea4aa0b32991aeedd%7C0%7C0%7C6391351847
> > > 16
> > > 17
> > > 8598%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuM
> > > DA
> > > wM
> > > CIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&s
> > > da
> > > ta
> > > =FkP48ZXE11h7Qq1kXsl9JK%2FBhQvnswsYpC8rPoPGgYg%3D&reserved=0
> > >
> > > 2. Retrieve the URI, extracting the identifier (beginning with
> > > "n")
> > >
> > > 3. Query Wikidata using this identifier.
> > >
> > > 4. Retrieve Wikidata's form of the name, which is not inverted.
> > >
> > > --Kevin
> > >
> > > On 5/3/26 1:25 PM, Stuart A. Yeates wrote:
> > > > Does anyone know of somewhere that describes LCCN authorised
> > > > personal names as regexps? I want to be able to rewrite them at
> > > > scale
> > to 'normal'
> > > order.
> > > >
> > > > AI appears to be actively undermining the functionality of
> > > > search
> > > engines.
> > > >
> > > > cheers
> > > > stuart
> > > > --
> > > > ...let us be heard from red core to black sky
> > >
> >
>
|