I've got many pages like
https://id.loc.gov/authorities/names/n2001028682.html (stored in WARC
files)
I've got names.madsrdf.xml.gz which is all the names in madsrdf, but it's
disaggregated rather than in the format exampled in
https://www.loc.gov/standards/mads/rdf/ so it's not really amenable to
processing in XSL. I'd prefer not to spin up a triple store and reasoner of
any kind.
I suspect that what I need is the MARCXML, which I'm familiar with
manipulating with XSL and has all the subfields I need explicitly marked.
As I work, I've been documenting the differences I find between LoC and
wikidata, on the understanding that bridging LCCNs and wikidata is unlikely
to be the work of a single person, see
https://www.wikidata.org/wiki/User:Stuartyeates/Wikidata_-_LoC_ontological_mismatches
cheers
stuart
--
...let us be heard from red core to black sky
On Tue, 5 May 2026 at 07:40, Michael Monaco <
[log in to unmask]> wrote:
> As Kevin mentioned, there are in fact many possible patterns for names to
> appear in, so it's probably not possible to un-invert all the names in the
> NAF with a single RegEx.
>
> You mention that you've downloaded the records in bulk -- what format are
> the records in? Could you provide some examples?
>
> Thanks,
>
> Mike Monaco
> Head, Technical Services & Coordinator, Cataloging Services
> Associate Professor of Bibliography
> University Libraries Technical Services
> 261B Bierce Library
> The University of Akron
> Akron, Ohio 44325-1712
> He/him/his
> Office: 330-972-2446
> [log in to unmask]
> ORCID: 0000-0001-7244-5154
> https://www.uakron.edu/libraries
>
>
> -----Original Message-----
> From: Code for Libraries <[log in to unmask]> On Behalf Of Stuart A.
> Yeates
> Sent: Monday, May 4, 2026 3:07 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] Regexp for rewriting LoC LCCN authorised personal
> names
>
> CAUTION:This email originated from outside of The University of Akron. Use
> caution when opening attachments, clicking links or responding to requests
> for information.
>
>
>
> As it happens, I have already downloaded the records in bulk. What I need
> is a regexp to parse the "quoted text"
>
> cheers
> stuart
>
> --
> ...let us be heard from red core to black sky
>
>
> On Tue, 5 May 2026 at 06:33, Trail, Nate <[log in to unmask]> wrote:
>
> > Stuart,
> >
> > You could download the entire Names file in "nt" serialization, then
> > there's one line for each name you can filter on:
> >
> >
> > <http://id.l/
> > oc.gov%2Fauthorities%2Fnames%2Fnr2001046558&data=05%7C02%7Cmmonaco%
> 40UAKRON.EDU%7C65c1a7fc4f6d48f5610608deaa106e9e%7Ce8575dedd7f94ecea4aa0b32991aeedd%7C0%7C0%7C639135184716106736%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XITloQ5ZybEL5qrdAojXpx%2FZ21wedG6%2BA%2BO%2B1ix4cok%3D&reserved=0>
> < http://www.loc.gov/mads/rdf/v1#authoritativeLabel> "Smith, Jim, 1940
> October 17-" .
> >
> > Then you can do what you want with the quoted text.
> >
> > Saves bandwidth for you and us.
> >
> > https://id.l/
> > oc.gov%2Fdownload%2F&data=05%7C02%7Cmmonaco%40UAKRON.EDU%7C65c1a7fc4f6
> > d48f5610608deaa106e9e%7Ce8575dedd7f94ecea4aa0b32991aeedd%7C0%7C0%7C639
> > 135184716159980%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOi
> > IwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%
> > 7C%7C&sdata=T7OhOWgr1s4TxHLYmtL5hgQR7rNT3rcLIT5LfjFSvoA%3D&reserved=0
> >
> > Good luck,
> >
> > Nate
> >
> >
> > -----------------------------------------
> > Nate Trail
> > Network Development & MARC Standards Office LCSG/DPS/ABA/NDMSO Library
> > of Congress Washington DC 20540
> >
> >
> > -----Original Message-----
> > From: Code for Libraries <[log in to unmask]> On Behalf Of Kevin
> > Hawkins
> > Sent: Monday, May 04, 2026 2:08 PM
> > To: [log in to unmask]
> > Subject: Re: [CODE4LIB] Regexp for rewriting LoC LCCN authorised
> > personal names
> >
> > CAUTION: This email message has been received from an external source.
> > Please use caution when opening attachments, or clicking on links.
> >
> > Hello Stuart,
> >
> > Do you mean that you want to convert LCNAF personal names from this
> > sort of order:
> >
> > Mudge, Lewis Seymour, 1868-1945
> >
> > to something like this:
> >
> > Lewis Seymour Mudge, 1868-1945
> >
> > ? But then also deal with authorized forms containing no commas,
> > forms with more than two commas, and occasional use of parentheses.
> > So, as you know, it gets complicated.
> >
> > I wonder if a different approach might make more sense here:
> >
> > 1. Query the inverted LCNAF form at
> > https://id.l/
> > oc.gov%2F&data=05%7C02%7Cmmonaco%40UAKRON.EDU%7C65c1a7fc4f6d48f5610608
> > deaa106e9e%7Ce8575dedd7f94ecea4aa0b32991aeedd%7C0%7C0%7C63913518471617
> > 8598%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwM
> > CIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata
> > =FkP48ZXE11h7Qq1kXsl9JK%2FBhQvnswsYpC8rPoPGgYg%3D&reserved=0
> >
> > 2. Retrieve the URI, extracting the identifier (beginning with "n")
> >
> > 3. Query Wikidata using this identifier.
> >
> > 4. Retrieve Wikidata's form of the name, which is not inverted.
> >
> > --Kevin
> >
> > On 5/3/26 1:25 PM, Stuart A. Yeates wrote:
> > > Does anyone know of somewhere that describes LCCN authorised
> > > personal names as regexps? I want to be able to rewrite them at scale
> to 'normal'
> > order.
> > >
> > > AI appears to be actively undermining the functionality of search
> > engines.
> > >
> > > cheers
> > > stuart
> > > --
> > > ...let us be heard from red core to black sky
> >
>
|