We've also been seeing some traffic from inconsiderate AI bots. One of my colleagues came across this site, which tracks and documents AI bots: https://darkvisitors.com/ -- Scott -- Scott Prater Digital Library Architect UW Digital Collections Center University of Wisconsin - Madison ________________________________________ From: Code for Libraries <[log in to unmask]> on behalf of Lolis, John <[log in to unmask]> Sent: Wednesday, April 10, 2024 12:15 PM To: [log in to unmask] Subject: Re: [CODE4LIB] blocking GPTBot? This *sounds* as if it should help: https://urldefense.com/v3/__https://searchengineland.com/google-extended-crawler-432636__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPPtfncyM$ John Lolis Coordinator of Computer Systems 100 Martine Avenue White Plains, NY 10601 tel: 1.914.422.1497 fax: 1.914.422.1452 https://urldefense.com/v3/__https://whiteplainslibrary.org/__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPwb7-RSk$ *“I would rather have questions that can’t be answered than answers that can’t be questioned.”* — Richard Feynman <https://urldefense.com/v3/__https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtP3X91XJ0$ >, theoretical physicist and recipient of the Nobel Prize in Physics in 1965 On Mon, 8 Apr 2024 at 16:31, Jason Casden <[log in to unmask]> wrote: > Thanks for bringing this up, Eben. We've been having a horrible time with > these bots, including those from previously fairly well-behaved sources > like Google. They've caused issues ranging from slow response times and > high system load all the way up to outages for some older systems. So far, > our systems folks have been playing whack-a-mole with a combination of IP > range blocks and increasingly detailed robots.txt statements. A group is > being convened to investigate more comprehensive options so I will be > watching this thread closely. > > Jason > > On Mon, Apr 8, 2024 at 4:18 PM Eben English <[log in to unmask]> > wrote: > > > Hi all, > > > > I'm wondering if other folks are seeing AI and/or ML-related crawlers > like > > GPTBot accessing your library's website, catalog, digital collections, or > > other sites. > > > > If so, are you blocking or disallowing these crawlers? Has anyone come up > > with any policies around this? > > > > We're debating whether to allow these types of bots to crawl our digital > > collections, many of which contain large amounts of copyrighted or "no > > derivatives"-licensed materials. On one hand, these materials are > available > > for public view, but on the other hand the type of use that GPTBot and > the > > like are after (integrating the content into their models) could be > > characterized as creating a derivative work, which is expressly > > discouraged. > > > > Thanks, > > > > Eben English (he/him/his) > > Digital Repository Services Manager > > Boston Public Library > > >