The URBE Consortium is planning to apply fail2ban rules to detect and block high rate accesses. sb -- Dott. Stefano Bargioni Pontificia Universita' della Santa Croce - Roma Vicedirettore della Biblioteca <mailto:[log in to unmask]> <http://www.pusc.it> --- "Non refert quam multos habeas libros, sed bonos" (Seneca) --- > On 9 Apr 2024, at 00:15, Bruce Orcutt <[log in to unmask]> wrote: > > also following as also been find some crawlers being less and less behaved, ignoring robots.txt, scanning fast enough to impact performance, etc. used to just be a handful of badly behaving bots but definitely growing of late. > > Bruce Orcutt > UTSA Libraries: Systems > (210) 458- 6192 > ________________________________ > From: Code for Libraries <[log in to unmask]> on behalf of Jason Casden <[log in to unmask]> > Sent: Monday, April 8, 2024 3:30:14 PM > To: [log in to unmask] <[log in to unmask]> > Subject: [EXTERNAL] Re: [CODE4LIB] blocking GPTBot? > > **EXTERNAL EMAIL** > This email originated outside of The University of Texas at San Antonio. > Please exercise caution when clicking on links or opening attachments. > > > > Thanks for bringing this up, Eben. We've been having a horrible time with > these bots, including those from previously fairly well-behaved sources > like Google. They've caused issues ranging from slow response times and > high system load all the way up to outages for some older systems. So far, > our systems folks have been playing whack-a-mole with a combination of IP > range blocks and increasingly detailed robots.txt statements. A group is > being convened to investigate more comprehensive options so I will be > watching this thread closely. > > Jason > > On Mon, Apr 8, 2024 at 4:18 PM Eben English <[log in to unmask]> wrote: > >> Hi all, >> >> I'm wondering if other folks are seeing AI and/or ML-related crawlers like >> GPTBot accessing your library's website, catalog, digital collections, or >> other sites. >> >> If so, are you blocking or disallowing these crawlers? Has anyone come up >> with any policies around this? >> >> We're debating whether to allow these types of bots to crawl our digital >> collections, many of which contain large amounts of copyrighted or "no >> derivatives"-licensed materials. On one hand, these materials are available >> for public view, but on the other hand the type of use that GPTBot and the >> like are after (integrating the content into their models) could be >> characterized as creating a derivative work, which is expressly >> discouraged. >> >> Thanks, >> >> Eben English (he/him/his) >> Digital Repository Services Manager >> Boston Public Library >>