The URBE Consortium is planning to apply fail2ban rules to detect and block high rate accesses.
sb
--
Dott. Stefano Bargioni
Pontificia Universita' della Santa Croce - Roma
Vicedirettore della Biblioteca
<mailto:[log in to unmask]> <http://www.pusc.it>
--- "Non refert quam multos habeas libros, sed bonos" (Seneca) ---
> On 9 Apr 2024, at 00:15, Bruce Orcutt <[log in to unmask]> wrote:
>
> also following as also been find some crawlers being less and less behaved, ignoring robots.txt, scanning fast enough to impact performance, etc. used to just be a handful of badly behaving bots but definitely growing of late.
>
> Bruce Orcutt
> UTSA Libraries: Systems
> (210) 458- 6192
> ________________________________
> From: Code for Libraries <[log in to unmask]> on behalf of Jason Casden <[log in to unmask]>
> Sent: Monday, April 8, 2024 3:30:14 PM
> To: [log in to unmask] <[log in to unmask]>
> Subject: [EXTERNAL] Re: [CODE4LIB] blocking GPTBot?
>
> **EXTERNAL EMAIL**
> This email originated outside of The University of Texas at San Antonio.
> Please exercise caution when clicking on links or opening attachments.
>
>
>
> Thanks for bringing this up, Eben. We've been having a horrible time with
> these bots, including those from previously fairly well-behaved sources
> like Google. They've caused issues ranging from slow response times and
> high system load all the way up to outages for some older systems. So far,
> our systems folks have been playing whack-a-mole with a combination of IP
> range blocks and increasingly detailed robots.txt statements. A group is
> being convened to investigate more comprehensive options so I will be
> watching this thread closely.
>
> Jason
>
> On Mon, Apr 8, 2024 at 4:18 PM Eben English <[log in to unmask]> wrote:
>
>> Hi all,
>>
>> I'm wondering if other folks are seeing AI and/or ML-related crawlers like
>> GPTBot accessing your library's website, catalog, digital collections, or
>> other sites.
>>
>> If so, are you blocking or disallowing these crawlers? Has anyone come up
>> with any policies around this?
>>
>> We're debating whether to allow these types of bots to crawl our digital
>> collections, many of which contain large amounts of copyrighted or "no
>> derivatives"-licensed materials. On one hand, these materials are available
>> for public view, but on the other hand the type of use that GPTBot and the
>> like are after (integrating the content into their models) could be
>> characterized as creating a derivative work, which is expressly
>> discouraged.
>>
>> Thanks,
>>
>> Eben English (he/him/his)
>> Digital Repository Services Manager
>> Boston Public Library
>>
|