Print

Print


also following as also been find some crawlers  being less and less behaved, ignoring robots.txt, scanning fast enough to impact performance, etc.  used to just be a handful of badly behaving bots but definitely growing of late.

Bruce Orcutt
UTSA Libraries: Systems
(210) 458- 6192
________________________________
From: Code for Libraries <[log in to unmask]> on behalf of Jason Casden <[log in to unmask]>
Sent: Monday, April 8, 2024 3:30:14 PM
To: [log in to unmask] <[log in to unmask]>
Subject: [EXTERNAL] Re: [CODE4LIB] blocking GPTBot?

  **EXTERNAL EMAIL**
  This email originated outside of The University of Texas at San Antonio.
  Please exercise caution when clicking on links or opening attachments.



Thanks for bringing this up, Eben. We've been having a horrible time with
these bots, including those from previously fairly well-behaved sources
like Google. They've caused issues ranging from slow response times and
high system load all the way up to outages for some older systems. So far,
our systems folks have been playing whack-a-mole with a combination of IP
range blocks and increasingly detailed robots.txt statements. A group is
being convened to investigate more comprehensive options so I will be
watching this thread closely.

Jason

On Mon, Apr 8, 2024 at 4:18 PM Eben English <[log in to unmask]> wrote:

> Hi all,
>
> I'm wondering if other folks are seeing AI and/or ML-related crawlers like
> GPTBot accessing your library's website, catalog, digital collections, or
> other sites.
>
> If so, are you blocking or disallowing these crawlers? Has anyone come up
> with any policies around this?
>
> We're debating whether to allow these types of bots to crawl our digital
> collections, many of which contain large amounts of copyrighted or "no
> derivatives"-licensed materials. On one hand, these materials are available
> for public view, but on the other hand the type of use that GPTBot and the
> like are after (integrating the content into their models) could be
> characterized as creating a derivative work, which is expressly
> discouraged.
>
> Thanks,
>
> Eben English (he/him/his)
> Digital Repository Services Manager
> Boston Public Library
>