Thanks for bringing this up, Eben. We've been having a horrible time with
these bots, including those from previously fairly well-behaved sources
like Google. They've caused issues ranging from slow response times and
high system load all the way up to outages for some older systems. So far,
our systems folks have been playing whack-a-mole with a combination of IP
range blocks and increasingly detailed robots.txt statements. A group is
being convened to investigate more comprehensive options so I will be
watching this thread closely.
Jason
On Mon, Apr 8, 2024 at 4:18 PM Eben English <[log in to unmask]> wrote:
> Hi all,
>
> I'm wondering if other folks are seeing AI and/or ML-related crawlers like
> GPTBot accessing your library's website, catalog, digital collections, or
> other sites.
>
> If so, are you blocking or disallowing these crawlers? Has anyone come up
> with any policies around this?
>
> We're debating whether to allow these types of bots to crawl our digital
> collections, many of which contain large amounts of copyrighted or "no
> derivatives"-licensed materials. On one hand, these materials are available
> for public view, but on the other hand the type of use that GPTBot and the
> like are after (integrating the content into their models) could be
> characterized as creating a derivative work, which is expressly
> discouraged.
>
> Thanks,
>
> Eben English (he/him/his)
> Digital Repository Services Manager
> Boston Public Library
>
|