Print

Print


We've also been seeing some traffic from inconsiderate AI bots.

One of my colleagues came across this site, which tracks and documents AI bots:

https://darkvisitors.com/

-- Scott

-- 
Scott Prater
Digital Library Architect
UW Digital Collections Center
University of Wisconsin - Madison



________________________________________
From: Code for Libraries <[log in to unmask]> on behalf of Lolis, John <[log in to unmask]>
Sent: Wednesday, April 10, 2024 12:15 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] blocking GPTBot?

This *sounds* as if it should help:
https://urldefense.com/v3/__https://searchengineland.com/google-extended-crawler-432636__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPPtfncyM$

John Lolis
Coordinator of Computer Systems

100 Martine Avenue
White Plains, NY  10601
tel: 1.914.422.1497
fax: 1.914.422.1452

https://urldefense.com/v3/__https://whiteplainslibrary.org/__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtPwb7-RSk$

*“I would rather have questions that can’t be answered than answers that
can’t be questioned.”*
— Richard Feynman
<https://urldefense.com/v3/__https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu__;!!Mak6IKo!Pm6vbeyDLkzwxaEhcIBmaI0pK1d7U0GtguiIAgWmNzfNyOyR1m3n9iypyhqwZH3QxxIfNMIETf94S2_ioTtP3X91XJ0$ >,
theoretical physicist and recipient of the Nobel Prize in Physics in 1965


On Mon, 8 Apr 2024 at 16:31, Jason Casden <[log in to unmask]> wrote:

> Thanks for bringing this up, Eben. We've been having a horrible time with
> these bots, including those from previously fairly well-behaved sources
> like Google. They've caused issues ranging from slow response times and
> high system load all the way up to outages for some older systems. So far,
> our systems folks have been playing whack-a-mole with a combination of IP
> range blocks and increasingly detailed robots.txt statements. A group is
> being convened to investigate more comprehensive options so I will be
> watching this thread closely.
>
> Jason
>
> On Mon, Apr 8, 2024 at 4:18 PM Eben English <[log in to unmask]>
> wrote:
>
> > Hi all,
> >
> > I'm wondering if other folks are seeing AI and/or ML-related crawlers
> like
> > GPTBot accessing your library's website, catalog, digital collections, or
> > other sites.
> >
> > If so, are you blocking or disallowing these crawlers? Has anyone come up
> > with any policies around this?
> >
> > We're debating whether to allow these types of bots to crawl our digital
> > collections, many of which contain large amounts of copyrighted or "no
> > derivatives"-licensed materials. On one hand, these materials are
> available
> > for public view, but on the other hand the type of use that GPTBot and
> the
> > like are after (integrating the content into their models) could be
> > characterized as creating a derivative work, which is expressly
> > discouraged.
> >
> > Thanks,
> >
> > Eben English (he/him/his)
> > Digital Repository Services Manager
> > Boston Public Library
> >
>