RIP ThingISBN ?
Just incredible the damage I'm seeing around the internet. all sorts of good things have stopped working.
Eric
> On Mar 26, 2025, at 10:25 AM, Tim Spalding <[log in to unmask]> wrote:
>
> Not a library, but we run several library products and have several bookish
> websites with many millions of pages.
>
> * We've seen an overall rise in scraping over the last two years. We and
> others attribute the rise to bots scraping for LLM development.
> * We have anti-LLM stuff in our robots.txt, but it doesn't matter. The
> problem is the bad actors.
> * We put ourselves by Cloudflare several years ago after a multi-day DDoS
> attack—a real one, with actual extortion demands. The rise of AI scraping
> has meant we spend time tweaking our Cloudflare settings. CF is free, but
> we pay for a higher-level of service.
> * Much or most of the traffic is China and Singapore, which where a lot of
> cloud-computing resources are located. On several occasions we'd literally
> shut down all traffic from China, but, alas, we have a big customer in
> Singapore.
> * We reduced our attack surface. In our case this meant killing off our
> many translated language sites (LibraryThing.fr <http://librarything.fr/>, LibraryThing.de <http://librarything.de/>,
> dk.LibraryThing.com <http://dk.librarything.com/>) in favor of having language-pickers on the main site.
> * Cloudflare has specific anti-AI filters, as well as a new "maze" feature
> to lead bots on a merry chase forever.
>
> Tim
>
> On Wed, Mar 26, 2025 at 10:08 AM Tod Olson <[log in to unmask] <mailto:[log in to unmask]>> wrote:
>
>> I can also say that we've seen a fair amount of this sort of
>> scraping/DDoS, it's been happening since late December. (We've also had one
>> or two incidents in that timeframe of harvesting coming from single IPs,
>> which of course are easier to deal with.) We're a FOLIO shop running VuFind
>> locally, and have also seen similar scraping/DDoS against our image
>> database.
>>
>> We have also implemented Cloudflare's Turnstile to good effect. We are
>> also exploring some Web Application Firewall options, in case things evolve
>> past the point of where Turnstile is effective.
>>
>> Best,
>>
>> -Tod
>>
>> Tod Olson <[log in to unmask]> (he/him)
>> Director of Integrated Library Systems
>> University of Chicago Library
>>
>> Local Host Committee, Open Repositories 2025<
>> https://or2025.openrepositories.org>
>> [Image.png]
>>
>> On Mar 26, 2025, at 6:56 AM, Esmé Cowles <[log in to unmask]> wrote:
>>
>> Eric-
>>
>> We have seen a lot of bot traffic in the last few weeks, and we are a
>> Clarivate (Alma) shop, though our discovery layer is Blacklight. Something
>> we've noticed as we've tried to block the bot traffic, is that the spikes
>> of bot activity that have been DOSing us for many months now is only part
>> of the picture, and we actually have a very high baseline level of bot
>> activity at all times. So much so that we're reconsidering our analytics
>> picture because so much of our recent historical traffic is undetected bots
>> (e.g., in one report China represented about 90% of our traffic).
>>
>> We've also heard of similar levels of problems from digital collections
>> and other kinds of sites (e.g., SourceHut
>> https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/). So my general
>> impression is that this isn't targeted at one technology stack or
>> libraries, but is basically everybody with any content on the internet.
>>
>> The thing we've implemented recently, which is the first thing that's been
>> really successful is using Turnstile. Jonathan Rochind wrote up this
>> approach:
>>
>>
>> https://bibwild.wordpress.com/2025/01/16/using-cloudflare-turnstile-to-protect-certain-pages-on-a-rails-app/
>>
>> And we adapted that to our setup using Traefik:
>>
>> https://github.com/pulibrary/princeton_ansible/tree/main/nomad/traefik-wall
>>
>> There has been a fair amount of discussion of this on the Code4Lib and
>> Samvera Slack workspaces (in the #bots channel in each), so I'd encourage
>> anyone who's battling this to check those out.
>>
>> -Esmé
>> --
>> Esmé Cowles <[log in to unmask]>
>> Asst. Director, Library Software Engineering
>> Princeton University Library
>>
>> On Mar 26, 2025, at 7:26 AM, Eric Blevins <
>> [log in to unmask]> wrote:
>>
>> Good morning,
>>
>> First time posting to Code4Lib, but have been a watcher for several years.
>> I'm curious from strictly a numbers standpoint how many libraries might've
>> been impacted recently (say the last couple of weeks or so) by massive bot
>> harvesting of data, basically resulting in a DDoS attack, against your ILS,
>> Discovery Layers, or other systems. I'm actually also curious if
>> non-Innovative/Clarivate product libraries are seeing similar issues. We
>> are an innovative/Clarivate product shop, so we have some awareness that
>> others with those products were impacted. Again, aside from curiosity if
>> you're a non-Clarivate shop, I'm not looking for specifics just wondering
>> about the scope of the attacks against other institutions/orgs.
>>
>> Regards,
>>
>> Eric C. Blevins
>> Sr. Manager of Library Technology
>> RIT Libraries
>> Rochester Institute of Technology
>> Email: [log in to unmask]<mailto:[log in to unmask]>
>>
>>
>
> --
> Check out my library at https://www.librarything.com/profile/timspalding
|