I can also say that we've seen a fair amount of this sort of scraping/DDoS, it's been happening since late December. (We've also had one or two incidents in that timeframe of harvesting coming from single IPs, which of course are easier to deal with.) We're a FOLIO shop running VuFind locally, and have also seen similar scraping/DDoS against our image database.
We have also implemented Cloudflare's Turnstile to good effect. We are also exploring some Web Application Firewall options, in case things evolve past the point of where Turnstile is effective.
Best,
-Tod
Tod Olson <[log in to unmask]> (he/him)
Director of Integrated Library Systems
University of Chicago Library
Local Host Committee, Open Repositories 2025<https://or2025.openrepositories.org>
[Image.png]
On Mar 26, 2025, at 6:56 AM, Esmé Cowles <[log in to unmask]> wrote:
Eric-
We have seen a lot of bot traffic in the last few weeks, and we are a Clarivate (Alma) shop, though our discovery layer is Blacklight. Something we've noticed as we've tried to block the bot traffic, is that the spikes of bot activity that have been DOSing us for many months now is only part of the picture, and we actually have a very high baseline level of bot activity at all times. So much so that we're reconsidering our analytics picture because so much of our recent historical traffic is undetected bots (e.g., in one report China represented about 90% of our traffic).
We've also heard of similar levels of problems from digital collections and other kinds of sites (e.g., SourceHut https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/). So my general impression is that this isn't targeted at one technology stack or libraries, but is basically everybody with any content on the internet.
The thing we've implemented recently, which is the first thing that's been really successful is using Turnstile. Jonathan Rochind wrote up this approach:
https://bibwild.wordpress.com/2025/01/16/using-cloudflare-turnstile-to-protect-certain-pages-on-a-rails-app/
And we adapted that to our setup using Traefik:
https://github.com/pulibrary/princeton_ansible/tree/main/nomad/traefik-wall
There has been a fair amount of discussion of this on the Code4Lib and Samvera Slack workspaces (in the #bots channel in each), so I'd encourage anyone who's battling this to check those out.
-Esmé
--
Esmé Cowles <[log in to unmask]>
Asst. Director, Library Software Engineering
Princeton University Library
On Mar 26, 2025, at 7:26 AM, Eric Blevins <[log in to unmask]> wrote:
Good morning,
First time posting to Code4Lib, but have been a watcher for several years. I'm curious from strictly a numbers standpoint how many libraries might've been impacted recently (say the last couple of weeks or so) by massive bot harvesting of data, basically resulting in a DDoS attack, against your ILS, Discovery Layers, or other systems. I'm actually also curious if non-Innovative/Clarivate product libraries are seeing similar issues. We are an innovative/Clarivate product shop, so we have some awareness that others with those products were impacted. Again, aside from curiosity if you're a non-Clarivate shop, I'm not looking for specifics just wondering about the scope of the attacks against other institutions/orgs.
Regards,
Eric C. Blevins
Sr. Manager of Library Technology
RIT Libraries
Rochester Institute of Technology
Email: [log in to unmask]<mailto:[log in to unmask]>
|