> On Apr 30, 2026, at 2:36 PM, Lucky, Shannon <[log in to unmask]> wrote:
>
> Hi all,
>
> I am curious what methods folks are using to deal with aggressive AI harvesting on websites - particularly digital project sites. Many of our servers are being hammered with traffic that impacts our service delivery and the methods we have been using cannot keep up.
>
> Specifically I am wondering who is using services like Cloudflare or implementing OS solutions like Anubis, or are you using something else? I'm gathering information about what services or methods are being using at academic libraries hosting DH/digital projects so we can look at investing in some kind of service or process solution.
>
> What are you using? Are you happy with it? What kinds of costs are associated?)
Back when I used to manage web servers, I would add IPTables rules to limit the number of connections an individual IP or block of IPs was allowed to make to our server at the same time:
-A INPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN -m connlimit --connlimit-above 5 --connlimit-mask 32 -j REJECT --reject-with tcp-reset
-A INPUT -p tcp -m tcp --dport 80 --tcp-flags FIN,SYN,RST,ACK SYN -m connlimit --connlimit-above 21 --connlimit-mask 24 -j REJECT --reject-with tcp-reset
This would keep a single IP address only allowed to open 4 HTTP connections at once, and a class C range (block of 256 IP addresses) to 20 connections . (you'd most likely want to change it to dport=443 these days)
I also did a lot of denying of specific IPs and UserAgent strings, but that doesn't tend to work for this type of malicious user for very long.
If you can add similar rules at your router, you can limit their ability to spray attempts across servers... but you really need to collect some statistics first to see what's "normal" traffic for your servers so you don't cause problems for legitimate traffic and set the limits too low. (4 connections per IP is probably okay... but the limits per IP block might need to be adjusted, or make sure that local traffic is allowed before that rule)
(it also wouldn't block folks using IPv6)
-Joe
|