There are essentially three kinds of web thing that we, as libraries,
host:
(a) mass-produced content surfaced in our LMSs (digital analogues of
'printed content'). We can / should use strong blocks around this content,
because it's not our responsibility to share, preserve, or manage (except
for our users).
(b) unique holdings (digital analogues of 'archival content'). In most
cases this should be held in robustly-cached software systems. Systems like
OJS and DSpace have been tuned over periods of decades to serve their
primary content only fractionally slower than it can be read off the flat
files on the desk, even under extreme load. We should be using policies
grounded in our local knowledge of this local content to decide whether the
content should be accessible to the public, to bots, etc.
(c) authentication systems (ADFS, OpenAthens, etc) for accessing content
elsewhere. These systems contain personal information. Bots, crawlers and
AI-nonsense should be kept well clear. Fortunately these systems typically
have very small surfaces exposed to unauthenticated users, for security
reasons.
In my experience most issues arise from conflation between these three.
Personally, I've been pondering the idea of redirecting obvious crawlers to
a recent https://archive-it.org/ -generated WARCs of our OJS / DSpace and
tar-pitting the rest.
cheers
stuart
--
...let us be heard from red core to black sky
On Thu, 27 Mar 2025 at 00:26, Eric Blevins <
[log in to unmask]> wrote:
> Good morning,
>
> First time posting to Code4Lib, but have been a watcher for several years.
> I'm curious from strictly a numbers standpoint how many libraries might've
> been impacted recently (say the last couple of weeks or so) by massive bot
> harvesting of data, basically resulting in a DDoS attack, against your ILS,
> Discovery Layers, or other systems. I'm actually also curious if
> non-Innovative/Clarivate product libraries are seeing similar issues. We
> are an innovative/Clarivate product shop, so we have some awareness that
> others with those products were impacted. Again, aside from curiosity if
> you're a non-Clarivate shop, I'm not looking for specifics just wondering
> about the scope of the attacks against other institutions/orgs.
>
> Regards,
>
> Eric C. Blevins
> Sr. Manager of Library Technology
> RIT Libraries
> Rochester Institute of Technology
> Email: [log in to unmask]<mailto:[log in to unmask]>
>
|