Personally, I'd be tempted to go the IP lockout route myself since the
patterns should be clear in the logs, but be aware that # megabytes gives a
reasonable level of control because you can set to log rather than lock
out. I think the risk of locking legitimate users is low. Although people
can download mixed materials, my guess is that your abusing accounts are
not watching loads of video.
There are things you can do with user names that would make it easy enough
to uncover abuse without unduly compromising privacy. For example, you
could flush your logs frequently while extracting the number of downloads
you're interested from individual users. Abuse accounts will be immediately
obvious. BTW, you can do some funky things with EZP that include
conditional logic, regexp searches, and rewriting that might be helpful.
Any path you take will protect user privacy far more than just about any
other site they visit. Plus, whoever maintains your network will
occasionally need to monitor specific computers to mitigate a wide variety
of problems. Systems used as a platform for abusive behavior, harassment,
or activity that causes harm to others get locked out and/or blacklisted
which will really hose your users. Getting that kind of thing cleared up
takes time because most places aren't nearly as forgiving as libraries.
kyle
On Wed, Nov 19, 2014 at 8:47 PM, Dan Scott <[log in to unmask]> wrote:
> On Wed, Nov 19, 2014 at 4:06 PM, Kyle Banerjee <[log in to unmask]>
> wrote:
>
> > There are a number of technical approaches that could be used to identify
> > which accounts have been compromised.
> >
> > But it's easier to just make the problem go away by setting usage limits
> so
> > EZP locks the account out after it downloads too much.
> >
>
> But EZProxy still doesn't let you set limits based on the type of download.
> You therefore have two very blunt sledge hammers with UsageLimit:
>
> - # of downloads (-transfers)
> - # of megabytes downloaded (-MB)
>
> # of downloads is effectively useless because many of our electronic
> resource platforms (hi Proquest and EBSCOHost) make between 50 and 150
> requests for JavaScript, CSS, and images per page, so you have to set your
> thresholds incredibly high to avoid locking out users who might be actively
> paging through search results. Any savvy abuser will just script their
> requests to avoid all of the JS/CSS/images to derive a list of PDFs, and
> then download just the PDFs, thereby staying well under the usage limits
> that legit users require... and I've seen exactly that happen through our
> proxy.
>
> # of megabytes downloaded is a pretty blunt tool as well, given that our
> multimedia-enriched databases now often serve up video and audio as well as
> HTML, images, and PDF files. For the pure audio and video streaming sites
> such as Naxos or Curio, you can set higher limits; but as vendors
> increasingly enrich their databases with audio and video, you're going to
> have to increase your general limits as well... and you can pull down a ton
> of PDFs under that cover.
>
> So no, I don't think it's easy to make the problem go away through the
> suggested approach, unless you're willing to err on the side of locking out
> legitimate users.
>
|