This is amazing!
Maybe a github repo for config blocks is in order? I figure the only way
to work out the myriad kinks in this would be scale.
On Wed, Jan 29, 2014 at 6:00 PM, Andrew Anderson <[log in to unmask]> wrote:
> When OCLC first announced their purchase of EZproxy, we started a low
> priority research project to see what the alternatives were a few years
> ago, and what it would take to bring them into a production ready state.
> The two open source solutions we evaluated were Squid and Apache HTTPd.
> We considered other options (e.g. Apache Traffic Server), but limited the
> research to these two pieces of software since they are already widely used
> and familiar to most system administrators.
> Long story short, Squid did not support URL rewriting in a way that we
> felt would be able to be supported well, between requiring patches to the
> core C++ server code, or an external rewriting processes, or an ICAP server
> implementation. Some of that has improved a bit since the original
> evaluation, but the built-in support for URL rewriting may still need some
> time to mature. Another aspect of Squid that did not seem to be a good fit
> was that it is somewhat limited in its authentication mechanisms vs Apache
> So we moved on to evaluating Apache HTTPd with the mod_proxy family of
> modules. While Apache HTTPd does not support the advanced cache federation
> features as Squid, it has grown to be a robust proxy solution in its own
> right, and the 2.4 release appears to have all of the required pieces out
> of the box, with the mod_proxy_html module functionality. In addition to
> basic URL rewriting support, you get full HTTP protocol support, mature
> IPv6 support, GZIP support, just about any authentication mechanism you
> need, a server that you can self-host content with easily, as well as a
> built-in HTTP object cache.
> How would it work?
> Here's the current EZproxy stanza for ProQuest:
> HTTPHeader X-Requested-With
> HTTPHeader Accept-Encoding
> Title ProQuest
> URL http://search.proquest.com/ip
> DJ proquest.com
> HJ gateway.proquest.com
> DJ umi.com
> HJ fedsearch.proquest.com
> HJ literature.proquest.com
> DJ conquest-leg-insight.com
> DJ conquestsystems.com
> DJ m.search.proquest.com
> DJ media.proquest.com
> NeverProxy order.proquest.com
> NeverProxy rss.proquest.com
> Here's an Apache HTTPd configuration using ProQuest that accomplishes much
> of the same functionality for the main search.proquest.com interface:
> <VirtualHost _default_:80>
> ServerName search.proquest.com.fqdn
> ProxyRequests Off
> ProxyVia On
> RewriteEngine On
> RewriteRule ^/(.*) http://search.proquest.com/$1 [P]
> <Location "/">
> AllowMethods GET POST OPTIONS
> ProxyPassReverse http://search.proquest.com/
> ProxyPassReverseCookieDomain search.proquest.comsearch.proquest.com.fqdn
> CacheEnable disk
> SetOutputFilter INFLATE;DEFLATE
> Header Append Vary User-Agent env=!dont-vary
> # Put Authentication directives here
> ErrorDocument 401 /path/to/login
> Require Valid-User
> A few notes on this:
> - There is no need for NeverProxy: if you do not define a VirtualHost for
> the hostname, it is not proxied. So instead of HJ and DJ lines, you add a
> new VirtualHost block for each hostname that needs to be proxied. The
> astute will ask "what about services that have dozens or hundreds of host
> entries, like Sage?" Those can be handled by the ProxyExpress features in
> Apache HTTPd.
> - There is no need for HTTPHeader: since Apache HTTPd is a full HTTP
> proxy/server, it supports all HTTP headers natively.
> - Some of the hostnames that are in EZproxy stanzas are not needed, and
> some are legacy hostnames that are no longer used by the vendor
> - Some of the hostnames that are in EZproxy stanzas are for CDN hosted
> assets that make up the vendor's user interface). Another example: how
> many of you have "DJ google.com" in one of your stanzas? Now how many of
> you registered your IP addresses with Google in any way? Outside of Google
> Scholar, I suspect the answer to those questions are "nearly everyone" and
> "nearly no one", respectively.
> - Some of the hostnames are for things that no sane person would do: How
> many people run their discovery services through their EZproxy server vs.
> authenticating their discovery platform by IP address with vendors directly?
> - Something that this configuration does that EZproxy does not do is
> enable object caching. This can easily save 30-50% of your upstream
> bandwidth usage (Proxy/ProxySSL in EZproxy can achieve the same result with
> an external caching proxy server).
> - More complex vendor platforms (e.g. Gale Cengage) need ProxyHTML
> directives and ProxyHTMLURLMap configured, and multiple VirtualHost
> sections to get them fully working. These can be a little fun to get
> working initially.
> - Some services need redirects edited to work correctly, and not break out
> of the proxy:
> Header edit Location http://vendor/ http://vendor.fqdn/
> - Some vendors send wrong HTTP headers for the MIME type, and
> mod_proxy_html exposes this in some cases as it rewrites the page. There
> may be a better way to do this, but this is what I threw together for
> <Location "/badpath">
> ProxyHTMLEnable Off
> SetOutputFilter INFLATE;dummy-html-to-plain
> ExtFilterOptions LogStdErr Onfail=remove
> ExtFilterDefine dummy-html-to-plain mode=output intype=text/html
> outtype=text/plain cmd="/bin/cat -"
> So what's currently missing in the Apache HTTPd solution?
> - Services that use an authentication token (predominantly ebook vendors)
> need special support written. I have been entertaining using mod_lua for
> this to make this support relatively easy for someone who is not hard-core
> technical to maintain.
> - Services that are not IP authenticated, but use one of the Form-based
> authentication variants. I suspect that an approach that injects a script
> fill/submission might be a sane approach here. This should also cleanly
> deal with the ASP.net abominations that use __PAGESTATE to store sessions
> client-side instead of server-side.
> - EZproxy's built-in DNS server (enabled with the "DNS" directive) would
> need to be handled using a separate DNS server (there are several options
> to choose from).
> - In this setup, standard systems-level management and reporting tools
> would be used instead of the /admin interface in EZproxy
> - In this setup, the functionality of the EZproxy /menu URL would need to
> be handled externally. This may not be a real issue, as many academic
> sites already use LMS or portal systems instead of the EZproxy to direct
> students to resources, so this feature may not be as critical to replicate.
> - And of course, extensive testing. While the above ProQuest stanza works
> for the main ProQuest search interface, it won't work for everyone,
> everywhere just yet.
> Bottom line: Yes, Apache HTTPd is a viable EZproxy alternative if you have
> a system administrator who knows their way around Apache HTTPd, and are
> willing to spend some time getting to know your vendor services intimately.
> All of this testing was done on Fedora 19 for the 2.4 version of HTTPd,
> which should be available in RHEL7/CentOS7 soon, so about the time that
> hard decisions are to be made regarding EZproxy vs something else, that
> something else may very well be Apache HTTPd with vendor-specific
> configuration files.
> Andrew Anderson, Director of Development, Library and Information
> Resources Network, Inc.
> http://www.lirn.net/ | http://www.twitter.com/LIRNnotes |
> On Jan 29, 2014, at 14:42, Margo Duncan <[log in to unmask]> wrote:
> > Would you *have* to be hosted? We're in a rural part of the USA and
> network connections from here to anywhere aren't great, so we try to host
> most everything we can. EZProxy really is "EZ" to host yourself.
> > Margo
> > -----Original Message-----
> > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> stuart yeates
> > Sent: Wednesday, January 29, 2014 1:40 PM
> > To: [log in to unmask]
> > Subject: Re: [CODE4LIB] EZProxy changes / alternatives ?
> > The text I've seen talks about "[e]xpanded reporting capabilities to
> support management decisions" in forthcoming versions and encourages
> towards the hosted solution.
> > Since we're in .nz, they'd put our hosted proxy server in .au, but the
> network connection between .nz and .au is via the continental .us, which
> puts an extra trans-pacific network loop in 99% of our proxied network
> > cheers
> > stuart
> > On 30/01/14 03:14, Ingraham Dwyer, Andy wrote:
> >> OCLC announced in April 2013 the changes in their license model for
> North America. EZProxy's license moves from requiring a one-time purchase
> of US$495 to a *annual* fee of $495, or through their hosted service, with
> the fee depending on scale of service. The old one-time purchase license
> is no longer offered for sale as of July 1, 2013. I don't have any details
> about pricing for other parts of the world.
> >> An important thing to recognize here, is that they cannot legally
> change the terms of a license that is already in effect. The software you
> have purchased under the old license is still yours to use, indefinitely.
> OCLC has even released several maintenance updates during 2013 that are
> available to current license-holders. In fact, they released V5.7 in early
> January 2014, and made that available to all license-holders. However, all
> updates after that version are only available to holders of the yearly
> subscription. The hosted product is updated to the most current version
> >> My recommendation is: If your installation of EZProxy works, don't
> change it. Yet. Upgrade your installation to the last version available
> under the old license, and use that for as long as you can. At this point,
> there are no world-changing new features that have been added to the
> product. There is speculation that IPv6 support will be the next big
> feature-add, but I haven't heard anything official. Start planning and
> budgeting for a change, either to the yearly fee, or the cost of hosted, or
> to some as-yet-undetermined alternative. But I see no need to start paying
> now for updates you don't need.
> >> -Andy
> >> Andy Ingraham Dwyer
> >> Infrastructure Specialist
> >> State Library of Ohio
> >> 274 E. 1st Avenue
> >> Columbus, OH 43201
> >> library.ohio.gov
> >> -----Original Message-----
> >> From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> >> Of stuart yeates
> >> Sent: Tuesday, January 28, 2014 10:03 PM
> >> To: [log in to unmask]
> >> Subject: Re: [CODE4LIB] EZProxy changes / alternatives ?
> >> I probably should have been more specific.
> >> Does anyone have experience switching from EzProxy to anything else?
> >> Is anyone else aware of the coming OCLC changes and considering
> >> Does anyone have a worked example like: "My EzProxy config for site Y
> looked like A; after the switch, my X config for site Z looked like B"?
> >> I'm aware of this good article:
> >> http://journal.code4lib.org/articles/7470
> >> cheers
> >> stuart
> >> On 29/01/14 15:24, stuart yeates wrote:
> >>> We've just received notification of forth-coming changes to EZProxy,
> >>> which will require us to pay an arm and a leg for future versions to
> >>> install locally and/or host with OCLC AU with a ~ 10,000km round trip.
> >>> What are the alternatives?
> >>> cheers
> >>> stuart
> >> --
> >> Stuart Yeates
> >> Library Technology Services http://www.victoria.ac.nz/library/
> > --
> > Stuart Yeates
> > Library Technology Services http://www.victoria.ac.nz/library/