Print

Print


When OCLC first announced their purchase of EZproxy, we started a low priority research project to see what the alternatives were a few years ago, and what it would take to bring them into a production ready state.  The two open source solutions we evaluated were Squid and Apache HTTPd.  We considered other options (e.g. Apache Traffic Server), but limited the research to these two pieces of software since they are already widely used and familiar to most system administrators.

Long story short, Squid did not support URL rewriting in a way that we felt would be able to be supported well, between requiring patches to the core C++ server code, or an external rewriting processes, or an ICAP server implementation.  Some of that has improved a bit since the original evaluation, but the built-in support for URL rewriting may still need some time to mature.  Another aspect of Squid that did not seem to be a good fit was that it is somewhat limited in its authentication mechanisms vs Apache HTTPd.

So we moved on to evaluating Apache HTTPd with the mod_proxy family of modules.  While Apache HTTPd does not support the advanced cache federation features as Squid, it has grown to be a robust proxy solution in its own right, and the 2.4 release appears to have all of the required pieces out of the box, with the mod_proxy_html module functionality.  In addition to basic URL rewriting support, you get full HTTP protocol support, mature IPv6 support, GZIP support, just about any authentication mechanism you need, a server that you can self-host content with easily, as well as a built-in HTTP object cache.

How would it work?  

Here’s the current EZproxy stanza for ProQuest:

HTTPHeader X-Requested-With
HTTPHeader Accept-Encoding
Title ProQuest
URL http://search.proquest.com/ip
DJ proquest.com
HJ gateway.proquest.com
DJ umi.com
HJ fedsearch.proquest.com
HJ literature.proquest.com
DJ conquest-leg-insight.com
DJ conquestsystems.com
DJ m.search.proquest.com
DJ media.proquest.com
NeverProxy order.proquest.com
NeverProxy rss.proquest.com

Here’s an Apache HTTPd configuration using ProQuest that accomplishes much of the same functionality for the main search.proquest.com interface:

<VirtualHost _default_:80>
 ServerName search.proquest.com.fqdn

 ProxyRequests Off
 ProxyVia On

 RewriteEngine On
 RewriteRule ^/(.*) http://search.proquest.com/$1 [P]

 <Location “/“>
  AllowMethods GET POST OPTIONS
  ProxyPassReverse http://search.proquest.com/
  ProxyPassReverseCookieDomain search.proquest.com search.proquest.com.fqdn
  CacheEnable disk
  SetOutputFilter INFLATE;DEFLATE
  Header Append Vary User-Agent env=!dont-vary
  # Put Authentication directives here
  ErrorDocument 401 /path/to/login
  Require Valid-User
 </Location>
</virtualHost>

A few notes on this:

- There is no need for NeverProxy: if you do not define a VirtualHost for the hostname, it is not proxied.  So instead of HJ and DJ lines, you add a new VirtualHost block for each hostname that needs to be proxied.  The astute will ask “what about services that have dozens or hundreds of host entries, like Sage?”  Those can be handled by the ProxyExpress features in Apache HTTPd.

- There is no need for HTTPHeader: since Apache HTTPd is a full HTTP proxy/server, it supports all HTTP headers natively.

- Some of the hostnames that are in EZproxy stanzas are not needed, and some are legacy hostnames that are no longer used by the vendor

- Some of the hostnames that are in EZproxy stanzas are for CDN hosted content that requires no special access (e.g. JavaScript/CSS/graphics assets that make up the vendor’s user interface).  Another example: how many of you have “DJ google.com” in one of your stanzas? Now how many of you registered your IP addresses with Google in any way?  Outside of Google Scholar, I suspect the answer to those questions are “nearly everyone” and “nearly no one”, respectively.

- Some of the hostnames are for things that no sane person would do: How many people run their discovery services through their EZproxy server vs. authenticating their discovery platform by IP address with vendors directly?

- Something that this configuration does that EZproxy does not do is enable object caching.  This can easily save 30-50% of your upstream bandwidth usage (Proxy/ProxySSL in EZproxy can achieve the same result with an external caching proxy server).

- More complex vendor platforms (e.g. Gale Cengage) need ProxyHTML directives and ProxyHTMLURLMap configured, and multiple VirtualHost sections to get them fully working.  These can be a little fun to get working initially.

- Some services need redirects edited to work correctly, and not break out of the proxy:

	Header edit Location http://vendor/ http://vendor.fqdn/

- Some vendors send wrong HTTP headers for the MIME type, and mod_proxy_html exposes this in some cases as it rewrites the page.  There may be a better way to do this, but this is what I threw together for testing:

	<Location “/badpath”>
		ProxyHTMLEnable Off
		SetOutputFilter INFLATE;dummy-html-to-plain
		ExtFilterOptions LogStdErr Onfail=remove
	</Location>
	ExtFilterDefine dummy-html-to-plain mode=output intype=text/html outtype=text/plain cmd=“/bin/cat -“

So what’s currently missing in the Apache HTTPd solution?

- Services that use an authentication token (predominantly ebook vendors) need special support written.  I have been entertaining using mod_lua for this to make this support relatively easy for someone who is not hard-core technical to maintain.

- Services that are not IP authenticated, but use one of the Form-based authentication variants.  I suspect that an approach that injects a script tag into the page pointing to javascript that handles the form fill/submission might be a sane approach here.  This should also cleanly deal with the ASP.net abominations that use __PAGESTATE to store sessions client-side instead of server-side.

- EZproxy’s built-in DNS server (enabled with the “DNS” directive) would need to be handled using a separate DNS server (there are several options to choose from).

- In this setup, standard systems-level management and reporting tools would be used instead of the /admin interface in EZproxy

- In this setup, the functionality of the EZproxy /menu URL would need to be handled externally.  This may not be a real issue, as many academic sites already use LMS or portal systems instead of the EZproxy to direct students to resources, so this feature may not be as critical to replicate.

- And of course, extensive testing.  While the above ProQuest stanza works for the main ProQuest search interface, it won’t work for everyone, everywhere just yet.

Bottom line: Yes, Apache HTTPd is a viable EZproxy alternative if you have a system administrator who knows their way around Apache HTTPd, and are willing to spend some time getting to know your vendor services intimately.

All of this testing was done on Fedora 19 for the 2.4 version of HTTPd, which should be available in RHEL7/CentOS7 soon, so about the time that hard decisions are to be made regarding EZproxy vs something else, that something else may very well be Apache HTTPd with vendor-specific configuration files.

-- 
Andrew Anderson, Director of Development, Library and Information Resources Network, Inc.
http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | http://www.facebook.com/LIRNnotes

On Jan 29, 2014, at 14:42, Margo Duncan <[log in to unmask]> wrote:

> Would you *have* to be hosted? We're in a rural part of the USA and network connections from here to anywhere aren't great, so we try to host most everything we can.  EZProxy really is "EZ" to host yourself.
> 
> Margo
> 
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of stuart yeates
> Sent: Wednesday, January 29, 2014 1:40 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] EZProxy changes / alternatives ?
> 
> The text I've seen talks about "[e]xpanded reporting capabilities to support management decisions" in forthcoming versions and encourages towards the hosted solution.
> 
> Since we're in .nz, they'd put our hosted proxy server in .au, but the network connection between .nz and .au is via the continental .us, which puts an extra trans-pacific network loop in 99% of our proxied network connections.
> 
> cheers
> stuart
> 
> On 30/01/14 03:14, Ingraham Dwyer, Andy wrote:
>> OCLC announced in April 2013 the changes in their license model for North America.  EZProxy's license moves from requiring a one-time purchase of US$495 to a *annual* fee of $495, or through their hosted service, with the fee depending on scale of service.  The old one-time purchase license is no longer offered for sale as of July 1, 2013.  I don't have any details about pricing for other parts of the world.
>> 
>> An important thing to recognize here, is that they cannot legally change the terms of a license that is already in effect.  The software you have purchased under the old license is still yours to use, indefinitely.  OCLC has even released several maintenance updates during 2013 that are available to current license-holders.  In fact, they released V5.7 in early January 2014, and made that available to all license-holders.  However, all updates after that version are only available to holders of the yearly subscription.  The hosted product is updated to the most current version automatically.
>> 
>> My recommendation is:  If your installation of EZProxy works, don't change it.  Yet.  Upgrade your installation to the last version available under the old license, and use that for as long as you can.  At this point, there are no world-changing new features that have been added to the product.  There is speculation that IPv6 support will be the next big feature-add, but I haven't heard anything official.  Start planning and budgeting for a change, either to the yearly fee, or the cost of hosted, or to some as-yet-undetermined alternative.  But I see no need to start paying now for updates you don't need.
>> 
>> -Andy
>> 
>> 
>> 
>> Andy Ingraham Dwyer
>> Infrastructure Specialist
>> State Library of Ohio
>> 274 E. 1st Avenue
>> Columbus, OH 43201
>> library.ohio.gov
>> 
>> 
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf 
>> Of stuart yeates
>> Sent: Tuesday, January 28, 2014 10:03 PM
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] EZProxy changes / alternatives ?
>> 
>> I probably should have been more specific.
>> 
>> Does anyone have experience switching from EzProxy to anything else?
>> 
>> Is anyone else aware of the coming OCLC changes and considering switching?
>> 
>> Does anyone have a worked example like: "My EzProxy config for site Y looked like A; after the switch, my X config for site Z looked like B"?
>> 
>> I'm aware of this good article:
>> http://journal.code4lib.org/articles/7470
>> 
>> cheers
>> stuart
>> 
>> 
>> On 29/01/14 15:24, stuart yeates wrote:
>>> We've just received notification of forth-coming changes to EZProxy, 
>>> which will require us to pay an arm and a leg for future versions to 
>>> install locally and/or host with OCLC AU with a ~ 10,000km round trip.
>>> 
>>> What are the alternatives?
>>> 
>>> cheers
>>> stuart
>> 
>> 
>> --
>> Stuart Yeates
>> Library Technology Services http://www.victoria.ac.nz/library/
>> 
> 
> 
> --
> Stuart Yeates
> Library Technology Services http://www.victoria.ac.nz/library/