Print

Print


This is amazing!

Maybe a github repo for config blocks is in order?  I figure the only way
to work out the myriad kinks in this would be scale.

-Ross.


On Wed, Jan 29, 2014 at 6:00 PM, Andrew Anderson <[log in to unmask]> wrote:

> When OCLC first announced their purchase of EZproxy, we started a low
> priority research project to see what the alternatives were a few years
> ago, and what it would take to bring them into a production ready state.
>  The two open source solutions we evaluated were Squid and Apache HTTPd.
>  We considered other options (e.g. Apache Traffic Server), but limited the
> research to these two pieces of software since they are already widely used
> and familiar to most system administrators.
>
> Long story short, Squid did not support URL rewriting in a way that we
> felt would be able to be supported well, between requiring patches to the
> core C++ server code, or an external rewriting processes, or an ICAP server
> implementation.  Some of that has improved a bit since the original
> evaluation, but the built-in support for URL rewriting may still need some
> time to mature.  Another aspect of Squid that did not seem to be a good fit
> was that it is somewhat limited in its authentication mechanisms vs Apache
> HTTPd.
>
> So we moved on to evaluating Apache HTTPd with the mod_proxy family of
> modules.  While Apache HTTPd does not support the advanced cache federation
> features as Squid, it has grown to be a robust proxy solution in its own
> right, and the 2.4 release appears to have all of the required pieces out
> of the box, with the mod_proxy_html module functionality.  In addition to
> basic URL rewriting support, you get full HTTP protocol support, mature
> IPv6 support, GZIP support, just about any authentication mechanism you
> need, a server that you can self-host content with easily, as well as a
> built-in HTTP object cache.
>
> How would it work?
>
> Here's the current EZproxy stanza for ProQuest:
>
> HTTPHeader X-Requested-With
> HTTPHeader Accept-Encoding
> Title ProQuest
> URL http://search.proquest.com/ip
> DJ proquest.com
> HJ gateway.proquest.com
> DJ umi.com
> HJ fedsearch.proquest.com
> HJ literature.proquest.com
> DJ conquest-leg-insight.com
> DJ conquestsystems.com
> DJ m.search.proquest.com
> DJ media.proquest.com
> NeverProxy order.proquest.com
> NeverProxy rss.proquest.com
>
> Here's an Apache HTTPd configuration using ProQuest that accomplishes much
> of the same functionality for the main search.proquest.com interface:
>
> <VirtualHost _default_:80>
>  ServerName search.proquest.com.fqdn
>
>  ProxyRequests Off
>  ProxyVia On
>
>  RewriteEngine On
>  RewriteRule ^/(.*) http://search.proquest.com/$1 [P]
>
>  <Location "/">
>   AllowMethods GET POST OPTIONS
>   ProxyPassReverse http://search.proquest.com/
>   ProxyPassReverseCookieDomain search.proquest.comsearch.proquest.com.fqdn
>   CacheEnable disk
>   SetOutputFilter INFLATE;DEFLATE
>   Header Append Vary User-Agent env=!dont-vary
>   # Put Authentication directives here
>   ErrorDocument 401 /path/to/login
>   Require Valid-User
>  </Location>
> </virtualHost>
>
> A few notes on this:
>
> - There is no need for NeverProxy: if you do not define a VirtualHost for
> the hostname, it is not proxied.  So instead of HJ and DJ lines, you add a
> new VirtualHost block for each hostname that needs to be proxied.  The
> astute will ask "what about services that have dozens or hundreds of host
> entries, like Sage?"  Those can be handled by the ProxyExpress features in
> Apache HTTPd.
>
> - There is no need for HTTPHeader: since Apache HTTPd is a full HTTP
> proxy/server, it supports all HTTP headers natively.
>
> - Some of the hostnames that are in EZproxy stanzas are not needed, and
> some are legacy hostnames that are no longer used by the vendor
>
> - Some of the hostnames that are in EZproxy stanzas are for CDN hosted
> content that requires no special access (e.g. JavaScript/CSS/graphics
> assets that make up the vendor's user interface).  Another example: how
> many of you have "DJ google.com" in one of your stanzas? Now how many of
> you registered your IP addresses with Google in any way?  Outside of Google
> Scholar, I suspect the answer to those questions are "nearly everyone" and
> "nearly no one", respectively.
>
> - Some of the hostnames are for things that no sane person would do: How
> many people run their discovery services through their EZproxy server vs.
> authenticating their discovery platform by IP address with vendors directly?
>
> - Something that this configuration does that EZproxy does not do is
> enable object caching.  This can easily save 30-50% of your upstream
> bandwidth usage (Proxy/ProxySSL in EZproxy can achieve the same result with
> an external caching proxy server).
>
> - More complex vendor platforms (e.g. Gale Cengage) need ProxyHTML
> directives and ProxyHTMLURLMap configured, and multiple VirtualHost
> sections to get them fully working.  These can be a little fun to get
> working initially.
>
> - Some services need redirects edited to work correctly, and not break out
> of the proxy:
>
>         Header edit Location http://vendor/ http://vendor.fqdn/
>
> - Some vendors send wrong HTTP headers for the MIME type, and
> mod_proxy_html exposes this in some cases as it rewrites the page.  There
> may be a better way to do this, but this is what I threw together for
> testing:
>
>         <Location "/badpath">
>                 ProxyHTMLEnable Off
>                 SetOutputFilter INFLATE;dummy-html-to-plain
>                 ExtFilterOptions LogStdErr Onfail=remove
>         </Location>
>         ExtFilterDefine dummy-html-to-plain mode=output intype=text/html
> outtype=text/plain cmd="/bin/cat -"
>
> So what's currently missing in the Apache HTTPd solution?
>
> - Services that use an authentication token (predominantly ebook vendors)
> need special support written.  I have been entertaining using mod_lua for
> this to make this support relatively easy for someone who is not hard-core
> technical to maintain.
>
> - Services that are not IP authenticated, but use one of the Form-based
> authentication variants.  I suspect that an approach that injects a script
> tag into the page pointing to javascript that handles the form
> fill/submission might be a sane approach here.  This should also cleanly
> deal with the ASP.net abominations that use __PAGESTATE to store sessions
> client-side instead of server-side.
>
> - EZproxy's built-in DNS server (enabled with the "DNS" directive) would
> need to be handled using a separate DNS server (there are several options
> to choose from).
>
> - In this setup, standard systems-level management and reporting tools
> would be used instead of the /admin interface in EZproxy
>
> - In this setup, the functionality of the EZproxy /menu URL would need to
> be handled externally.  This may not be a real issue, as many academic
> sites already use LMS or portal systems instead of the EZproxy to direct
> students to resources, so this feature may not be as critical to replicate.
>
> - And of course, extensive testing.  While the above ProQuest stanza works
> for the main ProQuest search interface, it won't work for everyone,
> everywhere just yet.
>
> Bottom line: Yes, Apache HTTPd is a viable EZproxy alternative if you have
> a system administrator who knows their way around Apache HTTPd, and are
> willing to spend some time getting to know your vendor services intimately.
>
> All of this testing was done on Fedora 19 for the 2.4 version of HTTPd,
> which should be available in RHEL7/CentOS7 soon, so about the time that
> hard decisions are to be made regarding EZproxy vs something else, that
> something else may very well be Apache HTTPd with vendor-specific
> configuration files.
>
> --
> Andrew Anderson, Director of Development, Library and Information
> Resources Network, Inc.
> http://www.lirn.net/ | http://www.twitter.com/LIRNnotes |
> http://www.facebook.com/LIRNnotes
>
> On Jan 29, 2014, at 14:42, Margo Duncan <[log in to unmask]> wrote:
>
> > Would you *have* to be hosted? We're in a rural part of the USA and
> network connections from here to anywhere aren't great, so we try to host
> most everything we can.  EZProxy really is "EZ" to host yourself.
> >
> > Margo
> >
> > -----Original Message-----
> > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> stuart yeates
> > Sent: Wednesday, January 29, 2014 1:40 PM
> > To: [log in to unmask]
> > Subject: Re: [CODE4LIB] EZProxy changes / alternatives ?
> >
> > The text I've seen talks about "[e]xpanded reporting capabilities to
> support management decisions" in forthcoming versions and encourages
> towards the hosted solution.
> >
> > Since we're in .nz, they'd put our hosted proxy server in .au, but the
> network connection between .nz and .au is via the continental .us, which
> puts an extra trans-pacific network loop in 99% of our proxied network
> connections.
> >
> > cheers
> > stuart
> >
> > On 30/01/14 03:14, Ingraham Dwyer, Andy wrote:
> >> OCLC announced in April 2013 the changes in their license model for
> North America.  EZProxy's license moves from requiring a one-time purchase
> of US$495 to a *annual* fee of $495, or through their hosted service, with
> the fee depending on scale of service.  The old one-time purchase license
> is no longer offered for sale as of July 1, 2013.  I don't have any details
> about pricing for other parts of the world.
> >>
> >> An important thing to recognize here, is that they cannot legally
> change the terms of a license that is already in effect.  The software you
> have purchased under the old license is still yours to use, indefinitely.
>  OCLC has even released several maintenance updates during 2013 that are
> available to current license-holders.  In fact, they released V5.7 in early
> January 2014, and made that available to all license-holders.  However, all
> updates after that version are only available to holders of the yearly
> subscription.  The hosted product is updated to the most current version
> automatically.
> >>
> >> My recommendation is:  If your installation of EZProxy works, don't
> change it.  Yet.  Upgrade your installation to the last version available
> under the old license, and use that for as long as you can.  At this point,
> there are no world-changing new features that have been added to the
> product.  There is speculation that IPv6 support will be the next big
> feature-add, but I haven't heard anything official.  Start planning and
> budgeting for a change, either to the yearly fee, or the cost of hosted, or
> to some as-yet-undetermined alternative.  But I see no need to start paying
> now for updates you don't need.
> >>
> >> -Andy
> >>
> >>
> >>
> >> Andy Ingraham Dwyer
> >> Infrastructure Specialist
> >> State Library of Ohio
> >> 274 E. 1st Avenue
> >> Columbus, OH 43201
> >> library.ohio.gov
> >>
> >>
> >> -----Original Message-----
> >> From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> >> Of stuart yeates
> >> Sent: Tuesday, January 28, 2014 10:03 PM
> >> To: [log in to unmask]
> >> Subject: Re: [CODE4LIB] EZProxy changes / alternatives ?
> >>
> >> I probably should have been more specific.
> >>
> >> Does anyone have experience switching from EzProxy to anything else?
> >>
> >> Is anyone else aware of the coming OCLC changes and considering
> switching?
> >>
> >> Does anyone have a worked example like: "My EzProxy config for site Y
> looked like A; after the switch, my X config for site Z looked like B"?
> >>
> >> I'm aware of this good article:
> >> http://journal.code4lib.org/articles/7470
> >>
> >> cheers
> >> stuart
> >>
> >>
> >> On 29/01/14 15:24, stuart yeates wrote:
> >>> We've just received notification of forth-coming changes to EZProxy,
> >>> which will require us to pay an arm and a leg for future versions to
> >>> install locally and/or host with OCLC AU with a ~ 10,000km round trip.
> >>>
> >>> What are the alternatives?
> >>>
> >>> cheers
> >>> stuart
> >>
> >>
> >> --
> >> Stuart Yeates
> >> Library Technology Services http://www.victoria.ac.nz/library/
> >>
> >
> >
> > --
> > Stuart Yeates
> > Library Technology Services http://www.victoria.ac.nz/library/
>