LISTSERV 16.5 - CODE4LIB Archives

There are 3 basic approaches to rewriting proxy servers that I have seen in the wild, each with their own strengths and weaknesses:

1) Proxy by port

This is the original EZproxy model, where each proxied resource gets its own port number.  This runs afoul of firewall rules to non port 80/443 resources, and it creates a problem for SSL access, as clients try both HTTP and HTTPS to the same port number, and EZproxy is not setup to differentiate both protocols accessing the same port.  With more and more resources moving to HTTPS, the end of this solution as a viable option is in sight.

2) Proxy by hostname

This is the current preferred EZproxy model, as it addresses the HTTP(S) port issue, but as you have identified, it instead creates a hostname mangling issue, and now I’m curious myself about how EZproxy will handle a hyphenated SSL site as well with HttpsHyphens enabled.  I /think/ it does the right thing by mapping the hostname back to the original internally, as a “-“ in hostnames for release versioning is how the Google App Engine platform works, but I have not explicitly investigated that.

3) Proxy by path

A different proxy product that we use, Muse Proxy from Edulib, leverages proxy by path, where the original website URL is deconstructed and passed to the proxy server as query arguments.  This approach has worked fairly well as it cleanly avoids the hostname mangling issues, though some of the new “single page web apps” that use JavaScript routing patterns can be interesting, so the vendor has added proxy by hostname support as an option for those sites as a fallback.

So there is no perfect solution, but some work better than others.  I’m looking forward to expanding our use of the proxy by path approach, as that is a very clean approach to this problem, and it seems to have fewer caveats than the other two approaches.

-- 
Andrew Anderson, Director of Development, Library and Information Resources Network, Inc.
http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | http://www.facebook.com/LIRNnotes

On Dec 18, 2014, at 17:04, Stuart A. Yeates <[log in to unmask]> wrote:

> It appears that the core of my problem was that I was unaware of
> 
> Option HttpsHyphens / NoHttpsHyphens
> 
> which toggle between proxying on
> 
> https://www.somedb.com.ezproxy.yourlib.org
> 
> and
> 
> https://www-somedb-com.ezproxy.yourlib.org
> 
> and allows infinitely nested domains to be proxied using a simple
> wildcard cert by compressing things.
> 
> The paranoid in me is screaming that there's an interesting brokenness
> in here when a separate hosted resource is at https://www-somedb.com/,
> but I'm trying to overlook that.
> 
> cheers
> stuart
> --
> ...let us be heard from red core to black sky
> 
> 
> On Mon, Dec 15, 2014 at 9:24 AM, Stuart A. Yeates <[log in to unmask]> wrote:
>> Some resources are only available only via HTTPS. Previously we used a
>> wildcard certificate, I can't swear that it was ever tested as
>> working, but we weren't getting any complaints.
>> 
>> Recently browser security has been tightened and RFC 6125 has appeared
>> and been implemented and proxing of https resources with a naive
>> wildcard cert no longer works (we're getting complaints and are able
>> to duplicate the issues).
>> 
>> At https://security.stackexchange.com/questions/10538/what-certificates-are-needed-for-multi-level-subdomains
>> there is an interesting solution with multiple wildcards in the same
>> cert:
>> 
>> foo.com
>> *.foo.com
>> *.*.foo.com
>> ...
>> 
>> There is also the possibility that we can just grep the logs for every
>> machine name ever accessed and generate a huge list.
>> 
>> Has anyone tried these options? Successes? Failures? Thoughts?
>> 
>> cheers
>> stuart
>> 
>> 
>> --
>> ...let us be heard from red core to black sky