Apologies for coming into this thread so late. I am one of the developers that did an initial technical review for our institution. I thought it might be a good idea to go into more detail about our findings.
>I think we need to clear (and careful) in this discussion about what user
>data we are discussing. With authentication being done by the library /
>university, Lean Library doesn’t actually have personally identifiable
>information (PII). While IP addresses can be traced, is that any more a
>concern than an user’s ISP tracking all of users traffic already, since
>Lean Library is only effective from off campus IP addresses?
While it's true that authentication occurs on library servers, my concern about PII stems from the fact that the plugin can send detailed patron browsing activity to Lean Library servers. This behavior appears to be enabled for about 100 of the roughly 170 institutions that subscribe to Lean Library. More troubling, the plugin appears to send browsing activity even when the plugin appears to be "inactive" due to a patron being on-campus.
I've copied a portion of my report below. It would be great if anyone has done a similar review and confirm (or, even better, refute) some of the issues we came across. (I originally wrote this in markdown so I apologize for the odd formatting)
I used the Firefox Add-on debugger to observe network traffic and plugin activity. More info on using the add-on debugger can be found here:
### Desktop setup
- Browser: Firefox 61.0.2
- Operating System: macOS v10.13.6
- Lean Library Plugin: 2.8.1
## Lean Library API Endpoints
- Plugin communicates with lean-library through a handful of "endpoints"
- Base URL for requests is `https://app.leanlibrary.com/?r=api`
- All endpoints are served over HTTPS, however, they do not appear to be restricted or authenticated with a token or API key. Therefore these calls can come from any source. For example, I utilized the "api/institutes" and "api/resourceDomains" to determine the IP ranges and database listings for a handful of institutions that I do not belong to.
- Note: this is not an exhaustive list, but rather some of the more pertinent endpoints that we came across.
### Endpoint Notes: /api/logAction
Client sends user's current URL (hostname, path, and querystring) to the API server. This API call occurs whenever the user types a URL into the browser "address bar" or if the user clicks on link on a web page. Request payload also includes UserAgent, Instution ID, and "Client ID" (I belive this uniquely identifies this plugin instance).
- NOTE: It's this behavior does not occur for every institution. In my testing, 'logAction" calls only occur when the institution utilizes "IP range validation", which is about 100 of the 171 instutions that appear in the "Select Your Library" config screen (see endpoint notes for "/api/institutes" further down).
- This API call occurs any time the user clicks on a link or types an address into the browser's address bar. The request payload includes hostname, path, and querystring.
- URL's can often include sensitive, personally identifying information. Such information could be used by a bad actor to facilitate phishing attacks, among other things.
- This behavior occurs even when the plugin claims to be "inactive" - For example, if I click on the Library Access button from an on-campus IP, the plugin opens a popup message which states "You are logged in on a campus network, so our extension is inactive. Keep calm and study on!"
### Endpoint Notes: /api/institutes
This API call returns a JSON object containing a list of "institutes" (i.e. libraries that subscribe to Lean Library) as well as their configuration information. Presumably the plugin makes this API call on startup to render the "Select Your Library" dropdown on the plugin configuation page.
- One of the fields, "enableIpRangeValidation", presumably indicates whether the plugin should attempt to determine if the user is "on campus". Notably, whenever this field is set to "true", the plugin utilizes the "logAction" api call to send patron browsing activity to Lean Library.
- response payload appears to contain the configuration information for every library that subscribes to Lean Library. Including the address of the institution's EZProxy server and map of "on-campus" IP ranges.
- While not necessarily private and/or sensitive, but institions might not be thrilled w/ the idea of this information being publically available.
### Endpoint Notes: /api/resourceDomains
Returns a list of databases available to a particular institution. Basically a list of "Starting Point URL's" in Ezproxy-speak.
## Other Server notes
- OS: CentOS
- HTTP Server: Apache/2.2.15 [released: March 2010]
- App server: PHP/7.1.9 [released: December 2016]
Developer, ASU Library