I agree with all of the below; for quite some time, we were reduced to scraping our Serial Solutions journal title search results for use in our Bento, because SS did not have a search or discovery API. Fortunately, the SS page was semantically very simple, and SS hadn’t changed the interface in literally 10-12 years, so it was a stable (as can be) choice. It is a viable option given the correct circumstances, and has its own advantages as indicated in the below statement. However, the results time for that page were extremely variable, but then again, APIs can have connectivity issues as well. S Steven Turner, MLIS Manager, Web Technologies and Development, Assistant Professor University Libraries The University of Alabama<https://www.ua.edu/> 416 Gorgas Library | Box 870266, Tuscaloosa, AL 35487-0266 office 205-348-1638 steven.j.turner<mailto:[log in to unmask]>@ua.edu | http://www.lib.ua.edu/ [cid:[log in to unmask]] <https://www.ua.edu/> <https://www.ua.edu/> On Nov 28, 2017, at 1:27 PM, Kyle Banerjee <[log in to unmask]<mailto:[log in to unmask]>> wrote: Howdy Brad, Jason nailed it on the head. Scraping is what you're reduced to when API's, extractions, DB calls, shipping drives, mounting data on shared infrastructure and the like aren't viable options. Also, scraping sometimes gets you precombined or preprocessed data that would otherwise be a pain to generate. I find your question interesting. I avoid scraping like the plague as it gives me heartburn just thinking about it -- i.e. I'm much more inclined to figure out how not to use the method rather than how to use it. Having said that, I have personally used scraping to migrate ILS and digital collections data, identify corrupted digital assets on systems, verify embargo compliance, and generate ILL pull lists sorted in correct order with availability. I expect to resort to scraping to solve some consortial collection analysis problems that cannot be solved using provided analytical tools and extracts in the not too distant future. The Orbis Cascade Alliance used to rely on web scraping to support consortial borrowing among dozens of standalone systems that did not support NCIP. In addition to the obvious stability issues, there are a number of other issues to be mindful of when scraping such as the method may violate TOS, look like a DOS attack, be very slow and/or resource intensive, get mucked up by spider traps (including unintentional ones), and be much harder/easier depending on what headers you send. Before harvesting someone else's systems, be sure to call and make sure they're cool with it and that there's not some other undocumented mechanism that will serve you better. Web scraping is all about parsing and cleaning, and the best method/tools will vary with the specific application. As is the case with many "hacky" methods, it's fun to do despite its deficiencies. And it works better than one would think -- you'd be surprised how reliable a process that scrapes millions of pages can be if you set it up right. kyle On Tue, Nov 28, 2017 at 10:59 AM, Jason Bengtson <[log in to unmask]<mailto:[log in to unmask]>> wrote: I use web scraping sometimes to extract data from systems that lack APIs. I'm doing this to get current library job openings from our University jobs application, for instance. I use the structure of their website in a way similar to an API query, scrape the results, and extract only what I need. I jokingly call it a FIFIO API (Fine, I'll Figure It Out). Obviously, such a tool is inherently unstable, and has to be closely managed. When used with things like the jobs application, which maintain a relatively stable uri structure over time, however, it can be a pretty good tool when you have nothing else. I also used screen scraping as part of a tool I built years ago to allow authorized staff to create announcements within a special libguide that they then pushed to the EZ Proxy login page. I wrote a book chapter on that one: "Leveraging LibGuides as an EZProxy Notifications Interface." Innovative Libguides Applications: Real World Examples. New York: Rowman & Littlefield, 2016 Best regards, *Jason Bengtson* *http://www.jasonbengtson.com/ <http://www.jasonbengtson.com/>* On Tue, Nov 28, 2017 at 12:24 PM, Brad Coffield < [log in to unmask]<mailto:[log in to unmask]> wrote: I think there's likely a lot of possibilities out there and was hoping to hear examples of web scraping for libraries. Your example might just inspire me or another reader to do something similar. At the very least, the ideas will be interesting! Brad -- Brad Coffield, MLIS Assistant Information and Web Services Librarian Saint Francis University 814-472-3315 [log in to unmask]