I scrape data from a large book retailer we sometimes order from to make
our acquisitions workflow a bit easier. In particular, I ask people to make
wishlists and scrape those (I can do individual items too, but the large
retailer doesn't like that, even though we're trying to purchase from
them), check those titles against our holdings, then line them up with some
info in a spreadsheet for people. It is NOT the way I would suggest going
about things if you can help it, as pages change frequently, large
retailers block IP addresses, large retailers time out a lot, and so on,
but when up against a rock and a hard place.
Not for libraries, but I also recently had to put together a scraper when
looking for daycares in Massachusetts as the page listing daycares was
helpful but really, really clunky. Saved me quite a bit of sanity.
On Wed, Nov 29, 2017 at 8:50 AM, Ross Singer <[log in to unmask]> wrote:
> Due the absence of APIs, we have to scrape III WebBridge and EBSCO
> LinkSource link resolver results to determine electronic holdings for
> things.
>
> Neither of them make it particularly easy, since they don't provide many
> semantic clues in the markup as to what you're looking at and there are all
> kinds of other conditions you have to account for (e.g. direct linking to
> certain sources, etc.).
>
> It's generally one of those things I avoid at all costs since the pages you
> want/need to scrape are the most likely to be the most frustrating to work
> with.
>
> -Ross.
>
> On Tue, Nov 28, 2017 at 1:26 PM Brad Coffield <[log in to unmask]
> >
> wrote:
>
> > I think there's likely a lot of possibilities out there and was hoping to
> > hear examples of web scraping for libraries. Your example might just
> > inspire me or another reader to do something similar. At the very least,
> > the ideas will be interesting!
> >
> > Brad
> >
> >
> > --
> > Brad Coffield, MLIS
> > Assistant Information and Web Services Librarian
> > Saint Francis University
> > 814-472-3315 <(814)%20472-3315>
> > [log in to unmask]
> >
>
|