Yes, I did ask, and ask, and ask, and waited for 2 months. There was
something political going on internally with that group that was well
beyond my pay grade.
I did explain the potential problems to my boss and she was providing cover.
I did it in batches as Google Sheets limits the amount of ImportXML that
you can do in a 24 hour span, so I wasn't hammering anyone's web server
into oblivion.
It's funny, I actually had to do a fair amount to get the old V1 LibGuides
link checker to stop hammering my ILS into going offline back in 2010-2011.
On Tue, Nov 28, 2017 at 2:18 PM, Bill Dueber <[log in to unmask]> wrote:
> Brett, did you ask the folks at the Large University Library if they could
> set something up for you? I don't have a good sense of how other
> institutions deal with things like this.
>
> In any case, I know I'd much rather talk about setting up an API or a
> nightly dump or something rather than have my analytics (and bandwidth!)
> blown by a screen scraper. I might say "no," but at least it would be an
> informed "no" :-)
>
> On Tue, Nov 28, 2017 at 2:08 PM, Brett <[log in to unmask]> wrote:
>
> > I leveraged the IMPORTXML() and xpath features in Google Sheets to pull
> > information from a large university website to help create a set of
> weeding
> > lists for a branch campus. They needed extra details about what was in
> > off-site storage and what was held at the central campus library.
> >
> > This was very much like Jason's FIFO API, the central reporting group had
> > sent me a spreadsheet with horrible data that I would have had to sort
> out
> > almost completely manually, but the call numbers were pristine. I used
> the
> > call numbers as a key to query the catalog with limits for each campus I
> > needed to check, and then it dumped all of the necessary content
> (holdings,
> > dates, etc) into the spreadsheet.
> >
> > I've also used Feed43 as a way to modify certain RSS feeds and scrape
> > websites to only display the content I want.
> >
> > Brett Williams
> >
> >
> > On Tue, Nov 28, 2017 at 1:24 PM, Brad Coffield <
> > [log in to unmask]>
> > wrote:
> >
> > > I think there's likely a lot of possibilities out there and was hoping
> to
> > > hear examples of web scraping for libraries. Your example might just
> > > inspire me or another reader to do something similar. At the very
> least,
> > > the ideas will be interesting!
> > >
> > > Brad
> > >
> > >
> > > --
> > > Brad Coffield, MLIS
> > > Assistant Information and Web Services Librarian
> > > Saint Francis University
> > > 814-472-3315
> > > [log in to unmask]
> > >
> >
>
>
>
> --
> Bill Dueber
> Library Systems Programmer
> University of Michigan Library
>
|