Print

Print


Hi Alex,
It might really depend on the kind of authentication used, but a number of years ago I had to do something similar for a site protected by university (CAS) authn. If I recall correctly, I logged into the site with Firefox, and then told wget to use Firefox cookies. More or less like this like the "easy" version of the accepted answer here:

http://askubuntu.com/questions/161778/how-do-i-use-wget-curl-to-download-from-a-site-i-am-logged-into

Mike

Mike Hagedon | Team Lead for Software & Web Development (Dev) | Technology Strategy & Services | University of Arizona Libraries


-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Alex Armstrong
Sent: Friday, January 13, 2017 12:42 AM
To: [log in to unmask]
Subject: [CODE4LIB] How to archive selected pages from a site requiring authentication

Has anyone had to archive selected pages from a login-protected site? How did you do it?

I've used the CLI tool httrack in the past for archiving sites. But in this case, accessing the pages require logging in. There's some vague documentation about how to do this with httrack, but I haven't cracked it yet. (The instructions are better for the Windows version of the application, but I only have ready access to a Mac.)

Before I go on a wild goose chase, any help would be much appreciated.

Alex

--
Alex Armstrong
Web Developer & Digital Strategist, AMICAL Consortium [log in to unmask]