Print

Print


Hi Mike & Tom,

I didn’t clarify in my original question that I’m looking to access a site that uses form-based authentication.

You’re both pointing me to the same which is to provide cookies to a CLI tool. You suggest wget, I began by looking at httrack and someone off-list suggested curl. All of these should work :)

I’ve been swamped by other work to try this, but my next steps are surer now. Thanks folks!

Alex

On 15 January 2017 at 01:49:20, Hagedon, Mike - (mhagedon) ([log in to unmask]) wrote:

Hi Alex,  
It might really depend on the kind of authentication used, but a number of years ago I had to do something similar for a site protected by university (CAS) authn. If I recall correctly, I logged into the site with Firefox, and then told wget to use Firefox cookies. More or less like this like the "easy" version of the accepted answer here:  

http://askubuntu.com/questions/161778/how-do-i-use-wget-curl-to-download-from-a-site-i-am-logged-into  

Mike  

Mike Hagedon | Team Lead for Software & Web Development (Dev) | Technology Strategy & Services | University of Arizona Libraries  


-----Original Message-----  
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Alex Armstrong  
Sent: Friday, January 13, 2017 12:42 AM  
To: [log in to unmask]  
Subject: [CODE4LIB] How to archive selected pages from a site requiring authentication  

Has anyone had to archive selected pages from a login-protected site? How did you do it?  

I've used the CLI tool httrack in the past for archiving sites. But in this case, accessing the pages require logging in. There's some vague documentation about how to do this with httrack, but I haven't cracked it yet. (The instructions are better for the Windows version of the application, but I only have ready access to a Mac.)  

Before I go on a wild goose chase, any help would be much appreciated.  

Alex  

--  
Alex Armstrong  
Web Developer & Digital Strategist, AMICAL Consortium [log in to unmask]