https://archive-it.org/ the subscription service of https://archive.org/ does login-protected sites. We've found them to be very helpful and the software to just work, but we've never done any password protected sites. cheers stuart -- ...let us be heard from red core to black sky On Thu, Jan 19, 2017 at 5:54 AM, Nicholas Taylor <[log in to unmask]> wrote: > Hi Alex, > > If you don't mind having your data in WARC format, you could use: > * The Webrecorder web service (https://webrecorder.io/), which records to > an archive pages that you browse. Works well if you only have a small > number of pages to archive and has the advantage that it can archive > whatever you can access via your browser. Just make sure to set the > collection to private and/or download and delete it once completed. > * The Heritrix archival crawler support HTTP authentication ( > https://webarchive.jira.com/wiki/display/Heritrix/Credentials), much like > HTTrack or wget, with the added advantage of storing the files in WARC. > > ~Nicholas > > -----Original Message----- > From: Alex Armstrong [mailto:[log in to unmask]] > Sent: Tuesday, January 17, 2017 7:09 AM > Subject: Re: How to archive selected pages from a site requiring > authentication > > Hi Mike & Tom, > > I didn’t clarify in my original question that I’m looking to access a site > that uses form-based authentication. > > You’re both pointing me to the same which is to provide cookies to a CLI > tool. You suggest wget, I began by looking at httrack and someone off-list > suggested curl. All of these should work :) > > I’ve been swamped by other work to try this, but my next steps are surer > now. Thanks folks! > > Alex > > On 15 January 2017 at 01:49:20, Hagedon, Mike - (mhagedon) ( > [log in to unmask]) wrote: > > Hi Alex, > It might really depend on the kind of authentication used, but a number of > years ago I had to do something similar for a site protected by university > (CAS) authn. If I recall correctly, I logged into the site with Firefox, > and then told wget to use Firefox cookies. More or less like this like the > "easy" version of the accepted answer here: > > http://askubuntu.com/questions/161778/how-do-i-use- > wget-curl-to-download-from-a-site-i-am-logged-into > > Mike > > Mike Hagedon | Team Lead for Software & Web Development (Dev) | Technology > Strategy & Services | University of Arizona Libraries > > > -----Original Message----- > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of > Alex Armstrong > Sent: Friday, January 13, 2017 12:42 AM > To: [log in to unmask] > Subject: [CODE4LIB] How to archive selected pages from a site requiring > authentication > > Has anyone had to archive selected pages from a login-protected site? How > did you do it? > > I've used the CLI tool httrack in the past for archiving sites. But in > this case, accessing the pages require logging in. There's some vague > documentation about how to do this with httrack, but I haven't cracked it > yet. (The instructions are better for the Windows version of the > application, but I only have ready access to a Mac.) > > Before I go on a wild goose chase, any help would be much appreciated. > > Alex > > -- > Alex Armstrong > Web Developer & Digital Strategist, AMICAL Consortium > [log in to unmask] >