LISTSERV 16.5 - CODE4LIB Archives

Hi Eric,

I have created static versions of several WordPress sites. Here's a link to
one of the sites:

http://futureofthebook.org/occurrence/

As you will see, some of the functionality is lost, such as the search and
commenting features. But the content is preserved, and now I don't have to
maintain WordPress for this site (for which its need for interactivity is
long past).

Here is the wget command I used:

wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --include /occurrence \
     --no-parent \
         http://www.futureofthebook.org/occurrence/ \
     --domains www.futureofthebook.org

I'm not certain that I needed all of these switches, but some of them were
necessary.

After I did the wget, I put the set of files into a new location and then
tested, tested, tested. Some links didn't work properly, and so I had to do
some manual work to get a fully functioning site. Nothing is perfect.

Once I had everything working the way I wanted, I pointed my Web server to
the new location of the site, backed up my WordPress database and files,
and saved everything as a tar file, just in case.

Good luck!

Best wishes,

Carol

On Mon, Oct 6, 2014 at 2:44 AM, Eric Phetteplace <[log in to unmask]> wrote:

> Hey C4L,
>
> If I wanted to archive a Wordpress site, how would I do so?
>
> More elaborate: our library recently got a "donation" of a remote Wordpress
> site, sitting one directory below the root of a domain. I can tell from a
> cursory look it's a Wordpress site. We've never archived a website before
> and I don't need to do anything fancy, just download a workable copy as it
> presently exists. I've heard this can be as simple as:
>
> wget -m $PATH_TO_SITE_ROOT
>
> but that's not working as planned. Wget's convert links feature doesn't
> seem to be quite so simple; if I download the site, disable my network
> connection, then host locally, some 20 resources aren't available. Mostly
> images which are under the same directory. Possibly loaded via AJAX.
> Advice?
>
> (Anticipated) pertinent advice: I shouldn't be doing this at all, we should
> outsource to Archive-It or similar, who actually know what they're doing.
> Yes/no?
>
> Best,
> Eric
>



-- 
Carol Kassel
NYU Digital Library Technology Services
[log in to unmask]
(212) 992-9246
dlib.nyu.edu