LISTSERV 16.5 - CODE4LIB Archives

A thing to be careful of when web harvesting a wiki, is that it may harvest mope than you bargained for.

Most wikis (I don't know JSPWiki, sorry) can present earlier versions of pages, diffs, indexes, and sometimes the same pages under different URLs. This may or may not be what you want.

For taking a snapshot of the current versions of all pages on a wiki, I have had luck with using an index page (that is: a list of all pages in the wiki, not index.html) and harvesting that page with a recursion depth of one. This may or may not help you :-)

$ wget --html-extension -r -k -l1 <wiki-index-page-url>

Best of luck,
  Kåre

> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On
> Behalf Of Tom Keays
> Sent: Wednesday, May 23, 2012 3:27 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] archiving a wiki
> 
> I haven't tried it on a wiki, but the command-line Unix utility
> wget can be
> used to mirror a website.
> 
> http://www.gnu.org/software/wget/manual/html_node/Advanced-
> Usage.html
> 
> I usually call it like this:
> 
> wget -m -p http://www.site.com/
> 
> common flags:
>    -m = mirroring on/off
>    -p = page_requisites on/off
>    -c = continue - when download is interrupted
>    -l5 = reclevel - Recursion level (depth) default = 5
> 
> On Tue, May 22, 2012 at 5:04 PM, Carol Hassler
> <[log in to unmask]>wrote:
> 
> > My organization would like to archive/export our internal wiki in
> some
> > kind of end-user friendly format. The concept is to copy the wiki
> > contents annually to a format that can be used on any standard
> computer
> > in case of an emergency (i.e. saved as an HTML web-style archive,
> saved
> > as PDF files, saved as Word files).
> >
> > Another way to put it is that we are looking for a way to export
> the
> > contents of the wiki into a printer-friendly format - to a
> document that
> > maintains some organization and formatting and can be used on any
> > standard computer.
> >
> > Is anybody aware of a tool out there that would allow for this
> sort of
> > automated, multi-page export? Our wiki is large and we would
> prefer not
> > to do this type of backup one page at a time. We are using
> JSPwiki, but
> > I'm open to any option you think might work. Could any of the web
> > harvesting products be adapted to do the job? Has anyone else
> backed up
> > a wiki to an alternate format?
> >
> > Thanks!
> >
> >
> > Carol Hassler
> > Webmaster / Cataloger
> > Wisconsin State Law Library
> > (608) 261-7558
> > http://wilawlibrary.gov/
> >
> >