Print

Print


Thanks Kurt,

Some of those pre-built utilities look interesting although they don't
seem to solve my immediate problem. However, they should prove useful
in the future.

Edward

On Wed, Nov 23, 2011 at 9:40 PM, Nordstrom, Kurt <[log in to unmask]> wrote:
> Hi Edward,
>
> We're currently using the warc-tools library for WARC creation. It's written in Python, but there are a few pre-built utilities that come with the package that might suit your needs?
>
> http://code.hanzoarchives.com/warc-tools
>
> -Kurt
> ________________________________________
> From: Code for Libraries [[log in to unmask]] on behalf of Edward M. Corrado [[log in to unmask]]
> Sent: Wednesday, November 23, 2011 5:30 PM
> To: [log in to unmask]
> Subject: [CODE4LIB] Web archiving and WARC
>
> Hello All,
>
> I need to harvest a few Web sites in order to preserve them. I'd
> really like to preserve them using the WARC file format [1] since it
> is a standard for digital preservation. I looked at I looked at Web
> Curator Tool (WCT) and Heritrix and they seem to be good at what they
> do but are built to work on a much larger scale then what I'd like to
> do -- and that comes with a cost of increased complexity. Tools like
> wget are simple to use and can easily be scripted to accomplish my
> limited task, except the standard wget and similar tools I am familiar
> with do not support WARC. Also, I haven't been able to find a tool
> that can convert zipped files created with wget to WARC.
>
> I did find a version of wget with warc support built in [1] from the
> Archive Team so that may be my solution, but compile software with
> "dirty" written into the name of the zip file is maybe not the best
> longterm solution. Does anyone know of any other simples tool to
> create a WARC file (either from harvesting or converting a wget or
> similar mirror/archive)?
>
> Edward
>
> [1] http://archiveteam.org/index.php?title=Wget_with_WARC_output
>