Print

Print


Hi Demian,
Any chance that site was crawled/archived by archive.org? It's not perfect,
but if there's a current-ish version in Wayback Machine that could work. I
have also had some success with https://ricks-apps.com/osx/sitesucker/ but
if the site's no longer up that may be moot.
Good luck!
Derek

On Wed, Mar 4, 2020 at 10:37 AM Demian Katz <[log in to unmask]>
wrote:

> Hello, everyone –
>
> I’ve been struggling with a use case that feels like it can’t be unique to
> my situation. Wondering if anyone else has solved this!
>
> We’ve decommissioned an old dynamic site, and we still want to make the
> content available in a static form. It was a large and complex site with a
> lot of pages, and after trying a variety of solutions, we ended up
> harvesting it all into a WARC file. This is great for archival purposes,
> but we’re struggling with presentation.
>
> The problem with serving content from a WARC is that it seems to be
> unbearably slow in every solution we try. (And when I say unbearably, I
> mean “40 minutes to load one page using pywb” – not kidding).
>
> I assume that this slowness has to do with dynamically navigating around
> in a multi-gigabyte file to retrieve things… but really all we want to do
> is serve up static content.
>
> Is there some tool that can simply unpack a WARC into a directory of
> static files that can be navigated quickly? It seems like this should be
> possible, but I’m coming up empty in searching.
>
> And just to be clear: I understand that unpacking a WARC probably won’t
> retain all of the richness of detail that dynamic retrieval from the WARC
> can provide, and I certainly don’t plan to throw away the WARC… but for
> people who just want to quickly navigate content from the most
> recently-crawled version of the old site, I want a solution that will
> perform acceptably, and I haven’t found it yet.
>
> Thanks for any and all advice! 😊
>
> - Demian
>