Hi Demian,
What kind of dynamic site is it? I've decommissioned old Drupal and
Wordpress sites by essentially doing a big wget and generating a bunch of
HTML pages. It's not perfect, and it depends on the site in question, but
it has worked well enough. I wonder if that could serve as a Plan B for you.
Best wishes,
Carol
On Wed, Mar 4, 2020 at 10:37 AM Demian Katz <[log in to unmask]>
wrote:
> Hello, everyone –
>
> I’ve been struggling with a use case that feels like it can’t be unique to
> my situation. Wondering if anyone else has solved this!
>
> We’ve decommissioned an old dynamic site, and we still want to make the
> content available in a static form. It was a large and complex site with a
> lot of pages, and after trying a variety of solutions, we ended up
> harvesting it all into a WARC file. This is great for archival purposes,
> but we’re struggling with presentation.
>
> The problem with serving content from a WARC is that it seems to be
> unbearably slow in every solution we try. (And when I say unbearably, I
> mean “40 minutes to load one page using pywb” – not kidding).
>
> I assume that this slowness has to do with dynamically navigating around
> in a multi-gigabyte file to retrieve things… but really all we want to do
> is serve up static content.
>
> Is there some tool that can simply unpack a WARC into a directory of
> static files that can be navigated quickly? It seems like this should be
> possible, but I’m coming up empty in searching.
>
> And just to be clear: I understand that unpacking a WARC probably won’t
> retain all of the richness of detail that dynamic retrieval from the WARC
> can provide, and I certainly don’t plan to throw away the WARC… but for
> people who just want to quickly navigate content from the most
> recently-crawled version of the old site, I want a solution that will
> perform acceptably, and I haven’t found it yet.
>
> Thanks for any and all advice! 😊
>
> - Demian
>
--
Carol Kassel
Senior Manager, Digital Library Infrastructure
NYU Digital Library Technology Services
she/her/hers
[log in to unmask]
(212) 992-9246
dlib.nyu.edu
|