Hi all,
There's a new collection of Jupyter notebooks to help researchers use web
archives in the GLAM Workbench:
https://glam-workbench.github.io/web-archives/
There's a mix of examples, explorations, apps and tools. For example there
are tools to find when a word or phrase appears (or disappears) from a web
page, to compare the text content of a page over time, to create full page
screenshots & more. If you want to get deeper into the data, there's
detailed documentation and examples of the sorts of data that's available
and how you can get it. Repositories include the Australian Web Archive,
the NZ Web Archive, the UK Web Archive, and the Internet Archive, but many
of the notebooks could be modified to work with other Memento-compliant web
archives.
The focus is on data that is readily accessible and able to be used without
the need for special equipment. The notebooks use existing APIs to get data
in manageable chunks. But many of the examples demonstrated can also be
scaled up to build substantial datasets for analysis – you just have to be
patient!
The development of these notebooks was supported by the International
Internet Preservation Consortium's Discretionary Funding Programme 2019-2020,
with the participation of the British Library, the National Library of
Australia, and the National Library of New Zealand.
Cheers, Tim
--
Tim Sherratt ([log in to unmask])
timsherratt.org
@wragge on Twitter
|