+1 to Alex's suggestion to use WARC for the preservation master and generate PDFs for access. While I agree with Kyle that it's ultimately the "content" that's important and that hypothetical researcher needs are inexhaustible, I do think there's an advantage to preserving web content in a web-native way. Aside from verisimilitude, looking ahead to implementation of Memento (http://mementoweb.org/) - a mechanism for adding temporal navigation to the web through federated discovery of resources preserved in distributed web archives - data stored in WARC will ultimately be better integrated into the fabric of the web than PDFs siloed in an individual institutional repository. I also wanted to mention (and encourage addition to!) the Wikipedia list of web archiving initiatives: http://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives. It provides a good overview of many web archiving institutions' programs, data formats, technology stacks, and access provisions (including links to their Wayback implementations). ~Nicholas -- Nicholas Taylor Web Archiving Service Manager Stanford University Libraries