On Tue, May 19, 2009 at 8:26 AM, Boheemen, Peter van <[log in to unmask]> wrote: > Clever idea to put the TicToc stuff 'in the cloud'. How are you going to > keep it up-to-date ? By periodically reuploading the entire set (which takes about 15-20 mins), new or changed records can be updated. A changed record is one with a new RSS feed for the same ISSN + Title combination; the data is keyed by ISSN+Title. This process can be optimized by only uploading the delta (you upload .csv files, so the delta can be obtained easily via comm(1)). Removing records is a bit of a hassle since GAE does not provide an easy-to-use interface for that. It's possible to wipe an entire table clean by repeatedly deleting 500 records at a time (the entire set is about 19,000 records), then doing a fresh import. This can be done by uploading a "console" application into the cloud. (http://con.appspot.com/console/help/about ) Alternatively, smaller sets of records can be deleted via a "remove" handler, which I haven't implemented yet. A script will need to post the data to be removed against the handler. Will do that though if anybody uses it. User impact is low if old records aren't removed. A possible alternative is to have the GAE app periodically verify the validity of each requested record with a server we'd have to run. (Pulling the data straight from tictocs.ac.uk doesn't work since it's larger what you're allowed to fetch.) This approach would somewhat defeat the idea of the cloud since we'd have to rely on keeping that server operational, albeit at a lower degree of availability and load. Another potential issue is the quota Google provides: you get 10GBytes and 1.3M requests free per 24 hour period, then they start charging you ($.12 per GByte) I think I mentioned in my post that I included a non-GAE version of the server that only requires mod_wsgi. For that standalone version, keeping the data set up to date is implemented by checking the last mod time of its localy copy - it will reread its data when it detects a more recent jrss.txt in its current directory, so keeping its data up to date is a simple a periodically curling http://www.tictocs.ac.uk/text.php - Godmar