Print

Print


> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Edward M. Corrado
> Sent: Tuesday, September 01, 2009 3:57 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] FW: PURL Server Update 2
> 
> This should be a lesson to each and everyone of us who are in charge of
> maintain systems to actually try to restore a system from your backups
> on a some-what regular basis to ensure that you can get important
> systems up and running in a timely manner. From personal experience, I
> can tell you that you can learn a lot from these tests, and on at least
> one occasion they saved my ____ when I had a critical system fail.

I would be surprised if GPO wasn't backing up their data or even imaging
their server drives on a periodic basis.  However, in my experience with 
operation folks over many years, it isn't as simple as grabbing a backup
tape, putting it in a drive, and copying the data.  Just copying the
data, depending on how large it is could take some time.

1) Data drive fails. Grab a new drive from stock, reformat the drive,
   grab the backup tape, reload the data, then figure out what was lost
   between the time it was backed up and it went down.  If it were a SQL
   database can you recover from any intact journals?  Yes, then apply
   the journals.  What if this isn't the only service running on the
   server, you got to restore the data for those applications too.

2) OS drive fails. Grab a new drive from stock, reformat the drive, grab
   the backup tape or image and reload the OS, then figure out what
   security or application patches were applied between the time the 
   backup/image was taken and today.  Reinstall the missing patches.

3) Total hardware failure.  Do (1) and (2) above.  Your organization
   might keep pre-built spare servers around, but you still have to make
   sure that they are up-to-date with security and application patches,
   that they have the correct applications installed, you need to change
   the IP address, the server name and other little details like that.

4) Server compromised.  Worst case scenario.  They need to preserve all
   the drives so they can analyze them and turn over information to 
   police.  They are not going to trust the backup/image since they don't 
   know how long the server was compromised.  So they are most likely 
   going to rebuild the server from scratch and insure that it has *all* 
   the latest security and application patches, in addition to doing (1).

We had a research server get compromised a few years ago and it took
several weeks to get it back online due to rebuilding it from scratch.
Nothing is as simple as it seems...


Andy.