On Sep 1, 2009, at 9:36 PM, Ross Singer <[log in to unmask]> wrote:
> On Tue, Sep 1, 2009 at 7:51 PM, Edward M.
> Corrado<[log in to unmask]> wrote:
>> Thus I have to believe them that they did not have a compromised
>> server and instead they had a hardware failure. I have no idea why
>> they couldn't just restore from backup which would at least gotten
>> them back to where they were from the last backup (which presumably
>> was at most a week ago, if not someone should have a lot of
>> to do to someone).
> I didn't want to join this speculation party, but here goes.
> It's quite possible that part of the problem here is that the
> "significant hardware failure" meant that the replacement was a
> completely different architecture (let's say for argument's sakes that
> the server that failed was AS/400 and the replacement was Solaris on
> an Intel server) because IT policy (or, you know, reality) dictated
> that the old hardware would be replaced if it failed.
> So then we're not just talking about backing up from tape -- things
> need to be compiled -- there are perhaps problems with legacy C
> libraries, character sets, *whatever*.
> When I was working at Emory, we had a grant funded project that
> indexed a handful of collections of SGML EAD files in an app called
> iSearch (http://www.etymon.com/tr.html#). When the (admittedly
> neglected) VA Linux server it ran on had a major problem it was
> insanely non-trivial to get this completely orphaned application
> running in a contemporary operating system (in this case, RedHat).
> Old versions of iSearch /would not under any circumstances/ compile --
> new ones couldn't read the old data. The application was down for --
> I don't know -- months, IIRC. Granted, this was nowhere near the
> priority of GPO's PURL server -- but you can't stop time to solve
> these sorts of Catch-22s, either.
> Things happen. Catastrophes generally have the added advantage of
> ensuring they don't happen again for a while.
For arguments sake, let's say youe speculation is correct. In this
case this would be a huge management failure. If you are running a
major service you need to plan for hardware failure. This isn't rocket
science, this is sysadmin 101. Hardware fails.