

For what it's worth, I share your feelings. Sometimes I would wander into the machine room (when I was still responsible for a machine room) and gaze in amazement that all of the blinking lights were blinking in just the right sequence to mean that all of the varied services used by our patrons were functioning as they should. It is really kinda majestic if you stop and think about it. 

Then I'd go back and hack into some really ugly code...


On Nov 9, 2011, at 17:07, "Yitzchak Schaffer" <[log in to unmask]> wrote:

> This is just a reflection on the earlier name resolution incident. I 
> find it remarkable how much goes into solving a problem, and the 
> corollary, how much impact a simple problem can have. Just my braindump 
> as a relatively novice sysadmin.
> Here's the chain of events:
> - This morning at 9am, our web server chokes. I see apache is using up 
> MaxClients
> - After poking around the various daemons and looking at logs, I figure 
> out that everything is running correctly
> - I somehow narrow it down to the script that pings the OCLC chat 
> availability service waiting for 20+ seconds and finally timing out, 
> *despite* the fact that I thought it was set up with a 2-second timeout 
> (I don't remember how I got it down to that)
> - I shut that down temporarily and disabled our chat function, which got 
> the server back to normal.
> - I browsed the service manually, which worked, and tried two different 
> techniques in the PHP (file_get_contents() and curl), both of which failed.
> - I went to Brooklyn to do some vigilante digitization and have lunch 
> with my boss
> - I got back to the office, saw nothing had changed, and started digging 
> deeper into the curl request
> - I found the name resolution error, which blew my mind
> - I tried resolving multiple ways, and failing that, came here
> Thanks to all who contributed ideas... amazing how one change to a 
> vendor DNS server can lead to our web server DOS'ing itself. More 
> networking knowledge... must get more networking knowledge...
> -- 
> Yitzchak Schaffer
> Systems Manager
> Touro College Libraries
> 212.742.8770 ext. 2432
> Access Problems? Contact [log in to unmask]