Print

Print


This is just a reflection on the earlier name resolution incident. I 
find it remarkable how much goes into solving a problem, and the 
corollary, how much impact a simple problem can have. Just my braindump 
as a relatively novice sysadmin.

Here's the chain of events:
- This morning at 9am, our web server chokes. I see apache is using up 
MaxClients
- After poking around the various daemons and looking at logs, I figure 
out that everything is running correctly
- I somehow narrow it down to the script that pings the OCLC chat 
availability service waiting for 20+ seconds and finally timing out, 
*despite* the fact that I thought it was set up with a 2-second timeout 
(I don't remember how I got it down to that)
- I shut that down temporarily and disabled our chat function, which got 
the server back to normal.
- I browsed the service manually, which worked, and tried two different 
techniques in the PHP (file_get_contents() and curl), both of which failed.
- I went to Brooklyn to do some vigilante digitization and have lunch 
with my boss
- I got back to the office, saw nothing had changed, and started digging 
deeper into the curl request
- I found the name resolution error, which blew my mind
- I tried resolving multiple ways, and failing that, came here

Thanks to all who contributed ideas... amazing how one change to a 
vendor DNS server can lead to our web server DOS'ing itself. More 
networking knowledge... must get more networking knowledge...

-- 
Yitzchak Schaffer
Systems Manager
Touro College Libraries
212.742.8770 ext. 2432
http://www.tourolib.org/

Access Problems? Contact [log in to unmask]