Print

Print


On 10/16/07 7:22 AM, "Jeremy Frumkin" <[log in to unmask]>
spake:

> Hi Folks -
>
> Our apologies - code4lib.org is currently down due to a non-responding
> database cluster. We are aware of the problem and are working to resolve it.

And it gets better and better.

Higher than normal database traffic produced higher than normal amounts of
database logging, filling up our database transaction logging disk on both
nodes of our database cluster. A few milliseconds after that, MySQL
sputtered, coughed and cursed my name... and then hung waiting for me to
cleanup the logs.

After some poking, prodding and cursing of my own the cluster is back up and
running.

Today I am going to be working on a better log cleanup script that purges
any logs already relayed to the other cluster node(s) and already written to
tape. That should help prevent the problem in the future. And I'm working on
some finer grained monitoring to detect the problem a little sooner so I can
clean it up before it gets to this point.

Ryan

--
Ryan Ordway                          E-mail:   [log in to unmask]
Unix Systems Administrator             [log in to unmask]
OSU Libraries, Corvallis, OR 97370        Office: Valley Library #4657