monitoring cluster for failures
I am interested in monitoring cluster running linux for various failures
(hardware and software). I basically want to quantify the cluster for
different failures (including device failures) over period of a month or so.
For this purpose I need to periodically scan the
syslogd and klogd messages to determine the failures. But the issue is
that the volume of messages is quite large and I am not sure what I am
exactly looking for. If ppl in the list could post some of the major
error/panic/warning messages that I should parse for (to achieve my
objective detailed above), I would be very glad.
Thanks in advance,
Re: monitoring cluster for failures
[email]email@example.com[/email] (Pirabhu) wrote in message
> I am interested in monitoring[/color]