Minor page faults go through the roof
This is very curious. Five days ago, the number of minor page faults
on all but one of our servers has started to scale up in a linear
fashion. It isn't related to user load (linear increases seldom
are...) and we can't tie it to any system changes.
When I say "scale up", I mean last Thursday page faults were in the
hundreds per second and just now they average around 3000 per second.
On 21 servers, performing unrelated functions. Some run Java
applications, some are database servers, two are file servers, and some
only do web service, but all of them are showing the same linear
increase in minor page faults.
This has been accompanied by a (smaller) linear increase in context
switches and system calls. So far, we haven't seen any corresponding
increase in other types of I/O or CPU.
RRD shows a ruler-straight line from last thursday to now on most
machines. (More jagged on the busier machines.)
The only machine that has not exhibited this characteristic is our
proxy server, which coincidentally is the only machine outside the
network. This would seem to indicate some sort of network issue, but
our network admins can't find anything out of the ordinary. The
network is a flat mesh.
We share the same mesh with about 40 Windows servers, and we haven't
established yet what changes might have occurred in that environment.
But in the meantime, I wanted to ask if anyone else has seen this, and
if they know what caused it.