adventures in debottlenecking: disk / cpu / or ethernet?
I was looking for comments that might help me rationalize the results of
some file-transfer tests on my eth-bonded-servers.
local-disk-to-local-disk using cp 40 sec
local-disk-to-local-disk using dd 23 sec
local-disk-to-local-disk using scp 40 sec
local-disk-to-local-disk using scp to root@localhost explicitly
(presumably enables encryption and talks with the local ssh daemon)
scp server-to-server (mode4 bonding 802.3ad) 90 sec
scp server-to-server (mode6 bonding alb=Adaptive Load Balancing)89 sec
scp server-to-server (single eth link) 93 sec
[Note: multiple runs show some variablity in timing; approx. +/- 5 secs.]
In all scp runs I benchmarked cpu usage on both source and sink machine
cpus. Around 60% usage of a *single* cpu. Each machine has 8 cpus. For
the scp transfer to root@localhost two cpus showed around 60% usage.
Test transfer-file was a fc9 image of around 4GB.
These results confuse me totally. What is my bottleneck?
(Option a) Not cpu since none of the tests max out my cpu %age.
(Option b) Not disk since the cp on localdisks are almost twice as fast
as the ones over the network.
This would logically imply the network is my bottleneck. But then if I
kill one interface from my bond0 I should see a degradation of
performance. I do not see this.
That leaves us with:
(c) network was not loaded fully so bonding-or-not- did not matter(but
then the network is not the bottleneck! a contradiction!)
(d) my bonding of eth's is not operational at all!
Any comments? Or holes in my analysis? Forget comparing modes 4 and 6 I
seem to be not able to figure out if bonding works at all or not! Any
other tests people might suggest are greatly appreciated!