Erratic network performance: Spin mutexes vs. Interrupts

I was recently investigating the cause of high variance in network performancebetween Logical Domains on a SunFire T2000. I was running the iperf benchmarkfrom one LDom guest to two other LDom guests.The rig was configured like this:# ldm lsNAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIMEprimary active -n-cv- SP 8 4G 2.1% 1d 1hoaf381-ld-1 active -n---- 5000 8 6G 13% 1moaf381-ld-2 active sn---- 5001 8 6G 0.0% 1moaf381-ld-3 active sn---- 5002 8 6G 0.0% 2mSometimes I would see throughput of up to 1360 Mb/s, but otherruns it would drop to as low as 870 Mb/s. Here's a graph of thebenchmark results, as you can see they are very erratic. (You may need toopen it in a separate window if your browser scales it).

Looking at mpstat outputthere seemed by some sort of connection between a high spin mutexcount and performance, but it's hard to get a grasp of tens of mpstatoutputs at once.
For example here is mpstat output for a run with a result of 1318 Mb/sCPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1490 313 1 66 11 2 4250 0 4 0 100 0 0 1 0 0 1467 314 0 71 9 4 4678 0 4 0 100 0 0 2 0 0 486 2207 4 3687 2 1277 187 0 34 0 24 0 76 3 0 0 192 1048 2 1574 2 526 106 0 21 0 12 0 87 4 0 0 627 3302 5 6008 9 657 163 0 36 0 30 0 70 5 0 0 608 3134 6 5597 11 695 159 0 45 0 31 0 68 6 0 0 3911 6130 4094 4590 31 663 222 0 62 0 44 0 56 7 0 0 4462 6279 4205 4625 32 666 238 0 50 0 45 0 55and here is mpstat output for a run with a result of 882 Mb/sCPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 666 5338 4 9695 8 523 318 0 29 0 34 0 66 1 0 0 540 4593 6 8272 9 506 277 0 34 0 31 0 69 2 0 0 405 3382 6 5795 9 448 202 0 43 0 28 0 72 3 0 0 1644 2037 112 3208 5 338 124 0 48 0 25 0 75 4 0 0 84 928 4 1283 1 402 82 0 30 0 14 0 86 5 0 0 36 503 2 496 0 152 31 0 15 0 5 0 95 6 0 0 6490 6540 6102 87 23 1 2197 0 5 0 100 0 0 7 0 0 6485 6547 6107 92 23 2 2336 0 5 0 100 0 0The best way I found to see the pattern was to graph it.For each benchmark run I found what CPU had the highest smtx count,and plotted that smtx value against the iperf result, using a different colourfor each CPU. The graph is below and reveals an unusual pattern:
A few notes:
  • There appear to be four groupings of behaviour
  • If the highest smtx count is on CPU 6 or 7, the iperf result is low
  • If the highest smtx count is on CPU 1 or 2 the iperf result is high
  • The highest smtx count is never on CPU 3.
  • There is a range of results with very low smtx values, so there may be another variable in play as well.
Another data point is that in every run, the same two CPUs (6 and 7) handled the interruptsfor the vnet device. Here is the intrstat output, and it is confirmedby the mpstat output above: device | cpu0 %tim cpu1 %tim cpu2 %tim cpu3 %tim-------------+------------------------------------------------------------ vdc#0 | 0 0.0 0 0.0 0 0.0 4 0.0 vnet#0 | 0 0.0 0 0.0 0 0.0 0 0.0 vnet#1 | 0 0.0 0 0.0 0 0.0 0 0.0 device | cpu4 %tim cpu5 %tim cpu6 %tim cpu7 %tim-------------+------------------------------------------------------------ vdc#0 | 0 0.0 0 0.0 0 0.0 0 0.0 vnet#0 | 0 0.0 0 0.0 0 0.0 0 0.0 vnet#1 | 0 0.0 0 0.0 3973 3.6 3980 3.6So the first conclusion I could draw was that if the interrupt handlingand whatever generates the spin mutexes is on the same two CPUs, theniperf performance is badly affected.
I will follow up this blog entry with more analysis and some workarounds.Notes

I was running with iperf 2.0.4. oaf381-ld-1 is the server, oaf381-ld-2 and oaf381-ld-3 are the clients. It is invoked on the server as:iperf204 -c 192.1.44.2 -f m -t 120 -N -l 1M -P 100 &iperf204 -c 192.1.44.3 -f m -t 120 -N -l 1M -P 100 &and on the clients as:iperf204 -s -N -f m -l 1M

More...