The new "IBM Power Systems Performance Report POWER7, POWER6 and POWER5 results" holds an interesting piece of information. A reoccuring question be colleagues and befriended admins is the impact of LPARs to the performance. It looks like IBM needs the LPARS to get some speed out of their larger systems.

Just a few examples: When using a 795 with 4.25 GHz and 64 cores a configuration with 4 LPARS a 16 cores yield a relative performance of 926.28. The same system with just 1 LPAR with 64 processors yield a relative performance of 777.09. So leaving the scaling to the OS instead of dividing it into 4 small systems gives you just 83.89% of the performance. When using a 795 with 4.25 GHz and 64 cores a configuration with 8 LPARs with 16 cores each yields a rperf value of 1852,56. With 2 LPAR with 64 cores each you get 1554,18. Interestingly is 83,89% again.

At first i thought "16 cores are easily fitting on a processor book (with 4 procs each). A 64 cores LPAR has to use two processor books. So when you use a configuration larger than a processor book you will leave 16,11% on the way". But doing the same calculation with some other data showed otherwise. But the move from 32 to 64 core lpars just reduce the performance by 5,7 percent respectively 5,4 %. 32 cores fit on a processor book, too. Thus the difference should be similar to the 64 to 16 cores situation.
So my interpretation is a little bit different: The scalability of AIX seems to have sweet spot between 16 and 32 cores. I thought a moment about an intra-book bottleneck, but the CPUs on the book are fully meshed (1 hop from each CPU to every other), so i don't think it's a problem.

When you look into this chart (please click into the image for a larger version), you may find some interesting points. The light blue line is a hypothetical perfectly scaling 4.0 GHz Power7 in a 795. The data is based on rPerf number of 103.41 for an 8 core system(source: Page 20 of the document). Please look at the right side of the chart at 256 cores. You end up at 81,9% of the hypothetical performance when you use 64 core LPARs and at 86% of the hypothetical performance when using 32 core LPARS (the difference is interestingly pretty much the same as computed before for 64 cores instead). At 64 cores your load is distributed at 8 processors, thus just 2 processor board. Still there is an serious impact of almost 20%. Will be interesting to further dig into this topic.

However it's important to know, that the operating system is limited to 64 core SMP no matter how many cores are in the system by the LPARs configuration. So this numbers doesn't factor in scalability challenges of AIX above 64 cores as the os has not to scale above this point while generating this rperf numbers. The numbers for the large core number configurations are not single OS image numbers. Further penalties for the OS scaling comes on top. Furthermore this benchmark is a pure CPU/memory benchmark. As IBM explained in their own description of the benchmark, there is no I/O and no networking involved.

That said, a number of really interesting data points are missing in the pdf:
  • The rperf number of a fully blown unpartitioned system
  • The rperf number on a fully blown system with just one LPAR in the size of the complete system
  • Somehow i have the impression, IBM is hiding something. When you look into the mentioned pdf there is SPEC number for an unpartitioned system, but no rperf for it. The existence of the SPEC numbers hints to the point, that they had indeed an OS that was able to scale to 256 cores. On the other side, there is no SPEC number for the partitioned systems, but the rperf numbers. Just call it a presentiment ...

Read More about [Digging into 795 rperf numbers...