## Too perfect

Sometimes charts look to perfect to be measured. I had this feeling when i saw the rperf numbers of the 795 and put them into a chart. I'm a very visual person, so i put everything in a chart just to get a feeling for numbers.

At first i thought i was paranoid, but then my colleague Jan Brosowski mailed to me that he had thought the same, albeit he approached the problem from the mathematical point of view. Okay ... that left me with a lot of questions and so i did some quick bull****-testing math on the datapoints.

Some math

After reading his mail i wanted to do some tests on my own. So i did a short test with the numbers. At first i've put the data of the 3,7 Ghz P7 into my favorite statistical programm R.
> procs rperf fm fitted.values(fm) 1 2 3 4 5 6 7 8 273.51 547.02 820.53 1094.04 1367.55 1641.06 1914.57 2188.08 > residuals(fm) 1 2 3 4 5 -7.886829e-14 -1.045378e-14 6.540129e-14 6.414338e-14 3.446377e-14 6 7 8 -2.363755e-14 -2.489546e-14 -2.615336e-14 > coefficients(fm) (Intercept) procs 1.607775e-13 1.139625e+01> summary(fm)Call:lm(formula = rperf ~ procs)Residuals: Min 1Q Median 3Q Max -7.887e-14 -2.521e-14 -1.705e-14 4.188e-14 6.540e-14 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.608e-13 4.241e-14 3.791e+00 0.00906 ** procs 1.140e+01 3.499e-16 3.257e+16 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.442e-14 on 6 degrees of freedomMultiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 1.061e+33 on 1 and 6 DF, p-value: < 2.2e-16
Strange ... the linear model leads to coeffcients able to predict the rperf value per core with minimal residuals. And i learned not to trust data with a R-squared of 1. Okay ... let's check for the 4.0 GHz P7 perf numbers:
> procs rperf fm residuals(fm) 1 2 3 4 5 1.627514e-13 -1.454981e-13 4.364426e-16 -4.677910e-15 -3.821397e-14 6 7 8 -1.490662e-14 -1.052861e-13 1.453949e-13 > fitted.values(fm) 1 2 3 4 5 6 7 8 372.27 744.54 1116.81 1489.08 1861.35 2233.62 2605.89 2978.16 > coefficients(fm) (Intercept) procs -3.215549e-13 1.163344e+01
Again ... minimal residuals. Okay ... last check ... for the 4.25 GHz procs.
> procs rperf fm residuals(fm) 1 2 3 4 5 3.163842e-03 -1.638418e-03 -1.242938e-03 -8.474576e-04 -4.519774e-04 6 7 8 -5.649718e-05 3.389831e-04 7.344633e-04 > fitted.values(fm) 1 2 3 4 5 6 7 347.3568 463.1416 694.7112 926.2808 1157.8505 1389.4201 1620.9897 8 1852.5593> coefficients(fm) (Intercept) procs 0.002429379 14.473100282
Sorry ... that's is looking to perfect to me.

When you put the 64-cores LPARs data into the same system, you will see for 4.0 GHz:
> procs rperf fm coefficients(fm) (Intercept) procs -3.215549e-13 1.098719e+01
And now for the 4.25 GHz P7:
> procs rperf fm coefficients(fm) (Intercept) procs -1.607775e-13 1.214203e+01
Both times the intercept is 0 (i assume the small intercept is owed to rounding in data point by IBM or by the challenges of floating point arithmetic on computers.

That's totally unreasonable for measured data. When you just assume 99% of the performance for the 256 cores datapoint (thus an practically impossible scaling factor) you would have an intercept in the range of 28.13.
> procs rperf fm coefficients(fm)(Intercept) procs 28.12720 10.76744
Conclusion

At the moment i don't believe that IBM has really measured all the data it provides in the rperf list. The data fits to perfect in a linear model. The interesting question is: "Which data points were really measured?" All the data provided for the configurations look computed/guessed or something like that and not measured. Even when you want to assume that IBM found a way to the holy grail of linear scaleability, an R-squared of 1 and residuals at 0 are just ridiclious. I would really like to know what data points were really measured.