Comparative data of ORACLE 10g on SPARC & SOLARIS 10
Oracle 10g OLTP performance on SPARC chips
[B]A boring ratio[/B]
<SPAN STYLE="font-weight: normal">Customers would love to have their performance levels linked to their hardware. But more often than you think, they migrate from System X (designed 10 years ago) to System Y (fresh from the oven) and are surprised with the performance improvements. In the past two years, we have completed many successful migrations from F15k/E25k servers to new Enterprise Servers M9000. Customer have reported great improvements in throughput and response time. But what can you really expect and what percentage of the improvement is actually due to the operating system enhancement ? Can the recent small frequency increase on our SPARC64 VII chipset be at all interesting ? The new SPARC64 VII 2.88Ghz available on our M8000 and M9000 flagships propose no architectural change, no additional features and a modest frequency increase going from 2.52 Ghz to 2.88 Ghz - for a ratio of 1.14. We could stop our analysis there and label this change 'marginal' or 'not interesting'. But my initial testings showed a comparative OLTP peak throughput to be way higher than this frequency-based ratio.
What happened ?
[B]A passion for Solaris [/B]
Most of the long term Sun employees have a passion for Solaris. Solaris is the uncontested Unix leader and include such a huge amount of features that when you are a Solaris addict, it is difficult to get in love with another Operating System. And Oracle executives made no mistake : Sun has the best UNIX kernel & performance engineers in the world. Without them, Solaris would not scale today to a 512 hardware thread system (M9000-64).
But of course, Solaris is a moving target. Every release brings its truck load of features, bug fixes and other performance improvements. Here are critical fixes done between Solaris 10 Update 4 and the brand new Solaris 10 Update 8 [U]influencing Oracle performance on the M9000[/U] :
[LIST] [*]In Solaris 10 Update 5 (05/08), we optimized interrupt management ( cr=5017144), math operations (cr=6491717). We also streamlined CPU yield (cr=6495392) and cache hierarchy (cr=6495401).
[*]In Solaris 10 Update 6 (10/08), we optimized libraries and implemented shared context for Jupiter (cr=6655597 & 6642758)
[*]In Solaris 10 Update 7 (05/09), we enhanced MPXIO as well as the PCI framework (cr=6449810 and others) and improved thread scheduling (cr=6647538). We also enhanced Mutex operations (cr=6719447).
Finally, in Solaris 10 Update 8 , after long customer escalations, we fixed the single threaded nature of callout processing (cr=6565503-6311743). [[I]This is critical for all calls made to nanosleep & usleep[/I].] We also improved the throughput & latency of the very common e1000g driver (cr=6335837 + 5 more) and optimized the mpt driver (cr=6784459). We cleaned up interrupt management (cr=6799018) and optimized bcopy and kcopy operations (cr=6292199). Finally, we improved some single threaded operations (cr=6755069).
[/LIST] My initial SPARC64 VII iGenOLTP tests were done with Solaris 10 Update 4. But I could not test the new SPARC64 VII 2.88Ghz with this release because it was not supported ! Therefore, I had to compare the new chip performance to SPARC64VII 2.52Ghz using each S10U4 and S10U8. [U]We will see below that most of the improvements are not coming from the frequency increase but from Solaris itself.[/U]
[B]Chips & Chassis[/B]
Please find below , the key characteristic[I]s [/I]of the chips we have tested [I]:[/I]
[B]SPARC64 VII (+)[/B]
356 sq mm
421 sq mm
421 sq mm
421 sq mm
[I]Note on [/I][I][B](+)[/B][/I][I]: The new SPARC64 VII is not officially labeled with a plus sign in order to reflect the absence of new features.[/I]
Now, here is our hardware list. Note that to avoid the need for a huge Client system, we ran this iGenOltp workload in a Console/Server mode. It means that the Java processes sending SQL queries via JDBC are running directly on the server tested. While this model was unusual ten years ago in the era of Client/Server, it is more and more commonly found today in new customer deployments.
[B]Total hardware threads[/B]
Solaris 10 Update 4
Solaris 10 Update 4
Solaris 10 Update 4 & 8
Solaris 10 Update 8
[I]64 GB cache[/I]
[I]200 Hitachi HDD[/I]
[I][B]Note on (~):[/B][/I] [I]While the system clock has not changed, the new M9000 CMUs are equipped with an optimized Memory Access Controller labeled MAC+. The MAC+ chip set is critical for system reliability, in particular for the memory mirroring and memory patrolling features. We have not identified performance improvements linked to this new feature.[/I]
[I][B]Note on (*):[/B][/I] [I]Those domains have 128GB total memory. To compare apple-to-apple, 64GB of memory are allocated, populated and locked in place with my very own _shmalloc tool.[/I]
The [B]iGenOLTPv4 workload[/B] is a Java-based lightweight OLTP database workload. Simulating a classic Order Entry system, it is tested in stream mode (I.e no wait time between transactions). For this particular exercise, we have created a very large database of 8 Terabyte total. This database is stored on the SE9990V using Oracle ASM. We query 100 million customer identifiers on this very large database in order to create an I/O intensive (but not I/O bound) workload similar to the largest OLTP installations in the world. (Example : the E25ks running the bulk load of Oracle internal applications). The exact throughput in number of transactions per second and average response times are reported and coalesced for each scalability level. [U]For this test, we used Solaris 10 Update 4 & 8, Java version 1.6 build 16, and the Oracle database server 10.2.0.4[/U]
Performance notes :
[LIST] [*]In peak, the new SPARC64VII 2.88Ghz produce 1.10x OLTP throughput compared to the 2.52Ghz on S10U8.
[*]But compared to the 2.52Ghz chips on S10U4, the ratio is 1.54x and compared to the SPARC64 VI it is 2.38x.
[*]For a customer willing to upgrade a E25k equipped with 1.5Ghz chips, the throughput ratio is 4.125 ! It means that we can easily replace a 8 boards E25k with a 2 boards M8000 for better throughput and improved response times.
Average transaction response times in peak are [B]126 ms[/B] on the UltraSPARC IV+ domain, [B]87ms[/B] on the SPARC64 VI, [B]82 ms[/B] on the SPARC64VII 2.52Ghz (U4), [B]77 ms[/B] on the SPARC64 VII 2.52Ghz (U8) and [B]72 ms[/B] on the latest chip.
As expected, Oracle OLTP improvements due to the new SPARC64VII chip are modest using the latest Solaris 10. However, all the customer already in production using previous release of Solaris 10 will see throughput improvement up to 1.54x. Most likely, this is enough to motivate a refresh of their system. And all E25k customers have now a very interesting value proposition with our M8000 and M9000 chassis.
See you next time in the wonderful world of benchmarking....