I have been involved in a customer situation on and off for at least 6 months now. The customer had been seeing performance issues with their application running on Oracle 10g RAC. We looked through the mounds of data initially and noticed that they were indeed waiting quite a bit on Global cache buffer waits. This was during times of fairly heavy load and we could see the CPU was fairly busy with interrupts as well. After looking at the MTU size for the cluster interconnect, we noticed that it was incorrectly set to the default (1500). Thus started the odyssey to implement Jumbo Frames.
The default MTU is 1500 for Solaris, but this is not ideal when Oracle is using an 8K block size. Simple math tells us that you will require 6 transfers to transmit just one block of data across the cluster interconnect. This just creates additional overhead on the server and additional latency waiting for global blocks to be transferred. Changing the MTU to be a "jumbo frame" of 8K or greater is fairly simple from a technical point of view, but it can quickly turn into a political issue.
The cluster interconnect is often relegated to be the responsibility of the networking group. No problem right? While this is a network component, it is really part of the server - no different really from a PCI bus or processor back plane. The networking groups will often apply their tried and true methods for LANs around the company, but this doesn't translate to RAC. Modern network switches can easily handle this configuration change as well, but policy often wins. The networking group assures everyone their switch can handle the traffic with the default MTU and everyone goes on their merry way.
So, what happened?
After months looking at "other things", they finally were convinced to try this "Best Practice" with Jumbo Frames. Immediately, they saw:
  • 50% reduction in CPU overhead
  • 75% reduction in Global cache buffer waits
  • IP Reassemblies dropped by 10x
Moral of the story: Implement Jumbo Frames for Oracle RAC interconnects... It is a best practice after all

More...