It's been a long time since I last had the time to write about out something really interesting, and a number of great products and technologies have seen the light over the last few months.
Lately I have been asked by many customers to talk about a specific product that hit the spotlight in Portugal last year and is becoming a serious best seller candidate for the next few months.
Not surprisingly, I'm referring to the Sun Oracle Database Machine, alias for the improved Oracle's Exatada V2 system, built with Oracle software and Sun hardware equipped with Intel processors.

This system has presented itself as the best Datawarehousing platform, but is also touted as the best OLTP platform, and that is puzzling many people, because of the clear distinction between these two complex types of workloads.
In fact, many people have spent lots of time tuning platforms for each of these workloads with clear, distinct performance goals - Datawarehousing platforms have been designed and configured to maximize throughput and data loading bandwidth, along with immense and ever increasing capacity, while OLTP platforms are often carefully tuned to withstand an increasing number of concurrent transactions, minimize latencies, improve response times, increase security and resiliency.
With so many knobs to turn, and because these systems usually depend on external components for networking, storage, security and management, a standard way of dealing with this type of problems is to insulate each platform using separate hardware components, each configured to tackle different bottlenecks.

A datawarehouse needs to process data, lots of data... in fact these days we are talking about huge amounts of data, easily growing past 20 Terabytes and ever increasing. Wow, that definitely requires a fair amount of plumbing work on the data pipes. But traditional architectures tend to rely on centralized, monolythic storage behemoths fitted with hundreds of fast, enterprise-grade fibre channel disks, and a storage area network. Nice, but how much can we hope to improve in the data pipes?
FC works at 8Gbps nowadays, FC Disks are still 4Gbps each, and what about the CPUs controlling the storage unit? Are they state-of-the-art? usually not, as this type of systems tend to be very stable and homogeneous to ensure long lifecycles. That doesn't solve our problems.
Add more FC ports (and switches)? well, trunking has limitations, and on both ends we have systems with limited scalability. Oh, wait, on the database side, we have already evolved to new grid architectures that spread the work among several commodity, standards-based, latest technology hardware. Hmm, then why not apply that same concept on the storage side? Could we use a grid of storage servers, each based on commodity, standards-based, state-of-the-art hardware? Sure, if we could only change the way the database works...

Yes, of course that has been done. Oracle has created a special purpose product, called the Exadata Storage Server, coupled with Sun's FlashFire-enabled Storage Servers. With carefully architected volumes using grid disk interleaving for faster data access to frequently used information, this is the building block of the new concept of massively scalable storage grids, that take the stage at the back-end of the Exadata Database Machine. The interconnection between both grids (database grid and storage grid) is the current best low latency network technology, redundant 40Gbps Infiniband, supported by Sun engineered Infiniband Datacenter Switches. Also, the Oracle 11g R2 product has been improved to further enhance the data flow. Some of the most critical data processing tasks, even data mining scoring, are delegated to each storage server. This products bundles an advanced compression technology, called Hybrid Columnar Compression, that can drastically reduce the disk space required for the data.

So the mystery behing the Exadata datawarehousing capabilities are: Huge, parallel data pipes, intelligent disk layout, delegated execution of SQL operations on the feature-rich storage grid with Hybrid Columnar Compression.

You could say that's a solution that doesn't address the OLTP problems. Remeber? latency, response time, concurrency...
Online transaction processing is a whole different game. We shouldn't even need large data pipes, in fact we need to process large amounts of small, random I/O operations, and that could be by itself a problem in a traditional architecture, because each disk usually can handle only up to around 300 I/O operations per second, but on top of that, on a traditional architecture the database server has to sip lots of data from the disks before being able to filter out and discard unwanted data, based on specific query parameters. So what did Oracle do inside the Exadata V2 Database Machine to address that?

First, there's the data volume problem, but if you remember, the Exadata delegates data processing tasks to the storage servers themselves. This includes row filtering (using the Smart Scan technology), but there are other interesting features, like the new concept of Storage Indexes. Each storage server maintains an index of data present on each disk block, so that it can easily check which blocks are worth reading. Also, the database can read compressed data, so the savings in disk space translate also in bandwidth and response time savings. The Exadata interconnect carries only 10% of the data usually flowing in a traditional, centralized storage, SAN-based architecture.

OK, what about speed then? I still have 300 IOPS per disk, right? This is where another piece of magic happens. Thanks to the leading Flash technology-based devices from Sun, the storage servers are equipped with 4 PCI cards each carrying 96GB of persistent Flash storage, capable of handling tens of thousands (yes, I mean ~75000) of I/O operations per second. These devices are tightly integrated with Oracle's software, which is able to use them for smart, hot-data caching. If we add up the benefits of partition pruning (addressing only a subset of the data), Compression (less data to read), Storage Indexes (disreguarding non-interesting disk blocks), Smart Scanning (filtering the data on the storage nodes), and intelligent caching on Flash devices, then we understand why the Exadata V2 Database Machine has to be the fastest database platform in the world.

Hey, wait a minute, even if the technology and architecture is able to address both problems, how can I consolidate a datawarehouse that floods the Infiniband pipes with an OLTP that wants sub-millisecond responses from the system?

Well, the storage server itself addresses that, by use of I/O Resource Management, a technology that complements the traditional network and cpu resource management with I/O bandwidth management, enabling the configuration of distinct profiles within the same Exadata box.

I find myself amazed with the results of the combined hardware/software work performed by Sun and Oracle to come up with this product, and that was before we merged the teams. Now you should expect more and more amazing products to come out in the future, and I will try my best to keep my readers updated more often than until now. I also want to take this opportunity to thank Luis Campos from Oracle Portugal, with whom I have been working closely on several occasions, for many of the expert knowledge bits I have learned about the software components of this product.

Read More about [Best of Both Worlds...