Updated Performance Limit Summary

I was able to squeak out a few more bytes/second in the streaming DRAM test for IPoIB and have achieved a respectable upper bound for RDMA streaming disk reads for this Sun Storage 7410 configuration. The updated summary is below with links to the relevant Analytics screenshots. I'll update this summary as I gather more data.

RDMA IPoIB NFSv3 Streaming DRAM Read
3.18 GBytes/second **

2.40 GBytes/second*
NFSv3 Streaming Disk Read
2.51 GBytes/second **

1.51 GBytes/second *
NFSv3 Streaming Write
1.00 GBytes/Second **

752 MBytes/second *
NFSv3 Max IOPS - 1 byte reads

NFSv3 Max IOPS - 4k reads

NFSv3 Max IOPS - 8K reads


The IPoIB numbers do not represent the maximum limits I expect to ultimately achieve. On the 7410, we are well under resource utilization for CPU and disk. In the I/O path, we are no where close to saturating the IB transport and the hypertransport and PCIe root complexes have plenty of head room. The problem is the number of clients. As I develop a better client fabric, expect these values to change.


With NFSv3/RDMA, I am able to hit maximum limits with the current client configuration (10 clients). Except, that is, max IOPS. In the streaming read from DRAM test , I was able to hit the limit imposed by the PCIe generation 1 root complexes and downstream bus. For the streaming read/write from/to disk, I am able to reach the maximum we can expect from this storage configuration. The throughput numbers are given in GBytes/second for the transport. These values were gathered on the subnet manager using a script (based on OFED utilities) I wrote to collect and report the the port statistics. These numbers include network header payload that the IOPS data does not in the Analytics screenshot.

Fabric Configuration

Filer: Sun Storage 7410, with the following config:

  • 256 Gbytes DRAM
  • 8 JBODs, each with 24 x 1 Tbyte disks, configured with mirroring
  • 4 sockets of six-core AMD Opteron 2600 MHz CPUs (Istanbul)
  • 2 Sun DDR Dual Port Infiniband HCA
  • 3 HBA cards
  • noatime on shares, and database size left at 128 Kbytes
Clients: 10 blades, each:
  • 2 sockets of Intel Xeon quad-core 1600 MHz CPUs
  • 3 Gbytes of DRAM
  • 1 Sun DDR Dual Port Infiniband HCA Express Module
  • mount options:
    • read tests: mounted forcedirectio (to skip client caching), and rsize to match the workload
    • write tests: default mount options
Switches: 2 internal Sun DataCenter 3x24 Infiniband switches (A and C)

Subnet manager:

  • Centos 5.2
  • Sun HPC Software, Linux Edition
  • 2 Sun DDR Dual Port Infiniband HCA
NFSv3 Streaming Disk Reads

I was able to achieve a maximum read limit for NFSv3 streaming read from disk for RDMA. As with my previous tests, I have a 10 client fabric connected to the same Sun Storage 7410. The clients are split equally between two subnets and connected to two separate HCA ports on the 7410. Each client has a separate share mounted. For the read from disk tests, I'm using all 10 clients each running 10 threads to read 1 MB of data (see Brendan's seqread.pl script) from its own 2GB file. The shares are mounted with rsize=128K.

Update on Maximum IOPS

I'm still waiting to run this set of tests with a larger number of clients. But in the interim, I wanted to make sure that adding those clients would indeed push me to the limits of the 7410. To validate my thinking, I ran a step test for the 4k maximum IOPS test. Here, we can see the stepwise function of adding two clients at a time plus one at the end for a maximum of 9 clients.

We're scaling nicely: every two clients adds roughly 42000 IOPS per step and the last client adds another 20000. We're starting to reach a CPU limit but if I add just 5 more clients, I can match Brendan's IOP max of 400K. I think I can do it! Stay tuned...