I'd like to show the benefit of the HybridStorage Pool (HSP) in the simplest possible way, as an addition tomy topspeed results for theSunStorage 7410.The HSP uses both read and write optimized flash based solid state disks (SSDs) toimprove the performance of file I/O, with the ZFS L2ARC and SLOG (seperate ZIL) technologies:

  • old model
    new model
    with ZFS
I've demonstrated both the HSP technologies previously in detail, and explained how they work:
Here I'll test how these HSP technologies perform with simple workloads, to measure best-case latency.

Disclaimer: Performance isn't simple. Measuring the speed of simpleworkloads can be easy; inferring from this how a system will behave in anygiven production environment can be quite hard. The best way to understandperformance is to test the system with a workload that closely resembles theproduction environment.

Latency

The results I'll measure for the L2ARC and SLOG will be latency, not IOPS orthroughput. It doesn't make sense to compare IOPS values alone while ignoringthe latency of those IOPS ("10,000 IOPS" from flash memory does not equal"10,000 IOPS" from slow rotating disks!) Throughputisn't characteristic either - while the HSP delivers great throughput, it usuallydoes so by minimizing use of the L2ARC and SLOG and using multiple diskspindles instead.

The latency presented here will be measured from the NFS layer usingAnalyticson the Sun Storage server. This latency will include the overheads from processing the NFS protocol, the ZFS filesystem, and waiting for the flash baseddevice I/O itself.The reason is to best portray the delivered performance of the whole product, rather than showing component performance.

Testing Notes

A couple of notes about performance testing SSDs.
  • Age the SSDs: Flash memory based SSDs can be considerably faster the firsttime you fill them. The second time they are populated the I/O latency canbecome much worse as the data becomes fragmented. (Fragmentation may sound oddin the context of flash memory, but this behavior does exist.)So, the SSDs I test are aged - they've been completely filled with data many timesbefore.
  • Use Gbytes of data: Like rotating disks, SSD devices can have on-disk DRAM to cache I/O. If you are interested in flash memory read latency buttest this by rereading a small amount of data (Mbytes), you may hit from the on-disk cache instead. I use large working sets (in terms of Gbytes) to avoidthis.
L2ARC

See my screenshots forthe full introduction to the L2ARC. To characterize it, I'll perform random512 byte reads over NFSv3 from a single thread on a single client. The targetis a 7410 (Barcelona) with 6 L2ARC devices (Readzillas). Some of the workloadwill hit from DRAM, some from the L2ARC, and some from disk. The amount fromeach will depend on cache warmth and working set size; what we are interestedin here is the latency.

To find the top speed, the ZFS record size is set to 512 bytes to match the I/O size: but before you try this yourself, understand thatZFS record sizes smaller than 4 Kbytes do begin to cost noticeable amounts ofDRAM (bigger metadata/data ratio) and significantly reduce streaming I/Operformance. Before this sounds too unlikely to be interesting, note that aworkload involving thousands of tiny files may behave in a similar way; the ZFSrecord size only takes effect if the file becomes bigger than that size.

The following screenshots show NFS read latency heat maps at different verticalranges (click for larger versions), each corresponds to the HSP technology returning that read. I've also listed the latency range which covers most of the I/O:

DRAM latency: 0 - 10 us

This shows hits from the ZFS DRAM cache (the ARC: Adaptive Replacement Cache), which are fast asexpected.

L2ARC latency: 128 - 260 us

The L2ARC is returning these 512 byte reads with consistently low latency.The range may sound high when compared to advertised SSD 512 byte read latency, butremember that these are aged SSDs, and this time includes the ZFS and NFSoverheads.

Disk latency: 1900 - 9700 us

These 7200 RPM disks are returning the reads with latency from 2 to 10 ms,as expected for random disk I/O (platter rotation time + head seek time.)

SLOG

See myscreenshots forthe full introduction to the SLOG. To characterize it, I'll perform 512 byteO_DSYNC writes over NFSv3. Since these will all be handled by the SSD SLOGdevice (Logzilla), I'll also show what happens without them - the synchronouswrite latency to disk. As before, I'll show screenshots and latency ranges.

SLOG latency: 137 - 181 us

Most of the I/O is in a tight band in the 100 to 200 us range (the screenshot doesn't look so tight due to the false color palette; see the range average values on the left.)

Disk latency: 1950 - 9170 us

Without the SSD based SLOG devices, synchronous write I/O is served here from 7200 RPM disks, with a latency between 2 and 9 ms.

Reality Check

While I've been showing top speeds of the L2ARC and SLOG devices, I can helpset expectations by describing cases where you will be slower (or faster!) thanthese top speeds.
  • I/O size. To find the top speeds above I used a application I/O sizeof 512 bytes. Bigger I/O takes longer, although most of the time for the 512byte I/O is processing the I/O - not data transfer, so this won't scale as bad as it may sound. As an example, see this L2ARC 4 Kbyte I/Oscreenshot - it's a little slower than the 512 byte I/O, but not 8 timesslower.
  • FS record size. For the L2ARC test I set the filesystem recordsize to 512 bytes before creating the files, so that the L2ARC devices would beaccessed in 512 byte blocks. To keep metadata/data ratios in check and to keep up streaming performance, I don't usually set the record size to below 4Kbytes (although a smaller record size may be used inadvertentlydue to thousands of small files.) The previousscreenshotwas also with a 4 Kbyte FS record size, which (along with the I/O size)contributed to the slightly higher latency.
  • Threads. For these top speed tests, I only used one client process (andone thread) to perform I/O. If multiple threads access the L2ARC and SLOG devices simultaneously, then contention can increase the latency. The more simultaneous threads, the worse it gets. This is the same for disk I/O, butwith the HSP we usually have more disk spindles than SSD devices, so the poolof disks can in some cases handle more concurrent I/O than the SSDs can (dependson the number of disks and pool profile.)
  • SSD technology. The latencies shown above are for the current SSD devices we are shipping with the Sun Storage 7000 series. SSD technology hasbeen improving quickly - so I'd expect these latencies to get better over time.
  • Workload. The L2ARC is currently tuned for random reads, and theSSD SLOG devices are used for synchronous writes. If you test streaming reads or asynchronous writes, the HSP will deliver greatperformance - but it may not use the SSD devices very much. In these cases the latency will often be better - but that isn't showing SSD latency.The HSP uses the best tool for the job - which may or may not be the L2ARC andSLOG devices.
Summary

NFS latency ranges for the L2ARC:
Here DRAM latency doesn't quite register a pixel in height on the linearplot. The logarithmic plot shows the role of the SSD based L2ARC very well,in between DRAM and disk.

NFS latency ranges for the SLOG:
Conclusion

When the HSP uses read and write optimized flash memory SSDs the NFS I/O latency canreach as low as 0.1 ms, compared to 2 to 10 ms without these SSDs (disk + DRAM only.)These results aren't really surprisingconsidering the technology.

I've updatedmy topspeeds with these latency results.And as top speeds go, this is as good as itgets - your latency will be higher with larger I/O sizes.



More...