What is a Terabyte?

While this may be surprising to some people, one discussion I seem to have more frequently than any other is one surrounding usable capacity. Not surprisingly, capacity is a key metric by which customers buy storage, but what may be surprising is how poorly it is understood. I'll attempt to demystify things a little.

Before we get too far I think it's worth explaining what a Terabyte is. I often get the question "Isn't 1TB just one trillion bytes?" That is a perfectly reasonable conclusion to draw - tera is the standard prefix for trillion in base 10. The thing we need to remember that computers don't do decimal (base 10) math like we do, they do binary (base 2) math. As a computer understands it, 1TB is 240, or 1,099,511,627,776 bytes. Similarly, 1GB is actually 230, or 1,073,741,824 bytes.

Surely then when we buy a 1TB drive in our storage array, or from the local electronics store, it must contain 1,099,511,627,776 bytes right? Wrong. If you refer to the specifications of nearly any disk on the market today you'll see the capacity footnoted with text something like this (found on a Seagate specification sheet) "1 One gigabyte, or GB, equals one billion bytes and one terabyte, or TB, equals one trillion bytes when referring to drive capacity". Technically this is correct, since the prefixes 'giga' and 'tera' do describe billions and trillions, but it leaves a little to be desired when we're talking in terms the computer understands.

I can't honestly pinpoint when this footnoting of capacity started happening without going over a lot of old spec sheets, but I would guess it would date back to the time when drives in the gigabyte range started to become available. Previous to that I can recall looking at the spec sheets for drives in which the number of megabytes of capacity was well documented. In fact, the geometry of the disk was described in painstaking detail including the number of spare sectors per cylinder, and the number of spare cylinders. But then, those were the days when you needed that information to use the drive.

In any case, it seems as though at some point marketing decided that bigger, rounder units were convenient and sexy, and as consumers we accepted this as 'close enough'. It may be due to the fact that in those days the difference was smaller (1 billion bytes vs 1GB would work out to about a 70MB difference).

So now let's see how the gap magnifies as the scale gets bigger. The table below shows how the ratio between SI units (standard mega, giga prefixes in base 10) compares with binary units; it is borrowed from a wikipedia article here. What we can see from the table is that the ratio between SI units (standard base 10 mega, giga, tera, peta) and binary units (base 2) grows as the units grow bigger. A 1TB disk that you buy today actually contains a little over 931GB of space. If we could manufacture a 1PB disk, it would contain something like 909TB of space.

Multiples of bytes SI decimal prefixes IEC binary prefixes Name
(Symbol) Standard
SI Binary
usage
Ratio
SI/Binary Name
(Symbol) Value kilobyte (kB) 103 210 0.9766 kibibyte (KiB) 210 megabyte (MB) 106 220 0.9537 mebibyte (MiB) 220 gigabyte (GB) 109 230 0.9313 gibibyte (GiB) 230 terabyte (TB) 1012 240 0.9095 tebibyte (TiB) 240 petabyte (PB) 1015 250 0.8882 pebibyte (PiB) 250 exabyte (EB) 1018 260 0.8674 exbibyte (EiB) 260 zettabyte (ZB) 1021 270 0.8470 zebibyte (ZiB) 270 yottabyte (YB) 1024 280 0.8272 yobibyte (YiB) 280

As you can see above, a new naming convention has been created to describe storage capacity in binary terms. So a terabyte is actually not a terabyte, but rather a tebibyte or TiB. The problem is that I feel like the only guy using the term; that may be why I feel like I'm having this conversation all the time.

To make matters worse, in enterprise storage systems, there is additional overhead consumed by things like storage system meta data. In some cases this additional overhead can consume more than 10% of the purchased capacity in TiB. I doubt there's any changing the disk industry at this point, but you can challenge your enterprise storage system suppliers to tell you about the usable capacity of their system in tebibytes - and when they ask what that is (because they likely will), send them here. I've helped my customers decode the real capacity of my competitors systems while those competitors struggled to accurately describe a terabyte.

If you'd prefer to buy storage from a company that's transparent, look no further than Oracle. The size calculator that I maintain (the latest release as of this writing is here) for our 7000 series tells you *exactly* how much capacity you will be able to use when you power up the system including all overhead, and allows you to see how much capacity you will have later when you expand your system.

EOF