What is a Terabyte?

• 08-27-2010, 09:10 PM
UnixBot
What is a Terabyte?
While this may be surprising to some people, one discussion I seem to have more frequently than any other is one surrounding usable capacity. Not surprisingly, capacity is a key metric by which customers buy storage, but what may be surprising is how poorly it is understood. I'll attempt to demystify things a little.

Before we get too far I think it's worth explaining what a Terabyte is. I often get the question "Isn't 1TB just one trillion bytes?" That is a perfectly reasonable conclusion to draw - tera is the standard prefix for trillion in base 10. The thing we need to remember that computers don't do decimal (base 10) math like we do, they do binary (base 2) math. As a computer understands it, 1TB is 240, or 1,099,511,627,776 bytes. Similarly, 1GB is actually 230, or 1,073,741,824 bytes.

Surely then when we buy a 1TB drive in our storage array, or from the local electronics store, it must contain 1,099,511,627,776 bytes right? Wrong. If you refer to the specifications of nearly any disk on the market today you'll see the capacity footnoted with text something like this (found on a Seagate specification sheet) "1 One gigabyte, or GB, equals one billion bytes and one terabyte, or TB, equals one trillion bytes when referring to drive capacity". Technically this is correct, since the prefixes 'giga' and 'tera' do describe billions and trillions, but it leaves a little to be desired when we're talking in terms the computer understands.

I can't honestly pinpoint when this footnoting of capacity started happening without going over a lot of old spec sheets, but I would guess it would date back to the time when drives in the gigabyte range started to become available. Previous to that I can recall looking at the spec sheets for drives in which the number of megabytes of capacity was well documented. In fact, the geometry of the disk was described in painstaking detail including the number of spare sectors per cylinder, and the number of spare cylinders. But then, those were the days when you needed that information to use the drive.

In any case, it seems as though at some point marketing decided that bigger, rounder units were convenient and sexy, and as consumers we accepted this as 'close enough'. It may be due to the fact that in those days the difference was smaller (1 billion bytes vs 1GB would work out to about a 70MB difference).

So now let's see how the gap magnifies as the scale gets bigger. The table below shows how the ratio between SI units (standard mega, giga prefixes in base 10) compares with binary units; it is borrowed from a wikipedia article [URL="http://en.wikipedia.org/wiki/Tebibyte"]here[/URL]. What we can see from the table is that the ratio between SI units (standard base 10 mega, giga, tera, peta) and binary units (base 2) grows as the units grow bigger. A 1TB disk that you buy today actually contains a little over 931GB of space. If we could manufacture a 1PB disk, it would contain something like 909TB of space.

[FONT=sans-serif] Multiples of [URL="http://en.wikipedia.org/wiki/Byte"]bytes[/URL] [URL="http://en.wikipedia.org/wiki/SI_prefix"]SI decimal prefixes[/URL] [URL="http://en.wikipedia.org/wiki/IEC_60027"]IEC[/URL] [URL="http://en.wikipedia.org/wiki/Binary_prefix"]binary prefixes[/URL] Name
(Symbol) Standard
usage[/URL] Ratio
SI/Binary Name
(Symbol) Value [URL="http://en.wikipedia.org/wiki/Kilobyte"]kilobyte[/URL] (kB) 103 210 0.9766 [URL="http://en.wikipedia.org/wiki/Kibibyte"]kibibyte[/URL] (KiB) 210 [URL="http://en.wikipedia.org/wiki/Megabyte"]megabyte[/URL] (MB) 106 220 0.9537 [URL="http://en.wikipedia.org/wiki/Mebibyte"]mebibyte[/URL] (MiB) 220 [URL="http://en.wikipedia.org/wiki/Gigabyte"]gigabyte[/URL] (GB) 109 230 0.9313 [URL="http://en.wikipedia.org/wiki/Gibibyte"]gibibyte[/URL] (GiB) 230 [URL="http://en.wikipedia.org/wiki/Terabyte"]terabyte[/URL] (TB) 1012 240 0.9095 [B]tebibyte[/B] (TiB) 240 [URL="http://en.wikipedia.org/wiki/Petabyte"]petabyte[/URL] (PB) 1015 250 0.8882 [URL="http://en.wikipedia.org/wiki/Pebibyte"]pebibyte[/URL] (PiB) 250 [URL="http://en.wikipedia.org/wiki/Exabyte"]exabyte[/URL] (EB) 1018 260 0.8674 [URL="http://en.wikipedia.org/wiki/Exbibyte"]exbibyte[/URL] (EiB) 260 [URL="http://en.wikipedia.org/wiki/Zettabyte"]zettabyte[/URL] (ZB) 1021 270 0.8470 [URL="http://en.wikipedia.org/wiki/Zebibyte"]zebibyte[/URL] (ZiB) 270 [URL="http://en.wikipedia.org/wiki/Yottabyte"]yottabyte[/URL] (YB) 1024 280 0.8272 [URL="http://en.wikipedia.org/wiki/Yobibyte"]yobibyte[/URL] (YiB) 280 [/FONT]

As you can see above, a new naming convention has been created to describe storage capacity in binary terms. So a terabyte is actually not a terabyte, but rather a tebibyte or TiB. The problem is that I feel like the only guy using the term; that may be why I feel like I'm having this conversation all the time.

To make matters worse, in enterprise storage systems, there is additional overhead consumed by things like storage system meta data. In some cases this additional overhead can consume more than 10% of the purchased capacity in TiB. I doubt there's any changing the disk industry at this point, but you can challenge your enterprise storage system suppliers to tell you about the usable capacity of their system in tebibytes - and when they ask what that is (because they likely will), send them here. I've helped my customers decode the real capacity of my competitors systems while those competitors struggled to accurately describe a terabyte.

If you'd prefer to buy storage from a company that's transparent, look no further than Oracle. The size calculator that I maintain (the latest release as of this writing is [URL="http://fixunix.com/rdm/entry/sizing_j4410"]here[/URL]) for our 7000 series tells you *exactly* how much capacity you will be able to use when you power up the system including all overhead, and allows you to see how much capacity you will have later when you expand your system.

EOF