I just read a blog entry that just made my grin, see


Amazon using FedEx to upload large amounts of data to S3 cloud

I still remember the days when networks were token ring, so complicated that you couldn't transfer data between computers even though they were connected. Then there is the complication of how to setup a network share with the right permissions, the right firewall permissions, that still cause people, including myself, to have to resort to a USB drive to transfer data even to this day.


The above story though I thought was pretty funny. Even with all this abundant bandwidth, we still find ourselves in situations where it is still cheaper and easier to transfer data through physical means, such as using FedEx.


Something we will need to contend with very shortly I feel, is the the volume of data that we are storing and managing today. Just one of my machines at home has 4 1TB drives, and I am using about 1.5TB of that. Have you taken a look at how LONG it takes to back up this data, or to just transfer it from one drive to another? Just a few months back I upgraded to a new Mac and it took 4.5 hours just to get my user directory replicated over to the new Mac. If its time consuming to do it locally, how can we expect to be able to do it remotely?


I think we need to start working on tools that will allow us to manage all this data so that we can figure out what we have and deal with duplicates and other such tricks to control how much we are actually in need of using. I think that it is a falacy that data storage is free. The fact that a 1TB disc drive costs nothing, is not where the cost is. The cost is in how do I back it up, deal with data loss, and more importantly deal with finding a use for this data. Even today when I do a search on just my laptop, it can take quite some time for me to be able to find "data"/document I am looking for. Finding the right keyword/search phrase to find it, and trusting that my index is up to date and will find what I am looking for, is a real pain. Storing data without being able to easily assign some meta data/context to it, seems to make all this data retention not as useful as it seems.


Oh well, this started out as a chuckle on sneaker nets and turned into a bit of a rant, nevetheless I think data storage and retrieval are critical things we need the industry to help us with.





Read More about [Anyone remember what a "sneaker net" is?...