Long term archival storage - Storage

This is a discussion on Long term archival storage - Storage ; I have around 20 Tb of data that I (a) want to store for a very (>50 years)long time and also have available for search and download. The data consists of two types: (a) the preservation masters which is the ...

+ Reply to Thread
Results 1 to 14 of 14

Thread: Long term archival storage

  1. Long term archival storage

    I have around 20 Tb of data that I (a) want to store for a very (>50
    years)long time and also have available for search and download.

    The data consists of two types:

    (a) the preservation masters which is the data we want to keep and is
    in tiff
    and bwf formats among others

    (b) the viewing copies which are in derived formals such as png and
    mp3.

    I am coming down to an HSM type of solution which a large enough front
    end cache to allow us to keep the viewing copies online at all times
    but which allows the archival copies to disappear off to tape to be
    cloned and duplicated etc.

    Anyone else doing this? Anyone got a better idea?

    -dgm

  2. Re: Long term archival storage

    doug.moncur@gmail.com (dgm) writes:
    > I have around 20 Tb of data that I (a) want to store for a very (>50
    > years)long time and also have available for search and download...
    > (a) the preservation masters which is the data we want to keep and is
    > in tiff and bwf formats among others>
    > (b) the viewing copies which are in derived formals such as png and mp3.


    You have a bunch of images and audio recordings. If the images are
    scanned paper docs, the tiff files won't be substantially smaller than
    the png files. If they're photographs, you probably want to view jpg
    rather than png. In this case the mp3 and jpg files will probably be
    less than 1/10th the size of the originals, or about 2 GB total, not
    much at all. A small RAID disk system can hold that much easily, on
    say six 400GB drives.

    > I am coming down to an HSM type of solution which a large enough front
    > end cache to allow us to keep the viewing copies online at all times
    > but which allows the archival copies to disappear off to tape to be
    > cloned and duplicated etc.


    Sounds kind of complicated. Where's this data now, how is it stored,
    and how fast are you adding to it and through what kind of system? 20
    TB isn't really big storage these days. You could have a small tape
    library online and move incoming raw data to tape immediately while
    also making the online viewing copies on disk. HSM systems with
    automatic migration and retrieval are probably overkill.

  3. Re: Long term archival storage

    depends, a lot of companies have regulatory requirements that data be
    kept for a long time. If it needs to be accessible quickly, the ATA
    drive solution is advisable. I think the cost is around $7 per GB for
    stuff like netapp nearstore solutions. The particular product which
    would simulate a worm drive is called snaplock.


  4. Re: Long term archival storage

    >50 years is a long time by archival standards. Most media/drive
    manufacturers guarantee 30 years under nominal conditions. You'll have
    to store your masters under optimal conditions and check them regularly
    (read them every couple of years, check error rate, copy if needed
    etc.) to go beyond that. Which raises the question where you'll find
    the equipment (spares, service) a couple decades down the road. Example
    from my perspective: If you have DLTtape III tapes lying around from
    the early nineties, you'd better do something about them now since
    Quantum EOL'd the DLT8000, which is the last generation that will read
    those tapes. Service will be available for another 5 years. That's 20
    of the 50 years.
    Basically you'll have to copy the data to state of the art media about
    every decade. Don't try to store spare drives for the future, that
    doesn't usually work - electromechanical devices age when they're not
    in use too.
    There have been numerous stories about the problems NASA has retrieving
    old data recordings. Your project will face the same. Fortunately 20 TB
    isn't a big deal any more and will be less so in the future. The front
    end doesn't really matter, but the archive will need a lot of thought
    and care. Think what the state of the art was 50 years ago.


  5. Re: Long term archival storage

    "RPR" wrote in message
    news:1112123184.224224.230810@g14g2000cwa.googlegr oups.com...
    > Basically you'll have to copy the data to state of the art media about
    > every decade. Don't try to store spare drives for the future, that
    > doesn't usually work - electromechanical devices age when they're not
    > in use too.


    In addition to Ralf-Peter's comment, you better think long and hard about
    how you will be accessing that data 50 years from now, from an application
    point of view. 50 years from now, the computing devices will be radically
    different from today's PC's. Unless you have documented every bit about the
    format of the files you stored and the environment you need to recreate the
    information, even migration to state of the art media will not help.

    Consider a Word Perfect 4.2 file from 20 years ago. You'll need some effort
    today to open and read such a file. Because the format is relatively simple,
    you can still read the text using any hex editor. But recreating the page
    formatting maybe harder already.

    Now consider your MP3 and picture files which are heavily encoded en
    compressed, and fast forward to the year 2055. Unless you know exactly how
    they are recreated, all you'll have 50 years from now is a bunch of zeroes
    and ones. This is scary for single files, but things are even worse when
    multipple files form a single context. Think databases with external
    pointers. Think HTML files with web links. How much of that will exist 50
    years from now?

    For permanent long-term records, store the information on a medium that can
    be interpreted by the most universal and long-term computer you have - the
    one between your ears -. Microfiche and dead trees aren't obsolete just
    yet...

    Rob



  6. Re: Long term archival storage

    On 28 Mar 2005 19:49:21 -0800, doug.moncur@gmail.com (dgm) wrote:

    >I have around 20 Tb of data that I (a) want to store for a very (>50
    >years)long time and also have available for search and download.
    >
    >The data consists of two types:
    >
    >(a) the preservation masters which is the data we want to keep and is
    >in tiff
    > and bwf formats among others
    >
    >(b) the viewing copies which are in derived formals such as png and
    >mp3.
    >
    >I am coming down to an HSM type of solution which a large enough front
    >end cache to allow us to keep the viewing copies online at all times
    >but which allows the archival copies to disappear off to tape to be
    >cloned and duplicated etc.
    >
    >Anyone else doing this? Anyone got a better idea?
    >
    >-dgm



    Other replies have made several good points. Here's what we did at a
    former employer.

    All archived data was stored on NetApp Nearstore (any cheap disk will
    do though). No if's, and's, or but's. Reason being is whenever the
    next disk upgrade comes in the data is migrated along with it. no
    issue of recovery or media type not being available, the data set
    follows the technology.
    Disks were more expensive than tape (and may still be) but the
    guarantee of being able to at least access the data was worth it. As
    someone pointed out, you still have to deal with the application to
    read it but that can be tested along the way much easier if it's on
    disk. Heck, you could even package the app with the data; that's what
    we did.

    And as technology progresses, no matter what the main media storage
    type, there will always be migration techniques. Any vendor wanting
    you to migrate from your 15PB 4billion k magentic drives to their
    solid light storage will provide a migration path, guaranteed.

    The data can be backed up to tape for DR as you see fit. We sent
    copies offsite just for "smoking hole' purposes but mostly they were
    rotated weekly.

    As a proof of concept we did a forklift upgrade from the R100 to the
    R200. Just roll in the new and roll out the old. Went from 144GB
    drives to 266GB drives so existing data set took up about half of what
    it did. This will always be the case.
    The migration went fine and data that is now 8 years old is still
    spinning away on new disk with their applications. Now whether or not
    anyone knows how to work the app is another issue...

    ~F

  7. Re: Long term archival storage

    On Tue, 29 Mar 2005 23:36:05 +0200, "Rob Turk"
    <_wipe_me_r.turk@chello.nl> wrote:

    >"RPR" wrote in message
    >news:1112123184.224224.230810@g14g2000cwa.googlegr oups.com...
    >> Basically you'll have to copy the data to state of the art media about
    >> every decade. Don't try to store spare drives for the future, that
    >> doesn't usually work - electromechanical devices age when they're not
    >> in use too.

    >
    >In addition to Ralf-Peter's comment, you better think long and hard about
    >how you will be accessing that data 50 years from now, from an application
    >point of view. 50 years from now, the computing devices will be radically
    >different from today's PC's. Unless you have documented every bit about the
    >format of the files you stored and the environment you need to recreate the
    >information, even migration to state of the art media will not help.
    >
    >Consider a Word Perfect 4.2 file from 20 years ago. You'll need some effort
    >today to open and read such a file. Because the format is relatively simple,
    >you can still read the text using any hex editor. But recreating the page
    >formatting maybe harder already.


    Ok so a lot of converters do an incomplete job, but is this really so
    complicated? Save a copy of the application(s) and maybe the OS that
    ran it with the data. Between backwards compatibility and improving
    emulation technology it might be more doable than you think.

    Also keeping data for 50 years doesn't necessarily imply keeping
    storage devices for 50 years. Periodic upgrades of the storage and
    maybe even the file format of the data might be what needs to happen
    to realistically keep useable information for many decades. A major
    overhaul like this around every 10 years seems to be working for me
    pretty well. Waiting 15 years or more tends to be problematic.

    Your mileage may vary and, well, the past is not always a good
    indicator of the future.

  8. Re: Long term archival storage

    Faeandar writes:
    > All archived data was stored on NetApp Nearstore (any cheap disk will
    > do though). No if's, and's, or but's. Reason being is whenever the
    > next disk upgrade comes in the data is migrated along with it. no
    > issue of recovery or media type not being available, the data set
    > follows the technology.


    You seriously think you'll still be using that Netapp stuff in 2055?

  9. Re: Long term archival storage

    Curious George writes:
    > >Consider a Word Perfect 4.2 file from 20 years ago. You'll need
    > >some effort today to open and read such a file. Because the format
    > >is relatively simple, you can still read the text using any hex
    > >editor. But recreating the page formatting maybe harder already.

    >
    > Ok so a lot of converters do an incomplete job, but is this really so
    > complicated? Save a copy of the application(s) and maybe the OS that
    > ran it with the data. Between backwards compatibility and improving
    > emulation technology it might be more doable than you think.


    I would say that most of these conversion problems have stemmed from
    secret, undocumented formats. Formats like jpg and mp3, which are well
    documented and have reference implementations available as free source
    code, should be pretty well immune to the problems.

  10. Re: Long term archival storage

    On 29 Mar 2005 20:26:17 -0800, Paul Rubin
    wrote:

    >Faeandar writes:
    >> All archived data was stored on NetApp Nearstore (any cheap disk will
    >> do though). No if's, and's, or but's. Reason being is whenever the
    >> next disk upgrade comes in the data is migrated along with it. no
    >> issue of recovery or media type not being available, the data set
    >> follows the technology.

    >
    >You seriously think you'll still be using that Netapp stuff in 2055?


    It doesn't matter... he's saying that ANY vendor that wants to sell
    you some storage is going to provide a migration facility to do this.
    For example... EMC will happily migrate data off NetApp devices into
    Clariion or Symmetrix today.

    If they didn't do this, it would make it really hard to convince
    heavily entrenched users to move.

    We all know that disk-based storage will need to be migrated every 5
    years or so. As we know we're going to migrate, we're also sure that
    it's going to be possible to do this, rather than waiting for the
    technology to fall over before doing something about it.

    So in 2055 it could be NetAPP, EMC or ACME Storage Corp... it doesn't
    matter.

    HVB.

  11. Re: Long term archival storage

    Check out this article.
    http://www.infoconomy.com/pages/stor...roup101451.adp

    basically digital information for present term access, understanding
    that it will need a migration of the back end storage platform and a
    translation of the front end software and data into whatever is the
    current technical lingua franca..

    For long term storage and DR, using microfilm rated at 250 years.

    The was a law enacted by congress of UK and US for long term census
    records.

    So, there is your answer, everyone is right.

    Wanna get really cool, ideas are floating for laser lithographs on
    ceramic disks at the microscopic level for storage which would last
    thousands of years. I personally like that, but I'm a geek with an
    interest in history.


  12. Re: Long term archival storage

    In article <1112237441.536966.23640@o13g2000cwo.googlegroups.c om>,
    boatgeek wrote:
    >Check out this article.
    >http://www.infoconomy.com/pages/stor...roup101451.adp
    >
    >basically digital information for present term access, understanding
    >that it will need a migration of the back end storage platform and a
    >translation of the front end software and data into whatever is the
    >current technical lingua franca..
    >
    >For long term storage and DR, using microfilm rated at 250 years.
    >
    >The was a law enacted by congress of UK and US for long term census
    >records.
    >
    >So, there is your answer, everyone is right.
    >
    >Wanna get really cool, ideas are floating for laser lithographs on
    >ceramic disks at the microscopic level for storage which would last
    >thousands of years. I personally like that, but I'm a geek with an
    >interest in history.
    >



    microscopic laser pits on nickel sheets.

    But seriously, folks....


    Emulation and virtual machines are going to be the salvation for
    recovering ancient applications and data. As it has been pointed out,
    having a database dump does you no good unless you can run the
    application. Now it can be done in emulation. Current computers are
    sooo much faster than the machines we emulate that performance can be
    decent even if you have to emulate the entire instruction set.

    There will always be some service shop that will read your old media
    (if it's readable and burn it into a CDR (or whatever the media is
    years from now) and as long as you have the OS, application and data
    you'll be good to go.


    Take a look at this for a list of emulators for machines that haven't
    existed outside museums for decades.

    http://simh.trailing-edge.com/


    I've owned (as a corproate manager) several of the machines on this
    list and played with an emulator. A machine that cost close to a
    million bucks in 1978 and sucked about 30kW runs slower than its
    emulator does on my PC, at least for a single user.

    The PC Emulator of IBM370 (http://www.conmicro.cx/hercules/) was used
    big-time by corporations in 1999 for testing Y2K conversions.




    --
    a d y k e s @ p a n i x . c o m

    Don't blame me. I voted for Gore.

  13. Re: Long term archival storage


    Paul Rubin wrote:
    [...]

    > Sounds kind of complicated. Where's this data now, how is it stored,
    > and how fast are you adding to it and through what kind of system?

    20
    > TB isn't really big storage these days. You could have a small tape
    > library online and move incoming raw data to tape immediately while
    > also making the online viewing copies on disk. HSM systems with
    > automatic migration and retrieval are probably overkill.


    It is kind of complicated. Currently we have 6Tb digitised and are
    adding 0.1Tb/week.

    Now this is data that's stuff that needs to be kept for ever - the
    audio stuff is world heritage stuff. The driver for using HSM is two
    fold

    1) keeping multiple copies securely including offsite
    2) we know we have a 900kg gorilla called video waiting in the wings
    ....


  14. Re: Long term archival storage

    The point about file formats is well made, but we've been through the
    same arguement in detail already. We're choosing file formats which are
    publically described for which there are multiple (open source)
    clients.

    The idea is to be able to ensure that we have the format description
    and enough example code to be able to recreate viewers in the future.
    That's why we're using tiff and bwf as the archival masters. I don't
    care about the mp3's as they are *derived* copies - we can as easily
    use ogg vorbis, or whatever we're using in 2055 as long as we can parse
    the original compression free datastream


+ Reply to Thread