Very Large Filesystems - Storage

This is a discussion on Very Large Filesystems - Storage ; Following some research I've been doing on the matter across newsgroups and mailing lists, I'd be glad if people could share numbers about real life large filesystem and their experience with them. I'm slowly coming to a realization that regardless ...

+ Reply to Thread
Results 1 to 17 of 17

Thread: Very Large Filesystems

  1. Very Large Filesystems

    Following some research I've been doing on the matter across
    newsgroups and mailing lists, I'd be glad if people could share
    numbers about real life large filesystem and their experience with
    them. I'm slowly coming to a realization that regardless of
    theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more
    or less across the enterprise filesystem arena people are recommending
    to keep practical filesystems up to 1TB in size, for manageability and
    recoverability.

    What's the maximum filesystem size you've used in production
    environment? How did the experience come out?

    Thanks,
    -Yaniv


  2. Re: Very Large Filesystems

    On 28 Apr 2007 02:21:30 -0700, Aknin wrote:

    >Following some research I've been doing on the matter across
    >newsgroups and mailing lists, I'd be glad if people could share
    >numbers about real life large filesystem and their experience with
    >them. I'm slowly coming to a realization that regardless of
    >theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more
    >or less across the enterprise filesystem arena people are recommending
    >to keep practical filesystems up to 1TB in size, for manageability and
    >recoverability.
    >
    >What's the maximum filesystem size you've used in production
    >environment? How did the experience come out?
    >
    >Thanks,
    > -Yaniv


    The true constraint, as you've pointed out, is recoverability. If you
    need to recover and entire file system in any sane amount of time 16TB
    and bigger is out of the question.

    I think 3-4TB is fine with today's tape drive speeds but there may be
    limitations from your backup software. I recall hearing a limit of
    4TB per NDMP stream for NBU.

    You could go higher I think if you have a directory structure that
    allows for recovery prioritization. If you have a 36TB file system
    but you know that these 9 directories are the priority, then you
    really only have a recover limit of those 9 directories. The rest can
    be done as time permits.

    ~F

  3. Re: Very Large Filesystems

    Faeandar wrote:
    > On 28 Apr 2007 02:21:30 -0700, Aknin wrote:
    >
    >> Following some research I've been doing on the matter across
    >> newsgroups and mailing lists, I'd be glad if people could share
    >> numbers about real life large filesystem and their experience with
    >> them. I'm slowly coming to a realization that regardless of
    >> theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more
    >> or less across the enterprise filesystem arena people are recommending
    >> to keep practical filesystems up to 1TB in size, for manageability and
    >> recoverability.
    >>
    >> What's the maximum filesystem size you've used in production
    >> environment? How did the experience come out?
    >>
    >> Thanks,
    >> -Yaniv

    >
    > The true constraint, as you've pointed out, is recoverability. If you
    > need to recover and entire file system in any sane amount of time 16TB
    > and bigger is out of the question.


    That really depends on how you're recovering it, which in turn depends
    on what kind of problem you need to recover it from.

    If you're talking about restoring from backup tapes, fine. If you're
    talking about recovery from backup disks (plus a few recent
    incrementals, whether on disk or on tape, that can be applied directly
    to them to recreate the running system), you can usually probably go
    larger. If you're talking about recovery using a synchronous
    replication site, no size limit exists at all (though you need a)
    snapshot or CDP facilities to ensure that common corruption at both
    sites can be quickly backed out and b) *real* confidence in the software
    not to have introduced system-level corruption at both sites, though the
    latter can in part be addressed by using logical inter-site mirroring
    with different software implementations at the two sites).

    As the required software matures, CDP in combination with inter-site
    synchronous replication (or low-delay asynchronous replication plus
    local logging to cover the gap for anything save complete primary site
    destruction) should help make make backups as obsolete as paper tape:
    decreasing hardware costs for such services should make the management
    costs of backups (let alone their effect on recovery-time objectives)
    increasingly untenable.

    - bill

  4. Re: Very Large Filesystems

    On Apr 30, 12:48 am, Bill Todd wrote:
    > Faeandar wrote:
    > > On 28 Apr 2007 02:21:30 -0700, Aknin wrote:

    >
    > >> Following some research I've been doing on the matter across
    > >> newsgroups and mailing lists, I'd be glad if people could share
    > >> numbers about real life large filesystem and their experience with
    > >> them. I'm slowly coming to a realization that regardless of
    > >> theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more
    > >> or less across the enterprise filesystem arena people are recommending
    > >> to keep practical filesystems up to 1TB in size, for manageability and
    > >> recoverability.

    >
    > >> What's the maximum filesystem size you've used in production
    > >> environment? How did the experience come out?

    >
    > >> Thanks,
    > >> -Yaniv

    >
    > > The true constraint, as you've pointed out, is recoverability. If you
    > > need to recover and entire file system in any sane amount of time 16TB
    > > and bigger is out of the question.

    >
    > That really depends on how you're recovering it, which in turn depends
    > on what kind of problem you need to recover it from.
    >
    > If you're talking about restoring from backup tapes, fine. If you're
    > talking about recovery from backup disks (plus a few recent
    > incrementals, whether on disk or on tape, that can be applied directly
    > to them to recreate the running system), you can usually probably go
    > larger. If you're talking about recovery using a synchronous
    > replication site, no size limit exists at all (though you need a)
    > snapshot or CDP facilities to ensure that common corruption at both
    > sites can be quickly backed out and b) *real* confidence in the software
    > not to have introduced system-level corruption at both sites, though the
    > latter can in part be addressed by using logical inter-site mirroring
    > with different software implementations at the two sites).
    >
    > As the required software matures, CDP in combination with inter-site
    > synchronous replication (or low-delay asynchronous replication plus
    > local logging to cover the gap for anything save complete primary site
    > destruction) should help make make backups as obsolete as paper tape:
    > decreasing hardware costs for such services should make the management
    > costs of backups (let alone their effect on recovery-time objectives)
    > increasingly untenable.
    >
    > - bill


    The system in question is made of millions (sometimes more) of small
    files. Corruption in any particular file isn't troublesome, nor even
    in hundreds of files. The block device is mirrored and is stored on
    expensive SAN arrays that are trusted not to choke and die, and
    snapshots can be taken at regular intervals.

    As you can probably understand, the amount of files times the capacity
    (tens of TBs and growing...) makes backups quite irrelevant, and what
    we're counting on (maybe unjustly) is the mirroring and the
    snapshotting. We trust the system in the sense that it's too stupid to
    do something wrong, it works at the file level and is exceedingly
    unlikely to corrupt more than a file (or two, or a hundred - but no
    more) at a time.

    What /is/ worrying to me is silent filesystem corruption that will at
    some point jump and bite my arse. Filesystem corruption will cause
    prompt snapshot rollback and incremental recovery*, but I'm worried
    about rolling back only to discover the filesystem was already
    corrupted at the time of the snap. I don't have room for much more
    than one or two snaps.

    So you see the most complex part of my scenario is the filesystem,
    rather than the system, and tape backup is totally impractical even
    for sizes much smaller than 4TB.

    Does that change your advice?


  5. Re: Very Large Filesystems

    Aknin wrote:
    > What /is/ worrying to me is silent filesystem corruption that will at
    > some point jump and bite my arse.


    Which is leading you to suggest that your files are best kept across a
    number of independent filesystem 'domains' so as to contain the possible
    effects of any corruption. This would seem a reasonable suggestion with
    the proviso that the 'domains' are genuinely independent and not sitting
    on the same SAN, fileserver etc. etc. You also need to be confident of
    detecting the corruption as soon as possible, for the reasons that you
    outline.

    It seems that the only logical solution is automatic checksumming
    coupled with redundancy, in the manner that ZFS does. No doubt this
    feature will be found in other filesystems in the future.

    ESB

  6. Re: Very Large Filesystems

    On 2007-04-30, Ernst S Blofeld wrote:
    >
    > It seems that the only logical solution is automatic checksumming
    > coupled with redundancy, in the manner that ZFS does.


    This blind trust in ZFS amazes me.. ZFS will have bugs, and get corrupted
    like any other file system, and then you'll need your backups. Also, when
    the automatic checksumming finds a corruption, you'll need your backups.

    So the answers is backup (online, nearline, offline, whatever), and spread
    your files over many small'ish fs's to reduce the time to recover from a
    fs corrution.


    -jf

  7. Re: Very Large Filesystems

    Jan-Frode Myklebust wrote:
    > On 2007-04-30, Ernst S Blofeld wrote:
    >> It seems that the only logical solution is automatic checksumming
    >> coupled with redundancy, in the manner that ZFS does.

    >
    > This blind trust in ZFS amazes me..


    Perhaps as much as blind ignorance like yours amazes me. The main
    difference between us being that no one talking about ZFS in any way
    suggested trusting it blindly: ESB above suggested a mechanism *like*
    ZFS's (in case you're unaware of the fact, other more mature systems
    provide features of this nature), and I suggested that ZFS, while by no
    means mature, *might* still satisfy the expressed needs..

    ZFS will have bugs, and get corrupted
    > like any other file system, and then you'll need your backups. Also, when
    > the automatic checksumming finds a corruption, you'll need your backups.


    Well, no: the redundancy is used to correct it. And in the unlikely
    event that the corruption was system-caused and hence loyally replicated
    by lower-level functions, that's what no-overwrite snapshotting is for:
    it would take a particularly pathological bug to subvert both the
    main-line data and the separate snapshots.

    >
    > So the answers is backup (online, nearline, offline, whatever)


    Since the original poster just told us that this is *not* a suitable
    answer, one can only assume that you're listening to him as poorly as
    you've apparently listened to others.

    - bill

  8. Re: Very Large Filesystems

    On 2007-04-30, Bill Todd wrote:
    >
    > Perhaps as much as blind ignorance like yours amazes me. The main
    > difference between us being that no one talking about ZFS in any way
    > suggested trusting it blindly: ESB above suggested a mechanism *like*
    > ZFS's (in case you're unaware of the fact, other more mature systems
    > provide features of this nature), and I suggested that ZFS, while by no
    > means mature, *might* still satisfy the expressed needs..


    To quote the OP:

    "What /is/ worrying to me is silent filesystem corruption that will at
    some point jump and bite my arse. Filesystem corruption will cause
    prompt snapshot rollback and incremental recovery*, but I'm worried
    about rolling back only to discover the filesystem was already
    corrupted at the time of the snap. I don't have room for much more
    than one or two snaps."

    Is there any other solution than backups, if neither the fs nor the two
    snaps can be trusted ? I would argue that making your fs's as small as
    possible, to confine the damage, and keeping good backups is the best
    option. Why would tape backup be "totally impractical even for sizes
    much smaller than 4TB." ?


    And the quoting you from another recent thread:

    "Though (as I already noted) I don't have any direct experience with it,
    my impression is that people are using it in production systems
    successfully "

    "My impression is that *some* customers have workloads that have found
    ZFS to be very stable already, while others push corner cases that are
    still uncovering bugs."

    So you agree it's a fairly new fs where people are still uncovering bugs,
    have no direct experience with it, and do you still think it's the
    solution to the OP's worry about file system corruption ?

    >
    >> So the answers is backup (online, nearline, offline, whatever)

    >
    > Since the original poster just told us that this is *not* a suitable
    > answer, one can only assume that you're listening to him as poorly as
    > you've apparently listened to others.


    He doesn't say much about why backups would be "totally impractical", so
    I'm suggesting the best option (when you have fs corruption, and the 2
    snaps isn't good enough) is to spread the files over as many fs's as
    possible to confine the damage and amount of files that's needed to
    restore from backup.


    -jf

  9. Re: Very Large Filesystems

    Jan-Frode Myklebust wrote:
    > Is there any other solution than backups, if neither the fs nor the two
    > snaps can be trusted ? I would argue that making your fs's as small as
    > possible, to confine the damage, and keeping good backups is the best
    > option. Why would tape backup be "totally impractical even for sizes
    > much smaller than 4TB." ?


    Who said don't make backups ? ZFS is not a backup solution but a
    filesystem with checksumming and redundancy features. I've never heard
    anyone seriously suggest that ZFS obviated the need for backups, not in
    this thread or anywhere else. Rant about non-issues elsewhere please.

    As already pointed out, increasing the number of filesystems does not
    increase the protection because you still have all the common modes of
    failure (including the software bugs that you are so apparently keen
    on). How much better off are a million files on a single filesystem
    against the same files on a thousand filesystems if everything else
    remains equal? There is no meaningful difference at all.

    Moreover backups do not address the OP's point - silent corruption. If
    you aren't checking your files how can you have any confidence in your
    backups? A backup is as problematic in terms of integrity as the
    filesystem it is read from. Backing-up a corrupt file doesn't fix it.

    You cannot avoid the need for checksumming to detect errors and
    redundancy to fix them. Putting these features directly in your
    filesystem is a good idea - integrity is maintained and there is fast
    recovery. The fact that there will be teething problems in ZFS or an
    equivalent filesystem is not a sound basis for rejecting these features.

    There will still be backups in the future too.

    ESB

  10. Re: Very Large Filesystems

    On May 1, 3:15 am, Ernst S Blofeld
    wrote:
    > Jan-Frode Myklebust wrote:
    > > Is there any other solution than backups, if neither the fs nor the two
    > > snaps can be trusted ? I would argue that making your fs's as small as
    > > possible, to confine the damage, and keeping good backups is the best
    > > option. Why would tape backup be "totally impractical even for sizes
    > > much smaller than 4TB." ?

    >
    > Who said don't make backups ? ZFS is not a backup solution but a
    > filesystem with checksumming and redundancy features. I've never heard
    > anyone seriously suggest that ZFS obviated the need for backups, not in
    > this thread or anywhere else. Rant about non-issues elsewhere please.
    >
    > As already pointed out, increasing the number of filesystems does not
    > increase the protection because you still have all the common modes of
    > failure (including the software bugs that you are so apparently keen
    > on). How much better off are a million files on a single filesystem
    > against the same files on a thousand filesystems if everything else
    > remains equal? There is no meaningful difference at all.
    >
    > Moreover backups do not address the OP's point - silent corruption. If
    > you aren't checking your files how can you have any confidence in your
    > backups? A backup is as problematic in terms of integrity as the
    > filesystem it is read from. Backing-up a corrupt file doesn't fix it.
    >
    > You cannot avoid the need for checksumming to detect errors and
    > redundancy to fix them. Putting these features directly in your
    > filesystem is a good idea - integrity is maintained and there is fast
    > recovery. The fact that there will be teething problems in ZFS or an
    > equivalent filesystem is not a sound basis for rejecting these features.
    >
    > There will still be backups in the future too.
    >
    > ESB


    I've cross-posted this question on several places, and practically all
    answers switched immediately to backup/restore issues. It seems that
    no-one puts any kind of trust in filesystems, in the sense that even
    if you have an expensive mirrored SAN, the system (the software
    managing the data) is too stupid to cause corruption (more about that
    in my previous post) and small amounts of data /may/ be lost without
    too much pain, people here (and on VxFS ML, and on ZFS-discuss)
    recommend to backup the filesystem (i.e., copy all it's data to
    something which has a different data structure than the filesystem
    itself, implicitly because the FS /will/ get corrupt at some point) or
    split it into smaller FSs (implicitly because then if one of them gets
    corrupt, we can contain the damage and restore backups).

    So it seems like 'we' always think an FS will get corrupt, and no
    amount of sophistication will make it not-to, or at least not in a way
    that is a total-loss. Would anyone here trust the filesystem (any
    filesystem, name your pick) enough to make a few (say 3 or 4) 32TB
    monsters holding the above-mentioned kind of data and being backed
    solely by snaps? If you feel that it's not safe - what good are those
    gigantic-interconnected/grid-multi-TB-super-expensive SANs, if you
    can't mkfs more than a few TBs without fear because of filesystem
    limitation?

    Thanks for your replies, they've been very interesting and useful so
    far!

    - Yaniv


  11. Re: Very Large Filesystems

    In article <1178001817.609578.20510@p77g2000hsh.googlegroups.c om>,
    Aknin wrote:
    >On May 1, 3:15 am, Ernst S Blofeld
    >wrote:
    >> Jan-Frode Myklebust wrote:
    >> > Is there any other solution than backups, if neither the fs nor the two
    >> > snaps can be trusted ? I would argue that making your fs's as small as
    >> > possible, to confine the damage, and keeping good backups is the best
    >> > option. Why would tape backup be "totally impractical even for sizes
    >> > much smaller than 4TB." ?

    >>
    >> Who said don't make backups ? ZFS is not a backup solution but a
    >> filesystem with checksumming and redundancy features. I've never heard
    >> anyone seriously suggest that ZFS obviated the need for backups, not in
    >> this thread or anywhere else. Rant about non-issues elsewhere please.
    >>
    >> As already pointed out, increasing the number of filesystems does not
    >> increase the protection because you still have all the common modes of
    >> failure (including the software bugs that you are so apparently keen
    >> on). How much better off are a million files on a single filesystem
    >> against the same files on a thousand filesystems if everything else
    >> remains equal? There is no meaningful difference at all.
    >>
    >> Moreover backups do not address the OP's point - silent corruption. If
    >> you aren't checking your files how can you have any confidence in your
    >> backups? A backup is as problematic in terms of integrity as the
    >> filesystem it is read from. Backing-up a corrupt file doesn't fix it.
    >>
    >> You cannot avoid the need for checksumming to detect errors and
    >> redundancy to fix them. Putting these features directly in your
    >> filesystem is a good idea - integrity is maintained and there is fast
    >> recovery. The fact that there will be teething problems in ZFS or an
    >> equivalent filesystem is not a sound basis for rejecting these features.
    >>
    >> There will still be backups in the future too.
    >>
    >> ESB

    >
    >I've cross-posted this question on several places, and practically all
    >answers switched immediately to backup/restore issues. It seems that
    >no-one puts any kind of trust in filesystems, in the sense that even



    Filesystems are not the problem. Hardware is.

    I've worked with many thousands of PC disks starting with the first
    release of NTFS, almost 15 years ago. I have never seen NTFS
    "corrupt" itself. All failures were traced to dying hardware. Sh*t
    happens. I have to admit that my experience with RAID is much less.

    I'd like to hear of documented cases of such NTFS problems.

    In any case, you need a strategy for backup and recovery of your data.
    Even if the filesystem is fine, the building can burn down.




    --
    a d y k e s @ p a n i x . c o m
    Don't blame me. I voted for Gore. A Proud signature since 2001

  12. Re: Very Large Filesystems

    In article <1177752090.144496.248950@n59g2000hsh.googlegroups. com>,
    Aknin wrote:
    >What's the maximum filesystem size you've used in production
    >environment? How did the experience come out?


    In the NCAR Mass Storage Service (MSS), a tape archive that is approaching
    3 PB in size, we currently have a disk cache of 48 TB on 4 FC->SATA RAIDs.
    I have it configured as 24 logical units of just under 2 TB each, each as
    a single Irix XFS file system. When a disk in the RAID fails, the controller
    can rebuild the RAID group in about 4-6 hours. Files written to the
    disk cache (between 112 KB and 1 GB in size) are usually written to tape
    within 24 hours. Residency in the cache varies between 30-60 days. We've
    not had any problems with XFS.

  13. Re: Very Large Filesystems

    One option is to go with segmented filesystems: www.ibrix.com.

    Instead of having one monolithic filesystem, break it up across
    several segments. Ibrix still provides a single namespace. Back up the
    segments separately, recover them separately.


    On Apr 28, 2:21 am, Aknin wrote:
    > Following some research I've been doing on the matter across
    > newsgroups and mailing lists, I'd be glad if people could share
    > numbers about real life large filesystem and their experience with
    > them. I'm slowly coming to a realization that regardless of
    > theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more
    > or less across the enterprise filesystem arena people are recommending
    > to keep practical filesystems up to 1TB in size, for manageability and
    > recoverability.
    >
    > What's the maximum filesystem size you've used in production
    > environment? How did the experience come out?
    >
    > Thanks,
    > -Yaniv




  14. Re: Very Large Filesystems

    > Filesystems are not the problem. Hardware is.
    >
    > I've worked with many thousands of PC disks starting
    > with the first
    > release of NTFS, almost 15 years ago. I have never seen
    > NTFS
    > "corrupt" itself. All failures were traced to dying
    > hardware. Sh*t
    > happens. I have to admit that my experience with RAID is
    > much less.
    >
    > I'd like to hear of documented cases of such NTFS
    > problems.
    >
    > In any case, you need a strategy for backup and recovery
    > of your data.
    > Even if the filesystem is fine, the building can burn
    > down.


    here's a well-known NTFS example:
    http://support.microsoft.com/kb/229607

    I also have another from personal experience. When you
    get way out of the norm, you are much more likely to
    encounter problems that have nothing to do with your
    hardware. I had a case open with MS in which I was told
    they had internal documentation suggesting limits that,
    while beyond what you'd likely
    ever see in 'normal' scenarious, are not out of the realm
    of possibility for poorly-designed applications... of
    which I inherited one.

    Put 9 figures worth of dirs/files on a single NTFS volume
    in a heavily write-intensive environment and tell me all
    is well. It's very scary, and **** starts to break down,
    write failures, etc. I know -- I've been there, and I'm
    doing it now until such time things get rewritten. I've
    lost it all more than once. Takes weeks to restore. Yes,
    the app is broken, but it's what I'm stuck with for now.

    Also, ask your vendors (any of them) for documented
    studies of heavy IO in that type of environment. None of
    them have any because for the most part, they do not test
    to those levels. Even MS only tests to 100M dirs/files
    for milestone releases (SPs) of Win2K3, and this is the
    first release where they went that high.

    You want to be safe, you'd better stay below 10M
    dirs/files on a single volume. That's realistically the
    highest you can go and count on all of your vendors
    possibly having tested to (I'm talking file systems,
    file-based replication software). You uncover the bugs
    their normal stress-testing doesn't. Believe me, in
    trying to deal with my mess, I've done a lot of talking to
    vendors. Their sales reps tell you everything is 'not a
    problem.' Their technical guys get really quiet when you
    ask for proof or customer examples you can speak with.




  15. Re: Very Large Filesystems

    In article ,
    Kraft Fan! wrote:
    >> Filesystems are not the problem. Hardware is.
    >>
    >> I've worked with many thousands of PC disks starting
    >> with the first
    >> release of NTFS, almost 15 years ago. I have never seen
    >> NTFS
    >> "corrupt" itself. All failures were traced to dying
    >> hardware. Sh*t
    >> happens. I have to admit that my experience with RAID is
    >> much less.
    >>
    >> I'd like to hear of documented cases of such NTFS
    >> problems.
    >>
    >> In any case, you need a strategy for backup and recovery
    >> of your data.
    >> Even if the filesystem is fine, the building can burn
    >> down.

    >
    >here's a well-known NTFS example:
    >http://support.microsoft.com/kb/229607
    >
    >I also have another from personal experience. When you
    >get way out of the norm, you are much more likely to
    >encounter problems that have nothing to do with your
    >hardware. I had a case open with MS in which I was told
    >they had internal documentation suggesting limits that,



    TNX, I knew there would be something and I figured it would
    be a case of pushing some scale limit.

    --
    a d y k e s @ p a n i x . c o m
    Don't blame me. I voted for Gore. A Proud signature since 2001

  16. Re: Very Large Filesystems

    Kraft Fan! wrote:

    ....

    > here's a well-known NTFS example:
    > http://support.microsoft.com/kb/229607


    Since that particular bug was fixed just over 8 years ago, something a
    bit more recent might be a more convincing argument for not trusting a
    reasonably mature file system.

    - bill

  17. Re: Very Large Filesystems

    Aknin proclaimed:

    > Following some research I've been doing on the matter across
    > newsgroups and mailing lists, I'd be glad if people could share
    > numbers about real life large filesystem and their experience with
    > them. I'm slowly coming to a realization that regardless of
    > theoretical filesystem capabilities (1TB, 32TB, 256TB or more), more
    > or less across the enterprise filesystem arena people are recommending
    > to keep practical filesystems up to 1TB in size, for manageability and
    > recoverability.
    >
    > What's the maximum filesystem size you've used in production
    > environment? How did the experience come out?


    I'd guess the biggest problem with very large file systems would be when
    you need to run a file system check against them and dont have a few
    days to run the check on 100 terabytes or so. Some scale better than
    others, particularly if they are practically full. Backups and restores
    can be helped by delta style technology.


+ Reply to Thread