Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY - FreeBSD

This is a discussion on Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY - FreeBSD ; On Fri, Sep 26, 2008 at 10:35:57PM -0700, Derek Kuli??ski wrote: > Hello Jeremy, > > Friday, September 26, 2008, 10:14:13 PM, you wrote: > > >> Actually what's the advantage of having fsck run in background if it > ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 26

Thread: Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

  1. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    On Fri, Sep 26, 2008 at 10:35:57PM -0700, Derek Kuli??ski wrote:
    > Hello Jeremy,
    >
    > Friday, September 26, 2008, 10:14:13 PM, you wrote:
    >
    > >> Actually what's the advantage of having fsck run in background if it
    > >> isn't capable of fixing things?
    > >> Isn't it more dangerous to be it like that? i.e. administrator might
    > >> not notice the problem; also filesystem could break even further...

    >
    > > This question should really be directed at a set of different folks,
    > > e.g. actual developers of said stuff (UFS2 and soft updates in
    > > specific), because it's opening up a can of worms.

    >
    > > I believe it has to do with the fact that there is much faith given to
    > > UFS2 soft updates -- the ability to background fsck allows the user to
    > > boot their system and have it up and working (able to log in, etc.) in a
    > > much shorter amount of time[1]. It makes the assumption that "everything
    > > will work just fine", which is faulty.

    >
    > As far as I know (at least ideally, when write caching is disabled)


    Re: write caching: wheelies and burn-outs in empty parking lots
    detected.

    Let's be realistic. We're talking about ATA and SATA hard disks, hooked
    up to on-board controllers -- these are the majority of users. Those
    with ATA/SATA RAID controllers (not on-board RAID either; most/all of
    those do not let you disable drive write caching) *might* have a RAID
    BIOS menu item for disabling said feature.

    FreeBSD atacontrol does not let you toggle such features (although "cap"
    will show you if feature is available and if it's enabled or not).

    Users using SCSI will most definitely have the ability to disable
    said feature (either via SCSI BIOS or via camcontrol). But the majority
    of users are not using SCSI disks, because the majority of users are not
    going to spend hundreds of dollars on a controller followed by hundreds
    of dollars for a small (~74GB) disk.

    Regardless of all of this, end-users should, in no way shape or form,
    be expected to go to great lengths to disable their disk's write cache.
    They will not, I can assure you. Thus, we must assume: write caching
    on a disk will be enabled, period. If a filesystem is engineered with
    that fact ignored, then the filesystem is either 1) worthless, or 2)
    serves a very niche purpose and should not be the default filesystem.

    Do we agree?

    > the data should always be consistent, and all fsck supposed to be
    > doing is to free unreferenced blocks that were allocated.


    fsck does a heck of a lot more than that, and there's no guarantee
    that's all fsck is going to do on a UFS2+SU filesystem. I'm under the
    impression it does a lot more than just looking for unref'd blocks.

    > Wouldn't be possible for background fsck to do that while the
    > filesystem is mounted, and if there's some unrepairable error, that
    > somehow happen (while in theory it should be impossible) just
    > periodically scream on the emergency log level?


    The system is already up and the filesystems mounted. If the error in
    question is of such severity that it would impact a user's ability to
    reliably use the filesystem, how do you expect constant screaming on
    the console will help? A user won't know what it means; there is
    already evidence of this happening (re: mysterious ATA DMA errors which
    still cannot be figured out[6]).

    IMHO, a dirty filesystem should not be mounted until it's been fully
    analysed/scanned by fsck. So again, people are putting faith into
    UFS2+SU despite actual evidence proving that it doesn't handle all
    scenarios.

    > > It also gives the impression of a journalled filesystem, which UFS2 soft
    > > updates are not. gjournal(8) on the other hand, is, and doesn't require
    > > fsck at all[2].

    >
    > > I also think this further adds fuel to the "so why are we enabling soft
    > > updates by default and using UFS2 as a filesystem again?" fire. I'm
    > > sure someone will respond to this with "So use ZFS and shut up". *sigh*

    >
    > I think the reason for using Soft Updates by default is that it was
    > a pretty hard thing to implement, and (at least in theory it supposed
    > by as reliable as journaling.


    The problem here is that when it was created, it was sort of an
    "experiment". Now, when someone installs FreeBSD, UFS2 is the default
    filesystem used, and SU are enabled on every filesystem except the root
    fs. Thus, we have now put ourselves into a situation where said
    feature ***must*** be reliable in all cases.

    You're also forgetting a huge focus of SU -- snapshots[1]. However, there
    are more than enough facts on the table at this point concluding that
    snapshots are causing more problems[7] than previously expected. And
    there's further evidence filesystem snapshots shouldn't even be used in
    this way[8].

    > Also, if I remember correctly, PJD said that gjournal is performing
    > much better with small files, while softupdates is faster with big
    > ones.


    Okay, so now we want to talk about benchmarks. The benchmarks you're
    talking about are in two places[2][3].

    The benchmarks pjd@ provided were very basic/simple, which I feel is
    good, because the tests were realistic (common tasks people will do).
    The benchmarks mckusick@ provided for UFS2+SU were based on SCSI
    disks, which is... interesting to say the least.

    Bruce Evans responded with some more data[4].

    I particularly enjoy this quote in his benchmark: "I never found the
    exact cause of the slower readback ...", followed by (plausible)
    speculations as to why that is.

    I'm sorry that I sound like such a hard-ass on this matter, but there is
    a glaring fact that people seem to be overlooking intentionally:

    Filesystems have to be reliable; data integrity is focus #1, and cannot
    be sacrificed. Users and administrators *expect* a filesystem to be
    reliable. No one is going to keep using a filesystem if it has
    disadvantages which can result in data loss or "waste of administrative
    time" (which I believe is what's occurring here).

    Users *will* switch to another operating system that has filesystems
    which were not engineered/invented with these features in mind. Or,
    they can switch to another filesystem assuming the OS offers one which
    performs equally as good/well and is guaranteed to be reliable --
    and that's assuming the user wants to spend the time to reformat and
    reinstall just to get that.

    In the case of "bit rot" (e.g. drive cache going bad silently, bad
    cables, or other forms of low-level data corruption), a filesystem is
    likely not to be able to cope with this (but see below).

    A common rebuttal here would be: "so use UFS2 without soft updates".
    Excellent advice! I might consider it myself! But the problem is that
    we cannot expect users to do that. Why? Because the defaults chosen
    during sysinstall are to use SU for all filesystems except root. If SU
    is not reliable (or is "reliable in most cases" -- same thing if you ask
    me), then it should not be enabled by default. I think we (FreeBSD)
    might have been a bit hasty in deciding to choose that as a default.

    Next: a system locking up (or a kernel panic) should result in a dirty
    filesystem. That filesystem should be *fully recoverable* from that
    kind of error, with no risk of data loss (but see below).

    (There is the obvious case where a file is written to the disk, and the
    disk has not completed writing the data from its internal cache to the
    disk itself (re: write caching); if power is lost, the disk may not have
    finished writing the cache to disk. In this case, the file is going to
    be sparse -- there is absolutely nothing that can be done about this
    with any filesystem, including ZFS (to my knowledge). This situation
    is acceptable; nature of the beast.)

    The filesystem should be fully analysed and any errors repaired (either
    with user interaction or automatically -- I'm sure it depends on the
    kind of error) **before** the filesystem is mounted.

    This is where SU gets in the way. The filesystem is mounted and the
    system is brought up + online 60 seconds before the fsck starts. The
    assumption made is that the errors in question will be fully recoverable
    by an automatic fsck, which as this thread proves, is not always the
    case.

    ZFS is the first filesystem, to my knowledge, which provides 1) a
    reliable filesystem, 2) detection of filesystem problems in real-time or
    during scrubbing, 3) repair of problems in real-time (assuming raidz1 or
    raidz2 are used), and 4) does not need fsck. This makes ZFS powerful.

    "So use ZFS!" A good piece of advice -- however, I've already had
    reports from users that they will not consider ZFS for FreeBSD at this
    time. Why? Because ZFS on FreeBSD can panic the system easily due to
    kmem exhaustion. Proper tuning can alleviate this problem, but users do
    not want to to have to "tune" their system to get stability (and I feel
    this is a very legitimate argument).

    Additionally, FreeBSD doesn't offer ZFS as a filesystem during
    installation. PC-BSD does, AFAIK. So on FreeBSD, you have to go
    through a bunch of rigmarole[5] to get it to work (and doing this
    after-the-fact is a real pain in the rear -- believe me, I did it this
    weekend.)

    So until both of these ZFS-oriented issues can be dealt with, some
    users aren't considering it.

    This is the reality of the situation. I don't think what users and
    administrators want is unreasonable; they may be rough demands, but
    that's how things are in this day and age.

    Have I provided enough evidence? :-)

    [1]: http://www.usenix.org/publications/l...tml/index.html
    [2]: http://lists.freebsd.org/pipermail/f...ne/064043.html
    [3]: http://www.usenix.org/publications/l...tml/index.html
    [4]: http://lists.freebsd.org/pipermail/f...ne/064166.html
    [5]: http://wiki.freebsd.org/JeremyChadwi..._on_a_ZFS_pool
    [6]: http://wiki.freebsd.org/JeremyChadwi...roubleshooting
    [7]: http://wiki.freebsd.org/JeremyChadwi...eported_issues
    [8]: http://lists.freebsd.org/pipermail/f...ry/032070.html

    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  2. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    On 2008-Sep-26 23:44:17 -0700, Jeremy Chadwick wrote:
    >On Fri, Sep 26, 2008 at 10:35:57PM -0700, Derek Kuli??ski wrote:
    >> As far as I know (at least ideally, when write caching is disabled)

    ...
    >FreeBSD atacontrol does not let you toggle such features (although "cap"
    >will show you if feature is available and if it's enabled or not).


    True but it can be disabled via the loader tunable hw.ata.wc (at
    least in theory - apparently some drives don't obey the cache disable
    command to make them look better in benchmarks).

    >Users using SCSI will most definitely have the ability to disable
    >said feature (either via SCSI BIOS or via camcontrol).


    Soft-updates plus write caching isn't an issue with tagged queueing
    (which is standard for SCSI) because the critical point for
    soft-updates is knowing when the data is written to non-volatile
    storage - which tagged queuing provides.

    --
    Peter Jeremy
    Please excuse any delays as the result of my ISP's inability to implement
    an MTA that is either RFC2821-compliant or matches their claimed behaviour.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.9 (FreeBSD)

    iEYEARECAAYFAkjd4nAACgkQ/opHv/APuIfu8ACgiVxzBQXk8Nv7v3n9qQ70Ht1k
    9q0AmgMzIKxTvks9+CXUyGENwP7FJTN7
    =y0+F
    -----END PGP SIGNATURE-----


  3. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    On Fri, Sep 26, 2008 at 11:44:17PM -0700, Jeremy Chadwick wrote:
    > On Fri, Sep 26, 2008 at 10:35:57PM -0700, Derek Kuli??ski wrote:
    > > Hello Jeremy,
    > >
    > > Friday, September 26, 2008, 10:14:13 PM, you wrote:
    > >
    > > >> Actually what's the advantage of having fsck run in background if it
    > > >> isn't capable of fixing things?
    > > >> Isn't it more dangerous to be it like that? i.e. administrator might
    > > >> not notice the problem; also filesystem could break even further...

    > >
    > > > This question should really be directed at a set of different folks,
    > > > e.g. actual developers of said stuff (UFS2 and soft updates in
    > > > specific), because it's opening up a can of worms.

    > >
    > > > I believe it has to do with the fact that there is much faith given to
    > > > UFS2 soft updates -- the ability to background fsck allows the user to
    > > > boot their system and have it up and working (able to log in, etc.) in a
    > > > much shorter amount of time[1]. It makes the assumption that "everything
    > > > will work just fine", which is faulty.

    > >
    > > As far as I know (at least ideally, when write caching is disabled)

    >
    > Re: write caching: wheelies and burn-outs in empty parking lots
    > detected.
    >
    > Let's be realistic. We're talking about ATA and SATA hard disks, hooked
    > up to on-board controllers -- these are the majority of users. Those
    > with ATA/SATA RAID controllers (not on-board RAID either; most/all of
    > those do not let you disable drive write caching) *might* have a RAID
    > BIOS menu item for disabling said feature.
    >
    > FreeBSD atacontrol does not let you toggle such features (although "cap"
    > will show you if feature is available and if it's enabled or not).


    No, but using 'sysctl hw.ata.wc=0' will quickly and easily let you disable
    write caching on all ATA/SATA devices.
    This was actually the default setting briefly (back in 4.3 IIRC) but was
    reverted due to the performance penalty being considered too severe.


    >
    > Users using SCSI will most definitely have the ability to disable
    > said feature (either via SCSI BIOS or via camcontrol). But the majority
    > of users are not using SCSI disks, because the majority of users are not
    > going to spend hundreds of dollars on a controller followed by hundreds
    > of dollars for a small (~74GB) disk.
    >
    > Regardless of all of this, end-users should, in no way shape or form,
    > be expected to go to great lengths to disable their disk's write cache.
    > They will not, I can assure you. Thus, we must assume: write caching
    > on a disk will be enabled, period. If a filesystem is engineered with
    > that fact ignored, then the filesystem is either 1) worthless, or 2)
    > serves a very niche purpose and should not be the default filesystem.
    >
    > Do we agree?


    Sort of, but soft updates does not technically need write caching to be
    disabled. It does assume that disks will not 'lie' about if data has
    actually been written to the disk or just to the disk's cache. Many (most?)
    ATA/SATA disks are unreliable in this regard which means that the guarantees
    Soft Updates normally give about consistency of the file system can no
    longer be guaranteed.



    Using UFS2+soft updates on standard ATA/SATA disks (with write caching
    enabled) connected to a standard disk controller is not a problem (not any
    more than any other file system anyway.)

    Using background fsck together with the above setup is not recommended
    however. Background fsck will only handle a subset of the errors that a
    standard foreground fsck can handle. In particular it assumes that the soft
    updates guarantees of consistency are in place which would mean that there
    are only a few non-critical problems that could happen. With the above
    setup those guarantees are not in place, which means that background fsck
    can encounter errors it cannot (and will not) fix.






    --

    Erik Trulsson
    ertr1013@student.uu.se
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  4. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    On Sat, Sep 27, 2008 at 12:37:50AM -0700, Derek Kuli??ski wrote:
    > Friday, September 26, 2008, 11:44:17 PM, you wrote:
    >
    > >> As far as I know (at least ideally, when write caching is disabled)

    >
    > > Re: write caching: wheelies and burn-outs in empty parking lots
    > > detected.

    >
    > > Let's be realistic. We're talking about ATA and SATA hard disks, hooked
    > > up to on-board controllers -- these are the majority of users. Those
    > > with ATA/SATA RAID controllers (not on-board RAID either; most/all of
    > > those do not let you disable drive write caching) *might* have a RAID
    > > BIOS menu item for disabling said feature.

    >
    > > FreeBSD atacontrol does not let you toggle such features (although "cap"
    > > will show you if feature is available and if it's enabled or not).

    >
    > > Users using SCSI will most definitely have the ability to disable
    > > said feature (either via SCSI BIOS or via camcontrol). But the majority
    > > of users are not using SCSI disks, because the majority of users are not
    > > going to spend hundreds of dollars on a controller followed by hundreds
    > > of dollars for a small (~74GB) disk.

    >
    > > Regardless of all of this, end-users should, in no way shape or form,
    > > be expected to go to great lengths to disable their disk's write cache.
    > > They will not, I can assure you. Thus, we must assume: write caching
    > > on a disk will be enabled, period. If a filesystem is engineered with
    > > that fact ignored, then the filesystem is either 1) worthless, or 2)
    > > serves a very niche purpose and should not be the default filesystem.

    >
    > > Do we agree?

    >
    > Yes, but...
    >
    > In the link you sent to me, someone mentioned that write cache is
    > always creates problem, and it doesn't matter on OS or filesystem.
    >
    > There's more below.
    >
    > >> the data should always be consistent, and all fsck supposed to be
    > >> doing is to free unreferenced blocks that were allocated.

    > > fsck does a heck of a lot more than that, and there's no guarantee
    > > that's all fsck is going to do on a UFS2+SU filesystem. I'm under the
    > > impression it does a lot more than just looking for unref'd blocks.

    >
    > Yes, fsck does a lot more than that. But the whole point of soft
    > updates is to reduce the work of fsck to deallocate allocated blocks.
    >
    > Anyway, maybe my information are invalid, though funny thing is that
    > Soft Updates was mentioned in one of my lecture on Operating Systems.
    >
    > Apparently the goal of Soft Updates is to always enforce those rules
    > in very efficient manner, by reordering the writes:
    > 1. Never point to a data structure before initializing it
    > 2. Never reuse a structure before nullifying pointers to it
    > 3. Never reset last pointer to live structure before setting a new one
    > 4. Always mark free-block bitmap entries as used before making the
    > directory entry point to it
    >
    > The problem comes with disks which for performance reasons cache the
    > data and then write it in different order back to the disk.
    > I think that's the reason why it's recommended to disable it.
    > If a disk is reordering the writes, it renders the soft updates
    > useless.
    >
    > But if the writing order is preserved, all data remains always
    > consistent, the only thing that might appear are blocks that were
    > marked as being used, but nothing was pointing to them yet.
    >
    > So (in ideal situation, when nothing interferes) all fsck needs to do
    > is just to scan the filesystem and deallocate those blocks.
    >
    > > The system is already up and the filesystems mounted. If the error in
    > > question is of such severity that it would impact a user's ability to
    > > reliably use the filesystem, how do you expect constant screaming on
    > > the console will help? A user won't know what it means; there is
    > > already evidence of this happening (re: mysterious ATA DMA errors which
    > > still cannot be figured out[6]).

    >
    > > IMHO, a dirty filesystem should not be mounted until it's been fully
    > > analysed/scanned by fsck. So again, people are putting faith into
    > > UFS2+SU despite actual evidence proving that it doesn't handle all
    > > scenarios.

    >
    > Yes, I think the background fsck should be disabled by default, with a
    > possibility to enable it if the user is sure that nothing will
    > interfere with soft updates.
    >
    > > The problem here is that when it was created, it was sort of an
    > > "experiment". Now, when someone installs FreeBSD, UFS2 is the default
    > > filesystem used, and SU are enabled on every filesystem except the root
    > > fs. Thus, we have now put ourselves into a situation where said
    > > feature ***must*** be reliable in all cases.

    >
    > I think in worst case it just is as realiable as if it wouldn't be
    > enabled (the only danger is the background fsck)
    >
    > > You're also forgetting a huge focus of SU -- snapshots[1]. However, there
    > > are more than enough facts on the table at this point concluding that
    > > snapshots are causing more problems[7] than previously expected. And
    > > there's further evidence filesystem snapshots shouldn't even be used in
    > > this way[8].

    >
    > there's not much to argue about that.
    >
    > >> Also, if I remember correctly, PJD said that gjournal is performing
    > >> much better with small files, while softupdates is faster with big
    > >> ones.

    >
    > > Okay, so now we want to talk about benchmarks. The benchmarks you're
    > > talking about are in two places[2][3].

    >
    > > The benchmarks pjd@ provided were very basic/simple, which I feel is
    > > good, because the tests were realistic (common tasks people will do).
    > > The benchmarks mckusick@ provided for UFS2+SU were based on SCSI
    > > disks, which is... interesting to say the least.

    >
    > > Bruce Evans responded with some more data[4].

    >
    > > I particularly enjoy this quote in his benchmark: "I never found the
    > > exact cause of the slower readback ...", followed by (plausible)
    > > speculations as to why that is.

    >
    > > I'm sorry that I sound like such a hard-ass on this matter, but there is
    > > a glaring fact that people seem to be overlooking intentionally:

    >
    > > Filesystems have to be reliable; data integrity is focus #1, and cannot
    > > be sacrificed. Users and administrators *expect* a filesystem to be
    > > reliable. No one is going to keep using a filesystem if it has
    > > disadvantages which can result in data loss or "waste of administrative
    > > time" (which I believe is what's occurring here).

    >
    > > Users *will* switch to another operating system that has filesystems
    > > which were not engineered/invented with these features in mind. Or,
    > > they can switch to another filesystem assuming the OS offers one which
    > > performs equally as good/well and is guaranteed to be reliable --
    > > and that's assuming the user wants to spend the time to reformat and
    > > reinstall just to get that.

    >
    > I wasn't trying to argue about that. Perhaps my assumption is wrong,
    > but I belive that the problems that we know about Soft Updates, at
    > worst case make system as reliable as it was without using it.
    >
    > > In the case of "bit rot" (e.g. drive cache going bad silently, bad
    > > cables, or other forms of low-level data corruption), a filesystem is
    > > likely not to be able to cope with this (but see below).

    >
    > > A common rebuttal here would be: "so use UFS2 without soft updates".
    > > Excellent advice! I might consider it myself! But the problem is that
    > > we cannot expect users to do that. Why? Because the defaults chosen
    > > during sysinstall are to use SU for all filesystems except root. If SU
    > > is not reliable (or is "reliable in most cases" -- same thing if you ask
    > > me), then it should not be enabled by default. I think we (FreeBSD)
    > > might have been a bit hasty in deciding to choose that as a default.

    >
    > > Next: a system locking up (or a kernel panic) should result in a dirty
    > > filesystem. That filesystem should be *fully recoverable* from that
    > > kind of error, with no risk of data loss (but see below).

    >
    > > (There is the obvious case where a file is written to the disk, and the
    > > disk has not completed writing the data from its internal cache to the
    > > disk itself (re: write caching); if power is lost, the disk may not have
    > > finished writing the cache to disk. In this case, the file is going to
    > > be sparse -- there is absolutely nothing that can be done about this
    > > with any filesystem, including ZFS (to my knowledge). This situation
    > > is acceptable; nature of the beast.)

    >
    > > The filesystem should be fully analysed and any errors repaired (either
    > > with user interaction or automatically -- I'm sure it depends on the
    > > kind of error) **before** the filesystem is mounted.

    >
    > > This is where SU gets in the way. The filesystem is mounted and the
    > > system is brought up + online 60 seconds before the fsck starts. The
    > > assumption made is that the errors in question will be fully recoverable
    > > by an automatic fsck, which as this thread proves, is not always the
    > > case.

    >
    > That's why I think background fsck should be disabled by default.
    > Though I still don't think that soft updates hurt anything (probably
    > except performance)
    >
    > > ZFS is the first filesystem, to my knowledge, which provides 1) a
    > > reliable filesystem, 2) detection of filesystem problems in real-time or
    > > during scrubbing, 3) repair of problems in real-time (assuming raidz1 or
    > > raidz2 are used), and 4) does not need fsck. This makes ZFS powerful.

    >
    > > "So use ZFS!" A good piece of advice -- however, I've already had
    > > reports from users that they will not consider ZFS for FreeBSD at this
    > > time. Why? Because ZFS on FreeBSD can panic the system easily due to
    > > kmem exhaustion. Proper tuning can alleviate this problem, but users do
    > > not want to to have to "tune" their system to get stability (and I feel
    > > this is a very legitimate argument).

    >
    > > Additionally, FreeBSD doesn't offer ZFS as a filesystem during
    > > installation. PC-BSD does, AFAIK. So on FreeBSD, you have to go
    > > through a bunch of rigmarole[5] to get it to work (and doing this
    > > after-the-fact is a real pain in the rear -- believe me, I did it this
    > > weekend.)

    >
    > > So until both of these ZFS-oriented issues can be dealt with, some
    > > users aren't considering it.

    >
    > > This is the reality of the situation. I don't think what users and
    > > administrators want is unreasonable; they may be rough demands, but
    > > that's how things are in this day and age.

    >
    > > Have I provided enough evidence? :-)

    >
    > Yes, but as far as I understand it's not as bad as you think
    > I could be wrong though.
    >
    > I 100% agree on disabling background fsck, but I don't think soft
    > updates are making the system any less reliable than it would be
    > without it.


    With regards to all you've said:

    Thank you for these insights. Everything you and Erik have said has
    been quite educational, and I greatly appreciate it. Always good to
    learn from people who know more! :-)

    I believe we're in overall agreement with regards to background_fsck
    (should be disabled by default). I'd file a PR for this sort of thing,
    but it almost seems like something that should go to the (private)
    developers list for discussion first.

    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  5. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    On Sun, Sep 28, 2008 at 11:30:01PM -0400, Zaphod Beeblebrox wrote:
    > On Sat, Sep 27, 2008 at 3:37 AM, Derek Kuli?ski wrote:
    >
    > >
    > > > ZFS is the first filesystem, to my knowledge, which provides 1) a
    > > > reliable filesystem, 2) detection of filesystem problems in real-time or
    > > > during scrubbing, 3) repair of problems in real-time (assuming raidz1 or
    > > > raidz2 are used), and 4) does not need fsck. This makes ZFS powerful.

    > >

    >
    > While I am very enthusiastic about ZFS (and use it for certain tasks), there
    > are several things preventing me from recommending it as a general-purpose
    > filesystem (and none of them are specific to FreeBSD's port of it).
    >
    > As a large NAS filestore, ZFS seems very well designed. That is, if the
    > goal is to store a very large amount of files with data integrity and serve
    > them up over the network, ZFS achieves it with aplomb.
    >
    > However, as a core general purpose filesystem, it seems to have flaws, not
    > the least of which is a re-separation of file cache and memory cache. This
    > virtually doesn't matter for a fileserver, but is generally important in a
    > general purpose local filesystem. ZFS also has a transactional nature ---
    > which probably, again, works well in a fileserver, but I find (as a local
    > filesystem) it introduces unpredicable delays as the buffer fills up and
    > then gets flushed en masse.


    I'm curious to know how Solaris deals with these problems, since the
    default filesystem (AFAIK) in OpenSolaris is now ZFS. CC'ing pjd@ who
    might have some insight there.

    > This is not to say that general purpose filesystems couldn't head in the ZFS
    > direction, or that ZFS is anthing but an amazing piece of technology, but
    > UFS and UFS+SU have not outlived their usefulness yet.
    >
    > Maybe support for odd block sizes in UFS would allow geom to manufacture
    > checksums (by subtracting their size from the source block). This would be
    > the last link in the chain to provide gjournal + gmirror + gchecksum
    > (addressing points 1, 2, 3 and 4). Equally, maybe gchecksum could work like
    > gjournal. Dunno --- that would probably be expensive in io ops.


    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  6. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    On Mon, Sep 29, 2008 at 12:00 AM, Jeremy Chadwick wrote:

    > On Sun, Sep 28, 2008 at 11:30:01PM -0400, Zaphod Beeblebrox wrote:
    >



    > > However, as a core general purpose filesystem, it seems to have flaws,

    > not
    > > the least of which is a re-separation of file cache and memory cache.

    > This
    > > virtually doesn't matter for a fileserver, but is generally important in

    > a
    > > general purpose local filesystem. ZFS also has a transactional nature

    > ---
    > > which probably, again, works well in a fileserver, but I find (as a local
    > > filesystem) it introduces unpredicable delays as the buffer fills up and
    > > then gets flushed en masse.

    >
    > I'm curious to know how Solaris deals with these problems, since the
    > default filesystem (AFAIK) in OpenSolaris is now ZFS. CC'ing pjd@ who
    > might have some insight there.



    I certainly am not implying that it won't work as a local filesystem, simply
    that this design choice may not be ideal for completely generalized local
    workloads --- those same workloads that drove UN*X in general to unified
    buffer caches... which appears to be implemented independently by every
    major UN*X vendor... solaris may have even been the first.

    The ARC is separate from the general VM cache in solaris, too, IIRC.
    Solaris' UFS still uses a unified cache.

    Most of the problems where ZFS runs the machine out of kernel memory (or
    fights with other filesystems for memory, etc) are due to the effects of
    it's non-unified cache. Solaris and new patches to FreeBSD seem to make
    this play better, but the fundamental reason for unifying the filesystem and
    memory cache was the payoff that local applications memory and file usage
    would balance out better if the buffering of files and memory was not just
    from the same pool of memory, but in fact the "same thing".

    Historically, you had file cache being a percentage of memory (say 10%).
    The next innovation (I seem to remember my HpUX 9 workstation doing this)
    was to have the division of memory between file and memory caches move
    dynamically. This was better but non-optimal. This is the state of affairs
    now with ZFS too. The unified caches sprung up in UN*X derivatives shortly
    thereafter ... where caching a file and caching memory were one in the
    same. This is where UFS sits.

    Expanding on my post, if the job is to serve network disk, the dynamic
    division or unified cache strategies probably don't make too much
    difference. The "thumper" offering from sun gives you 48 SATA disks, two
    dual core opterons and 16G of memory. The obvious intention is that most of
    that 16G is, in the end, cache for the files (all in 4U and all externally
    accessible --- very cool, BTW).

    But a general purpose machine is executing many of those libraries and
    binaries and mmap()ing many of those files... both operations where the
    unified strategy was designed to win.
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  7. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    > However, as a core general purpose filesystem, it seems to have flaws, not
    > the least of which is a re-separation of file cache and memory cache.


    For me, this doesn't matter because ZFS is so much faster than UFS
    overall. Even if you don't use any of its features, the latest version
    does sequential I/O and heavy random I/O faster than UFS on the same
    hardware for me.

    Cases where UFS is faster are proving to be the exception rather than
    the rule.

    However, I cannot recommend its use until it is stable, which it
    currently still is not, under very heavy load.

    - Andrew


    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  8. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    sthaug@nethelp.no wrote:

    >>>IMHO, a dirty filesystem should not be mounted until it's been fully
    >>>analysed/scanned by fsck. So again, people are putting faith into
    >>>UFS2+SU despite actual evidence proving that it doesn't handle all
    >>>scenarios.

    >>
    >>Yes, I think the background fsck should be disabled by default, with a
    >>possibility to enable it if the user is sure that nothing will
    >>interfere with soft updates.

    >
    >
    > Having been bitten by problems in this area more than once, I now always
    > disable background fsck. Having it disabled by default has my vote too.


    Is there any possibility to selectively disable / enable background fsck
    on specified mount points?

    I can imagine system, where root, /usr, /var and /tmp will be checked by
    fsck in foreground, but waiting to foreground fsck on data partitions of
    about 500GB or more (it can take up tens of minutes or "hours") is scary.
    I need server with ssh running up "quickly" after the crash, so I can
    investigate what the problem was and not just sit and wait tens of
    minutes "if" machine gets online again or not... answering phone calls
    of clients in the meantime.

    Miroslav Lachman
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  9. Re: Recommendations for servers running SATA drives

    Jeremy Chadwick wrote:

    > On Sat, Sep 27, 2008 at 03:16:11PM -0400, Charles Sprickman wrote:
    >
    >>On Fri, 26 Sep 2008, Jeremy Chadwick wrote:

    [...]
    > This also leads me a little off-topic -- when it comes to disk
    > replacements, administrators want to be able to do this without taking
    > the system down. There are problems with this, but it often depends
    > greatly on hardware and BIOS configuration.
    >
    > I've successfully done a hot-swap (hardware: SATA hot-swap backplane,
    > AHCI in use, SATA2 disks), but it required me to issue "atacontrol
    > detach" first (I am very curious to know what would've happened had I
    > just yanked the disk). Upon inserting the new disk, one has to be
    > *very* careful about the order of atacontrol commands given -- there
    > are cases where "attach" will cause the system to panic or SATA bus to
    > lock up, but it seems to depend upon what commands were executed
    > previously (such as "reinit").
    >
    > Sorry if this is off-topic, but I wanted to mention it.


    Hot-swapping is totally upredictable on FreeBSD (from my experiences). I
    tried it many times on Asus 1U servers and on Sun Fire X2100 / X2100 M2
    with FreeBSD 6.2 and 7.0 (both i386). It sometimes panics on atacontrol
    detach, but never panics if disk was marked as failed by gmirror and
    detached by system it-self, then just removed from running machine. It
    sometimes panics immediately after the re-insertion of disk, sometimes
    after atacontrol attach. Sometimes it detects and attach disk without my
    intervention, so I can easily insert the disk in to gmirror.
    Then I stopped playing with hot-swapping and now always do power off
    before disk swapping.

    Miroslav Lachman
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  10. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    On Sat, Sep 27, 2008 at 05:36:17PM +1000, Peter Jeremy wrote:
    > On 2008-Sep-26 23:44:17 -0700, Jeremy Chadwick wrote:
    > >On Fri, Sep 26, 2008 at 10:35:57PM -0700, Derek Kuli??ski wrote:
    > >> As far as I know (at least ideally, when write caching is disabled)

    > ...
    > >FreeBSD atacontrol does not let you toggle such features (although "cap"
    > >will show you if feature is available and if it's enabled or not).

    >
    > True but it can be disabled via the loader tunable hw.ata.wc (at
    > least in theory - apparently some drives don't obey the cache disable
    > command to make them look better in benchmarks).


    Off-topic, but those who use it will be interested:

    hw.ata.wc has always been one of those "why was it done this way?!"
    features which has bothered me. It never made any sense to either
    disable or enable WC on all drives, since there's no guarantee the user
    will want that.

    With that kept in mind, I've submit a PR containing a small kernel
    patch, atacontrol patch, and update to the atacontrol man page that
    allows toggling of WC via "atacontrol wc on/off".

    http://www.freebsd.org/cgi/query-pr.cgi?pr=127717

    Now users will have the ability to do "atacontrol wc on",
    enabling WC on drives of their choice. And yes, you can toggle
    on/off in real-time, regardless of what hw.ata.wc contains (that
    tunable just acts as a default).

    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  11. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    On Mon, Sep 29, 2008 at 3:16 AM, Andrew Snow wrote:

    > However, as a core general purpose filesystem, it seems to have flaws, not
    >> the least of which is a re-separation of file cache and memory cache.
    >>

    >
    > For me, this doesn't matter because ZFS is so much faster than UFS
    > overall. Even if you don't use any of its features, the latest version
    > does sequential I/O and heavy random I/O faster than UFS on the same
    > hardware for me.
    >
    > Cases where UFS is faster are proving to be the exception rather than
    > the rule.
    >
    > However, I cannot recommend its use until it is stable, which it
    > currently still is not, under very heavy load.



    I certainly can't agree with this. I don't think you're measuring the
    performance of the machine --- only measuring the performance of the
    filesystem. When ZFS is active, memory is committed in the kernel to ZFS.
    That memory is then no longer available for paging. Also, there exists data
    within the ARC (I'm always tempted to say the ARC Cache, but that is
    redundant) that is also then in paging memory. You end up having to tune
    the size of the ARC to your workload, which is how UN*X did it upto 1992 or
    so. If you choose the wrong size for the ARC, you end up with very poor
    performance.

    In particular, the ARC fights for space with the nvidia binary driver (which
    really does need _real_ memory). To have the driver work at all, I have to
    keep the ARC from growing too large --- which is at odds with filesystem
    perofrmance.
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  12. Re: Recommendations for servers running SATA drives [hot-swap]

    On Mon, Sep 29, 2008 at 05:25:32PM +0200, Miroslav Lachman wrote:
    > It was about year ago with Asus and Sun Fire X2100. I don't have Asus
    > servers now (all returned as reclamation). Now I am running one X2100
    > and about ten X2100 M2. I have one spare X2100 M2, so if somebody have
    > exact order of commands used to "hot-swap" the disk, I can test it in
    > few days.


    I believe the correct order of operation is to do a "detach" on the
    channel before physically removing the disk, insert the new disk, then
    do "attach" on the same channel. "list" should be done afterwards to
    ensure the new disk shows up.

    If you want me to verify for certain, I have a test box built in the
    other room which has a SATA hot-swap backplane on it.

    I've also seen cases where the "attach" works, but upon doing "list",
    the old disk ID/string is still shown. In this case I had to do a
    "detach", remove the disk, insert the new disk, "reinit", then an
    "attach" for things to work.

    Finally, I've also seen the kernel panic or hard-lock after running
    "reinit", but this may have had something to do with Intel MatrixRAID.

    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  13. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY


    Matthew Dillon wrote:
    > It can take 6 hours to fsck a full 1TB HD. It can
    > take over a day to fsck larger setups. Putting in a few sleeps here
    > and there just makes the run time even longer and perpetuates the pain.


    We have a box with millions of files spread over 2TB, on a 16 disk RAID.
    Foreground fsck takes almost 8 hours, so background fsck, which takes
    almost 24 hours or more, is my only option when I want to bring the box
    back online quickly. And UFS Snapshots are so slow as to be completely
    useless.

    I've now converted the volume to ZFS, and am now enjoying instant boot
    time and higher speed I/O under heavy load, at the expense of memory
    consumption.


    > My recommendation? Default UFS back to a synchronous fsck and stop
    > treating ZFS (your only real alternative) as being so ultra-alpha that
    > it shouldn't be used.


    Completely agree. ZFS is the way of the future for FreeBSD. In my
    latest testing, the memory problems are now under control, there is just
    stability problems with random lockups after days of heavy load unless I
    turn off ZIL. So its nearly there.

    If only ZFS also supported a network distributed mode. Or can we
    convince you to port Hammer to FreeBSD? :-)


    - Andrew

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  14. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY


    :Completely agree. ZFS is the way of the future for FreeBSD. In my
    :latest testing, the memory problems are now under control, there is just
    :stability problems with random lockups after days of heavy load unless I
    :turn off ZIL. So its nearly there.
    :
    :If only ZFS also supported a network distributed mode. Or can we
    :convince you to port Hammer to FreeBSD? :-)
    :
    :- Andrew

    Heh. No, you guys would have to port it if you want it, though I would
    be happy to support it once ported. Issues are between minor and
    moderate but would still require a knowledgeable filesystem person
    to do. Biggest issues will be buffer cache and bioops differences,
    and differences in the namespace VOPs.

    --

    But, IMHO, you guys should focus on ZFS since clearly a lot of work has
    gone into its port, it works now in FBSD, and it just needs to be made
    production-ready and a little more programming support from the
    community. It also has a different feature set then HAMMER. P.S.
    FreeBSD should issue a $$ grant or stipend to Pawel for that work,
    he's really saving your asses. UFS has clearly reached its end-of-life.

    Speaking of ZFS, you guys probably went through the same boot issues
    that we are going through with HAMMER. I came up with a solution which
    turned out to be quite non-invasive and very easy to implement.

    * Make a small /boot UFS partition. e.g. 256M ad0s1a.

    * Put HAMMER (or ZFS in your case) on the rest of the disk (ad0s1d).

    * Adjust the loader to search both / and /boot, so /boot can be its own
    partition or a sub-directory on root.

    * Add a simple line to /boot/loader.conf to direct the kernel to the
    proper root, e.g.

    vfs.root.mountfrom="hammer:ad0s1d"

    And poof, you're done. Then when the system boots it boots into a
    HAMMER (ZFS) root, and /boot is mounted as small UFS filesystem under
    it.

    Miscellanious other partitions would then be pseudo-fs's under the
    single HAMMER (or ZFS) root, removing the need to worry about reserving
    particular amounts of space, and providing the needed backup and
    snapshot separation domains.

    Well, you guys might have already solved it. There isn't much to it.

    I recall there was quite a discussion on trying to create redundant
    boot setup on FreeBSD, such as boot-to-RAID setups, and having trouble
    getting the BIOS to recognize it. There's an alternative solution...
    having a separate, small /boot means you can boot from a small solid
    state storage device whos MTBF is going to be the same as the PC
    hardware itself. No real storage redundancy is needed and if your root
    is somewhere else that gives you the option of putting more
    sophisticated code in /boot (it being the full kernel) to properly
    mount the root. I have 0 (ZERO) trust in BIOS-RAID or card-supported
    RAID-enabled (such as with 3Ware) boot support. ZERO.

    -Matt
    Matthew Dillon

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  15. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    Dan Nelson wrote:
    > That'd be handy, but at least on my system the data prefetcher isn't
    > really called often enough to make a difference either way (assuming
    > the counts are accurate). Metadata prefetch is a big win, however.


    arcstats.prefetch_data_hits: 4538242 (13%)
    arcstats.prefetch_data_misses: 29298997
    arcstats.prefetch_metadata_hits: 593799808 (96%)
    arcstats.prefetch_metadata_misses: 21582847


    You are much luckier than me. For obvious reasons, I would like to
    completely disable data prefetch but leave on metadata prefetch.

    I believe it would speed up the filesystem plus save RAM (less wasted
    use of ARC). But especially desktop systems would not need to waste
    space on such aggressive prefetching.

    - Andrew
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  16. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    On Tue, Sep 30, 2008 at 11:40:46AM +1000, Andrew Snow wrote:
    >
    > Matthew Dillon wrote:
    >> It can take 6 hours to fsck a full 1TB HD. It can
    >> take over a day to fsck larger setups. Putting in a few sleeps here
    >> and there just makes the run time even longer and perpetuates the pain.

    >
    > We have a box with millions of files spread over 2TB, on a 16 disk RAID.
    > Foreground fsck takes almost 8 hours, so background fsck, which takes
    > almost 24 hours or more, is my only option when I want to bring the box
    > back online quickly. And UFS Snapshots are so slow as to be completely
    > useless.
    >
    > I've now converted the volume to ZFS, and am now enjoying instant boot
    > time and higher speed I/O under heavy load, at the expense of memory
    > consumption.
    >
    >> My recommendation? Default UFS back to a synchronous fsck and stop
    >> treating ZFS (your only real alternative) as being so ultra-alpha that
    >> it shouldn't be used.

    >
    > Completely agree. ZFS is the way of the future for FreeBSD. In my
    > latest testing, the memory problems are now under control, there is just
    > stability problems with random lockups after days of heavy load unless I
    > turn off ZIL. So its nearly there.


    It just now occurred to me that this entire conversation should've been
    moved to freebsd-fs weeks ago. *laugh* Oh well. :-)

    You're the first person I've encountered who has had to disable the ZIL
    to get stability in ZFS; ouch, that must hurt.

    ZFS stability has been discussed on freebsd-fs numerous times, but the
    answers provided are always penultimate; no one (AFAIK) has examined how
    to solve this from the start (specifically new FreeBSD installations).

    Yes, I know sysinstall/sade doesn't support ZFS (though the PC-BSD folks
    have apparently implemented this), but that's not what I'm talking
    about. I'm talking about the most commonly-encountered problem: kmem
    exhaustion. People want to be able to install FreeBSD then say "Okay!
    Time to give ZFS a try!" on some separate disks, and have it work. They
    don't want to encounter kmem exhaustion half way through the migration
    process; that's just going to dishearten them.

    I'll be starting up a new topic on freebsd-fs later tonight with an idea
    I came up with for solving this out-of-the-box. I have a feeling I'm
    going to get told "so who's going to do all the work?" or downright
    flamed, but I hope it induces a discussion of ideas, specifically with
    regards to new FreeBSD installations.

    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  17. Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

    Jeremy Chadwick wrote:
    > You're the first person I've encountered who has had to disable the ZIL
    > to get stability in ZFS; ouch, that must hurt.


    Its not so bad: this machine is doing backups with rsync, sometimes
    running 50 simultaneously. This workload doesn't contain any need for
    synchronous operations, and any files which didn't get written after a
    crash can simply be re-rsync. But I hope eventually it will be fixed!

    > I'm talking about the most commonly-encountered problem: kmem
    > exhaustion. People want to be able to install FreeBSD then say "Okay!
    > Time to give ZFS a try!" on some separate disks, and have it work.


    Personally I don't think there's much point worrying about how to boot
    off ZFS at this stage until the code is up to date, stable, and running
    7-STABLE branch.

    Until then I will also prefer to have a UFS root volume and just run ZFS
    for /usr and /home, because I still don't completely trust ZFS and I
    have a high value on being able to boot the system and have my tools
    available in /bin and /sbin.


    - Andrew
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  18. Re: Recommendations for servers running SATA drives [hot-swap]

    Jeremy Chadwick wrote:
    > On Mon, Sep 29, 2008 at 05:25:32PM +0200, Miroslav Lachman wrote:
    >
    >>It was about year ago with Asus and Sun Fire X2100. I don't have Asus
    >>servers now (all returned as reclamation). Now I am running one X2100
    >>and about ten X2100 M2. I have one spare X2100 M2, so if somebody have
    >>exact order of commands used to "hot-swap" the disk, I can test it in
    >>few days.

    >
    >
    > I believe the correct order of operation is to do a "detach" on the
    > channel before physically removing the disk, insert the new disk, then
    > do "attach" on the same channel. "list" should be done afterwards to
    > ensure the new disk shows up.
    >
    > If you want me to verify for certain, I have a test box built in the
    > other room which has a SATA hot-swap backplane on it.
    >
    > I've also seen cases where the "attach" works, but upon doing "list",
    > the old disk ID/string is still shown. In this case I had to do a
    > "detach", remove the disk, insert the new disk, "reinit", then an
    > "attach" for things to work.
    >
    > Finally, I've also seen the kernel panic or hard-lock after running
    > "reinit", but this may have had something to do with Intel MatrixRAID.


    Today I was replacing disk in one Sun Fire X2100 M2 so I tried
    hot-swapping. It was as you said: atacontrol detach ata3, replace the
    HDD, atacontrol attach ata3 and new disk is in the system. I tried it 3
    times to be sure that it was not coincidence - no panic was produced ;o)
    So in this case, hot-swapping on Sun Fire X2100 M2 with FreeBSD 7.0 i386
    works.

    Miroslav Lachman


    # atacontrol list
    ATA channel 0:
    Master: no device present
    Slave: no device present
    ATA channel 1:
    Master: no device present
    Slave: no device present
    ATA channel 2:
    Master: ad4 Serial ATA II
    Slave: no device present
    ATA channel 3:
    Master: ad6 Serial ATA II
    Slave: no device present

    # atacontrol detach ata3
    subdisk6: detached
    ad6: detached
    GEOM_MIRROR: Device gm0: provider ad6 disconnected

    # atacontrol list
    ATA channel 0:
    Master: no device present
    Slave: no device present
    ATA channel 1:
    Master: no device present
    Slave: no device present
    ATA channel 2:
    Master: ad4 Serial ATA II
    Slave: no device present
    ATA channel 3:
    Master: no device present
    Slave: no device present

    ## [old disk was physically removed]

    ## [new disk was physically inserted]

    # atacontrol attach ata3
    ata3: [ITHREAD]
    ad6: 953869MB at ata3-master SATA300
    Master: ad6 Serial ATA II
    Slave: no device present

    # atacontrol list
    ATA channel 0:
    Master: no device present
    Slave: no device present
    ATA channel 1:
    Master: no device present
    Slave: no device present
    ATA channel 2:
    Master: ad4 Serial ATA II
    Slave: no device present
    ATA channel 3:
    Master: ad6 Serial ATA II
    Slave: no device present

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  19. Re: Recommendations for servers running SATA drives [hot-swap]

    On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote:
    > Today I was replacing disk in one Sun Fire X2100 M2 so I tried
    > hot-swapping. It was as you said: atacontrol detach ata3, replace the
    > HDD, atacontrol attach ata3 and new disk is in the system. I tried it 3
    > times to be sure that it was not coincidence - no panic was produced ;o)
    > So in this case, hot-swapping on Sun Fire X2100 M2 with FreeBSD 7.0 i386
    > works.


    That's excellent news. So it seems possibly the problem I was seeing
    was with "reinit" causing some sort of chaos. I'll have to check things
    on my testbox here at home to see how I caused the panic last time.

    Thanks for providing feedback, as usual! :-)

    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  20. Re: Recommendations for servers running SATA drives [hot-swap]

    Jeremy Chadwick wrote:
    > On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote:
    >
    >>Today I was replacing disk in one Sun Fire X2100 M2 so I tried
    >>hot-swapping. It was as you said: atacontrol detach ata3, replace the
    >>HDD, atacontrol attach ata3 and new disk is in the system. I tried it 3
    >>times to be sure that it was not coincidence - no panic was produced ;o)
    >>So in this case, hot-swapping on Sun Fire X2100 M2 with FreeBSD 7.0 i386
    >>works.

    >
    >
    > That's excellent news. So it seems possibly the problem I was seeing
    > was with "reinit" causing some sort of chaos. I'll have to check things
    > on my testbox here at home to see how I caused the panic last time.
    >
    > Thanks for providing feedback, as usual! :-)


    Unfortunately there is one problem - I see a lot of interrupts after
    disk swapping (about 193k of atapci1)

    Interrupts
    197k total
    ohci0 21
    ehci0 22
    193k atapci1 23
    2001 cpu0: time
    1 bge1 273
    2001 cpu1: time

    Full output of systat -vm 2 is attached.

    It is shown in top as 50% interrupt (CPU state) and load 1 until I
    rebooted the machine (I can provide MRTG graphs). The system was not in
    production load, but almost idle. (I will put it in production tomorrow).
    After reboot, everything is OK.

    Can somebody test hot-swapping with SATA drives and confirm this
    behavior? (I can't test it now, because machine is in datacenter)

    Miroslav Lachman

    2 users Load 1.00 1.00 0.99 Oct 17 00:25

    Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
    Tot Share Tot Share Free in out in out
    Act 40032 6212 118412 9352 509928 count
    All 70200 7884 4370000 16700 pages
    Proc: Interrupts
    r p d s w Csw Trp Sys Int Sof Flt cow 197k total
    3 45 387k 6 75 193k 187 zfod ohci0 21
    ozfod ehci0 22
    0.7%Sys 45.9%Intr 0.0%User 0.0%Nice 53.4%Idle %ozfod 193k atapci1 23
    | | | | | | | | | | | daefr 2001 cpu0: time
    +++++++++++++++++++++++ prcfr 1 bge1 273
    10 dtbuf totfr 2001 cpu1: time
    Namei Name-cache Dir-cache 68955 desvn react
    Calls hits % hits % 58041 numvn pdwak
    17234 frevn pdpgs
    intrn
    Disks ad4 ad6 191128 wire
    KB/t 0.00 0.00 59664 act
    tps 0 0 242588 inact
    MB/s 0.00 0.00 46108 cache
    %busy 0 0 463820 free
    113488 buf
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"

+ Reply to Thread
Page 1 of 2 1 2 LastLast