Best way to simulate a failure - SUN

This is a discussion on Best way to simulate a failure - SUN ; I need to test a client's customized HA setup. I want to simulate the failure of a single box running Solaris. As I understand it, a poweroff issues a sync command which sends a FIN to all the network connections. ...

+ Reply to Thread
Results 1 to 19 of 19

Thread: Best way to simulate a failure

  1. Best way to simulate a failure

    I need to test a client's customized HA setup. I want to simulate the
    failure of a single box running Solaris. As I understand it, a
    poweroff issues a sync command which sends a FIN to all the network
    connections. A real failure wouldn't be so accommodating. On the
    other hand, I don't want to cause a real failure by pulling the power
    cord too many times during a test.

    Any suggestions? Does it matter what specific boxes I use? There are
    several setups, each involving different hardware generations. A
    general solution is preferred.

    --Thank you,
    --Mike Jr.

  2. Re: Best way to simulate a failure

    On 2008-01-11, Mike wrote:
    > I need to test a client's customized HA setup. I want to simulate the
    > failure of a single box running Solaris. As I understand it, a
    > poweroff issues a sync command which sends a FIN to all the network
    > connections. A real failure wouldn't be so accommodating. On the
    > other hand, I don't want to cause a real failure by pulling the power
    > cord too many times during a test.
    >
    > Any suggestions? Does it matter what specific boxes I use? There are
    > several setups, each involving different hardware generations. A
    > general solution is preferred.


    I just pull the network cables or "down" the switch interfaces (on the switch -
    if you administratively down the ports on the Sun, SNMP reports that, as I
    discovered when I was seeking the answer to a similar quiestion.) So far as
    anything monitoring the box is concerned, it's g-o-o-o-ne.


    --
    "Be thankful that you have a life, and forsake your vain
    and presumptuous desire for a second one."
    [email me at huge {at} huge (dot) org uk]

  3. Re: Best way to simulate a failure

    Mike wrote:
    > I need to test a client's customized HA setup. I want to simulate the
    > failure of a single box running Solaris. As I understand it, a
    > poweroff issues a sync command which sends a FIN to all the network
    > connections. A real failure wouldn't be so accommodating. On the
    > other hand, I don't want to cause a real failure by pulling the power
    > cord too many times during a test.
    >
    > Any suggestions? Does it matter what specific boxes I use? There are
    > several setups, each involving different hardware generations. A
    > general solution is preferred.


    crashing/corrupting/unplugging the machine is most accurate.

    You can pull network cables, but that never screws up the
    machine you just removed. Even better, pull drives out. That will generate
    some great errors and the machine will still be on the network.

    A machine falling off the network doesn't really simulate a half broken
    machine that's still limping along, possibly causing more problems. That's
    a real test right there.


  4. Re: Best way to simulate a failure


    "Cydrome Leader" wrote in message
    news:fmc7a9$vu$4@reader2.panix.com...
    > Mike wrote:
    >> I need to test a client's customized HA setup. I want to simulate the
    >> failure of a single box running Solaris. As I understand it, a
    >> poweroff issues a sync command which sends a FIN to all the network
    >> connections. A real failure wouldn't be so accommodating. On the
    >> other hand, I don't want to cause a real failure by pulling the power
    >> cord too many times during a test.
    >>
    >> Any suggestions? Does it matter what specific boxes I use? There are
    >> several setups, each involving different hardware generations. A
    >> general solution is preferred.

    >
    > crashing/corrupting/unplugging the machine is most accurate.
    >
    > You can pull network cables, but that never screws up the
    > machine you just removed. Even better, pull drives out. That will generate
    > some great errors and the machine will still be on the network.
    >
    > A machine falling off the network doesn't really simulate a half broken
    > machine that's still limping along, possibly causing more problems. That's
    > a real test right there.
    >


    Fully agree- Pop a drive and watch the world unfold. (I make a backup set
    and test it first) and pull the RAID array or in a mirrored disks pull the
    cable.

    Rob



  5. Re: Best way to simulate a failure

    On 11 Jan, 20:43, Mike wrote:
    > I need to test a client's customized HA setup. *I want to simulate the
    > failure of a single box running Solaris. *As I understand it, a
    > poweroff issues a sync command which sends a FIN to all the network
    > connections. *A real failure wouldn't be so accommodating. *On the
    > other hand, I don't want to cause a real failure by pulling the power
    > cord too many times during a test.
    >
    > Any suggestions? *Does it matter what specific boxes I use? *There are
    > several setups, each involving different hardware generations. *A
    > general solution is preferred.
    >


    Nothing so drastic as pulling drives, network cables or else.

    Just do this:

    STOP-A (or L1-A) on console

    This will get you to the OBP prompt and the OS will be suspended. It
    will be the same as pulling the power-cord.

    Next it is about you where do you want to go from there.
    I recommend to leave it there to see if your failover has worked and
    then

    ok reset-all

    because if in the meantime, the failover has worked and another node
    has acquired ownership of some shared SAN storage, reviving the
    stopped host with a "go" will create a big mess of your filesystems.

  6. Re: Best way to simulate a failure

    haydude wrote:
    > On 11 Jan, 20:43, Mike wrote:
    >> I need to test a client's customized HA setup. ?I want to simulate the
    >> failure of a single box running Solaris. ?As I understand it, a
    >> poweroff issues a sync command which sends a FIN to all the network
    >> connections. ?A real failure wouldn't be so accommodating. ?On the
    >> other hand, I don't want to cause a real failure by pulling the power
    >> cord too many times during a test.
    >>
    >> Any suggestions? ?Does it matter what specific boxes I use? ?There are
    >> several setups, each involving different hardware generations. ?A
    >> general solution is preferred.
    >>

    >
    > Nothing so drastic as pulling drives, network cables or else.
    >
    > Just do this:
    >
    > STOP-A (or L1-A) on console
    >
    > This will get you to the OBP prompt and the OS will be suspended. It
    > will be the same as pulling the power-cord.


    This is a great test if the ONLY way your ever machines fail is by being
    stopped by accident.



  7. Re: Best way to simulate a failure

    On Jan 17, 11:57*am, haydude wrote:
    > On 11 Jan, 20:43, Mike wrote:
    >
    > > I need to test a client's customized HA setup. *I want to simulate the
    > > failure of a single box running Solaris. *As I understand it, a
    > > poweroff issues a sync command which sends a FIN to all the network
    > > connections. *A real failure wouldn't be so accommodating. *On the
    > > other hand, I don't want to cause a real failure by pulling the power
    > > cord too many times during a test.

    >
    > > Any suggestions? *Does it matter what specific boxes I use? *There are
    > > several setups, each involving different hardware generations. *A
    > > general solution is preferred.

    >
    > Nothing so drastic as pulling drives, network cables or else.
    >
    > Just do this:
    >
    > STOP-A (or L1-A) on console

    I have read that the STOP-A key sequence can cause Solaris OS file
    system corruption. How would you compare the risk of using STOP-A to
    the risk of pulling the power cord? I want a realistic failover test
    but I don't want to destroy the system in the process. The system
    needs to be usable after my test has finished.

    >
    > This will get you to the OBP prompt and the OS will be suspended. It
    > will be the same as pulling the power-cord.
    >
    > Next it is about you where do you want to go from there.
    > I recommend to leave it there to see if your failover has worked and
    > then
    >
    > ok reset-all
    >
    > because if in the meantime, the failover has worked and another node
    > has acquired ownership of some shared SAN storage, reviving the
    > stopped host with a "go" will create a big mess of your filesystems.


    Sage advice.

    Thank you, Mike Jr.


  8. Re: Best way to simulate a failure

    On Jan 13, 12:24*am, Cydrome Leader wrote:
    > Mike wrote:
    > > I need to test a client's customized HA setup. *I want to simulate the
    > > failure of a single box running Solaris. *As I understand it, a
    > > poweroff issues a sync command which sends a FIN to all the network
    > > connections. *A real failure wouldn't be so accommodating. *On the
    > > other hand, I don't want to cause a real failure by pulling the power
    > > cord too many times during a test.

    >
    > > Any suggestions? *Does it matter what specific boxes I use? *There are
    > > several setups, each involving different hardware generations. *A
    > > general solution is preferred.

    >
    > crashing/corrupting/unplugging the machine is most accurate.
    >
    > You can pull network cables, but that never screws up the
    > machine you just removed. Even better, pull drives out. That will generate
    > some great errors and the machine will still be on the network.
    >
    > A machine falling off the network doesn't really simulate a half broken
    > machine that's still limping along, possibly causing more problems. That's
    > a real test right there.


    Sigh. I agree that the best way to test HA is to actually fail the
    hardware. However, I am not allowed to really break the existing
    boxes. Management wants a different solution and is pushing for
    poweroff. However, as I said, poweroff does a sync which is, to my
    mind, way too polite. I looking for the closest thing to pulling the
    power cord that doesn't put the system at a real risk of not powering
    backup.

    --Thank you, Mike Jr.


  9. Re: Best way to simulate a failure

    Mike writes:

    >I have read that the STOP-A key sequence can cause Solaris OS file
    >system corruption. How would you compare the risk of using STOP-A to
    >the risk of pulling the power cord? I want a realistic failover test
    >but I don't want to destroy the system in the process. The system
    >needs to be usable after my test has finished.


    Stop-A is no more damaging than pulling th epowercord (and not
    as bad for your hardware as cycling power).

    Some filesystems types can be corrupted if the system goes away in the
    middle of an update. ZFS does not have that problem.
    All filesystems can lose data (without corruption) when power goes out.
    (Only data which has not yet been acknowledged by the system
    as being committed to stable storage)

    Casper
    --
    Expressed in this posting are my opinions. They are in no way related
    to opinions held by my employer, Sun Microsystems.
    Statements on Sun products included here are not gospel and may
    be fiction rather than truth.

  10. Re: Best way to simulate a failure

    On Jan 21, 11:30*am, Casper H.S. Dik wrote:
    > Mike writes:
    > >I have read that the STOP-A key sequence can cause Solaris OS file
    > >system corruption. *How would you compare the risk of using STOP-A to
    > >the risk of pulling the power cord? *I want a realistic failover test
    > >but I don't want to destroy the system in the process. *The system
    > >needs to be usable after my test has finished.

    >
    > Stop-A is no more damaging than pulling th epowercord (and not
    > as bad for your hardware as cycling power).
    >
    > Some filesystems types can be corrupted if the system goes away in the
    > middle of an update. *ZFS does not have that problem.
    > All filesystems can lose data (without corruption) when power goes out.
    > (Only data which has not yet been acknowledged by the system
    > as being committed to stable storage)


    ZFS was included in the 6/06 update to Solaris 10 in June 2006. We
    haven't moved to Solaris 10 yet so no luck there.

    Hmm, we are using NFS on top of VxFS. Given that VxFS uses intent
    logging (records pending changes to the file system structure in a
    circular intent log) and that the VxFS fsck utility performs an intent
    log replay, I am going to hazard a guess that we would stand a good
    chance of surviving a STOP-A. What do you think?

    Thanks for your help, Mike Jr.

    >
    > Casper
    > --
    > Expressed in this posting are my opinions. *They are in no way related
    > to opinions held by my employer, Sun Microsystems.
    > Statements on Sun products included here are not gospel and may
    > be fiction rather than truth.



  11. Re: Best way to simulate a failure

    Mike wrote:
    > On Jan 17, 11:57?am, haydude wrote:
    >> On 11 Jan, 20:43, Mike wrote:
    >>
    >> > I need to test a client's customized HA setup. ?I want to simulate the
    >> > failure of a single box running Solaris. ?As I understand it, a
    >> > poweroff issues a sync command which sends a FIN to all the network
    >> > connections. ?A real failure wouldn't be so accommodating. ?On the
    >> > other hand, I don't want to cause a real failure by pulling the power
    >> > cord too many times during a test.

    >>
    >> > Any suggestions? ?Does it matter what specific boxes I use? ?There are
    >> > several setups, each involving different hardware generations. ?A
    >> > general solution is preferred.

    >>
    >> Nothing so drastic as pulling drives, network cables or else.
    >>
    >> Just do this:
    >>
    >> STOP-A (or L1-A) on console

    > I have read that the STOP-A key sequence can cause Solaris OS file
    > system corruption. How would you compare the risk of using STOP-A to
    > the risk of pulling the power cord? I want a realistic failover test
    > but I don't want to destroy the system in the process. The system
    > needs to be usable after my test has finished.


    Would a machine even fail-over if nothing was wrong with it to start with?

  12. Re: Best way to simulate a failure

    On Jan 21, 3:00*pm, Cydrome Leader wrote:
    > Mike wrote:
    > > On Jan 17, 11:57?am, haydude wrote:
    > >> On 11 Jan, 20:43, Mike wrote:

    >
    > >> > I need to test a client's customized HA setup. ?I want to simulate the
    > >> > failure of a single box running Solaris. ?As I understand it, a
    > >> > poweroff issues a sync command which sends a FIN to all the network
    > >> > connections. ?A real failure wouldn't be so accommodating. ?On the
    > >> > other hand, I don't want to cause a real failure by pulling the power
    > >> > cord too many times during a test.

    >
    > >> > Any suggestions? ?Does it matter what specific boxes I use? ?There are
    > >> > several setups, each involving different hardware generations. ?A
    > >> > general solution is preferred.

    >
    > >> Nothing so drastic as pulling drives, network cables or else.

    >
    > >> Just do this:

    >
    > >> STOP-A (or L1-A) on console

    > > I have read that the STOP-A key sequence can cause Solaris OS file
    > > system corruption. *How would you compare the risk of using STOP-A to
    > > the risk of pulling the power cord? *I want a realistic failover test
    > > but I don't want to destroy the system in the process. *The system
    > > needs to be usable after my test has finished.

    >
    > Would a machine even fail-over if nothing was wrong with it to start with?- Hide quoted text -

    If the HA watcher stops getting a "heartbeat" then yes, it is designed
    to failover. The boxes are designed to get their state at boot up
    from their partner. This way, after a failure, the failed boxes
    partner will tell the failed box that its status is standby thus
    preventing the original primary from booting as primary.

    BTW, ping makes for a lousy heartbeat since the applications can be in
    serious trouble but ping still works.
    >
    > - Show quoted text -



  13. Re: Best way to simulate a failure

    Cydrome Leader writes:
    >Mike wrote:
    >
    >> I have read that the STOP-A key sequence can cause Solaris OS file
    >> system corruption. How would you compare the risk of using STOP-A to
    >> the risk of pulling the power cord? I want a realistic failover test
    >> but I don't want to destroy the system in the process. The system
    >> needs to be usable after my test has finished.

    >
    >Would a machine even fail-over if nothing was wrong with it to start with?
    >


    Of course a failover to the standby machine can occur even though the active
    machine has nothing wrong with it.

    -Greg
    --
    Do NOT reply via e-mail.
    Reply in the newsgroup.

  14. Re: Best way to simulate a failure

    [A complimentary Cc of this posting was sent to
    Mike
    ], who wrote in article <091ea7da-a44d-4616-bd98-9be4a7fd0ec9@e25g2000prg.googlegroups.com>:
    > > Would a machine even fail-over if nothing was wrong with it to start with?=


    > If the HA watcher stops getting a "heartbeat" then yes, it is designed
    > to failover. The boxes are designed to get their state at boot up
    > from their partner. This way, after a failure, the failed boxes
    > partner will tell the failed box that its status is standby thus
    > preventing the original primary from booting as primary.
    >
    > BTW, ping makes for a lousy heartbeat since the applications can be in
    > serious trouble but ping still works.


    a) I do not see how Stop A is going to differ to unplugging the
    network cable...

    b) If Stop A is fine for the purpose of testing the fail-over, but not
    fine from point of view of endangering the hardware, what about
    running your system in a computer emulator, and stopping the emulato?
    If it runs on x86, should be quite easy...

    c) If you can boot from CD+USB, then duplicating the disk to an USB,
    and unplugging USB cable should not endanger anything (but data on
    USB copy, which may be restored)...

    Hope this helps,
    Ilya

  15. Re: Best way to simulate a failure

    Greg Andrews wrote:
    > Cydrome Leader writes:
    >>Mike wrote:
    >>
    >>> I have read that the STOP-A key sequence can cause Solaris OS file
    >>> system corruption. How would you compare the risk of using STOP-A to
    >>> the risk of pulling the power cord? I want a realistic failover test
    >>> but I don't want to destroy the system in the process. The system
    >>> needs to be usable after my test has finished.

    >>
    >>Would a machine even fail-over if nothing was wrong with it to start with?
    >>

    >
    > Of course a failover to the standby machine can occur even though the active
    > machine has nothing wrong with it.
    >
    > -Greg


    And what might cause this? Probably more things than somebody hitting
    stop-a by accident.

  16. zfs backups (was: Re: Best way to simulate a failure)

    On 2008-01-21, Casper H.S Dik wrote:
    > Mike writes:
    >
    >>I have read that the STOP-A key sequence can cause Solaris OS file
    >>system corruption. How would you compare the risk of using STOP-A to
    >>the risk of pulling the power cord? I want a realistic failover test
    >>but I don't want to destroy the system in the process. The system
    >>needs to be usable after my test has finished.

    >
    > Stop-A is no more damaging than pulling th epowercord (and not
    > as bad for your hardware as cycling power).
    >
    > Some filesystems types can be corrupted if the system goes away in the
    > middle of an update. ZFS does not have that problem.
    > All filesystems can lose data (without corruption) when power goes out.
    > (Only data which has not yet been acknowledged by the system
    > as being committed to stable storage)


    Out of curiosity -- speaking of zfs -- I seem to have a problem
    with backing up zfs filesystems on Exabyte tapes using amanda. Here is
    a part of the daily status e-mail from a recent backup run:


    ================================================== ====================
    HOSTNAME DISK L ORIG-kB OUT-kB COMP% MMM:SS KB/s MMM:SS KB/s
    -------------------------- ------------------------------------- -------------

    [ ... ]

    burke --usr-share 1 230 64 27.8 0:05 7.7 0:02 35.7
    burke -e-p/home-1 0 10 32 320.0 0:00 4.0 0:02 17.8
    burke -e-p/home-4 0 10 32 320.0 0:00 4.2 0:02 19.1
    burke -e-p/home-6 0 10 32 320.0 0:00 4.0 0:02 17.4
    burke -/workplace 1 22970 2080 9.1 1:33 22.3 0:05 398.0
    burke /opt 1 30640 3072 10.0 0:52 58.8 0:06 530.8
    burke /package02 0 430960 175744 40.8 1:45 1668.9 0:19 9147.6
    burke /photos/CSU 0 10 32 320.0 0:00 4.1 0:02 17.8
    burke /photos/DoN 0 10 32 320.0 0:00 4.1 0:02 17.8
    burke -os/Dolores 0 10 32 320.0 0:00 3.9 0:06 5.2
    burke /src 0 2580440 1576576 61.1 12:20 2130.0 2:23 11053.3

    [ ... ]

    ================================================== ====================

    The size of the backup (10 kB) is the same whether it is a
    level-0 or a level-1 backup. FWIW, the backups are being done with
    gtar, not ufsdump and the same works fine for non-zfs filesystems. (The
    zfs ones are the "-e-p/home-?" ones and the /photos/* and "-os/Dolores"
    ones. The first three filesystems are on one zfs pool, and the next
    three are on another. Both pools are healthy according to "zpool
    status" and the filesystems show no problems with "zfs list"

    If it matters, the OS is Solaris 10 U3, the hardware is a Sun
    Fire 280R, and both pools are built from 18GB drives -- SCA drives in a
    D-1000 for one, and 68-pin drives in a Kingston rack-mount JBOD in the
    other -- both with a hot spare drive sitting there.

    Has anyone else experienced a similar problem?

    Thanks,
    DoN.

    --
    Email: | Voice (all times): (703) 938-4564
    (too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
    --- Black Holes are where God is dividing by zero ---

  17. Re: Best way to simulate a failure

    Cydrome Leader writes:
    >Greg Andrews wrote:
    >> Cydrome Leader writes:
    >>>
    >>>Would a machine even fail-over if nothing was wrong with it to start with?
    >>>

    >>
    >> Of course a failover to the standby machine can occur even though the active
    >> machine has nothing wrong with it.
    >>

    >
    >And what might cause this? Probably more things than somebody hitting
    >stop-a by accident.
    >


    Two words: "Network" and "People"

    Network connectivity to the active server can be interrupted, so the
    other server decides it must initiate failover and become active.

    It isn't all that apropos to this thread, but people can also decide
    to perform manual failover. For example, to upgrade or patch software
    that isn't malfunctioning. Remember the Daylight Savings time patches
    we all had to apply a year ago?


    -Greg
    --
    Do NOT reply via e-mail.
    Reply in the newsgroup.

  18. Re: zfs backups (was: Re: Best way to simulate a failure)

    On Jan 21, 8:29*pm, "DoN. Nichols" wrote:
    > On 2008-01-21, Casper H.S *Dik wrote:
    >
    >
    >
    >
    >
    > > Mike writes:

    >
    > >>I have read that the STOP-A key sequence can cause Solaris OS file
    > >>system corruption. *How would you compare the risk of using STOP-A to
    > >>the risk of pulling the power cord? *I want a realistic failover test
    > >>but I don't want to destroy the system in the process. *The system
    > >>needs to be usable after my test has finished.

    >
    > > Stop-A is no more damaging than pulling th epowercord (and not
    > > as bad for your hardware as cycling power).

    >
    > > Some filesystems types can be corrupted if the system goes away in the
    > > middle of an update. *ZFS does not have that problem.
    > > All filesystems can lose data (without corruption) when power goes out.
    > > (Only data which has not yet been acknowledged by the system
    > > as being committed to stable storage)

    >
    > * * * * Out of curiosity -- speaking of zfs -- I seem to have a problem
    > with backing up zfs filesystems on Exabyte tapes using amanda. *Here is
    > a part of the daily status e-mail from a recent backup run:
    >
    > *================================================= =====================
    > HOSTNAME * * DISK * * * *L ORIG-kB *OUT-kB *COMP% *MMM:SS * KB/s MMM:SS * KB/s
    > -------------------------- ------------------------------------- -------------
    >
    > * * * * [ ... ]
    >
    > burke * * * *--usr-share 1 * * 230 * * *64 * 27.8 * *0:05 * *7.7 * 0:02 * 35.7
    > burke * * * *-e-p/home-1 0 * * *10 * * *32 *320.0 * *0:00 * *4.0 * 0:02 * 17.8
    > burke * * * *-e-p/home-4 0 * * *10 * * *32 *320.0 * *0:00 * *4.2 * 0:02 * 19.1
    > burke * * * *-e-p/home-6 0 * * *10 * * *32 *320.0 * *0:00 * *4.0 * 0:02 * 17.4
    > burke * * * *-/workplace 1 * 22970 * *2080 * *9.1 * *1:33 * 22.3 * 0:05 *398.0
    > burke * * * */opt * * * *1 * 30640 * *3072 * 10.0 * *0:52 * 58.8 * 0:06 *530.8
    > burke * * * */package02 *0 *430960 *175744 * 40.8 * *1:45 1668.9 * 0:19 9147.6
    > burke * * * */photos/CSU 0 * * *10 * * *32 *320.0 * *0:00 * *4.1 * 0:02 * 17.8
    > burke * * * */photos/DoN 0 * * *10 * * *32 *320.0 * *0:00 * *4.1 * 0:02 * 17.8
    > burke * * * *-os/Dolores 0 * * *10 * * *32 *320.0 * *0:00 * *3.9 * 0:06 * *5.2
    > burke * * * */src * * * *0 2580440 1576576 * 61.1 * 12:20 2130.0 * 2:23 11053.3
    >
    > * * * * [ ... ]
    >
    > *================================================= =====================
    >
    > * * * * The size of the backup (10 kB) is the same whether it is a
    > level-0 or a level-1 backup. *FWIW, the backups are being done with
    > gtar, not ufsdump and the same works fine for non-zfs filesystems. *(The
    > zfs ones are the "-e-p/home-?" ones and the /photos/* and "-os/Dolores"
    > ones. *The first three filesystems are on one zfs pool, and the next
    > three are on another. *Both pools are healthy according to "zpool
    > status" and the filesystems show no problems with "zfs list"
    >
    > * * * * If it matters, the OS is Solaris 10 U3, the hardware is a Sun
    > Fire 280R, and both pools are built from 18GB drives -- SCA drives in a
    > D-1000 for one, and 68-pin drives in a Kingston rack-mount JBOD in the
    > other -- both with a hot spare drive sitting there.


    What version of Amanda are you using?

    >
    > * * * * Has anyone else experienced a similar problem?
    >
    > * * * * Thanks,
    > * * * * * * * * DoN.
    >
    > --
    > *Email: * * | Voice (all times): (703) 938-4564
    > * * * * (too) near Washington D.C. |http://www.d-and-d.com/dnichols/DoN.html
    > * * * * * *--- Black Holes are where God is dividing by zero ---- Hide quoted text -
    >
    > - Show quoted text -



  19. Re: zfs backups (was: Re: Best way to simulate a failure)

    On 2008-01-22, Mike wrote:
    > On Jan 21, 8:29*pm, "DoN. Nichols" wrote:


    [ ... ]

    >> * * * * Out of curiosity -- speaking of zfs -- I seem to have a problem
    >> with backing up zfs filesystems on Exabyte tapes using amanda. *Here is
    >> a part of the daily status e-mail from a recent backup run:
    >>
    >> *================================================= =====================
    >> HOSTNAME * * DISK * * * *L ORIG-kB *OUT-kB *COMP% *MMM:SS * KB/s MMM:SS * KB/s
    >> -------------------------- ------------------------------------- -------------
    >>
    >> * * * * [ ... ]
    >>
    >> burke * * * *--usr-share 1 * * 230 * * *64 * 27.8 * *0:05 * *7.7 * 0:02 * 35.7
    >> burke * * * *-e-p/home-1 0 * * *10 * * *32 *320.0 * *0:00 * *4.0 * 0:02 * 17.8
    >> burke * * * *-e-p/home-4 0 * * *10 * * *32 *320.0 * *0:00 * *4.2 * 0:02 * 19.1
    >> burke * * * *-e-p/home-6 0 * * *10 * * *32 *320.0 * *0:00 * *4.0 * 0:02 * 17.4
    >> burke * * * *-/workplace 1 * 22970 * *2080 * *9.1 * *1:33 * 22.3 * 0:05 *398.0
    >> burke * * * */opt * * * *1 * 30640 * *3072 * 10.0 * *0:52 * 58.8 * 0:06 *530.8
    >> burke * * * */package02 *0 *430960 *175744 * 40.8 * *1:45 1668.9 * 0:19 9147.6
    >> burke * * * */photos/CSU 0 * * *10 * * *32 *320.0 * *0:00 * *4.1 * 0:02 * 17.8
    >> burke * * * */photos/DoN 0 * * *10 * * *32 *320.0 * *0:00 * *4.1 * 0:02 * 17.8
    >> burke * * * *-os/Dolores 0 * * *10 * * *32 *320.0 * *0:00 * *3.9 * 0:06 * *5.2
    >> burke * * * */src * * * *0 2580440 1576576 * 61.1 * 12:20 2130.0 * 2:23 11053.3
    >>
    >> * * * * [ ... ]
    >>
    >> *================================================= =====================
    >>
    >> * * * * The size of the backup (10 kB) is the same whether it is a
    >> level-0 or a level-1 backup. *FWIW, the backups are being done with
    >> gtar, not ufsdump and the same works fine for non-zfs filesystems. *(The
    >> zfs ones are the "-e-p/home-?" ones and the /photos/* and "-os/Dolores"
    >> ones. *The first three filesystems are on one zfs pool, and the next
    >> three are on another. *Both pools are healthy according to "zpool
    >> status" and the filesystems show no problems with "zfs list"
    >>
    >> * * * * If it matters, the OS is Solaris 10 U3, the hardware is a Sun
    >> Fire 280R, and both pools are built from 18GB drives -- SCA drives in a
    >> D-1000 for one, and 68-pin drives in a Kingston rack-mount JBOD in the
    >> other -- both with a hot spare drive sitting there.

    >
    > What version of Amanda are you using?


    amanda-2.5.1p3 -- locally compiled to use ssh for inter-system
    communications -- but this problem situation is all on a single system.
    IIRC, this is a couple of steps newer than what came with the OS, but it
    is such a pain to dig out the version number from a compiled Amanda
    suite that I can't be sure. Do you know an easy way to determine the
    version if you don't have the sources handy?

    >>
    >> * * * * Has anyone else experienced a similar problem?


    Thanks,
    DoN.

    --
    Email: | Voice (all times): (703) 938-4564
    (too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
    --- Black Holes are where God is dividing by zero ---

+ Reply to Thread