IDR on Netfinity NT4.0 / Smartraid 4L and SLR100 fails. (long!) - Veritas

This is a discussion on IDR on Netfinity NT4.0 / Smartraid 4L and SLR100 fails. (long!) - Veritas ; Hello, given following hardware: 2 x IBM Netfinity 5600 / ipsraid 4L / SLR100 on 7897 Controller using NT4.0 SP6a / Notes 5.0.5 and BENT 8.0 on the first machine. Before taking the second machine into production we tried to ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: IDR on Netfinity NT4.0 / Smartraid 4L and SLR100 fails. (long!)

  1. IDR on Netfinity NT4.0 / Smartraid 4L and SLR100 fails. (long!)

    Hello,



    given following hardware:

    2 x IBM Netfinity 5600 / ipsraid 4L / SLR100 on 7897 Controller

    using NT4.0 SP6a / Notes 5.0.5 and BENT 8.0 on the first machine.


    Before taking the second machine into production we tried to
    chill ourselves in recovering from a disaster situation.

    We tried really hard from 08:00 am to 10:00 pm yesterday
    without any success.

    We had a full backup of the first machine and our 4 disks freshly
    generated by the IDR-Wizard. All steps were repeated using a
    ISO-image as well.

    1.) We booted all three disks just to get the message that
    no harddisks are in the machine and that F3-Abort would be
    the only option.

    2.) We then booted into the F6-Mode of NT and specified the
    IO-related hardware manually. Then we finally got into the
    GUI-Mode where we were able to select an IDR file.
    After the selection of the idr file and confirming the
    repartitioning process we had the message "missing
    operating system" after the reboot. We never had any
    screen to look into the harddisk manager like documented.
    We also had no screen which showed any restoring. Also we
    had no dialog to choose the tape drive.

    3.) We booted into the F6-Mode of NT and dismissed the dialog to
    specify a IDR file. After that we were able to see the documented
    dialog which allows to call the harddisk manager. After we changed
    the small (500 Megs) system partition we ran into the reboot stating
    "operating system is missing".

    4.) We booted into F6-Mode, specified a 4000M System partition, dismissed
    the dialog to choose an IDR-file and created a data volume of the size
    of the remaining raid volume. Then we were able for the first time actually
    to restore something. The above steps (including reading TFM) took
    roughly 8 hours. After that we checked the Directory structure and boot.ini
    and were delighted to see that all files were restored. After the reboot
    we ran into an INACCESSIBLE BOOT DEVICE BSOD ...

    5.) Same steps as before - but we upgraded ipsraid.sys from 3.50 to V4.50
    without any success e.g. INACCESSIBLE BOOT DEVICE

    At 10:00 pm we were quite displeased as well with the procedure and
    the results.


    Now for the questions:

    Did someone actually manage to get a disaster recovery using this
    software on a real world server given quite identical hardware?

    Did we miss something to look for before performing backups and
    while doing the restore process?

    At the moment the conclusion to dump this software is hanging
    from the ceiling. Which working alternatives do exist when
    the fact is assumed that BENT 8.0 IDR is not working for us?


    t++

  2. Re: IDR on Netfinity NT4.0 / Smartraid 4L and SLR100 fails. (long!)

    Thomas Antepoth wrote:

    > given following hardware:
    > 2 x IBM Netfinity 5600 / ipsraid 4L / SLR100 on 7897 Controller
    > using NT4.0 SP6a / Notes 5.0.5 and BENT 8.0 on the first machine.


    Some more research done this day revealed that these machines
    although selled as identical are not.

    The master machine has a 3L controller Firmware 3.50.2 (ipsraidn.sys)
    and the slave machine has a 4L (nfr960.sys) firmware 4.50 controller
    built in.

    Although the recovery process places the nfr960.sys into the
    real winnt\system32\drivers directory it fails miserably to
    actually install the scsi driver into the freshly recovered
    windows nt system. This behaviour is just one step too short
    to provide a *real* disaster recovery.

    Had the master broken down (stolen, burned or whatever) - we'd
    no chance to recover except to reinstall the whole PDC (and its
    apps and its shares and its trusts and so on) and to replicate
    the user database (>200) from the next BDC machine in reach.

    Our IT disaster plans will have to be patched to match this IDR
    behaviour and we have to do some scripting to support this manual
    recovery procedure as the IDR concept has this race condition which
    is barely mentioned in the documentation.

    However, from this point of view the whole IDR concept seems
    quite useless to us as even slight differences in the hardware
    bring the concept to a breakdown and require skilled ($$$)
    assistance onsite anyway.

    Is there any point we missed while doing the training?

    What would have been if we relied on the IDR and when the
    disaster occurred these pitfalls opened up?

    Who would have payed the price of the results of missing
    precautions to the failure of the IDR procedure?

    Who pays the price of the lost 28 man hours in this training?

    If we had shelled out one week of work we had a site
    wide recovery procedure. This we will have to do now
    anyway.



    Best regards.


    t++

+ Reply to Thread