Re: Duplicating AIX disks from a Linux box ? - Aix

This is a discussion on Re: Duplicating AIX disks from a Linux box ? - Aix ; Mark Taylor wrote: > I will take a guess at the 888 > > Press the reset button and you will get some more numbers .. i.e. 888 > 102 700 0c2 .. basically means your system has crashed .. ...

+ Reply to Thread
Results 1 to 11 of 11

Thread: Re: Duplicating AIX disks from a Linux box ?

  1. Re: Duplicating AIX disks from a Linux box ?

    Mark Taylor wrote:
    > I will take a guess at the 888
    >
    > Press the reset button and you will get some more numbers .. i.e. 888
    > 102 700 0c2 .. basically means your system has crashed .. probably due
    > to device drivers .. it will be either
    >
    > 888 102 300 #c# or 888 102 700 #c#
    >
    > What you may want to pay attention to are the LEDs prior to the 888's
    > as this will tell you how far the boot process got before crashing.
    > above 299 and you are passing over to software control from hardware
    > control.
    >
    > As you have no o/s on the system, then you wont get asked to copy the
    > dump to tape or have a copy of the dump to analyse .. so you are
    > stuffed really ..
    >
    > What you may want to do is install every single driver file available
    > onto the system where you run the clone of the disk .. also bos.up and
    > bos.mp filesets .. the system should boot a "up" system with an "mp"
    > kernel, but not vica versa .. so bear than in mind ..
    >
    > Rgds
    > MarkT


    Thanks Mark.

    Until I can retrieve the error codes, I'm confused as to why device
    drivers might be crashing when running on identical hardware to that
    the image was taken from - any idea why this might be ?

    Am I also right in thinking, from the various replies so far, the dd
    approach is still a viable one ?

    Cheers
    Lee


  2. Re: Duplicating AIX disks from a Linux box ?

    >> I'm confused as to why device drivers might be crashing when running on identical hardware

    I would be suprised if your hardware was truly identical .. make a note
    of the LEDs prior to the 888's and then if we get to > 299 then we will
    know.


  3. Re: Duplicating AIX disks from a Linux box ?


    > Until I can retrieve the error codes, I'm confused as to why device
    > drivers might be crashing when running on identical hardware to that
    > the image was taken from - any idea why this might be ?


    test the theory ... dd to a disk, and then use it to boot on the same
    syste .. i.e. pull out the actual bootdisk and replace it with the dd'd
    one .. if it boots ok, then it should boot on an identical system .. if
    it doesn't, then they are not identical


  4. Re: Duplicating AIX disks from a Linux box ?

    muddyboots@gmail.com wrote:
    > Am I also right in thinking, from the various replies so far, the dd
    > approach is still a viable one ?


    The dd approach might work if you also clear the NVRAM on the target
    machine, but I'd only give that a 20 percent chance of success. The
    correct answer is to tell the customer that he/she is madder than a
    march hare, use the tape drive to do the restores. As time goes on,
    there will be many hard drive failures as the drives are currently on
    the wrong end of the "bathtub curve", so parts will need to be replaced.
    If the machines in question are 7011-220s (or -22Ws), you can also
    expect to see power supply failures, and eventually planar failures.
    The 7006s, 7009s, 7011s and 7012s were all good, rugged machines, but
    nothing lasts forever. The customer needs to develop a migration plan,
    the eBay supply of the oldest RS/6000s is already drying up because
    there's not enough profit in it for the sellers.

    Rick Ekblaw

  5. Re: Duplicating AIX disks from a Linux box ?


    Rick Ekblaw wrote:
    > muddyboots@gmail.com wrote:
    > > Am I also right in thinking, from the various replies so far, the dd
    > > approach is still a viable one ?

    >
    > The dd approach might work if you also clear the NVRAM on the target
    > machine, but I'd only give that a 20 percent chance of success. The
    > correct answer is to tell the customer that he/she is madder than a
    > march hare, use the tape drive to do the restores. As time goes on,
    > there will be many hard drive failures as the drives are currently on
    > the wrong end of the "bathtub curve", so parts will need to be replaced.
    > If the machines in question are 7011-220s (or -22Ws), you can also
    > expect to see power supply failures, and eventually planar failures.
    > The 7006s, 7009s, 7011s and 7012s were all good, rugged machines, but
    > nothing lasts forever. The customer needs to develop a migration plan,
    > the eBay supply of the oldest RS/6000s is already drying up because
    > there's not enough profit in it for the sellers.
    >
    > Rick Ekblaw


    Hi Rick

    I fully understand where you're coming from about the hardware, we are
    well aware of all the issues, but in this specific case suffice to say
    it just is not possible. Sorry I can't go into any more detail because
    of the nature of the work !

    Cheers
    Lee


  6. Re: Duplicating AIX disks from a Linux box ?


    Mark Taylor wrote:
    > >> I'm confused as to why device drivers might be crashing when running on identical hardware

    >
    > I would be suprised if your hardware was truly identical .. make a note
    > of the LEDs prior to the 888's and then if we get to > 299 then we will
    > know.


    Right, I have some error codes - but first an interesting development.
    Today I used dd to take a disk image, then wrote it back with dd to the
    same machine and it fails to boot. I have done this successfully
    before, but probably using different block size settings in dd.

    Any idea whether I need to specify a particular block size when using
    dd to take the image ?

    Anyway, the code sequences:
    218
    219
    220
    216
    290
    278
    291
    283
    271
    223
    299
    888 -> reset -> 103 -> 207 -> 299 -> 888

    Any pointers to what this means ?
    I presume it means an SRN of 207-299; I have found a copy of
    "Diagnostic Information for Multiple Bus Systems" but the code is not
    listed in there. I'm not sure if this is the right document for our
    machine types though...

    Any help much appreciated !
    Lee


  7. Re: Duplicating AIX disks from a Linux box ?

    Your not based in Hampshire are you, just north of the m27


  8. Re: Duplicating AIX disks from a Linux box ?

    > 888 -> reset -> 103 -> 207 -> 299 -> 888
    >
    > Any pointers to what this means ?
    > I presume it means an SRN of 207-299; I have found a copy of
    > "Diagnostic Information for Multiple Bus Systems" but the code is not
    > listed in there. I'm not sure if this is the right document for our
    > machine types though...
    >
    > Any help much appreciated !
    > Lee


    888 --> 103 --> 207 are hardware codes ..

    Here are the 3 digit LEDs for 433 (earliest I could find)

    299 is passing control over to s/w i.e. from the firmware to code on
    the disk .. maybe it cannot find the bootstrap .. hmm .. i will have to
    have a think about this ..

    Ref:
    http://publib16.boulder.ibm.com/pser...tm#A2wM360kevi

    Display Value: 299

    Explanation Progress indicator. IPL ROM passed control to loaded code.
    System Action If control transfer was not successful, the system
    halts.
    User Action If the system halts, record the SRN 101-299 in item 4 on
    the Problem Summary Form. Report the problem to your hardware service
    organization, and then stop. You have completed these procedures.

    So, you dd'd to a disk and used it in the same system it came from and
    it did not work .. ok, so the procedure is wrong at the moment. What
    block size did you use ? just paste the whole command / procedure you
    used.


  9. Re: Duplicating AIX disks from a Linux box ?

    > Here are the 3 digit LEDs for 433 (earliest I could find)
    >
    > 299 is passing control over to s/w i.e. from the firmware to code on
    > the disk .. maybe it cannot find the bootstrap .. hmm .. i will have to
    > have a think about this ..
    >
    > Ref:
    > http://publib16.boulder.ibm.com/pser...tm#A2wM360kevi
    >
    > Display Value: 299
    >
    > Explanation Progress indicator. IPL ROM passed control to loaded code.
    > System Action If control transfer was not successful, the system
    > halts.
    > User Action If the system halts, record the SRN 101-299 in item 4 on
    > the Problem Summary Form. Report the problem to your hardware service
    > organization, and then stop. You have completed these procedures.
    >
    > So, you dd'd to a disk and used it in the same system it came from and
    > it did not work .. ok, so the procedure is wrong at the moment. What
    > block size did you use ? just paste the whole command / procedure you
    > used.


    ....so it sounds like a problem with the way the disk is being copied.

    On another same-spec machine, I took an image off, and wrote it back,
    with the following command (remember this is run from a linux box
    connected to the AIX box disk via SCSI...):

    dd if=/dev/sda of=/tmp/image bs=4194304 count=130
    then
    dd if=/tmp/image of=/dev/sda bs=4194304 count=130

    ....where /dev/sda is the device that Linux has mapped the AIX disk to.
    This worked OK and the machine booted fine.

    I chose that block size and count because it matched the info I got
    from running lspv on the AIX box, 4Mb is the PP size and there were 130
    of them - which adds up to 520Mb.

    However, if on my Linux box I run fdisk on the AIX disk, it doesn't
    understand what's on there but it DOES report the number of
    sectors/cylinders etc - and this came to slightly more than 520Mb
    (haven't got the exact figures to hand, I'm not near the machine).
    I figured I needed to copy every last sector of the disk, as the LVM
    could be spreading data anywhere.....so:

    ....on a different, but same-spec AIX box, I ran the following:
    dd if=/dev/sda of=/tmp/image2 bs=512
    then
    dd if=/dev/sda of=/tmp/image2 bs=512

    ....IE without a count parameter, to make sure the entire disk was read
    and copied back.
    As far as I understand it, this should be an exact disk duplicate, but
    the machine no longers boots and gives the error codes I described
    previously.

    Yours, confused [ not in Hampshire ]


  10. Re: Duplicating AIX disks from a Linux box ?

    A further update....

    I decided to readback data from the AIX disk and compare with the data
    written to it.

    In summary, I wrote the first 50k (approx) of the disk 10 times, from
    the same source image, and read it back 10 times. I used diff to
    compare the source and each of the 10 readbacks.
    9 of the readbacks were identical to the source, 1 was different.
    So if I'm getting roughly 1 in 10 failures on around 50kb of data, it
    suggests there's very little hope of writing the ~540MB disk without it
    being corrupt.

    So why is it getting corrputed ? That's the next question to answer.


  11. Re: Duplicating AIX disks from a Linux box ?

    muddyboots@gmail.com wrote:
    > I presume it means an SRN of 207-299; I have found a copy of
    > "Diagnostic Information for Multiple Bus Systems" but the code is not
    > listed in there. I'm not sure if this is the right document for our
    > machine types though...


    Probably not, I'd guess that you want the "Diagnostic Information for
    Microchannel Bus Systems", and in that document on page 25-15 you find
    the description for SRN 207-299, which is a familiar classic to me:

    "Description: Unexpected program exception interrupt when control is
    passed to IPL program. Be sure there is a valid IPL program on the IPL
    device. If there is, exchange the media or the device."

    Your failing function codes are 132 ("The program that just loaded may
    be damaged") and 210 (planar or CPU card, depending on the machine type).

    Basically, what you're trying to do has always been problematic with
    single-ended SCSI buses -- when using the SCSI bus as a dual-ended
    interface, differential SCSI works much better. With careful
    measurements and a lot of hard work, you might be able to create a
    balanced intersection between your two SCSI controllers -- good luck
    with that, you'll need it.

    Rick Ekblaw

+ Reply to Thread