[C180] HPMC... processor cache damaged? - HP UX

This is a discussion on [C180] HPMC... processor cache damaged? - HP UX ; Hello. After four power outages some weeks ago, all in the same day, my HP 9000 C180 started hanging. I guess that I know the answer, but need to ask. Perhaps there is something I can do to recover this ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: [C180] HPMC... processor cache damaged?

  1. [C180] HPMC... processor cache damaged?

    Hello.

    After four power outages some weeks ago, all in the same day, my HP 9000
    C180 started hanging. I guess that I know the answer, but need to ask.
    Perhaps there is something I can do to recover this machine (e.g.,
    cleaning it, or checking all cables and connectors to see if something
    is loose). But I think that I must fear the worst.

    Now the HPMC dump:


    HP-UX procyon B.11.11 U 9000/780 2000845441

    CPU-ID( Model ) = 0xe
    ----------------- Processor 0 HPMC Information ------------------

    Timestamp = Mon Dec 19 20:03:22 GMT 2005 (20:05:12:19:20:03:22)

    HPMC Chassis Codes = 0xcbf0 0x20b4 0x5008 0x5408 0x5508 0xcbfb


    General Registers 0 - 31
    00-03 0000000000000000 0000000000c15000 0000000000000000 0000000000aafa28
    04-07 0000000000000001 0000000000000000 0000000000000000 0000000000aafa28
    08-11 0000000000ab03d8 0000000000000001 0000015adf016425 0000015adf016425
    12-15 0000000000d3d2b8 0000000000ab2c90 0000000000000000 0000000000000000
    16-19 0000000000000000 0000000000000000 0000000040046000 0000000000000000
    20-23 0000000000000000 00000000000c6135 0000000000000000 0000000000000000
    24-27 0000000000000000 0000000000000001 0000000003887000 0000000000c20000
    28-31 fffffff0ffffffff 0000000003661260 0000000003661290 0000000000c17000

    Control Registers 0 - 31
    00-03 000000007cd43907 0000000000000000 0000000000000000 0000000000000000
    04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    08-11 0000000000003129 0000000000002afe 00000000000000c0 0000000000000032
    12-15 0000000000000000 0000000000000000 000000000002a000 fffffff0ffffffff
    16-19 0000015ae41c62cc 0000000000000000 00000000001805cc 000000002b7dcfff
    20-23 0000000010340002 00000000abeaec00 000000ff0804ff1f 0000000000000000
    24-27 0000000000aafa28 00000000000377f0 0000000000aafa28 8000000100065658
    28-31 0000000000000000 0000015ae40be438 0000000000175b00 0000000003661290

    Space Registers 0 - 7
    00-03 07bec800 09f51000 01fadc00 00000000
    04-07 00000000 ffffffff 04e3a800 00000000

    IIA Space = 0x0000000000000000
    IIA Offset = 0x00000000001805d0
    Check Type = 0x80000000
    CPU State = 0x9e000004
    Cache Check = 0x50000000
    TLB Check = 0x00000000
    Bus Check = 0x00000000
    Assists Check = 0x00000000
    Assist State = 0x00000000
    Path Info = 0x00000000
    System Responder Address = 0x0000000000000000
    System Requestor Address = 0x0000000000000000
    Check Summary = 0x8000020030006080
    Available Memory = 0x0000000000000000
    CPU Diagnose Register 2 = 0x0301000000002206
    CPU Status Register 0 = 0x3420c20000000000
    CPU Status Register 1 = 0x8000020000000000
    SADD LOG = 0x4800000000000000
    Read Short LOG = 0xc17ff0fffffa0010


    Memory Error Log Information:

    Timestamp = Mon Dec 19 20:03:23 GMT 2005 (20:05:12:19:20:03:23)

    No memory errors logged


    I/O Module Error Log Information:

    Timestamp = Mon Dec 19 20:03:23 GMT 2005 (20:05:12:19:20:03:23)

    Bus HPA Module Type Path Slt Md Sev Estat Requestor Responder
    --- ---------- ---------------- -------- -- -- ---- ----- ---------- ----------
    0 0xfff88000 I/O Adapter 8 2 0 he 0x0d 0x00000000 0x00000000
    0 0xfff8a000 I/O Adapter 10 2 2 he 0x0d 0x00000000 0x00000000

    Module Revision
    ------ --------
    Backplane Board 2
    I/O Board 2
    Processor Board 1
    PA 8000-8000 CPU 3.1
    PDC 6.2
    Memory Controller 9
    I/O Bus Adapter 0 15
    GSC to PCI Bridge 0x6801
    Built in FWSCSI 0
    FWSCSI Chip 3
    Multi I/O Chip 0
    EISA Interface 0
    I/O Bus Adapter 1 15
    GSC 1 40MHz
    GSC 2 40MHz


    Current Diagnose Registers for Processor HPA: 0xfffa0000
    DR2: 0x00000000 03010000
    CPU Status 0/1: 0x00000000 08002204 / 0x00000000 4420c200
    Note: prefetch is disabled.
    PDH Status 1/2: 0x00000000 / 0x00000000
    PDH Control: 0x00000000


    My guess is that it is a too bad HPMC error. In fact, the 0xcbf0 (HPMC
    handing initiated), 0x20b4 (cache errors), 0x5008, 0x5408 and 0x5508
    (bus transactions errors), and 0xcbfb (branching to the OS HPMC handler)
    chassis codes do not announce anything good.

    Is there something I can do to recover (or *try* to recover) the machine?
    It now only stays up for two, three, at most four, hours.

    Cheers,
    Igor.

  2. Re: [C180] HPMC... processor cache damaged?

    Hello Igor,

    igor@nospam.invalid wrote:
    > After four power outages some weeks ago, all in the same day, my HP 9000
    > C180 started hanging. I guess that I know the answer, but need to ask.
    > Perhaps there is something I can do to recover this machine (e.g.,
    > cleaning it, or checking all cables and connectors to see if something
    > is loose). But I think that I must fear the worst.
    >
    > Now the HPMC dump:

    Sniped.

    > My guess is that it is a too bad HPMC error. In fact, the 0xcbf0 (HPMC
    > handing initiated), 0x20b4 (cache errors), 0x5008, 0x5408 and 0x5508
    > (bus transactions errors), and 0xcbfb (branching to the OS HPMC handler)
    > chassis codes do not announce anything good.
    >
    > Is there something I can do to recover (or *try* to recover) the machine?
    > It now only stays up for two, three, at most four, hours.


    The Output from pimdecode :

    Summary:
    Below is a list of causes for the failure ordered from most likely to
    least likely. Replace assemblies in the order listed.
    CPU board

    Details:
    CBF0 HPMC_INITIATED HPMC handling initiated
    20B4 CPU 0 DCACHE_ODD_DATA_PARITY
    5008 CPU 0 Runway broad error. This IOA received a Broad_Error from
    another module. Look elsewhere for the cause of the Broad_Error.
    5408 U2 chip IOA 0 Runway broad error. This IOA received a Broad_Error
    from another module. Look elsewhere for the cause of the Broad_Error.
    5508 U2 chip IOA 1 Runway broad error. This IOA received a Broad_Error
    from another module. Look elsewhere for the cause of the Broad_Error.
    CBFB Branching to OS HPMC handler. Was in OS when failure occured.

    Check Summary = 0x8000020030006080
    Bit 0: HPMC is detected
    Bit 22: D-Cache HPMC
    Runway Error Type (Check Summary[32:35]) = 0x3
    Slave Detected Broadcast Error
    Bit 49: Parity LPMCs enabled.
    Bit 50: Parity-induced LPMC pending.
    Bit 56: Sticky bit. Set when a parity error is found in the even (bit 0)
    data word of the odd cache port. Only cleared with a move-to-diagnose
    instruction.


    Regards

    Lars

  3. Re: [C180] HPMC... processor cache damaged?

    Lars Bausch wrote:
    >
    > The Output from pimdecode :
    >
    > Summary:
    > Below is a list of causes for the failure ordered from most likely to
    > least likely. Replace assemblies in the order listed.
    > CPU board

    [...]

    Hi Lars,

    Thanks a lot for the detailed description of the HPMC failure.
    I was not aware about the availability of the PIM Analysis Tool.
    Nice tool, sadly it is not available for HP-UX or, even better,
    as source code compilable on any Unix system.

    I will carefully look at the output of pimdecode when trying
    to recover my workstation but my first though is that the output
    of pimcode shows that a failure in the processor data cache is
    the most probable source for this problem. That is really bad.

    Thanks a lot for the feedback on this matter.

    Best regards,
    Igor.

+ Reply to Thread