Help troubleshooting high interrupt problem - Debian

This is a discussion on Help troubleshooting high interrupt problem - Debian ; Hi, I have 10 machines in a cluster. All are exactly the same hardware and running debian-sarge. For 9 of them, the baseline stats are within about 5-10% of each other which is fairly normal. However, one of them has ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: Help troubleshooting high interrupt problem

  1. Help troubleshooting high interrupt problem


    Hi,

    I have 10 machines in a cluster. All are exactly the same hardware
    and running debian-sarge. For 9 of them, the baseline stats are
    within about 5-10% of each other which is fairly normal. However, one
    of them has a CPU utilization and load average 10 times higher than
    the others. Upon some investigation with vmstat, I discovered this
    machine has an interrupt rate about 4 times as high as the others.

    My question is, how can I troubleshoot the device that's causing this
    problem? I checked all of the parameters with sysctl and nothing is
    too out of the ordinary. The vmstat parameters were also all
    resonably close aside from CPU utilization and interrupt rate. Even
    when the machine is relatively idle the CPU still hovers around 35%
    use by system processes. The other machines would be less than 1%
    utilized.

    This is really driving me crazy and I need to know if it's a hardware
    problem so I can return it before the warranty expires.

    Thanks for any help.

  2. Re: Help troubleshooting high interrupt problem

    On Wed, 07 Dec 2005 02:25:17 -0800, Ian East wrote:

    >
    >Hi,
    >
    >I have 10 machines in a cluster. All are exactly the same hardware
    >and running debian-sarge. For 9 of them, the baseline stats are
    >within about 5-10% of each other which is fairly normal. However, one
    >of them has a CPU utilization and load average 10 times higher than
    >the others. Upon some investigation with vmstat, I discovered this
    >machine has an interrupt rate about 4 times as high as the others.
    >
    >My question is, how can I troubleshoot the device that's causing this
    >problem? I checked all of the parameters with sysctl and nothing is
    >too out of the ordinary. The vmstat parameters were also all
    >resonably close aside from CPU utilization and interrupt rate. Even
    >when the machine is relatively idle the CPU still hovers around 35%
    >use by system processes. The other machines would be less than 1%
    >utilized.
    >
    >This is really driving me crazy and I need to know if it's a hardware
    >problem so I can return it before the warranty expires.
    >
    >Thanks for any help.


    I have discovered sysstat and have a little more info. I take it this
    machine is toast. Both machines were practically idle.

    This is a normal machine:
    #sar -u -I XALL 30 1
    Linux 2.4.26-1-686-smp (cow25) 12/07/05
    Average: CPU %user %nice %system %iowait %idle
    Average: all 0.50 0.00 0.50 0.00 99.00

    Average: INTR intr/s
    Average: 14 5.20
    Average: 54 593.00
    Average: 55 347.00


    This is the funky machine:
    Average: CPU %user %nice %system %iowait %idle
    Average: all 1.46 0.00 38.67 0.00 59.87

    Average: INTR intr/s
    Average: 14 4.60
    Average: 16 49268.90
    Average: 18 49816.20
    Average: 19 49961.90
    Average: 54 341781.00
    Average: 55 342663.20

    Here are the devices... The machines are identical:
    # lspci -v
    0000:00:00.0 Host bridge: Intel Corp. Server Memory Controller Hub
    (rev 0c)
    Subsystem: Intel Corp.: Unknown device 1079
    Flags: bus master, fast devsel, latency 0
    Capabilities: [40] #09 [4105]

    0000:00:00.1 ff00: Intel Corp. Memory Controller Hub Error Reporting
    Register (rev 0c)
    Subsystem: Intel Corp.: Unknown device 1079
    Flags: fast devsel

    0000:00:01.0 System peripheral: Intel Corp. Memory Controller Hub DMA
    Controller (rev 0c)
    Subsystem: Intel Corp.: Unknown device 1079
    Flags: fast devsel, IRQ 16
    Memory at fcdff000 (32-bit, non-prefetchable) [disabled] [size=4K]
    Capabilities: [b0] Message Signalled Interrupts: 64bit- Queue=0/1
    Enable-

    0000:00:02.0 PCI bridge: Intel Corp. Memory Controller Hub PCI Express
    Port A0 (rev 0c) (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=01, subordinate=03, sec-latency=0
    I/O behind bridge: 0000d000-0000dfff
    Memory behind bridge: fce00000-fcffffff
    Capabilities: [50] Power Management version 2
    Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1
    Enable-
    Capabilities: [64] #10 [0041]

    0000:00:1d.0 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
    UHCI #1 (rev 02) (prog-if 00 [UHCI])
    Subsystem: Intel Corp.: Unknown device 1079
    Flags: bus master, medium devsel, latency 0, IRQ 16
    I/O ports at c800 [size=32]

    0000:00:1d.1 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
    UHCI #2 (rev 02) (prog-if 00 [UHCI])
    Subsystem: Intel Corp.: Unknown device 1079
    Flags: bus master, medium devsel, latency 0, IRQ 19
    I/O ports at c880 [size=32]

    0000:00:1d.2 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
    UHCI #3 (rev 02) (prog-if 00 [UHCI])
    Subsystem: Intel Corp.: Unknown device 1079
    Flags: bus master, medium devsel, latency 0, IRQ 18
    I/O ports at cc00 [size=32]

    0000:00:1d.7 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB2
    EHCI Controller (rev 02) (prog-if 20 [EHCI])
    Subsystem: Intel Corp.: Unknown device 1079
    Flags: bus master, medium devsel, latency 0, IRQ 23
    Memory at fcdfec00 (32-bit, non-prefetchable) [size=1K]
    Capabilities: [50] Power Management version 2
    Capabilities: [58] #0a [20a0]

    0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2)
    (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=00, secondary=04, subordinate=04, sec-latency=32
    I/O behind bridge: 0000e000-0000efff
    Memory behind bridge: fd000000-febfffff

    0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC
    Bridge (rev 02)
    Flags: bus master, medium devsel, latency 0

    0000:00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150
    Storage Controller (rev 02) (prog-if 8a [Master SecP PriP])
    Subsystem: Intel Corp.: Unknown device 3437
    Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 18
    I/O ports at
    I/O ports at
    I/O ports at
    I/O ports at
    I/O ports at fc00 [size=16]

    0000:00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus
    Controller (rev 02)
    Subsystem: Intel Corp.: Unknown device 1079
    Flags: medium devsel, IRQ 17
    I/O ports at 0540 [size=32]

    0000:01:00.0 PCI bridge: Intel Corp. PCI Bridge Hub A (rev 09)
    (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=01, secondary=02, subordinate=02, sec-latency=64
    Capabilities: [44] #10 [0071]
    Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0
    Enable-
    Capabilities: [6c] Power Management version 2
    Capabilities: [d8] PCI-X bridge device.

    0000:01:00.1 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt
    Controller A (rev 09) (prog-if 20 [IO(X)-APIC])
    Subsystem: Intel Corp.: Unknown device 1079
    Flags: bus master, fast devsel, latency 0
    Memory at fcefe000 (32-bit, non-prefetchable) [size=4K]
    Capabilities: [44] #10 [0001]
    Capabilities: [6c] Power Management version 2

    0000:01:00.2 PCI bridge: Intel Corp. PCI Bridge Hub B (rev 09)
    (prog-if 00 [Normal decode])
    Flags: bus master, fast devsel, latency 0
    Bus: primary=01, secondary=03, subordinate=03, sec-latency=64
    I/O behind bridge: 0000d000-0000dfff
    Memory behind bridge: fcf00000-fcffffff
    Capabilities: [44] #10 [0071]
    Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/0
    Enable-
    Capabilities: [6c] Power Management version 2
    Capabilities: [d8] PCI-X bridge device.

    0000:01:00.3 PIC: Intel Corp. PCI Bridge Hub I/OxAPIC Interrupt
    Controller B (rev 09) (prog-if 20 [IO(X)-APIC])
    Subsystem: Intel Corp.: Unknown device 1079
    Flags: bus master, fast devsel, latency 0
    Memory at fceff000 (32-bit, non-prefetchable) [size=4K]
    Capabilities: [44] #10 [0001]
    Capabilities: [6c] Power Management version 2

    0000:03:04.0 Ethernet controller: Intel Corp. 82546GB Gigabit Ethernet
    Controller (rev 03)
    Subsystem: Intel Corp. PRO/1000 MT Dual Port Network Connection
    Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 54
    Memory at fcfa0000 (64-bit, non-prefetchable) [size=128K]
    I/O ports at d880 [size=64]
    Capabilities: [dc] Power Management version 2
    Capabilities: [e4] PCI-X non-bridge device.
    Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0
    Enable-

    0000:03:04.1 Ethernet controller: Intel Corp. 82546GB Gigabit Ethernet
    Controller (rev 03)
    Subsystem: Intel Corp. PRO/1000 MT Dual Port Network Connection
    Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 55
    Memory at fcfe0000 (64-bit, non-prefetchable) [size=128K]
    I/O ports at dc00 [size=64]
    Capabilities: [dc] Power Management version 2
    Capabilities: [e4] PCI-X non-bridge device.
    Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0
    Enable-

    0000:04:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL
    (rev 27) (prog-if 00 [VGA])
    Subsystem: Intel Corp.: Unknown device 1079
    Flags: bus master, stepping, medium devsel, latency 64, IRQ 17
    Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
    I/O ports at e800 [size=256]
    Memory at febff000 (32-bit, non-prefetchable) [size=4K]
    Expansion ROM at febc0000 [disabled] [size=128K]
    Capabilities: [5c] Power Management version 2


  3. Re: Help troubleshooting high interrupt problem

    It seems that the crazy machine is using USB ???
    Interrupts 16,18,19 are USB related.

    Same hardware but not same software configuration ?



+ Reply to Thread