x64 2K3 R2 BSODs when FC Tape Library is Rebooted - Storage

This is a discussion on x64 2K3 R2 BSODs when FC Tape Library is Rebooted - Storage ; I have a number of 2003 R2 x64 and x86 SP1 servers attached to a fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s Storport driver (the ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: x64 2K3 R2 BSODs when FC Tape Library is Rebooted

  1. x64 2K3 R2 BSODs when FC Tape Library is Rebooted

    I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
    fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries
    servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s
    Storport driver (the latest version, or course) for multipathing on all
    servers. Furthermore, we're using IBMs StorageManager Agents (again, latest
    version) on all hosts. Also part of that SAN is a Quantum PX502 robotic
    tape-library which is also Fibre-Channel and attached directly to the SAN
    (i.e. not physically attached to a server). We are not using any kind of SAN
    partitioning, so all hosts attached to the SAN see the tape drives and robot.

    Here's what happens. After rebooting the tape library, some or all of my x64
    servers BSOD with a 0x0A stop error and your typical IRQL_NOT_LESS...
    message. x86 servers have yet to be affected. Debugging the resulting memory
    dump shows that storport.sys is the culprit. Additionally, soon before the
    server BSODs, the system event has log entries from PlugPlayManager saying
    that the tape drives and robot disappeared without being prepared for removal
    (Event ID 12). Obviously, preparing the hardware for removal on all my
    servers is out of the question, besides, the hardware never shows up in the
    list of items to be safely removed.

    I'm very aware that SP2 is out for 2k3, and I intend to install that someday
    (once I recover from all the late-night work I've had to put in dealing with
    this problem); however, I'm not confident that will solve the problem since I
    will still have the same version of the storport driver.

    So, short of calling M$ and paying for a support incident, any other bright
    ideas? I'd appreciate sparing me of basic "update firmware" "update driver"
    suggestions as those are obvious and already done.

  2. Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted

    Well, a BSOD is by definition caused by either a bug in a driver or a bug in
    HW/Firmware. So, your request to not suggest updating driver/firmware may
    not get you very far.

    That said, I think you are running into a known bug that was fixed &
    released in Feb (KB Article 932755). You can download the fix directly via
    http://support.microsoft.com - make sure to grab the correct package.

    There is a SP1 & SP2 version of the fix - so you could get relief w/out SP2
    if you absolutely needed to. I would recommend going to SP2 first b/c there
    are a number of updates & perf improvements that are just general goodness.
    Then adding the fix on-top should get you where you need to be.

    If the problem persists after that, then I would recommend giving support a
    call. If the issue is a bug, we refund the cost of the incident (or
    re-credit your account if you have a Premier support contract).


    Pat



    "Eric" wrote in message
    news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
    >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
    > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries
    > servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s
    > Storport driver (the latest version, or course) for multipathing on all
    > servers. Furthermore, we're using IBMs StorageManager Agents (again,
    > latest
    > version) on all hosts. Also part of that SAN is a Quantum PX502 robotic
    > tape-library which is also Fibre-Channel and attached directly to the SAN
    > (i.e. not physically attached to a server). We are not using any kind of
    > SAN
    > partitioning, so all hosts attached to the SAN see the tape drives and
    > robot.
    >
    > Here's what happens. After rebooting the tape library, some or all of my
    > x64
    > servers BSOD with a 0x0A stop error and your typical IRQL_NOT_LESS...
    > message. x86 servers have yet to be affected. Debugging the resulting
    > memory
    > dump shows that storport.sys is the culprit. Additionally, soon before the
    > server BSODs, the system event has log entries from PlugPlayManager saying
    > that the tape drives and robot disappeared without being prepared for
    > removal
    > (Event ID 12). Obviously, preparing the hardware for removal on all my
    > servers is out of the question, besides, the hardware never shows up in
    > the
    > list of items to be safely removed.
    >
    > I'm very aware that SP2 is out for 2k3, and I intend to install that
    > someday
    > (once I recover from all the late-night work I've had to put in dealing
    > with
    > this problem); however, I'm not confident that will solve the problem
    > since I
    > will still have the same version of the storport driver.
    >
    > So, short of calling M$ and paying for a support incident, any other
    > bright
    > ideas? I'd appreciate sparing me of basic "update firmware" "update
    > driver"
    > suggestions as those are obvious and already done.



  3. Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted

    Right, the BSOD IS caused by a driver--the storport.sys driver in this case.
    The reason I was trying to avoid responses that suggest that I update
    drivers and firmware was because I've already done all that as one of the
    first troubleshooting steps. Also, I am already on that latest storport
    version (5.2.3790.2880 for SP1) you mention in the KB article. I mentioned
    all this in my original post.

    "Pat [MSFT]" wrote:

    > Well, a BSOD is by definition caused by either a bug in a driver or a bug in
    > HW/Firmware. So, your request to not suggest updating driver/firmware may
    > not get you very far.
    >
    > That said, I think you are running into a known bug that was fixed &
    > released in Feb (KB Article 932755). You can download the fix directly via
    > http://support.microsoft.com - make sure to grab the correct package.
    >
    > There is a SP1 & SP2 version of the fix - so you could get relief w/out SP2
    > if you absolutely needed to. I would recommend going to SP2 first b/c there
    > are a number of updates & perf improvements that are just general goodness.
    > Then adding the fix on-top should get you where you need to be.
    >
    > If the problem persists after that, then I would recommend giving support a
    > call. If the issue is a bug, we refund the cost of the incident (or
    > re-credit your account if you have a Premier support contract).
    >
    >
    > Pat
    >
    >
    >
    > "Eric" wrote in message
    > news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
    > >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
    > > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries
    > > servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s
    > > Storport driver (the latest version, or course) for multipathing on all
    > > servers. Furthermore, we're using IBMs StorageManager Agents (again,
    > > latest
    > > version) on all hosts. Also part of that SAN is a Quantum PX502 robotic
    > > tape-library which is also Fibre-Channel and attached directly to the SAN
    > > (i.e. not physically attached to a server). We are not using any kind of
    > > SAN
    > > partitioning, so all hosts attached to the SAN see the tape drives and
    > > robot.
    > >
    > > Here's what happens. After rebooting the tape library, some or all of my
    > > x64
    > > servers BSOD with a 0x0A stop error and your typical IRQL_NOT_LESS...
    > > message. x86 servers have yet to be affected. Debugging the resulting
    > > memory
    > > dump shows that storport.sys is the culprit. Additionally, soon before the
    > > server BSODs, the system event has log entries from PlugPlayManager saying
    > > that the tape drives and robot disappeared without being prepared for
    > > removal
    > > (Event ID 12). Obviously, preparing the hardware for removal on all my
    > > servers is out of the question, besides, the hardware never shows up in
    > > the
    > > list of items to be safely removed.
    > >
    > > I'm very aware that SP2 is out for 2k3, and I intend to install that
    > > someday
    > > (once I recover from all the late-night work I've had to put in dealing
    > > with
    > > this problem); however, I'm not confident that will solve the problem
    > > since I
    > > will still have the same version of the storport driver.
    > >
    > > So, short of calling M$ and paying for a support incident, any other
    > > bright
    > > ideas? I'd appreciate sparing me of basic "update firmware" "update
    > > driver"
    > > suggestions as those are obvious and already done.

    >
    >


  4. Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted

    Just because storport is in the bugcheck doesn't mean it's the cause. It
    may not have handled a buggy miniport correctly, but it's very possible the
    real problem is in a different driver.

    can you paste the !analyze -v output with the microsoft sym server set in
    your .sympath

    "Eric" wrote in message
    news:805E1809-2AEE-47F5-B7AC-4593B8324122@microsoft.com...
    > Right, the BSOD IS caused by a driver--the storport.sys driver in this
    > case.
    > The reason I was trying to avoid responses that suggest that I update
    > drivers and firmware was because I've already done all that as one of the
    > first troubleshooting steps. Also, I am already on that latest storport
    > version (5.2.3790.2880 for SP1) you mention in the KB article. I
    > mentioned
    > all this in my original post.
    >
    > "Pat [MSFT]" wrote:
    >
    >> Well, a BSOD is by definition caused by either a bug in a driver or a bug
    >> in
    >> HW/Firmware. So, your request to not suggest updating driver/firmware
    >> may
    >> not get you very far.
    >>
    >> That said, I think you are running into a known bug that was fixed &
    >> released in Feb (KB Article 932755). You can download the fix directly
    >> via
    >> http://support.microsoft.com - make sure to grab the correct package.
    >>
    >> There is a SP1 & SP2 version of the fix - so you could get relief w/out
    >> SP2
    >> if you absolutely needed to. I would recommend going to SP2 first b/c
    >> there
    >> are a number of updates & perf improvements that are just general
    >> goodness.
    >> Then adding the fix on-top should get you where you need to be.
    >>
    >> If the problem persists after that, then I would recommend giving support
    >> a
    >> call. If the issue is a bug, we refund the cost of the incident (or
    >> re-credit your account if you have a Premier support contract).
    >>
    >>
    >> Pat
    >>
    >>
    >>
    >> "Eric" wrote in message
    >> news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
    >> >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
    >> > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries
    >> > servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s
    >> > Storport driver (the latest version, or course) for multipathing on all
    >> > servers. Furthermore, we're using IBMs StorageManager Agents (again,
    >> > latest
    >> > version) on all hosts. Also part of that SAN is a Quantum PX502 robotic
    >> > tape-library which is also Fibre-Channel and attached directly to the
    >> > SAN
    >> > (i.e. not physically attached to a server). We are not using any kind
    >> > of
    >> > SAN
    >> > partitioning, so all hosts attached to the SAN see the tape drives and
    >> > robot.
    >> >
    >> > Here's what happens. After rebooting the tape library, some or all of
    >> > my
    >> > x64
    >> > servers BSOD with a 0x0A stop error and your typical IRQL_NOT_LESS...
    >> > message. x86 servers have yet to be affected. Debugging the resulting
    >> > memory
    >> > dump shows that storport.sys is the culprit. Additionally, soon before
    >> > the
    >> > server BSODs, the system event has log entries from PlugPlayManager
    >> > saying
    >> > that the tape drives and robot disappeared without being prepared for
    >> > removal
    >> > (Event ID 12). Obviously, preparing the hardware for removal on all my
    >> > servers is out of the question, besides, the hardware never shows up in
    >> > the
    >> > list of items to be safely removed.
    >> >
    >> > I'm very aware that SP2 is out for 2k3, and I intend to install that
    >> > someday
    >> > (once I recover from all the late-night work I've had to put in dealing
    >> > with
    >> > this problem); however, I'm not confident that will solve the problem
    >> > since I
    >> > will still have the same version of the storport driver.
    >> >
    >> > So, short of calling M$ and paying for a support incident, any other
    >> > bright
    >> > ideas? I'd appreciate sparing me of basic "update firmware" "update
    >> > driver"
    >> > suggestions as those are obvious and already done.

    >>
    >>



  5. Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted

    ************************************************** *****************************
    *
    *
    * Bugcheck Analysis
    *
    *
    *
    ************************************************** *****************************

    IRQL_NOT_LESS_OR_EQUAL (a)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high. This is usually
    caused by drivers using improper addresses.
    If a kernel debugger is available get the stack backtrace.
    Arguments:
    Arg1: 000000000005defd, memory referenced
    Arg2: 0000000000000002, IRQL
    Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
    Arg4: fffff800013e0579, address which referenced memory

    Debugging Details:
    ------------------

    *** Error in in reading nt!_ETHREAD @ 0000000000000000
    *** Error in in reading nt!_ETHREAD @ 0000000000000000
    *** Error in in reading nt!_ETHREAD @ 0000000000000000

    READ_ADDRESS: 000000000005defd

    CURRENT_IRQL: 2

    FAULTING_IP:
    nt!MiFindContiguousMemoryInPool+b9
    fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    BUGCHECK_STR: 0xA

    LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890

    LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)

    Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
    threads
    Threads: fffffade708fabf0-01<*>
    1 total locks, 1 locks currently held

    FAULTING_THREAD: fffffade708fabf0

    PNP_TRIAGE:
    Lock address : 0xfffff800011deb00
    Thread Count : 1
    Thread address: 0xfffffade708fabf0
    Thread wait : 0xee6922c

    TRAP_FRAME: fffffade5bca8d60 -- (.trap fffffade5bca8d60)
    NOTE: The trap frame does not contain all registers.
    Some register values may be zeroed.
    rax=0000000000000000 rbx=fffffade6e122000 rcx=0000000000112242
    rdx=00000000001d43ef rsi=0000000000000001 rdi=fffffade5f55d660
    rip=fffff800013e0579 rsp=fffffade5bca8ef0 rbp=0000000000100000
    r8=00000000000ffffe r9=0000000000112242 r10=fffffade6e442000
    r11=00000000000fffff r12=0000000000000000 r13=0000000000000000
    r14=0000000000000000 r15=0000000000000000
    iopl=0 nv up ei ng nz ac po cy
    nt!MiFindContiguousMemoryInPool+0xb9:
    fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
    ds:fffffade`6e122010=0000000000000002
    Resetting default scope

    STACK_TEXT:
    fffffade`5bca8bd8 fffff800`0104e5b4 : 00000000`0000000a 00000000`0005defd
    00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
    fffffade`5bca8be0 fffff800`0104d587 : 00000006`00000000 fffffa80`033c2090
    00000000`00000000 00000000`00000001 : nt!KiBugCheckDispatch+0x74
    fffffade`5bca8d60 fffff800`013e0579 : 00000000`00000002 00000000`6d436d4d
    00000000`00000002 00000000`00001000 : nt!KiPageFault+0x207
    fffffade`5bca8ef0 fffff800`013797a2 : 00000000`00000000 00000000`000fffff
    00000000`00100000 00000000`4d546100 : nt!MiFindContiguousMemoryInPool+0xb9
    fffffade`5bca8f90 fffff800`010edfee : 00000000`00000004 fffffade`6e27b000
    fffffade`6e122000 00000000`00000002 : nt!MiFindContiguousMemory+0x52
    fffffade`5bca8ff0 fffff800`010ee07b : 00000000`00000001 00000000`00000080
    fffffade`6d2343f0 00000000`ffffffff : nt!MiAllocateContiguousMemory+0x12e
    fffffade`5bca9070 fffffade`5b0764e2 : 00000000`00000000 00000000`00000000
    fffffa80`021b2a30 fffffade`6d2343f0 :
    nt!MmAllocateContiguousMemorySpecifyCache+0x5b
    fffffade`5bca90b0 fffffade`5b07612d : fffffade`6d2343f0 fffffade`708391b0
    fffffade`708391b0 fffffade`5bca91f0 : storport!RaidUnitAllocateResources+0x370
    fffffade`5bca9120 fffffade`5b06c71f : 00000000`00010200 00000000`00010200
    fffffade`5bca9260 00000000`00000000 : storport!RaidCreateUnit+0x13d
    fffffade`5bca9180 fffffade`5b06bfbf : 00000000`00000000 00000000`00000000
    00000000`00000002 00000000`00010200 : storport!RaidBusEnumeratorGetUnit+0x6f
    fffffade`5bca91f0 fffffade`5b06946f : fffffade`00fe0200 00000000`00000000
    00000000`00000001 00000000`00000002 : storport!RaidBusEnumeratorVisitUnit+0x4f
    fffffade`5bca92f0 fffffade`5b06957d : 00000000`00000000 fffffade`572c0d6d
    fffffade`5b87b180 fffffade`708391b0 : storport!RaidAdapterEnumerateBus+0xbf
    fffffade`5bca9470 fffffade`5b088c8f : fffffade`6d9a62e0 fffffade`6cd14ae0
    00000000`a0000003 fffffade`708391b0 : storport!RaidAdapterRescanBus+0x8d
    fffffade`5bca9530 fffffade`5b08890b : 00000000`00000000 fffffade`6cd14ae0
    00000000`00000000 fffffade`708391b0 :
    storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
    fffffade`5bca95d0 fffffade`5b089eef : fffffade`70893650 fffffade`5bca9820
    fffffade`70839060 fffffade`6cd14ae0 : storport!RaidAdapterPnpIrp+0x14b
    fffffade`5bca96a0 fffffade`5b85d949 : fffffade`5bca97e0 fffffade`70893650
    fffffade`6cd14ae0 fffffade`6fcf6bb0 : storport!RaDriverPnpIrp+0xcf
    fffffade`5bca9730 fffff800`0124d573 : fffffade`6cd14ae0 fffffade`5bca9820
    fffffade`70893500 fffffade`6fcf6bb0 : mpspfltr!MPSPQueryDeviceRelations+0xa9
    fffffade`5bca9790 fffff800`010dc4c1 : 00000000`00000000 00000000`00000002
    00000000`00000000 fffffade`6fcf6a70 : nt!IopSynchronousCall+0x14a
    fffffade`5bca9800 fffff800`013531ea : fffffade`709019b0 fffff800`01014f00
    fffffade`5bca98e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
    fffffade`5bca9890 fffff800`01354e95 : 00000000`000000c3 00000000`00000000
    00000000`00000002 fffffade`6f42b2f0 : nt!PipProcessDevNodeTree+0x342
    fffffade`5bca9c20 fffff800`010d8598 : fffff800`00000003 00000000`00000000
    fffffade`708fabf0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
    fffffade`5bca9c70 fffff800`0105507c : 00000000`00000000 fffff800`011decc0
    fffff800`010d8230 fffffade`708fabf0 : nt!PipDeviceActionWorker+0x368
    fffffade`5bca9d00 fffff800`01299cae : fffffade`708fabf0 00000000`00000080
    fffffade`708fabf0 fffffade`5b883680 : nt!ExpWorkerThread+0x13b
    fffffade`5bca9d70 fffff800`0102bbe6 : fffffade`5b87b180 fffffade`708fabf0
    fffffade`5b883680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e
    fffffade`5bca9dd0 00000000`00000000 : 00000000`00000000 00000000`00000000
    00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16


    STACK_COMMAND: .thread fffffade708fabf0 ; kb

    FOLLOWUP_IP:
    storport!RaidUnitAllocateResources+370
    fffffade`5b0764e2 4885c0 test rax,rax

    SYMBOL_STACK_INDEX: 7

    FOLLOWUP_NAME: MachineOwner

    SYMBOL_NAME: storport!RaidUnitAllocateResources+370

    MODULE_NAME: storport

    IMAGE_NAME: storport.sys

    DEBUG_FLR_IMAGE_TIMESTAMP: 448e954c

    FAILURE_BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370

    BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370

    Followup: MachineOwner
    ---------

    "Kenny Speer" wrote:

    > Just because storport is in the bugcheck doesn't mean it's the cause. It
    > may not have handled a buggy miniport correctly, but it's very possible the
    > real problem is in a different driver.
    >
    > can you paste the !analyze -v output with the microsoft sym server set in
    > your .sympath
    >
    > "Eric" wrote in message
    > news:805E1809-2AEE-47F5-B7AC-4593B8324122@microsoft.com...
    > > Right, the BSOD IS caused by a driver--the storport.sys driver in this
    > > case.
    > > The reason I was trying to avoid responses that suggest that I update
    > > drivers and firmware was because I've already done all that as one of the
    > > first troubleshooting steps. Also, I am already on that latest storport
    > > version (5.2.3790.2880 for SP1) you mention in the KB article. I
    > > mentioned
    > > all this in my original post.
    > >
    > > "Pat [MSFT]" wrote:
    > >
    > >> Well, a BSOD is by definition caused by either a bug in a driver or a bug
    > >> in
    > >> HW/Firmware. So, your request to not suggest updating driver/firmware
    > >> may
    > >> not get you very far.
    > >>
    > >> That said, I think you are running into a known bug that was fixed &
    > >> released in Feb (KB Article 932755). You can download the fix directly
    > >> via
    > >> http://support.microsoft.com - make sure to grab the correct package.
    > >>
    > >> There is a SP1 & SP2 version of the fix - so you could get relief w/out
    > >> SP2
    > >> if you absolutely needed to. I would recommend going to SP2 first b/c
    > >> there
    > >> are a number of updates & perf improvements that are just general
    > >> goodness.
    > >> Then adding the fix on-top should get you where you need to be.
    > >>
    > >> If the problem persists after that, then I would recommend giving support
    > >> a
    > >> call. If the issue is a bug, we refund the cost of the incident (or
    > >> re-credit your account if you have a Premier support contract).
    > >>
    > >>
    > >> Pat
    > >>
    > >>
    > >>
    > >> "Eric" wrote in message
    > >> news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
    > >> >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
    > >> > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries
    > >> > servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s
    > >> > Storport driver (the latest version, or course) for multipathing on all
    > >> > servers. Furthermore, we're using IBMs StorageManager Agents (again,
    > >> > latest
    > >> > version) on all hosts. Also part of that SAN is a Quantum PX502 robotic
    > >> > tape-library which is also Fibre-Channel and attached directly to the
    > >> > SAN
    > >> > (i.e. not physically attached to a server). We are not using any kind
    > >> > of
    > >> > SAN
    > >> > partitioning, so all hosts attached to the SAN see the tape drives and
    > >> > robot.
    > >> >
    > >> > Here's what happens. After rebooting the tape library, some or all of
    > >> > my
    > >> > x64
    > >> > servers BSOD with a 0x0A stop error and your typical IRQL_NOT_LESS...
    > >> > message. x86 servers have yet to be affected. Debugging the resulting
    > >> > memory
    > >> > dump shows that storport.sys is the culprit. Additionally, soon before
    > >> > the
    > >> > server BSODs, the system event has log entries from PlugPlayManager
    > >> > saying
    > >> > that the tape drives and robot disappeared without being prepared for
    > >> > removal
    > >> > (Event ID 12). Obviously, preparing the hardware for removal on all my
    > >> > servers is out of the question, besides, the hardware never shows up in
    > >> > the
    > >> > list of items to be safely removed.
    > >> >
    > >> > I'm very aware that SP2 is out for 2k3, and I intend to install that
    > >> > someday
    > >> > (once I recover from all the late-night work I've had to put in dealing
    > >> > with
    > >> > this problem); however, I'm not confident that will solve the problem
    > >> > since I
    > >> > will still have the same version of the storport driver.
    > >> >
    > >> > So, short of calling M$ and paying for a support incident, any other
    > >> > bright
    > >> > ideas? I'd appreciate sparing me of basic "update firmware" "update
    > >> > driver"
    > >> > suggestions as those are obvious and already done.
    > >>
    > >>

    >


  6. Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted

    Unfortunately, all the analyze shows is that while at DISPATCH_LEVEL (irql
    2) the system attempted to read a bogus pointer, most likely it's simply
    uninitialized (based on the garbage address 5defd) ...

    If you can reproduce this easily, I suggest you enable driver verifier on
    the mpio, storport, and miniport drivers.

    Also, if you have SANSurfer installed, please uninstall it. There is a
    kernel mode service which queries the devices on the SAN and can definetely
    cause issues.

    To enable verifier do this:
    1. start->run->verifier
    2. choose "Create standard settings"
    3. choose "Select driver names"
    4. check the following: mpdev.sys mpio.sys mpspfltr.sys ql2300.sys
    storport.sys
    5. reboot

    Then reboot your tape library. Many times, when verifier is running it will
    catch issues earlier than the bugcheck will and should be much more
    accurate. Driver Verifier will still BSOD your host, but the dump will
    contain better info. You also most likely don't need a complete memory
    dump, kernel dump should be sufficient.

    Since it seems like you are an IBM shop, you should be able to report this
    issue to IBM and have them report it to MS.

    Good luck,
    ~kenny

    "Eric" wrote in message
    news:B8DCFE65-7D6F-45BD-9E7F-94001D92E4A7@microsoft.com...
    > ************************************************** *****************************
    > *
    > *
    > * Bugcheck Analysis
    > *
    > *
    > *
    > ************************************************** *****************************
    >
    > IRQL_NOT_LESS_OR_EQUAL (a)
    > An attempt was made to access a pageable (or completely invalid) address
    > at an
    > interrupt request level (IRQL) that is too high. This is usually
    > caused by drivers using improper addresses.
    > If a kernel debugger is available get the stack backtrace.
    > Arguments:
    > Arg1: 000000000005defd, memory referenced
    > Arg2: 0000000000000002, IRQL
    > Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
    > Arg4: fffff800013e0579, address which referenced memory
    >
    > Debugging Details:
    > ------------------
    >
    > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    >
    > READ_ADDRESS: 000000000005defd
    >
    > CURRENT_IRQL: 2
    >
    > FAULTING_IP:
    > nt!MiFindContiguousMemoryInPool+b9
    > fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
    >
    > DEFAULT_BUCKET_ID: DRIVER_FAULT
    >
    > BUGCHECK_STR: 0xA
    >
    > LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890
    >
    > LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)
    >
    > Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
    > threads
    > Threads: fffffade708fabf0-01<*>
    > 1 total locks, 1 locks currently held
    >
    > FAULTING_THREAD: fffffade708fabf0
    >
    > PNP_TRIAGE:
    > Lock address : 0xfffff800011deb00
    > Thread Count : 1
    > Thread address: 0xfffffade708fabf0
    > Thread wait : 0xee6922c
    >
    > TRAP_FRAME: fffffade5bca8d60 -- (.trap fffffade5bca8d60)
    > NOTE: The trap frame does not contain all registers.
    > Some register values may be zeroed.
    > rax=0000000000000000 rbx=fffffade6e122000 rcx=0000000000112242
    > rdx=00000000001d43ef rsi=0000000000000001 rdi=fffffade5f55d660
    > rip=fffff800013e0579 rsp=fffffade5bca8ef0 rbp=0000000000100000
    > r8=00000000000ffffe r9=0000000000112242 r10=fffffade6e442000
    > r11=00000000000fffff r12=0000000000000000 r13=0000000000000000
    > r14=0000000000000000 r15=0000000000000000
    > iopl=0 nv up ei ng nz ac po cy
    > nt!MiFindContiguousMemoryInPool+0xb9:
    > fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
    > ds:fffffade`6e122010=0000000000000002
    > Resetting default scope
    >
    > STACK_TEXT:
    > fffffade`5bca8bd8 fffff800`0104e5b4 : 00000000`0000000a 00000000`0005defd
    > 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
    > fffffade`5bca8be0 fffff800`0104d587 : 00000006`00000000 fffffa80`033c2090
    > 00000000`00000000 00000000`00000001 : nt!KiBugCheckDispatch+0x74
    > fffffade`5bca8d60 fffff800`013e0579 : 00000000`00000002 00000000`6d436d4d
    > 00000000`00000002 00000000`00001000 : nt!KiPageFault+0x207
    > fffffade`5bca8ef0 fffff800`013797a2 : 00000000`00000000 00000000`000fffff
    > 00000000`00100000 00000000`4d546100 : nt!MiFindContiguousMemoryInPool+0xb9
    > fffffade`5bca8f90 fffff800`010edfee : 00000000`00000004 fffffade`6e27b000
    > fffffade`6e122000 00000000`00000002 : nt!MiFindContiguousMemory+0x52
    > fffffade`5bca8ff0 fffff800`010ee07b : 00000000`00000001 00000000`00000080
    > fffffade`6d2343f0 00000000`ffffffff : nt!MiAllocateContiguousMemory+0x12e
    > fffffade`5bca9070 fffffade`5b0764e2 : 00000000`00000000 00000000`00000000
    > fffffa80`021b2a30 fffffade`6d2343f0 :
    > nt!MmAllocateContiguousMemorySpecifyCache+0x5b
    > fffffade`5bca90b0 fffffade`5b07612d : fffffade`6d2343f0 fffffade`708391b0
    > fffffade`708391b0 fffffade`5bca91f0 :
    > storport!RaidUnitAllocateResources+0x370
    > fffffade`5bca9120 fffffade`5b06c71f : 00000000`00010200 00000000`00010200
    > fffffade`5bca9260 00000000`00000000 : storport!RaidCreateUnit+0x13d
    > fffffade`5bca9180 fffffade`5b06bfbf : 00000000`00000000 00000000`00000000
    > 00000000`00000002 00000000`00010200 :
    > storport!RaidBusEnumeratorGetUnit+0x6f
    > fffffade`5bca91f0 fffffade`5b06946f : fffffade`00fe0200 00000000`00000000
    > 00000000`00000001 00000000`00000002 :
    > storport!RaidBusEnumeratorVisitUnit+0x4f
    > fffffade`5bca92f0 fffffade`5b06957d : 00000000`00000000 fffffade`572c0d6d
    > fffffade`5b87b180 fffffade`708391b0 :
    > storport!RaidAdapterEnumerateBus+0xbf
    > fffffade`5bca9470 fffffade`5b088c8f : fffffade`6d9a62e0 fffffade`6cd14ae0
    > 00000000`a0000003 fffffade`708391b0 : storport!RaidAdapterRescanBus+0x8d
    > fffffade`5bca9530 fffffade`5b08890b : 00000000`00000000 fffffade`6cd14ae0
    > 00000000`00000000 fffffade`708391b0 :
    > storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
    > fffffade`5bca95d0 fffffade`5b089eef : fffffade`70893650 fffffade`5bca9820
    > fffffade`70839060 fffffade`6cd14ae0 : storport!RaidAdapterPnpIrp+0x14b
    > fffffade`5bca96a0 fffffade`5b85d949 : fffffade`5bca97e0 fffffade`70893650
    > fffffade`6cd14ae0 fffffade`6fcf6bb0 : storport!RaDriverPnpIrp+0xcf
    > fffffade`5bca9730 fffff800`0124d573 : fffffade`6cd14ae0 fffffade`5bca9820
    > fffffade`70893500 fffffade`6fcf6bb0 :
    > mpspfltr!MPSPQueryDeviceRelations+0xa9
    > fffffade`5bca9790 fffff800`010dc4c1 : 00000000`00000000 00000000`00000002
    > 00000000`00000000 fffffade`6fcf6a70 : nt!IopSynchronousCall+0x14a
    > fffffade`5bca9800 fffff800`013531ea : fffffade`709019b0 fffff800`01014f00
    > fffffade`5bca98e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
    > fffffade`5bca9890 fffff800`01354e95 : 00000000`000000c3 00000000`00000000
    > 00000000`00000002 fffffade`6f42b2f0 : nt!PipProcessDevNodeTree+0x342
    > fffffade`5bca9c20 fffff800`010d8598 : fffff800`00000003 00000000`00000000
    > fffffade`708fabf0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
    > fffffade`5bca9c70 fffff800`0105507c : 00000000`00000000 fffff800`011decc0
    > fffff800`010d8230 fffffade`708fabf0 : nt!PipDeviceActionWorker+0x368
    > fffffade`5bca9d00 fffff800`01299cae : fffffade`708fabf0 00000000`00000080
    > fffffade`708fabf0 fffffade`5b883680 : nt!ExpWorkerThread+0x13b
    > fffffade`5bca9d70 fffff800`0102bbe6 : fffffade`5b87b180 fffffade`708fabf0
    > fffffade`5b883680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e
    > fffffade`5bca9dd0 00000000`00000000 : 00000000`00000000 00000000`00000000
    > 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
    >
    >
    > STACK_COMMAND: .thread fffffade708fabf0 ; kb
    >
    > FOLLOWUP_IP:
    > storport!RaidUnitAllocateResources+370
    > fffffade`5b0764e2 4885c0 test rax,rax
    >
    > SYMBOL_STACK_INDEX: 7
    >
    > FOLLOWUP_NAME: MachineOwner
    >
    > SYMBOL_NAME: storport!RaidUnitAllocateResources+370
    >
    > MODULE_NAME: storport
    >
    > IMAGE_NAME: storport.sys
    >
    > DEBUG_FLR_IMAGE_TIMESTAMP: 448e954c
    >
    > FAILURE_BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
    >
    > BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
    >
    > Followup: MachineOwner
    > ---------
    >
    > "Kenny Speer" wrote:
    >
    >> Just because storport is in the bugcheck doesn't mean it's the cause. It
    >> may not have handled a buggy miniport correctly, but it's very possible
    >> the
    >> real problem is in a different driver.
    >>
    >> can you paste the !analyze -v output with the microsoft sym server set in
    >> your .sympath
    >>
    >> "Eric" wrote in message
    >> news:805E1809-2AEE-47F5-B7AC-4593B8324122@microsoft.com...
    >> > Right, the BSOD IS caused by a driver--the storport.sys driver in this
    >> > case.
    >> > The reason I was trying to avoid responses that suggest that I update
    >> > drivers and firmware was because I've already done all that as one of
    >> > the
    >> > first troubleshooting steps. Also, I am already on that latest
    >> > storport
    >> > version (5.2.3790.2880 for SP1) you mention in the KB article. I
    >> > mentioned
    >> > all this in my original post.
    >> >
    >> > "Pat [MSFT]" wrote:
    >> >
    >> >> Well, a BSOD is by definition caused by either a bug in a driver or a
    >> >> bug
    >> >> in
    >> >> HW/Firmware. So, your request to not suggest updating driver/firmware
    >> >> may
    >> >> not get you very far.
    >> >>
    >> >> That said, I think you are running into a known bug that was fixed &
    >> >> released in Feb (KB Article 932755). You can download the fix
    >> >> directly
    >> >> via
    >> >> http://support.microsoft.com - make sure to grab the correct package.
    >> >>
    >> >> There is a SP1 & SP2 version of the fix - so you could get relief
    >> >> w/out
    >> >> SP2
    >> >> if you absolutely needed to. I would recommend going to SP2 first b/c
    >> >> there
    >> >> are a number of updates & perf improvements that are just general
    >> >> goodness.
    >> >> Then adding the fix on-top should get you where you need to be.
    >> >>
    >> >> If the problem persists after that, then I would recommend giving
    >> >> support
    >> >> a
    >> >> call. If the issue is a bug, we refund the cost of the incident (or
    >> >> re-credit your account if you have a Premier support contract).
    >> >>
    >> >>
    >> >> Pat
    >> >>
    >> >>
    >> >>
    >> >> "Eric" wrote in message
    >> >> news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
    >> >> >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
    >> >> > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM
    >> >> > xSeries
    >> >> > servers attached using QLogic QLA2340 HBAs. We are using MPIO and
    >> >> > M$s
    >> >> > Storport driver (the latest version, or course) for multipathing on
    >> >> > all
    >> >> > servers. Furthermore, we're using IBMs StorageManager Agents (again,
    >> >> > latest
    >> >> > version) on all hosts. Also part of that SAN is a Quantum PX502
    >> >> > robotic
    >> >> > tape-library which is also Fibre-Channel and attached directly to
    >> >> > the
    >> >> > SAN
    >> >> > (i.e. not physically attached to a server). We are not using any
    >> >> > kind
    >> >> > of
    >> >> > SAN
    >> >> > partitioning, so all hosts attached to the SAN see the tape drives
    >> >> > and
    >> >> > robot.
    >> >> >
    >> >> > Here's what happens. After rebooting the tape library, some or all
    >> >> > of
    >> >> > my
    >> >> > x64
    >> >> > servers BSOD with a 0x0A stop error and your typical
    >> >> > IRQL_NOT_LESS...
    >> >> > message. x86 servers have yet to be affected. Debugging the
    >> >> > resulting
    >> >> > memory
    >> >> > dump shows that storport.sys is the culprit. Additionally, soon
    >> >> > before
    >> >> > the
    >> >> > server BSODs, the system event has log entries from PlugPlayManager
    >> >> > saying
    >> >> > that the tape drives and robot disappeared without being prepared
    >> >> > for
    >> >> > removal
    >> >> > (Event ID 12). Obviously, preparing the hardware for removal on all
    >> >> > my
    >> >> > servers is out of the question, besides, the hardware never shows up
    >> >> > in
    >> >> > the
    >> >> > list of items to be safely removed.
    >> >> >
    >> >> > I'm very aware that SP2 is out for 2k3, and I intend to install that
    >> >> > someday
    >> >> > (once I recover from all the late-night work I've had to put in
    >> >> > dealing
    >> >> > with
    >> >> > this problem); however, I'm not confident that will solve the
    >> >> > problem
    >> >> > since I
    >> >> > will still have the same version of the storport driver.
    >> >> >
    >> >> > So, short of calling M$ and paying for a support incident, any other
    >> >> > bright
    >> >> > ideas? I'd appreciate sparing me of basic "update firmware" "update
    >> >> > driver"
    >> >> > suggestions as those are obvious and already done.
    >> >>
    >> >>

    >>



  7. Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted

    I uninstalled SANSurfer, turned on driver verifier for the drivers you
    listed, and rebooted. Then I rebooted the tape library which in turn BSODs
    my hosts as expected. However, I got a 0xC5 BSOD on one host and a 0xD1 BSOD
    on the other. Below is the bugcheck analysis on the kernel dump of the host
    that got the 0xC5 after driver verifier was turned on. I'm thoroughly lost
    at this point. Any suggestions now?

    DRIVER_CORRUPTED_EXPOOL (c5)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high. This is
    caused by drivers that have corrupted the system pool. Run the driver
    verifier against any new (or suspect) drivers, and if that doesn't turn up
    the culprit, then use gflags to enable special pool.
    Arguments:
    Arg1: 00000000000b5430, memory referenced
    Arg2: 0000000000000002, IRQL
    Arg3: 0000000000000001, value 0 = read operation, 1 = write operation
    Arg4: fffff800011abd85, address which referenced memory

    Debugging Details:
    ------------------

    *** Error in in reading nt!_ETHREAD @ 0000000000000000
    *** Error in in reading nt!_ETHREAD @ 0000000000000000
    *** Error in in reading nt!_ETHREAD @ 0000000000000000

    OVERLAPPED_MODULE: Address regions for 'Cdfs' and 'imapi.sys' overlap

    BUGCHECK_STR: 0xC5_2

    CURRENT_IRQL: 2

    FAULTING_IP:
    nt!ExAllocatePoolWithTag+c8d
    fffff800`011abd85 48897008 mov [rax+0x8],rsi

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890

    LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)

    Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
    threads
    Threads: fffffade708987a0-01<*>
    1 total locks, 1 locks currently held

    FAULTING_THREAD: fffffade708987a0

    PNP_TRIAGE:
    Lock address : 0xfffff800011deb00
    Thread Count : 1
    Thread address: 0xfffffade708987a0
    Thread wait : 0x2fb7

    TRAP_FRAME: fffffade5be93b70 -- (.trap fffffade5be93b70)
    NOTE: The trap frame does not contain all registers.
    Some register values may be zeroed.
    rax=00000000000b5428 rbx=00000000000000a0 rcx=fffffade6e74bd60
    rdx=0000000000000000 rsi=fffffade5be93ed0 rdi=0000000000000001
    rip=fffff800011abd85 rsp=fffffade5be93d00 rbp=0000000000000000
    r8=0000000000000000 r9=000000000000000b r10=00000000000000b0
    r11=0000000000000001 r12=0000000000000000 r13=0000000000000000
    r14=0000000000000000 r15=0000000000000000
    iopl=0 nv up ei pl nz ac pe nc
    nt!ExAllocatePoolWithTag+0xc8d:
    fffff800`011abd85 48897008 mov [rax+0x8],rsi
    ds:0002:00000000`000b5430=????????????????
    Resetting default scope

    STACK_TEXT:
    fffffade`5be939e8 fffff800`0104e5b4 : 00000000`0000000a 00000000`000b5430
    00000000`00000002 00000000`00000001 : nt!KeBugCheckEx
    fffffade`5be939f0 fffff800`0104d587 : 00000000`00000000 00000000`00000000
    00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x74
    fffffade`5be93b70 fffff800`011abd85 : 00000000`00000000 00000000`00000000
    00000000`00000000 00000000`00000000 : nt!KiPageFault+0x207
    fffffade`5be93d00 fffff800`013f77a2 : 00000000`00000000 fffff800`008080be
    fffffade`5b236480 fffffaae`3dad8ee0 : nt!ExAllocatePoolWithTag+0xc8d
    fffffade`5be93dd0 fffff800`013ed1d1 : 00000000`0002625a fffffaae`3dad8ee0
    fffffade`5b235e6c fffffade`7085f060 : nt!VfIrpReserveCallStackData+0x52
    fffffade`5be93e00 fffffade`5b235e6c : fffffade`5be93e70 fffffaae`3dad8ee0
    fffffadf`f77fe000 fffffade`7085f060 : nt!IovCallDriver+0x131
    fffffade`5be93e70 fffffade`5b2372a1 : 00000000`00000000 fffffade`5be943b0
    fffffade`5be94190 fffffade`5be94240 :
    storport!RaidBusEnumeratorIssueSynchronousRequest+ 0x14c
    fffffade`5be93ff0 fffffade`5b236ed3 : fffffade`5be943b8 fffffade`5b241222
    fffffade`00000000 fffffade`5be94190 :
    storport!RaidBusEnumeratorIssueReportLuns+0x131
    fffffade`5be94070 fffffade`5b23779f : fffffade`7085f1b0 00000000`00000000
    00000000`00000000 00000000`00000000 :
    storport!RaidBusEnumeratorGetLunListFromTarget+0x1 43
    fffffade`5be94150 fffffade`5b2326ac : 00000000`00000000 00000000`00000000
    00000000`00000000 fffffade`708791b0 :
    storport!RaidBusEnumeratorGetLunList+0x6f
    fffffade`5be94210 fffffade`5b2327dd : 00000000`0000000a 00000000`00000000
    00000000`00000006 fffffade`708791b0 : storport!RaidAdapterEnumerateBus+0x9c
    fffffade`5be94390 fffffade`5b284c8f : fffff800`00000006 fffffaae`3df3ae10
    fffffade`708ac480 fffffade`708791b0 : storport!RaidAdapterRescanBus+0x8d
    fffffade`5be94450 fffffade`5b28490b : 00000282`00180000 fffffaae`3df3ae10
    00000000`00000000 fffffade`708791b0 :
    storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
    fffffade`5be944f0 fffffade`5b285f4f : fffffade`6e749810 fffffade`5be94680
    fffffade`70879060 fffffaae`3df3ae10 : storport!RaidAdapterPnpIrp+0x14b
    fffffade`5be945c0 fffff800`013ed255 : fffffaae`3df3ae10 fffffade`6e749810
    fffffaae`3df3ae10 fffffade`70879060 : storport!RaDriverPnpIrp+0xcf
    fffffade`5be94650 fffffade`5ba5d949 : fffffade`6eb723e0 fffffaae`3df3ae10
    00000000`00000000 fffffade`70dd4df0 : nt!IovCallDriver+0x1b5
    fffffade`5be946c0 fffff800`013ed255 : fffffade`6f749610 fffffade`5be94750
    fffffaae`3df3ae10 fffff800`0124d573 : mpspfltr!MPSPQueryDeviceRelations+0xa9
    fffffade`5be94720 fffff800`0124d573 : 00000000`00000004 fffffaae`3df3ae10
    00000000`00000000 fffffaae`3df3ae10 : nt!IovCallDriver+0x1b5
    fffffade`5be94790 fffff800`010dc4c1 : 00000000`00000000 00000000`00000002
    00000000`00000000 fffffade`708abc70 : nt!IopSynchronousCall+0x14a
    fffffade`5be94800 fffff800`013531ea : fffffa80`007793f0 fffffade`708aca00
    fffffade`5be948e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
    fffffade`5be94890 fffff800`01354e95 : 00000000`00000576 00000000`00000000
    00000000`00000002 fffffade`6f03b850 : nt!PipProcessDevNodeTree+0x342
    fffffade`5be94c20 fffff800`010d8598 : fffff800`00000003 00000000`00000000
    fffffade`708987a0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
    fffffade`5be94c70 fffff800`0105507c : 00000000`00000000 fffff800`011decc0
    fffff800`010d8230 fffffade`708987a0 : nt!PipDeviceActionWorker+0x368
    fffffade`5be94d00 fffff800`01299cae : fffffade`708987a0 00000000`00000080
    fffffade`708987a0 fffffade`5baa3680 : nt!ExpWorkerThread+0x13b
    fffffade`5be94d70 fffff800`0102bbe6 : fffffade`5ba9b180 fffffade`708987a0
    fffffade`5baa3680 fffff800`011b6dc0 : nt!PspSystemThreadStartup+0x3e
    fffffade`5be94dd0 00000000`00000000 : 00000000`00000000 00000000`00000000
    00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16


    STACK_COMMAND: .thread fffffade708987a0 ; kb

    FOLLOWUP_IP:
    storport!RaidBusEnumeratorIssueSynchronousRequest+ 14c
    fffffade`5b235e6c 3d03010000 cmp eax,0x103

    SYMBOL_STACK_INDEX: 6

    FOLLOWUP_NAME: MachineOwner

    SYMBOL_NAME: storport!RaidBusEnumeratorIssueSynchronousRequest+ 14c

    MODULE_NAME: storport

    IMAGE_NAME: storport.sys

    DEBUG_FLR_IMAGE_TIMESTAMP: 45d0a33d

    FAILURE_BUCKET_ID:
    X64_0xC5_2_storport!RaidBusEnumeratorIssueSynchron ousRequest+14c

    BUCKET_ID: X64_0xC5_2_storport!RaidBusEnumeratorIssueSynchron ousRequest+14c

    Followup: MachineOwner

    "Kenny Speer" wrote:

    > Unfortunately, all the analyze shows is that while at DISPATCH_LEVEL (irql
    > 2) the system attempted to read a bogus pointer, most likely it's simply
    > uninitialized (based on the garbage address 5defd) ...
    >
    > If you can reproduce this easily, I suggest you enable driver verifier on
    > the mpio, storport, and miniport drivers.
    >
    > Also, if you have SANSurfer installed, please uninstall it. There is a
    > kernel mode service which queries the devices on the SAN and can definetely
    > cause issues.
    >
    > To enable verifier do this:
    > 1. start->run->verifier
    > 2. choose "Create standard settings"
    > 3. choose "Select driver names"
    > 4. check the following: mpdev.sys mpio.sys mpspfltr.sys ql2300.sys
    > storport.sys
    > 5. reboot
    >
    > Then reboot your tape library. Many times, when verifier is running it will
    > catch issues earlier than the bugcheck will and should be much more
    > accurate. Driver Verifier will still BSOD your host, but the dump will
    > contain better info. You also most likely don't need a complete memory
    > dump, kernel dump should be sufficient.
    >
    > Since it seems like you are an IBM shop, you should be able to report this
    > issue to IBM and have them report it to MS.
    >
    > Good luck,
    > ~kenny
    >
    > "Eric" wrote in message
    > news:B8DCFE65-7D6F-45BD-9E7F-94001D92E4A7@microsoft.com...
    > > ************************************************** *****************************
    > > *
    > > *
    > > * Bugcheck Analysis
    > > *
    > > *
    > > *
    > > ************************************************** *****************************
    > >
    > > IRQL_NOT_LESS_OR_EQUAL (a)
    > > An attempt was made to access a pageable (or completely invalid) address
    > > at an
    > > interrupt request level (IRQL) that is too high. This is usually
    > > caused by drivers using improper addresses.
    > > If a kernel debugger is available get the stack backtrace.
    > > Arguments:
    > > Arg1: 000000000005defd, memory referenced
    > > Arg2: 0000000000000002, IRQL
    > > Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
    > > Arg4: fffff800013e0579, address which referenced memory
    > >
    > > Debugging Details:
    > > ------------------
    > >
    > > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    > > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    > > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    > >
    > > READ_ADDRESS: 000000000005defd
    > >
    > > CURRENT_IRQL: 2
    > >
    > > FAULTING_IP:
    > > nt!MiFindContiguousMemoryInPool+b9
    > > fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
    > >
    > > DEFAULT_BUCKET_ID: DRIVER_FAULT
    > >
    > > BUGCHECK_STR: 0xA
    > >
    > > LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890
    > >
    > > LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)
    > >
    > > Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
    > > threads
    > > Threads: fffffade708fabf0-01<*>
    > > 1 total locks, 1 locks currently held
    > >
    > > FAULTING_THREAD: fffffade708fabf0
    > >
    > > PNP_TRIAGE:
    > > Lock address : 0xfffff800011deb00
    > > Thread Count : 1
    > > Thread address: 0xfffffade708fabf0
    > > Thread wait : 0xee6922c
    > >
    > > TRAP_FRAME: fffffade5bca8d60 -- (.trap fffffade5bca8d60)
    > > NOTE: The trap frame does not contain all registers.
    > > Some register values may be zeroed.
    > > rax=0000000000000000 rbx=fffffade6e122000 rcx=0000000000112242
    > > rdx=00000000001d43ef rsi=0000000000000001 rdi=fffffade5f55d660
    > > rip=fffff800013e0579 rsp=fffffade5bca8ef0 rbp=0000000000100000
    > > r8=00000000000ffffe r9=0000000000112242 r10=fffffade6e442000
    > > r11=00000000000fffff r12=0000000000000000 r13=0000000000000000
    > > r14=0000000000000000 r15=0000000000000000
    > > iopl=0 nv up ei ng nz ac po cy
    > > nt!MiFindContiguousMemoryInPool+0xb9:
    > > fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
    > > ds:fffffade`6e122010=0000000000000002
    > > Resetting default scope
    > >
    > > STACK_TEXT:
    > > fffffade`5bca8bd8 fffff800`0104e5b4 : 00000000`0000000a 00000000`0005defd
    > > 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
    > > fffffade`5bca8be0 fffff800`0104d587 : 00000006`00000000 fffffa80`033c2090
    > > 00000000`00000000 00000000`00000001 : nt!KiBugCheckDispatch+0x74
    > > fffffade`5bca8d60 fffff800`013e0579 : 00000000`00000002 00000000`6d436d4d
    > > 00000000`00000002 00000000`00001000 : nt!KiPageFault+0x207
    > > fffffade`5bca8ef0 fffff800`013797a2 : 00000000`00000000 00000000`000fffff
    > > 00000000`00100000 00000000`4d546100 : nt!MiFindContiguousMemoryInPool+0xb9
    > > fffffade`5bca8f90 fffff800`010edfee : 00000000`00000004 fffffade`6e27b000
    > > fffffade`6e122000 00000000`00000002 : nt!MiFindContiguousMemory+0x52
    > > fffffade`5bca8ff0 fffff800`010ee07b : 00000000`00000001 00000000`00000080
    > > fffffade`6d2343f0 00000000`ffffffff : nt!MiAllocateContiguousMemory+0x12e
    > > fffffade`5bca9070 fffffade`5b0764e2 : 00000000`00000000 00000000`00000000
    > > fffffa80`021b2a30 fffffade`6d2343f0 :
    > > nt!MmAllocateContiguousMemorySpecifyCache+0x5b
    > > fffffade`5bca90b0 fffffade`5b07612d : fffffade`6d2343f0 fffffade`708391b0
    > > fffffade`708391b0 fffffade`5bca91f0 :
    > > storport!RaidUnitAllocateResources+0x370
    > > fffffade`5bca9120 fffffade`5b06c71f : 00000000`00010200 00000000`00010200
    > > fffffade`5bca9260 00000000`00000000 : storport!RaidCreateUnit+0x13d
    > > fffffade`5bca9180 fffffade`5b06bfbf : 00000000`00000000 00000000`00000000
    > > 00000000`00000002 00000000`00010200 :
    > > storport!RaidBusEnumeratorGetUnit+0x6f
    > > fffffade`5bca91f0 fffffade`5b06946f : fffffade`00fe0200 00000000`00000000
    > > 00000000`00000001 00000000`00000002 :
    > > storport!RaidBusEnumeratorVisitUnit+0x4f
    > > fffffade`5bca92f0 fffffade`5b06957d : 00000000`00000000 fffffade`572c0d6d
    > > fffffade`5b87b180 fffffade`708391b0 :
    > > storport!RaidAdapterEnumerateBus+0xbf
    > > fffffade`5bca9470 fffffade`5b088c8f : fffffade`6d9a62e0 fffffade`6cd14ae0
    > > 00000000`a0000003 fffffade`708391b0 : storport!RaidAdapterRescanBus+0x8d
    > > fffffade`5bca9530 fffffade`5b08890b : 00000000`00000000 fffffade`6cd14ae0
    > > 00000000`00000000 fffffade`708391b0 :
    > > storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
    > > fffffade`5bca95d0 fffffade`5b089eef : fffffade`70893650 fffffade`5bca9820
    > > fffffade`70839060 fffffade`6cd14ae0 : storport!RaidAdapterPnpIrp+0x14b
    > > fffffade`5bca96a0 fffffade`5b85d949 : fffffade`5bca97e0 fffffade`70893650
    > > fffffade`6cd14ae0 fffffade`6fcf6bb0 : storport!RaDriverPnpIrp+0xcf
    > > fffffade`5bca9730 fffff800`0124d573 : fffffade`6cd14ae0 fffffade`5bca9820
    > > fffffade`70893500 fffffade`6fcf6bb0 :
    > > mpspfltr!MPSPQueryDeviceRelations+0xa9
    > > fffffade`5bca9790 fffff800`010dc4c1 : 00000000`00000000 00000000`00000002
    > > 00000000`00000000 fffffade`6fcf6a70 : nt!IopSynchronousCall+0x14a
    > > fffffade`5bca9800 fffff800`013531ea : fffffade`709019b0 fffff800`01014f00
    > > fffffade`5bca98e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
    > > fffffade`5bca9890 fffff800`01354e95 : 00000000`000000c3 00000000`00000000
    > > 00000000`00000002 fffffade`6f42b2f0 : nt!PipProcessDevNodeTree+0x342
    > > fffffade`5bca9c20 fffff800`010d8598 : fffff800`00000003 00000000`00000000
    > > fffffade`708fabf0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
    > > fffffade`5bca9c70 fffff800`0105507c : 00000000`00000000 fffff800`011decc0
    > > fffff800`010d8230 fffffade`708fabf0 : nt!PipDeviceActionWorker+0x368
    > > fffffade`5bca9d00 fffff800`01299cae : fffffade`708fabf0 00000000`00000080
    > > fffffade`708fabf0 fffffade`5b883680 : nt!ExpWorkerThread+0x13b
    > > fffffade`5bca9d70 fffff800`0102bbe6 : fffffade`5b87b180 fffffade`708fabf0
    > > fffffade`5b883680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e
    > > fffffade`5bca9dd0 00000000`00000000 : 00000000`00000000 00000000`00000000
    > > 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
    > >
    > >
    > > STACK_COMMAND: .thread fffffade708fabf0 ; kb
    > >
    > > FOLLOWUP_IP:
    > > storport!RaidUnitAllocateResources+370
    > > fffffade`5b0764e2 4885c0 test rax,rax
    > >
    > > SYMBOL_STACK_INDEX: 7
    > >
    > > FOLLOWUP_NAME: MachineOwner
    > >
    > > SYMBOL_NAME: storport!RaidUnitAllocateResources+370
    > >
    > > MODULE_NAME: storport
    > >
    > > IMAGE_NAME: storport.sys
    > >
    > > DEBUG_FLR_IMAGE_TIMESTAMP: 448e954c
    > >
    > > FAILURE_BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
    > >
    > > BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
    > >
    > > Followup: MachineOwner
    > > ---------
    > >
    > > "Kenny Speer" wrote:
    > >
    > >> Just because storport is in the bugcheck doesn't mean it's the cause. It
    > >> may not have handled a buggy miniport correctly, but it's very possible
    > >> the
    > >> real problem is in a different driver.
    > >>
    > >> can you paste the !analyze -v output with the microsoft sym server set in
    > >> your .sympath
    > >>
    > >> "Eric" wrote in message
    > >> news:805E1809-2AEE-47F5-B7AC-4593B8324122@microsoft.com...
    > >> > Right, the BSOD IS caused by a driver--the storport.sys driver in this
    > >> > case.
    > >> > The reason I was trying to avoid responses that suggest that I update
    > >> > drivers and firmware was because I've already done all that as one of
    > >> > the
    > >> > first troubleshooting steps. Also, I am already on that latest
    > >> > storport
    > >> > version (5.2.3790.2880 for SP1) you mention in the KB article. I
    > >> > mentioned
    > >> > all this in my original post.
    > >> >
    > >> > "Pat [MSFT]" wrote:
    > >> >
    > >> >> Well, a BSOD is by definition caused by either a bug in a driver or a
    > >> >> bug
    > >> >> in
    > >> >> HW/Firmware. So, your request to not suggest updating driver/firmware
    > >> >> may
    > >> >> not get you very far.
    > >> >>
    > >> >> That said, I think you are running into a known bug that was fixed &
    > >> >> released in Feb (KB Article 932755). You can download the fix
    > >> >> directly
    > >> >> via
    > >> >> http://support.microsoft.com - make sure to grab the correct package.
    > >> >>
    > >> >> There is a SP1 & SP2 version of the fix - so you could get relief
    > >> >> w/out
    > >> >> SP2
    > >> >> if you absolutely needed to. I would recommend going to SP2 first b/c
    > >> >> there
    > >> >> are a number of updates & perf improvements that are just general
    > >> >> goodness.
    > >> >> Then adding the fix on-top should get you where you need to be.
    > >> >>
    > >> >> If the problem persists after that, then I would recommend giving
    > >> >> support
    > >> >> a
    > >> >> call. If the issue is a bug, we refund the cost of the incident (or
    > >> >> re-credit your account if you have a Premier support contract).
    > >> >>
    > >> >>
    > >> >> Pat
    > >> >>
    > >> >>
    > >> >>
    > >> >> "Eric" wrote in message
    > >> >> news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
    > >> >> >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
    > >> >> > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM
    > >> >> > xSeries
    > >> >> > servers attached using QLogic QLA2340 HBAs. We are using MPIO and
    > >> >> > M$s
    > >> >> > Storport driver (the latest version, or course) for multipathing on
    > >> >> > all
    > >> >> > servers. Furthermore, we're using IBMs StorageManager Agents (again,
    > >> >> > latest
    > >> >> > version) on all hosts. Also part of that SAN is a Quantum PX502
    > >> >> > robotic
    > >> >> > tape-library which is also Fibre-Channel and attached directly to
    > >> >> > the
    > >> >> > SAN
    > >> >> > (i.e. not physically attached to a server). We are not using any
    > >> >> > kind
    > >> >> > of
    > >> >> > SAN
    > >> >> > partitioning, so all hosts attached to the SAN see the tape drives
    > >> >> > and
    > >> >> > robot.
    > >> >> >
    > >> >> > Here's what happens. After rebooting the tape library, some or all
    > >> >> > of
    > >> >> > my
    > >> >> > x64
    > >> >> > servers BSOD with a 0x0A stop error and your typical
    > >> >> > IRQL_NOT_LESS...
    > >> >> > message. x86 servers have yet to be affected. Debugging the
    > >> >> > resulting
    > >> >> > memory
    > >> >> > dump shows that storport.sys is the culprit. Additionally, soon
    > >> >> > before
    > >> >> > the
    > >> >> > server BSODs, the system event has log entries from PlugPlayManager
    > >> >> > saying
    > >> >> > that the tape drives and robot disappeared without being prepared
    > >> >> > for
    > >> >> > removal
    > >> >> > (Event ID 12). Obviously, preparing the hardware for removal on all
    > >> >> > my
    > >> >> > servers is out of the question, besides, the hardware never shows up
    > >> >> > in
    > >> >> > the
    > >> >> > list of items to be safely removed.
    > >> >> >
    > >> >> > I'm very aware that SP2 is out for 2k3, and I intend to install that
    > >> >> > someday
    > >> >> > (once I recover from all the late-night work I've had to put in
    > >> >> > dealing


  8. Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted

    I was hoping that verifier running would catch any potentially buggy
    miniport issues but instead, you've only confirmed this appears to be a
    storport bug. If you don't want to report it to MS PSS, then report it to
    IBM and have them report it. MS can take the .dmp and most likely already
    have a fix or at least are aware of the issue.

    It doesn't appear there is much more you can do. Yell at your vendor and
    make them help you.

    Meanwhile, zone out any hosts which do not need access to the tape libraries
    so you can save them from hitting this issue.

    ~kenny

    "Eric" wrote in message
    news:F7F57AEC-85A3-4AE1-B53E-9F4886440AE4@microsoft.com...
    >I uninstalled SANSurfer, turned on driver verifier for the drivers you
    > listed, and rebooted. Then I rebooted the tape library which in turn
    > BSODs
    > my hosts as expected. However, I got a 0xC5 BSOD on one host and a 0xD1
    > BSOD
    > on the other. Below is the bugcheck analysis on the kernel dump of the
    > host
    > that got the 0xC5 after driver verifier was turned on. I'm thoroughly
    > lost
    > at this point. Any suggestions now?
    >
    > DRIVER_CORRUPTED_EXPOOL (c5)
    > An attempt was made to access a pageable (or completely invalid) address
    > at an
    > interrupt request level (IRQL) that is too high. This is
    > caused by drivers that have corrupted the system pool. Run the driver
    > verifier against any new (or suspect) drivers, and if that doesn't turn up
    > the culprit, then use gflags to enable special pool.
    > Arguments:
    > Arg1: 00000000000b5430, memory referenced
    > Arg2: 0000000000000002, IRQL
    > Arg3: 0000000000000001, value 0 = read operation, 1 = write operation
    > Arg4: fffff800011abd85, address which referenced memory
    >
    > Debugging Details:
    > ------------------
    >
    > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    >
    > OVERLAPPED_MODULE: Address regions for 'Cdfs' and 'imapi.sys' overlap
    >
    > BUGCHECK_STR: 0xC5_2
    >
    > CURRENT_IRQL: 2
    >
    > FAULTING_IP:
    > nt!ExAllocatePoolWithTag+c8d
    > fffff800`011abd85 48897008 mov [rax+0x8],rsi
    >
    > DEFAULT_BUCKET_ID: DRIVER_FAULT
    >
    > LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890
    >
    > LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)
    >
    > Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
    > threads
    > Threads: fffffade708987a0-01<*>
    > 1 total locks, 1 locks currently held
    >
    > FAULTING_THREAD: fffffade708987a0
    >
    > PNP_TRIAGE:
    > Lock address : 0xfffff800011deb00
    > Thread Count : 1
    > Thread address: 0xfffffade708987a0
    > Thread wait : 0x2fb7
    >
    > TRAP_FRAME: fffffade5be93b70 -- (.trap fffffade5be93b70)
    > NOTE: The trap frame does not contain all registers.
    > Some register values may be zeroed.
    > rax=00000000000b5428 rbx=00000000000000a0 rcx=fffffade6e74bd60
    > rdx=0000000000000000 rsi=fffffade5be93ed0 rdi=0000000000000001
    > rip=fffff800011abd85 rsp=fffffade5be93d00 rbp=0000000000000000
    > r8=0000000000000000 r9=000000000000000b r10=00000000000000b0
    > r11=0000000000000001 r12=0000000000000000 r13=0000000000000000
    > r14=0000000000000000 r15=0000000000000000
    > iopl=0 nv up ei pl nz ac pe nc
    > nt!ExAllocatePoolWithTag+0xc8d:
    > fffff800`011abd85 48897008 mov [rax+0x8],rsi
    > ds:0002:00000000`000b5430=????????????????
    > Resetting default scope
    >
    > STACK_TEXT:
    > fffffade`5be939e8 fffff800`0104e5b4 : 00000000`0000000a 00000000`000b5430
    > 00000000`00000002 00000000`00000001 : nt!KeBugCheckEx
    > fffffade`5be939f0 fffff800`0104d587 : 00000000`00000000 00000000`00000000
    > 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x74
    > fffffade`5be93b70 fffff800`011abd85 : 00000000`00000000 00000000`00000000
    > 00000000`00000000 00000000`00000000 : nt!KiPageFault+0x207
    > fffffade`5be93d00 fffff800`013f77a2 : 00000000`00000000 fffff800`008080be
    > fffffade`5b236480 fffffaae`3dad8ee0 : nt!ExAllocatePoolWithTag+0xc8d
    > fffffade`5be93dd0 fffff800`013ed1d1 : 00000000`0002625a fffffaae`3dad8ee0
    > fffffade`5b235e6c fffffade`7085f060 : nt!VfIrpReserveCallStackData+0x52
    > fffffade`5be93e00 fffffade`5b235e6c : fffffade`5be93e70 fffffaae`3dad8ee0
    > fffffadf`f77fe000 fffffade`7085f060 : nt!IovCallDriver+0x131
    > fffffade`5be93e70 fffffade`5b2372a1 : 00000000`00000000 fffffade`5be943b0
    > fffffade`5be94190 fffffade`5be94240 :
    > storport!RaidBusEnumeratorIssueSynchronousRequest+ 0x14c
    > fffffade`5be93ff0 fffffade`5b236ed3 : fffffade`5be943b8 fffffade`5b241222
    > fffffade`00000000 fffffade`5be94190 :
    > storport!RaidBusEnumeratorIssueReportLuns+0x131
    > fffffade`5be94070 fffffade`5b23779f : fffffade`7085f1b0 00000000`00000000
    > 00000000`00000000 00000000`00000000 :
    > storport!RaidBusEnumeratorGetLunListFromTarget+0x1 43
    > fffffade`5be94150 fffffade`5b2326ac : 00000000`00000000 00000000`00000000
    > 00000000`00000000 fffffade`708791b0 :
    > storport!RaidBusEnumeratorGetLunList+0x6f
    > fffffade`5be94210 fffffade`5b2327dd : 00000000`0000000a 00000000`00000000
    > 00000000`00000006 fffffade`708791b0 :
    > storport!RaidAdapterEnumerateBus+0x9c
    > fffffade`5be94390 fffffade`5b284c8f : fffff800`00000006 fffffaae`3df3ae10
    > fffffade`708ac480 fffffade`708791b0 : storport!RaidAdapterRescanBus+0x8d
    > fffffade`5be94450 fffffade`5b28490b : 00000282`00180000 fffffaae`3df3ae10
    > 00000000`00000000 fffffade`708791b0 :
    > storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
    > fffffade`5be944f0 fffffade`5b285f4f : fffffade`6e749810 fffffade`5be94680
    > fffffade`70879060 fffffaae`3df3ae10 : storport!RaidAdapterPnpIrp+0x14b
    > fffffade`5be945c0 fffff800`013ed255 : fffffaae`3df3ae10 fffffade`6e749810
    > fffffaae`3df3ae10 fffffade`70879060 : storport!RaDriverPnpIrp+0xcf
    > fffffade`5be94650 fffffade`5ba5d949 : fffffade`6eb723e0 fffffaae`3df3ae10
    > 00000000`00000000 fffffade`70dd4df0 : nt!IovCallDriver+0x1b5
    > fffffade`5be946c0 fffff800`013ed255 : fffffade`6f749610 fffffade`5be94750
    > fffffaae`3df3ae10 fffff800`0124d573 :
    > mpspfltr!MPSPQueryDeviceRelations+0xa9
    > fffffade`5be94720 fffff800`0124d573 : 00000000`00000004 fffffaae`3df3ae10
    > 00000000`00000000 fffffaae`3df3ae10 : nt!IovCallDriver+0x1b5
    > fffffade`5be94790 fffff800`010dc4c1 : 00000000`00000000 00000000`00000002
    > 00000000`00000000 fffffade`708abc70 : nt!IopSynchronousCall+0x14a
    > fffffade`5be94800 fffff800`013531ea : fffffa80`007793f0 fffffade`708aca00
    > fffffade`5be948e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
    > fffffade`5be94890 fffff800`01354e95 : 00000000`00000576 00000000`00000000
    > 00000000`00000002 fffffade`6f03b850 : nt!PipProcessDevNodeTree+0x342
    > fffffade`5be94c20 fffff800`010d8598 : fffff800`00000003 00000000`00000000
    > fffffade`708987a0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
    > fffffade`5be94c70 fffff800`0105507c : 00000000`00000000 fffff800`011decc0
    > fffff800`010d8230 fffffade`708987a0 : nt!PipDeviceActionWorker+0x368
    > fffffade`5be94d00 fffff800`01299cae : fffffade`708987a0 00000000`00000080
    > fffffade`708987a0 fffffade`5baa3680 : nt!ExpWorkerThread+0x13b
    > fffffade`5be94d70 fffff800`0102bbe6 : fffffade`5ba9b180 fffffade`708987a0
    > fffffade`5baa3680 fffff800`011b6dc0 : nt!PspSystemThreadStartup+0x3e
    > fffffade`5be94dd0 00000000`00000000 : 00000000`00000000 00000000`00000000
    > 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
    >
    >
    > STACK_COMMAND: .thread fffffade708987a0 ; kb
    >
    > FOLLOWUP_IP:
    > storport!RaidBusEnumeratorIssueSynchronousRequest+ 14c
    > fffffade`5b235e6c 3d03010000 cmp eax,0x103
    >
    > SYMBOL_STACK_INDEX: 6
    >
    > FOLLOWUP_NAME: MachineOwner
    >
    > SYMBOL_NAME: storport!RaidBusEnumeratorIssueSynchronousRequest+ 14c
    >
    > MODULE_NAME: storport
    >
    > IMAGE_NAME: storport.sys
    >
    > DEBUG_FLR_IMAGE_TIMESTAMP: 45d0a33d
    >
    > FAILURE_BUCKET_ID:
    > X64_0xC5_2_storport!RaidBusEnumeratorIssueSynchron ousRequest+14c
    >
    > BUCKET_ID:
    > X64_0xC5_2_storport!RaidBusEnumeratorIssueSynchron ousRequest+14c
    >
    > Followup: MachineOwner
    >
    > "Kenny Speer" wrote:
    >
    >> Unfortunately, all the analyze shows is that while at DISPATCH_LEVEL
    >> (irql
    >> 2) the system attempted to read a bogus pointer, most likely it's simply
    >> uninitialized (based on the garbage address 5defd) ...
    >>
    >> If you can reproduce this easily, I suggest you enable driver verifier on
    >> the mpio, storport, and miniport drivers.
    >>
    >> Also, if you have SANSurfer installed, please uninstall it. There is a
    >> kernel mode service which queries the devices on the SAN and can
    >> definetely
    >> cause issues.
    >>
    >> To enable verifier do this:
    >> 1. start->run->verifier
    >> 2. choose "Create standard settings"
    >> 3. choose "Select driver names"
    >> 4. check the following: mpdev.sys mpio.sys mpspfltr.sys ql2300.sys
    >> storport.sys
    >> 5. reboot
    >>
    >> Then reboot your tape library. Many times, when verifier is running it
    >> will
    >> catch issues earlier than the bugcheck will and should be much more
    >> accurate. Driver Verifier will still BSOD your host, but the dump will
    >> contain better info. You also most likely don't need a complete memory
    >> dump, kernel dump should be sufficient.
    >>
    >> Since it seems like you are an IBM shop, you should be able to report
    >> this
    >> issue to IBM and have them report it to MS.
    >>
    >> Good luck,
    >> ~kenny
    >>
    >> "Eric" wrote in message
    >> news:B8DCFE65-7D6F-45BD-9E7F-94001D92E4A7@microsoft.com...
    >> > ************************************************** *****************************
    >> > *
    >> > *
    >> > * Bugcheck Analysis
    >> > *
    >> > *
    >> > *
    >> > ************************************************** *****************************
    >> >
    >> > IRQL_NOT_LESS_OR_EQUAL (a)
    >> > An attempt was made to access a pageable (or completely invalid)
    >> > address
    >> > at an
    >> > interrupt request level (IRQL) that is too high. This is usually
    >> > caused by drivers using improper addresses.
    >> > If a kernel debugger is available get the stack backtrace.
    >> > Arguments:
    >> > Arg1: 000000000005defd, memory referenced
    >> > Arg2: 0000000000000002, IRQL
    >> > Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
    >> > Arg4: fffff800013e0579, address which referenced memory
    >> >
    >> > Debugging Details:
    >> > ------------------
    >> >
    >> > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    >> > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    >> > *** Error in in reading nt!_ETHREAD @ 0000000000000000
    >> >
    >> > READ_ADDRESS: 000000000005defd
    >> >
    >> > CURRENT_IRQL: 2
    >> >
    >> > FAULTING_IP:
    >> > nt!MiFindContiguousMemoryInPool+b9
    >> > fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
    >> >
    >> > DEFAULT_BUCKET_ID: DRIVER_FAULT
    >> >
    >> > BUGCHECK_STR: 0xA
    >> >
    >> > LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890
    >> >
    >> > LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)
    >> >
    >> > Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
    >> > threads
    >> > Threads: fffffade708fabf0-01<*>
    >> > 1 total locks, 1 locks currently held
    >> >
    >> > FAULTING_THREAD: fffffade708fabf0
    >> >
    >> > PNP_TRIAGE:
    >> > Lock address : 0xfffff800011deb00
    >> > Thread Count : 1
    >> > Thread address: 0xfffffade708fabf0
    >> > Thread wait : 0xee6922c
    >> >
    >> > TRAP_FRAME: fffffade5bca8d60 -- (.trap fffffade5bca8d60)
    >> > NOTE: The trap frame does not contain all registers.
    >> > Some register values may be zeroed.
    >> > rax=0000000000000000 rbx=fffffade6e122000 rcx=0000000000112242
    >> > rdx=00000000001d43ef rsi=0000000000000001 rdi=fffffade5f55d660
    >> > rip=fffff800013e0579 rsp=fffffade5bca8ef0 rbp=0000000000100000
    >> > r8=00000000000ffffe r9=0000000000112242 r10=fffffade6e442000
    >> > r11=00000000000fffff r12=0000000000000000 r13=0000000000000000
    >> > r14=0000000000000000 r15=0000000000000000
    >> > iopl=0 nv up ei ng nz ac po cy
    >> > nt!MiFindContiguousMemoryInPool+0xb9:
    >> > fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
    >> > ds:fffffade`6e122010=0000000000000002
    >> > Resetting default scope
    >> >
    >> > STACK_TEXT:
    >> > fffffade`5bca8bd8 fffff800`0104e5b4 : 00000000`0000000a
    >> > 00000000`0005defd
    >> > 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
    >> > fffffade`5bca8be0 fffff800`0104d587 : 00000006`00000000
    >> > fffffa80`033c2090
    >> > 00000000`00000000 00000000`00000001 : nt!KiBugCheckDispatch+0x74
    >> > fffffade`5bca8d60 fffff800`013e0579 : 00000000`00000002
    >> > 00000000`6d436d4d
    >> > 00000000`00000002 00000000`00001000 : nt!KiPageFault+0x207
    >> > fffffade`5bca8ef0 fffff800`013797a2 : 00000000`00000000
    >> > 00000000`000fffff
    >> > 00000000`00100000 00000000`4d546100 :
    >> > nt!MiFindContiguousMemoryInPool+0xb9
    >> > fffffade`5bca8f90 fffff800`010edfee : 00000000`00000004
    >> > fffffade`6e27b000
    >> > fffffade`6e122000 00000000`00000002 : nt!MiFindContiguousMemory+0x52
    >> > fffffade`5bca8ff0 fffff800`010ee07b : 00000000`00000001
    >> > 00000000`00000080
    >> > fffffade`6d2343f0 00000000`ffffffff :
    >> > nt!MiAllocateContiguousMemory+0x12e
    >> > fffffade`5bca9070 fffffade`5b0764e2 : 00000000`00000000
    >> > 00000000`00000000
    >> > fffffa80`021b2a30 fffffade`6d2343f0 :
    >> > nt!MmAllocateContiguousMemorySpecifyCache+0x5b
    >> > fffffade`5bca90b0 fffffade`5b07612d : fffffade`6d2343f0
    >> > fffffade`708391b0
    >> > fffffade`708391b0 fffffade`5bca91f0 :
    >> > storport!RaidUnitAllocateResources+0x370
    >> > fffffade`5bca9120 fffffade`5b06c71f : 00000000`00010200
    >> > 00000000`00010200
    >> > fffffade`5bca9260 00000000`00000000 : storport!RaidCreateUnit+0x13d
    >> > fffffade`5bca9180 fffffade`5b06bfbf : 00000000`00000000
    >> > 00000000`00000000
    >> > 00000000`00000002 00000000`00010200 :
    >> > storport!RaidBusEnumeratorGetUnit+0x6f
    >> > fffffade`5bca91f0 fffffade`5b06946f : fffffade`00fe0200
    >> > 00000000`00000000
    >> > 00000000`00000001 00000000`00000002 :
    >> > storport!RaidBusEnumeratorVisitUnit+0x4f
    >> > fffffade`5bca92f0 fffffade`5b06957d : 00000000`00000000
    >> > fffffade`572c0d6d
    >> > fffffade`5b87b180 fffffade`708391b0 :
    >> > storport!RaidAdapterEnumerateBus+0xbf
    >> > fffffade`5bca9470 fffffade`5b088c8f : fffffade`6d9a62e0
    >> > fffffade`6cd14ae0
    >> > 00000000`a0000003 fffffade`708391b0 :
    >> > storport!RaidAdapterRescanBus+0x8d
    >> > fffffade`5bca9530 fffffade`5b08890b : 00000000`00000000
    >> > fffffade`6cd14ae0
    >> > 00000000`00000000 fffffade`708391b0 :
    >> > storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
    >> > fffffade`5bca95d0 fffffade`5b089eef : fffffade`70893650
    >> > fffffade`5bca9820
    >> > fffffade`70839060 fffffade`6cd14ae0 : storport!RaidAdapterPnpIrp+0x14b
    >> > fffffade`5bca96a0 fffffade`5b85d949 : fffffade`5bca97e0
    >> > fffffade`70893650
    >> > fffffade`6cd14ae0 fffffade`6fcf6bb0 : storport!RaDriverPnpIrp+0xcf
    >> > fffffade`5bca9730 fffff800`0124d573 : fffffade`6cd14ae0
    >> > fffffade`5bca9820
    >> > fffffade`70893500 fffffade`6fcf6bb0 :
    >> > mpspfltr!MPSPQueryDeviceRelations+0xa9
    >> > fffffade`5bca9790 fffff800`010dc4c1 : 00000000`00000000
    >> > 00000000`00000002
    >> > 00000000`00000000 fffffade`6fcf6a70 : nt!IopSynchronousCall+0x14a
    >> > fffffade`5bca9800 fffff800`013531ea : fffffade`709019b0
    >> > fffff800`01014f00
    >> > fffffade`5bca98e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
    >> > fffffade`5bca9890 fffff800`01354e95 : 00000000`000000c3
    >> > 00000000`00000000
    >> > 00000000`00000002 fffffade`6f42b2f0 : nt!PipProcessDevNodeTree+0x342
    >> > fffffade`5bca9c20 fffff800`010d8598 : fffff800`00000003
    >> > 00000000`00000000
    >> > fffffade`708fabf0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
    >> > fffffade`5bca9c70 fffff800`0105507c : 00000000`00000000
    >> > fffff800`011decc0
    >> > fffff800`010d8230 fffffade`708fabf0 : nt!PipDeviceActionWorker+0x368
    >> > fffffade`5bca9d00 fffff800`01299cae : fffffade`708fabf0
    >> > 00000000`00000080
    >> > fffffade`708fabf0 fffffade`5b883680 : nt!ExpWorkerThread+0x13b
    >> > fffffade`5bca9d70 fffff800`0102bbe6 : fffffade`5b87b180
    >> > fffffade`708fabf0
    >> > fffffade`5b883680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e
    >> > fffffade`5bca9dd0 00000000`00000000 : 00000000`00000000
    >> > 00000000`00000000
    >> > 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
    >> >
    >> >
    >> > STACK_COMMAND: .thread fffffade708fabf0 ; kb
    >> >
    >> > FOLLOWUP_IP:
    >> > storport!RaidUnitAllocateResources+370
    >> > fffffade`5b0764e2 4885c0 test rax,rax
    >> >
    >> > SYMBOL_STACK_INDEX: 7
    >> >
    >> > FOLLOWUP_NAME: MachineOwner
    >> >
    >> > SYMBOL_NAME: storport!RaidUnitAllocateResources+370
    >> >
    >> > MODULE_NAME: storport
    >> >
    >> > IMAGE_NAME: storport.sys
    >> >
    >> > DEBUG_FLR_IMAGE_TIMESTAMP: 448e954c
    >> >
    >> > FAILURE_BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
    >> >
    >> > BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
    >> >
    >> > Followup: MachineOwner
    >> > ---------
    >> >
    >> > "Kenny Speer" wrote:
    >> >
    >> >> Just because storport is in the bugcheck doesn't mean it's the cause.
    >> >> It
    >> >> may not have handled a buggy miniport correctly, but it's very
    >> >> possible
    >> >> the
    >> >> real problem is in a different driver.
    >> >>
    >> >> can you paste the !analyze -v output with the microsoft sym server set
    >> >> in
    >> >> your .sympath
    >> >>
    >> >> "Eric" wrote in message
    >> >> news:805E1809-2AEE-47F5-B7AC-4593B8324122@microsoft.com...
    >> >> > Right, the BSOD IS caused by a driver--the storport.sys driver in
    >> >> > this
    >> >> > case.
    >> >> > The reason I was trying to avoid responses that suggest that I
    >> >> > update
    >> >> > drivers and firmware was because I've already done all that as one
    >> >> > of
    >> >> > the
    >> >> > first troubleshooting steps. Also, I am already on that latest
    >> >> > storport
    >> >> > version (5.2.3790.2880 for SP1) you mention in the KB article. I
    >> >> > mentioned
    >> >> > all this in my original post.
    >> >> >
    >> >> > "Pat [MSFT]" wrote:
    >> >> >
    >> >> >> Well, a BSOD is by definition caused by either a bug in a driver or
    >> >> >> a
    >> >> >> bug
    >> >> >> in
    >> >> >> HW/Firmware. So, your request to not suggest updating
    >> >> >> driver/firmware
    >> >> >> may
    >> >> >> not get you very far.
    >> >> >>
    >> >> >> That said, I think you are running into a known bug that was fixed
    >> >> >> &
    >> >> >> released in Feb (KB Article 932755). You can download the fix
    >> >> >> directly
    >> >> >> via
    >> >> >> http://support.microsoft.com - make sure to grab the correct
    >> >> >> package.
    >> >> >>
    >> >> >> There is a SP1 & SP2 version of the fix - so you could get relief
    >> >> >> w/out
    >> >> >> SP2
    >> >> >> if you absolutely needed to. I would recommend going to SP2 first
    >> >> >> b/c
    >> >> >> there
    >> >> >> are a number of updates & perf improvements that are just general
    >> >> >> goodness.
    >> >> >> Then adding the fix on-top should get you where you need to be.
    >> >> >>
    >> >> >> If the problem persists after that, then I would recommend giving
    >> >> >> support
    >> >> >> a
    >> >> >> call. If the issue is a bug, we refund the cost of the incident
    >> >> >> (or
    >> >> >> re-credit your account if you have a Premier support contract).
    >> >> >>
    >> >> >>
    >> >> >> Pat
    >> >> >>
    >> >> >>
    >> >> >>
    >> >> >> "Eric" wrote in message
    >> >> >> news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
    >> >> >> >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
    >> >> >> > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM
    >> >> >> > xSeries
    >> >> >> > servers attached using QLogic QLA2340 HBAs. We are using MPIO and
    >> >> >> > M$s
    >> >> >> > Storport driver (the latest version, or course) for multipathing
    >> >> >> > on
    >> >> >> > all
    >> >> >> > servers. Furthermore, we're using IBMs StorageManager Agents
    >> >> >> > (again,
    >> >> >> > latest
    >> >> >> > version) on all hosts. Also part of that SAN is a Quantum PX502
    >> >> >> > robotic
    >> >> >> > tape-library which is also Fibre-Channel and attached directly to
    >> >> >> > the
    >> >> >> > SAN
    >> >> >> > (i.e. not physically attached to a server). We are not using any
    >> >> >> > kind
    >> >> >> > of
    >> >> >> > SAN
    >> >> >> > partitioning, so all hosts attached to the SAN see the tape
    >> >> >> > drives
    >> >> >> > and
    >> >> >> > robot.
    >> >> >> >
    >> >> >> > Here's what happens. After rebooting the tape library, some or
    >> >> >> > all
    >> >> >> > of
    >> >> >> > my
    >> >> >> > x64
    >> >> >> > servers BSOD with a 0x0A stop error and your typical
    >> >> >> > IRQL_NOT_LESS...
    >> >> >> > message. x86 servers have yet to be affected. Debugging the
    >> >> >> > resulting
    >> >> >> > memory
    >> >> >> > dump shows that storport.sys is the culprit. Additionally, soon
    >> >> >> > before
    >> >> >> > the
    >> >> >> > server BSODs, the system event has log entries from
    >> >> >> > PlugPlayManager
    >> >> >> > saying
    >> >> >> > that the tape drives and robot disappeared without being prepared
    >> >> >> > for
    >> >> >> > removal
    >> >> >> > (Event ID 12). Obviously, preparing the hardware for removal on
    >> >> >> > all
    >> >> >> > my
    >> >> >> > servers is out of the question, besides, the hardware never shows
    >> >> >> > up
    >> >> >> > in
    >> >> >> > the
    >> >> >> > list of items to be safely removed.
    >> >> >> >
    >> >> >> > I'm very aware that SP2 is out for 2k3, and I intend to install
    >> >> >> > that
    >> >> >> > someday
    >> >> >> > (once I recover from all the late-night work I've had to put in
    >> >> >> > dealing



+ Reply to Thread