x64 2K3 R2 BSODs when FC Tape Library is Rebooted - Storage
This is a discussion on x64 2K3 R2 BSODs when FC Tape Library is Rebooted - Storage ; I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries
servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s
Storport driver (the ...
-
x64 2K3 R2 BSODs when FC Tape Library is Rebooted
I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries
servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s
Storport driver (the latest version, or course) for multipathing on all
servers. Furthermore, we're using IBMs StorageManager Agents (again, latest
version) on all hosts. Also part of that SAN is a Quantum PX502 robotic
tape-library which is also Fibre-Channel and attached directly to the SAN
(i.e. not physically attached to a server). We are not using any kind of SAN
partitioning, so all hosts attached to the SAN see the tape drives and robot.
Here's what happens. After rebooting the tape library, some or all of my x64
servers BSOD with a 0x0A stop error and your typical IRQL_NOT_LESS...
message. x86 servers have yet to be affected. Debugging the resulting memory
dump shows that storport.sys is the culprit. Additionally, soon before the
server BSODs, the system event has log entries from PlugPlayManager saying
that the tape drives and robot disappeared without being prepared for removal
(Event ID 12). Obviously, preparing the hardware for removal on all my
servers is out of the question, besides, the hardware never shows up in the
list of items to be safely removed.
I'm very aware that SP2 is out for 2k3, and I intend to install that someday
(once I recover from all the late-night work I've had to put in dealing with
this problem); however, I'm not confident that will solve the problem since I
will still have the same version of the storport driver.
So, short of calling M$ and paying for a support incident, any other bright
ideas? I'd appreciate sparing me of basic "update firmware" "update driver"
suggestions as those are obvious and already done.
-
Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted
Well, a BSOD is by definition caused by either a bug in a driver or a bug in
HW/Firmware. So, your request to not suggest updating driver/firmware may
not get you very far.
That said, I think you are running into a known bug that was fixed &
released in Feb (KB Article 932755). You can download the fix directly via
http://support.microsoft.com - make sure to grab the correct package.
There is a SP1 & SP2 version of the fix - so you could get relief w/out SP2
if you absolutely needed to. I would recommend going to SP2 first b/c there
are a number of updates & perf improvements that are just general goodness.
Then adding the fix on-top should get you where you need to be.
If the problem persists after that, then I would recommend giving support a
call. If the issue is a bug, we refund the cost of the incident (or
re-credit your account if you have a Premier support contract).
Pat
"Eric" wrote in message
news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
>I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
> fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries
> servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s
> Storport driver (the latest version, or course) for multipathing on all
> servers. Furthermore, we're using IBMs StorageManager Agents (again,
> latest
> version) on all hosts. Also part of that SAN is a Quantum PX502 robotic
> tape-library which is also Fibre-Channel and attached directly to the SAN
> (i.e. not physically attached to a server). We are not using any kind of
> SAN
> partitioning, so all hosts attached to the SAN see the tape drives and
> robot.
>
> Here's what happens. After rebooting the tape library, some or all of my
> x64
> servers BSOD with a 0x0A stop error and your typical IRQL_NOT_LESS...
> message. x86 servers have yet to be affected. Debugging the resulting
> memory
> dump shows that storport.sys is the culprit. Additionally, soon before the
> server BSODs, the system event has log entries from PlugPlayManager saying
> that the tape drives and robot disappeared without being prepared for
> removal
> (Event ID 12). Obviously, preparing the hardware for removal on all my
> servers is out of the question, besides, the hardware never shows up in
> the
> list of items to be safely removed.
>
> I'm very aware that SP2 is out for 2k3, and I intend to install that
> someday
> (once I recover from all the late-night work I've had to put in dealing
> with
> this problem); however, I'm not confident that will solve the problem
> since I
> will still have the same version of the storport driver.
>
> So, short of calling M$ and paying for a support incident, any other
> bright
> ideas? I'd appreciate sparing me of basic "update firmware" "update
> driver"
> suggestions as those are obvious and already done.
-
Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted
Right, the BSOD IS caused by a driver--the storport.sys driver in this case.
The reason I was trying to avoid responses that suggest that I update
drivers and firmware was because I've already done all that as one of the
first troubleshooting steps. Also, I am already on that latest storport
version (5.2.3790.2880 for SP1) you mention in the KB article. I mentioned
all this in my original post.
"Pat [MSFT]" wrote:
> Well, a BSOD is by definition caused by either a bug in a driver or a bug in
> HW/Firmware. So, your request to not suggest updating driver/firmware may
> not get you very far.
>
> That said, I think you are running into a known bug that was fixed &
> released in Feb (KB Article 932755). You can download the fix directly via
> http://support.microsoft.com - make sure to grab the correct package.
>
> There is a SP1 & SP2 version of the fix - so you could get relief w/out SP2
> if you absolutely needed to. I would recommend going to SP2 first b/c there
> are a number of updates & perf improvements that are just general goodness.
> Then adding the fix on-top should get you where you need to be.
>
> If the problem persists after that, then I would recommend giving support a
> call. If the issue is a bug, we refund the cost of the incident (or
> re-credit your account if you have a Premier support contract).
>
>
> Pat
>
>
>
> "Eric" wrote in message
> news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
> >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
> > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries
> > servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s
> > Storport driver (the latest version, or course) for multipathing on all
> > servers. Furthermore, we're using IBMs StorageManager Agents (again,
> > latest
> > version) on all hosts. Also part of that SAN is a Quantum PX502 robotic
> > tape-library which is also Fibre-Channel and attached directly to the SAN
> > (i.e. not physically attached to a server). We are not using any kind of
> > SAN
> > partitioning, so all hosts attached to the SAN see the tape drives and
> > robot.
> >
> > Here's what happens. After rebooting the tape library, some or all of my
> > x64
> > servers BSOD with a 0x0A stop error and your typical IRQL_NOT_LESS...
> > message. x86 servers have yet to be affected. Debugging the resulting
> > memory
> > dump shows that storport.sys is the culprit. Additionally, soon before the
> > server BSODs, the system event has log entries from PlugPlayManager saying
> > that the tape drives and robot disappeared without being prepared for
> > removal
> > (Event ID 12). Obviously, preparing the hardware for removal on all my
> > servers is out of the question, besides, the hardware never shows up in
> > the
> > list of items to be safely removed.
> >
> > I'm very aware that SP2 is out for 2k3, and I intend to install that
> > someday
> > (once I recover from all the late-night work I've had to put in dealing
> > with
> > this problem); however, I'm not confident that will solve the problem
> > since I
> > will still have the same version of the storport driver.
> >
> > So, short of calling M$ and paying for a support incident, any other
> > bright
> > ideas? I'd appreciate sparing me of basic "update firmware" "update
> > driver"
> > suggestions as those are obvious and already done.
>
>
-
Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted
Just because storport is in the bugcheck doesn't mean it's the cause. It
may not have handled a buggy miniport correctly, but it's very possible the
real problem is in a different driver.
can you paste the !analyze -v output with the microsoft sym server set in
your .sympath
"Eric" wrote in message
news:805E1809-2AEE-47F5-B7AC-4593B8324122@microsoft.com...
> Right, the BSOD IS caused by a driver--the storport.sys driver in this
> case.
> The reason I was trying to avoid responses that suggest that I update
> drivers and firmware was because I've already done all that as one of the
> first troubleshooting steps. Also, I am already on that latest storport
> version (5.2.3790.2880 for SP1) you mention in the KB article. I
> mentioned
> all this in my original post.
>
> "Pat [MSFT]" wrote:
>
>> Well, a BSOD is by definition caused by either a bug in a driver or a bug
>> in
>> HW/Firmware. So, your request to not suggest updating driver/firmware
>> may
>> not get you very far.
>>
>> That said, I think you are running into a known bug that was fixed &
>> released in Feb (KB Article 932755). You can download the fix directly
>> via
>> http://support.microsoft.com - make sure to grab the correct package.
>>
>> There is a SP1 & SP2 version of the fix - so you could get relief w/out
>> SP2
>> if you absolutely needed to. I would recommend going to SP2 first b/c
>> there
>> are a number of updates & perf improvements that are just general
>> goodness.
>> Then adding the fix on-top should get you where you need to be.
>>
>> If the problem persists after that, then I would recommend giving support
>> a
>> call. If the issue is a bug, we refund the cost of the incident (or
>> re-credit your account if you have a Premier support contract).
>>
>>
>> Pat
>>
>>
>>
>> "Eric" wrote in message
>> news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
>> >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
>> > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries
>> > servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s
>> > Storport driver (the latest version, or course) for multipathing on all
>> > servers. Furthermore, we're using IBMs StorageManager Agents (again,
>> > latest
>> > version) on all hosts. Also part of that SAN is a Quantum PX502 robotic
>> > tape-library which is also Fibre-Channel and attached directly to the
>> > SAN
>> > (i.e. not physically attached to a server). We are not using any kind
>> > of
>> > SAN
>> > partitioning, so all hosts attached to the SAN see the tape drives and
>> > robot.
>> >
>> > Here's what happens. After rebooting the tape library, some or all of
>> > my
>> > x64
>> > servers BSOD with a 0x0A stop error and your typical IRQL_NOT_LESS...
>> > message. x86 servers have yet to be affected. Debugging the resulting
>> > memory
>> > dump shows that storport.sys is the culprit. Additionally, soon before
>> > the
>> > server BSODs, the system event has log entries from PlugPlayManager
>> > saying
>> > that the tape drives and robot disappeared without being prepared for
>> > removal
>> > (Event ID 12). Obviously, preparing the hardware for removal on all my
>> > servers is out of the question, besides, the hardware never shows up in
>> > the
>> > list of items to be safely removed.
>> >
>> > I'm very aware that SP2 is out for 2k3, and I intend to install that
>> > someday
>> > (once I recover from all the late-night work I've had to put in dealing
>> > with
>> > this problem); however, I'm not confident that will solve the problem
>> > since I
>> > will still have the same version of the storport driver.
>> >
>> > So, short of calling M$ and paying for a support incident, any other
>> > bright
>> > ideas? I'd appreciate sparing me of basic "update firmware" "update
>> > driver"
>> > suggestions as those are obvious and already done.
>>
>>
-
Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted
************************************************** *****************************
*
*
* Bugcheck Analysis
*
*
*
************************************************** *****************************
IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: 000000000005defd, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff800013e0579, address which referenced memory
Debugging Details:
------------------
*** Error in in reading nt!_ETHREAD @ 0000000000000000
*** Error in in reading nt!_ETHREAD @ 0000000000000000
*** Error in in reading nt!_ETHREAD @ 0000000000000000
READ_ADDRESS: 000000000005defd
CURRENT_IRQL: 2
FAULTING_IP:
nt!MiFindContiguousMemoryInPool+b9
fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
DEFAULT_BUCKET_ID: DRIVER_FAULT
BUGCHECK_STR: 0xA
LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890
LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)
Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
threads
Threads: fffffade708fabf0-01<*>
1 total locks, 1 locks currently held
FAULTING_THREAD: fffffade708fabf0
PNP_TRIAGE:
Lock address : 0xfffff800011deb00
Thread Count : 1
Thread address: 0xfffffade708fabf0
Thread wait : 0xee6922c
TRAP_FRAME: fffffade5bca8d60 -- (.trap fffffade5bca8d60)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed.
rax=0000000000000000 rbx=fffffade6e122000 rcx=0000000000112242
rdx=00000000001d43ef rsi=0000000000000001 rdi=fffffade5f55d660
rip=fffff800013e0579 rsp=fffffade5bca8ef0 rbp=0000000000100000
r8=00000000000ffffe r9=0000000000112242 r10=fffffade6e442000
r11=00000000000fffff r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz ac po cy
nt!MiFindContiguousMemoryInPool+0xb9:
fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
ds:fffffade`6e122010=0000000000000002
Resetting default scope
STACK_TEXT:
fffffade`5bca8bd8 fffff800`0104e5b4 : 00000000`0000000a 00000000`0005defd
00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
fffffade`5bca8be0 fffff800`0104d587 : 00000006`00000000 fffffa80`033c2090
00000000`00000000 00000000`00000001 : nt!KiBugCheckDispatch+0x74
fffffade`5bca8d60 fffff800`013e0579 : 00000000`00000002 00000000`6d436d4d
00000000`00000002 00000000`00001000 : nt!KiPageFault+0x207
fffffade`5bca8ef0 fffff800`013797a2 : 00000000`00000000 00000000`000fffff
00000000`00100000 00000000`4d546100 : nt!MiFindContiguousMemoryInPool+0xb9
fffffade`5bca8f90 fffff800`010edfee : 00000000`00000004 fffffade`6e27b000
fffffade`6e122000 00000000`00000002 : nt!MiFindContiguousMemory+0x52
fffffade`5bca8ff0 fffff800`010ee07b : 00000000`00000001 00000000`00000080
fffffade`6d2343f0 00000000`ffffffff : nt!MiAllocateContiguousMemory+0x12e
fffffade`5bca9070 fffffade`5b0764e2 : 00000000`00000000 00000000`00000000
fffffa80`021b2a30 fffffade`6d2343f0 :
nt!MmAllocateContiguousMemorySpecifyCache+0x5b
fffffade`5bca90b0 fffffade`5b07612d : fffffade`6d2343f0 fffffade`708391b0
fffffade`708391b0 fffffade`5bca91f0 : storport!RaidUnitAllocateResources+0x370
fffffade`5bca9120 fffffade`5b06c71f : 00000000`00010200 00000000`00010200
fffffade`5bca9260 00000000`00000000 : storport!RaidCreateUnit+0x13d
fffffade`5bca9180 fffffade`5b06bfbf : 00000000`00000000 00000000`00000000
00000000`00000002 00000000`00010200 : storport!RaidBusEnumeratorGetUnit+0x6f
fffffade`5bca91f0 fffffade`5b06946f : fffffade`00fe0200 00000000`00000000
00000000`00000001 00000000`00000002 : storport!RaidBusEnumeratorVisitUnit+0x4f
fffffade`5bca92f0 fffffade`5b06957d : 00000000`00000000 fffffade`572c0d6d
fffffade`5b87b180 fffffade`708391b0 : storport!RaidAdapterEnumerateBus+0xbf
fffffade`5bca9470 fffffade`5b088c8f : fffffade`6d9a62e0 fffffade`6cd14ae0
00000000`a0000003 fffffade`708391b0 : storport!RaidAdapterRescanBus+0x8d
fffffade`5bca9530 fffffade`5b08890b : 00000000`00000000 fffffade`6cd14ae0
00000000`00000000 fffffade`708391b0 :
storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
fffffade`5bca95d0 fffffade`5b089eef : fffffade`70893650 fffffade`5bca9820
fffffade`70839060 fffffade`6cd14ae0 : storport!RaidAdapterPnpIrp+0x14b
fffffade`5bca96a0 fffffade`5b85d949 : fffffade`5bca97e0 fffffade`70893650
fffffade`6cd14ae0 fffffade`6fcf6bb0 : storport!RaDriverPnpIrp+0xcf
fffffade`5bca9730 fffff800`0124d573 : fffffade`6cd14ae0 fffffade`5bca9820
fffffade`70893500 fffffade`6fcf6bb0 : mpspfltr!MPSPQueryDeviceRelations+0xa9
fffffade`5bca9790 fffff800`010dc4c1 : 00000000`00000000 00000000`00000002
00000000`00000000 fffffade`6fcf6a70 : nt!IopSynchronousCall+0x14a
fffffade`5bca9800 fffff800`013531ea : fffffade`709019b0 fffff800`01014f00
fffffade`5bca98e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
fffffade`5bca9890 fffff800`01354e95 : 00000000`000000c3 00000000`00000000
00000000`00000002 fffffade`6f42b2f0 : nt!PipProcessDevNodeTree+0x342
fffffade`5bca9c20 fffff800`010d8598 : fffff800`00000003 00000000`00000000
fffffade`708fabf0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
fffffade`5bca9c70 fffff800`0105507c : 00000000`00000000 fffff800`011decc0
fffff800`010d8230 fffffade`708fabf0 : nt!PipDeviceActionWorker+0x368
fffffade`5bca9d00 fffff800`01299cae : fffffade`708fabf0 00000000`00000080
fffffade`708fabf0 fffffade`5b883680 : nt!ExpWorkerThread+0x13b
fffffade`5bca9d70 fffff800`0102bbe6 : fffffade`5b87b180 fffffade`708fabf0
fffffade`5b883680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e
fffffade`5bca9dd0 00000000`00000000 : 00000000`00000000 00000000`00000000
00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
STACK_COMMAND: .thread fffffade708fabf0 ; kb
FOLLOWUP_IP:
storport!RaidUnitAllocateResources+370
fffffade`5b0764e2 4885c0 test rax,rax
SYMBOL_STACK_INDEX: 7
FOLLOWUP_NAME: MachineOwner
SYMBOL_NAME: storport!RaidUnitAllocateResources+370
MODULE_NAME: storport
IMAGE_NAME: storport.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 448e954c
FAILURE_BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
Followup: MachineOwner
---------
"Kenny Speer" wrote:
> Just because storport is in the bugcheck doesn't mean it's the cause. It
> may not have handled a buggy miniport correctly, but it's very possible the
> real problem is in a different driver.
>
> can you paste the !analyze -v output with the microsoft sym server set in
> your .sympath
>
> "Eric" wrote in message
> news:805E1809-2AEE-47F5-B7AC-4593B8324122@microsoft.com...
> > Right, the BSOD IS caused by a driver--the storport.sys driver in this
> > case.
> > The reason I was trying to avoid responses that suggest that I update
> > drivers and firmware was because I've already done all that as one of the
> > first troubleshooting steps. Also, I am already on that latest storport
> > version (5.2.3790.2880 for SP1) you mention in the KB article. I
> > mentioned
> > all this in my original post.
> >
> > "Pat [MSFT]" wrote:
> >
> >> Well, a BSOD is by definition caused by either a bug in a driver or a bug
> >> in
> >> HW/Firmware. So, your request to not suggest updating driver/firmware
> >> may
> >> not get you very far.
> >>
> >> That said, I think you are running into a known bug that was fixed &
> >> released in Feb (KB Article 932755). You can download the fix directly
> >> via
> >> http://support.microsoft.com - make sure to grab the correct package.
> >>
> >> There is a SP1 & SP2 version of the fix - so you could get relief w/out
> >> SP2
> >> if you absolutely needed to. I would recommend going to SP2 first b/c
> >> there
> >> are a number of updates & perf improvements that are just general
> >> goodness.
> >> Then adding the fix on-top should get you where you need to be.
> >>
> >> If the problem persists after that, then I would recommend giving support
> >> a
> >> call. If the issue is a bug, we refund the cost of the incident (or
> >> re-credit your account if you have a Premier support contract).
> >>
> >>
> >> Pat
> >>
> >>
> >>
> >> "Eric" wrote in message
> >> news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
> >> >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
> >> > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM xSeries
> >> > servers attached using QLogic QLA2340 HBAs. We are using MPIO and M$s
> >> > Storport driver (the latest version, or course) for multipathing on all
> >> > servers. Furthermore, we're using IBMs StorageManager Agents (again,
> >> > latest
> >> > version) on all hosts. Also part of that SAN is a Quantum PX502 robotic
> >> > tape-library which is also Fibre-Channel and attached directly to the
> >> > SAN
> >> > (i.e. not physically attached to a server). We are not using any kind
> >> > of
> >> > SAN
> >> > partitioning, so all hosts attached to the SAN see the tape drives and
> >> > robot.
> >> >
> >> > Here's what happens. After rebooting the tape library, some or all of
> >> > my
> >> > x64
> >> > servers BSOD with a 0x0A stop error and your typical IRQL_NOT_LESS...
> >> > message. x86 servers have yet to be affected. Debugging the resulting
> >> > memory
> >> > dump shows that storport.sys is the culprit. Additionally, soon before
> >> > the
> >> > server BSODs, the system event has log entries from PlugPlayManager
> >> > saying
> >> > that the tape drives and robot disappeared without being prepared for
> >> > removal
> >> > (Event ID 12). Obviously, preparing the hardware for removal on all my
> >> > servers is out of the question, besides, the hardware never shows up in
> >> > the
> >> > list of items to be safely removed.
> >> >
> >> > I'm very aware that SP2 is out for 2k3, and I intend to install that
> >> > someday
> >> > (once I recover from all the late-night work I've had to put in dealing
> >> > with
> >> > this problem); however, I'm not confident that will solve the problem
> >> > since I
> >> > will still have the same version of the storport driver.
> >> >
> >> > So, short of calling M$ and paying for a support incident, any other
> >> > bright
> >> > ideas? I'd appreciate sparing me of basic "update firmware" "update
> >> > driver"
> >> > suggestions as those are obvious and already done.
> >>
> >>
>
-
Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted
Unfortunately, all the analyze shows is that while at DISPATCH_LEVEL (irql
2) the system attempted to read a bogus pointer, most likely it's simply
uninitialized (based on the garbage address 5defd) ...
If you can reproduce this easily, I suggest you enable driver verifier on
the mpio, storport, and miniport drivers.
Also, if you have SANSurfer installed, please uninstall it. There is a
kernel mode service which queries the devices on the SAN and can definetely
cause issues.
To enable verifier do this:
1. start->run->verifier
2. choose "Create standard settings"
3. choose "Select driver names"
4. check the following: mpdev.sys mpio.sys mpspfltr.sys ql2300.sys
storport.sys
5. reboot
Then reboot your tape library. Many times, when verifier is running it will
catch issues earlier than the bugcheck will and should be much more
accurate. Driver Verifier will still BSOD your host, but the dump will
contain better info. You also most likely don't need a complete memory
dump, kernel dump should be sufficient.
Since it seems like you are an IBM shop, you should be able to report this
issue to IBM and have them report it to MS.
Good luck,
~kenny
"Eric" wrote in message
news:B8DCFE65-7D6F-45BD-9E7F-94001D92E4A7@microsoft.com...
> ************************************************** *****************************
> *
> *
> * Bugcheck Analysis
> *
> *
> *
> ************************************************** *****************************
>
> IRQL_NOT_LESS_OR_EQUAL (a)
> An attempt was made to access a pageable (or completely invalid) address
> at an
> interrupt request level (IRQL) that is too high. This is usually
> caused by drivers using improper addresses.
> If a kernel debugger is available get the stack backtrace.
> Arguments:
> Arg1: 000000000005defd, memory referenced
> Arg2: 0000000000000002, IRQL
> Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
> Arg4: fffff800013e0579, address which referenced memory
>
> Debugging Details:
> ------------------
>
> *** Error in in reading nt!_ETHREAD @ 0000000000000000
> *** Error in in reading nt!_ETHREAD @ 0000000000000000
> *** Error in in reading nt!_ETHREAD @ 0000000000000000
>
> READ_ADDRESS: 000000000005defd
>
> CURRENT_IRQL: 2
>
> FAULTING_IP:
> nt!MiFindContiguousMemoryInPool+b9
> fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
>
> DEFAULT_BUCKET_ID: DRIVER_FAULT
>
> BUGCHECK_STR: 0xA
>
> LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890
>
> LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)
>
> Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
> threads
> Threads: fffffade708fabf0-01<*>
> 1 total locks, 1 locks currently held
>
> FAULTING_THREAD: fffffade708fabf0
>
> PNP_TRIAGE:
> Lock address : 0xfffff800011deb00
> Thread Count : 1
> Thread address: 0xfffffade708fabf0
> Thread wait : 0xee6922c
>
> TRAP_FRAME: fffffade5bca8d60 -- (.trap fffffade5bca8d60)
> NOTE: The trap frame does not contain all registers.
> Some register values may be zeroed.
> rax=0000000000000000 rbx=fffffade6e122000 rcx=0000000000112242
> rdx=00000000001d43ef rsi=0000000000000001 rdi=fffffade5f55d660
> rip=fffff800013e0579 rsp=fffffade5bca8ef0 rbp=0000000000100000
> r8=00000000000ffffe r9=0000000000112242 r10=fffffade6e442000
> r11=00000000000fffff r12=0000000000000000 r13=0000000000000000
> r14=0000000000000000 r15=0000000000000000
> iopl=0 nv up ei ng nz ac po cy
> nt!MiFindContiguousMemoryInPool+0xb9:
> fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
> ds:fffffade`6e122010=0000000000000002
> Resetting default scope
>
> STACK_TEXT:
> fffffade`5bca8bd8 fffff800`0104e5b4 : 00000000`0000000a 00000000`0005defd
> 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
> fffffade`5bca8be0 fffff800`0104d587 : 00000006`00000000 fffffa80`033c2090
> 00000000`00000000 00000000`00000001 : nt!KiBugCheckDispatch+0x74
> fffffade`5bca8d60 fffff800`013e0579 : 00000000`00000002 00000000`6d436d4d
> 00000000`00000002 00000000`00001000 : nt!KiPageFault+0x207
> fffffade`5bca8ef0 fffff800`013797a2 : 00000000`00000000 00000000`000fffff
> 00000000`00100000 00000000`4d546100 : nt!MiFindContiguousMemoryInPool+0xb9
> fffffade`5bca8f90 fffff800`010edfee : 00000000`00000004 fffffade`6e27b000
> fffffade`6e122000 00000000`00000002 : nt!MiFindContiguousMemory+0x52
> fffffade`5bca8ff0 fffff800`010ee07b : 00000000`00000001 00000000`00000080
> fffffade`6d2343f0 00000000`ffffffff : nt!MiAllocateContiguousMemory+0x12e
> fffffade`5bca9070 fffffade`5b0764e2 : 00000000`00000000 00000000`00000000
> fffffa80`021b2a30 fffffade`6d2343f0 :
> nt!MmAllocateContiguousMemorySpecifyCache+0x5b
> fffffade`5bca90b0 fffffade`5b07612d : fffffade`6d2343f0 fffffade`708391b0
> fffffade`708391b0 fffffade`5bca91f0 :
> storport!RaidUnitAllocateResources+0x370
> fffffade`5bca9120 fffffade`5b06c71f : 00000000`00010200 00000000`00010200
> fffffade`5bca9260 00000000`00000000 : storport!RaidCreateUnit+0x13d
> fffffade`5bca9180 fffffade`5b06bfbf : 00000000`00000000 00000000`00000000
> 00000000`00000002 00000000`00010200 :
> storport!RaidBusEnumeratorGetUnit+0x6f
> fffffade`5bca91f0 fffffade`5b06946f : fffffade`00fe0200 00000000`00000000
> 00000000`00000001 00000000`00000002 :
> storport!RaidBusEnumeratorVisitUnit+0x4f
> fffffade`5bca92f0 fffffade`5b06957d : 00000000`00000000 fffffade`572c0d6d
> fffffade`5b87b180 fffffade`708391b0 :
> storport!RaidAdapterEnumerateBus+0xbf
> fffffade`5bca9470 fffffade`5b088c8f : fffffade`6d9a62e0 fffffade`6cd14ae0
> 00000000`a0000003 fffffade`708391b0 : storport!RaidAdapterRescanBus+0x8d
> fffffade`5bca9530 fffffade`5b08890b : 00000000`00000000 fffffade`6cd14ae0
> 00000000`00000000 fffffade`708391b0 :
> storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
> fffffade`5bca95d0 fffffade`5b089eef : fffffade`70893650 fffffade`5bca9820
> fffffade`70839060 fffffade`6cd14ae0 : storport!RaidAdapterPnpIrp+0x14b
> fffffade`5bca96a0 fffffade`5b85d949 : fffffade`5bca97e0 fffffade`70893650
> fffffade`6cd14ae0 fffffade`6fcf6bb0 : storport!RaDriverPnpIrp+0xcf
> fffffade`5bca9730 fffff800`0124d573 : fffffade`6cd14ae0 fffffade`5bca9820
> fffffade`70893500 fffffade`6fcf6bb0 :
> mpspfltr!MPSPQueryDeviceRelations+0xa9
> fffffade`5bca9790 fffff800`010dc4c1 : 00000000`00000000 00000000`00000002
> 00000000`00000000 fffffade`6fcf6a70 : nt!IopSynchronousCall+0x14a
> fffffade`5bca9800 fffff800`013531ea : fffffade`709019b0 fffff800`01014f00
> fffffade`5bca98e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
> fffffade`5bca9890 fffff800`01354e95 : 00000000`000000c3 00000000`00000000
> 00000000`00000002 fffffade`6f42b2f0 : nt!PipProcessDevNodeTree+0x342
> fffffade`5bca9c20 fffff800`010d8598 : fffff800`00000003 00000000`00000000
> fffffade`708fabf0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
> fffffade`5bca9c70 fffff800`0105507c : 00000000`00000000 fffff800`011decc0
> fffff800`010d8230 fffffade`708fabf0 : nt!PipDeviceActionWorker+0x368
> fffffade`5bca9d00 fffff800`01299cae : fffffade`708fabf0 00000000`00000080
> fffffade`708fabf0 fffffade`5b883680 : nt!ExpWorkerThread+0x13b
> fffffade`5bca9d70 fffff800`0102bbe6 : fffffade`5b87b180 fffffade`708fabf0
> fffffade`5b883680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e
> fffffade`5bca9dd0 00000000`00000000 : 00000000`00000000 00000000`00000000
> 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
>
>
> STACK_COMMAND: .thread fffffade708fabf0 ; kb
>
> FOLLOWUP_IP:
> storport!RaidUnitAllocateResources+370
> fffffade`5b0764e2 4885c0 test rax,rax
>
> SYMBOL_STACK_INDEX: 7
>
> FOLLOWUP_NAME: MachineOwner
>
> SYMBOL_NAME: storport!RaidUnitAllocateResources+370
>
> MODULE_NAME: storport
>
> IMAGE_NAME: storport.sys
>
> DEBUG_FLR_IMAGE_TIMESTAMP: 448e954c
>
> FAILURE_BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
>
> BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
>
> Followup: MachineOwner
> ---------
>
> "Kenny Speer" wrote:
>
>> Just because storport is in the bugcheck doesn't mean it's the cause. It
>> may not have handled a buggy miniport correctly, but it's very possible
>> the
>> real problem is in a different driver.
>>
>> can you paste the !analyze -v output with the microsoft sym server set in
>> your .sympath
>>
>> "Eric" wrote in message
>> news:805E1809-2AEE-47F5-B7AC-4593B8324122@microsoft.com...
>> > Right, the BSOD IS caused by a driver--the storport.sys driver in this
>> > case.
>> > The reason I was trying to avoid responses that suggest that I update
>> > drivers and firmware was because I've already done all that as one of
>> > the
>> > first troubleshooting steps. Also, I am already on that latest
>> > storport
>> > version (5.2.3790.2880 for SP1) you mention in the KB article. I
>> > mentioned
>> > all this in my original post.
>> >
>> > "Pat [MSFT]" wrote:
>> >
>> >> Well, a BSOD is by definition caused by either a bug in a driver or a
>> >> bug
>> >> in
>> >> HW/Firmware. So, your request to not suggest updating driver/firmware
>> >> may
>> >> not get you very far.
>> >>
>> >> That said, I think you are running into a known bug that was fixed &
>> >> released in Feb (KB Article 932755). You can download the fix
>> >> directly
>> >> via
>> >> http://support.microsoft.com - make sure to grab the correct package.
>> >>
>> >> There is a SP1 & SP2 version of the fix - so you could get relief
>> >> w/out
>> >> SP2
>> >> if you absolutely needed to. I would recommend going to SP2 first b/c
>> >> there
>> >> are a number of updates & perf improvements that are just general
>> >> goodness.
>> >> Then adding the fix on-top should get you where you need to be.
>> >>
>> >> If the problem persists after that, then I would recommend giving
>> >> support
>> >> a
>> >> call. If the issue is a bug, we refund the cost of the incident (or
>> >> re-credit your account if you have a Premier support contract).
>> >>
>> >>
>> >> Pat
>> >>
>> >>
>> >>
>> >> "Eric" wrote in message
>> >> news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
>> >> >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
>> >> > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM
>> >> > xSeries
>> >> > servers attached using QLogic QLA2340 HBAs. We are using MPIO and
>> >> > M$s
>> >> > Storport driver (the latest version, or course) for multipathing on
>> >> > all
>> >> > servers. Furthermore, we're using IBMs StorageManager Agents (again,
>> >> > latest
>> >> > version) on all hosts. Also part of that SAN is a Quantum PX502
>> >> > robotic
>> >> > tape-library which is also Fibre-Channel and attached directly to
>> >> > the
>> >> > SAN
>> >> > (i.e. not physically attached to a server). We are not using any
>> >> > kind
>> >> > of
>> >> > SAN
>> >> > partitioning, so all hosts attached to the SAN see the tape drives
>> >> > and
>> >> > robot.
>> >> >
>> >> > Here's what happens. After rebooting the tape library, some or all
>> >> > of
>> >> > my
>> >> > x64
>> >> > servers BSOD with a 0x0A stop error and your typical
>> >> > IRQL_NOT_LESS...
>> >> > message. x86 servers have yet to be affected. Debugging the
>> >> > resulting
>> >> > memory
>> >> > dump shows that storport.sys is the culprit. Additionally, soon
>> >> > before
>> >> > the
>> >> > server BSODs, the system event has log entries from PlugPlayManager
>> >> > saying
>> >> > that the tape drives and robot disappeared without being prepared
>> >> > for
>> >> > removal
>> >> > (Event ID 12). Obviously, preparing the hardware for removal on all
>> >> > my
>> >> > servers is out of the question, besides, the hardware never shows up
>> >> > in
>> >> > the
>> >> > list of items to be safely removed.
>> >> >
>> >> > I'm very aware that SP2 is out for 2k3, and I intend to install that
>> >> > someday
>> >> > (once I recover from all the late-night work I've had to put in
>> >> > dealing
>> >> > with
>> >> > this problem); however, I'm not confident that will solve the
>> >> > problem
>> >> > since I
>> >> > will still have the same version of the storport driver.
>> >> >
>> >> > So, short of calling M$ and paying for a support incident, any other
>> >> > bright
>> >> > ideas? I'd appreciate sparing me of basic "update firmware" "update
>> >> > driver"
>> >> > suggestions as those are obvious and already done.
>> >>
>> >>
>>
-
Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted
I uninstalled SANSurfer, turned on driver verifier for the drivers you
listed, and rebooted. Then I rebooted the tape library which in turn BSODs
my hosts as expected. However, I got a 0xC5 BSOD on one host and a 0xD1 BSOD
on the other. Below is the bugcheck analysis on the kernel dump of the host
that got the 0xC5 after driver verifier was turned on. I'm thoroughly lost
at this point. Any suggestions now?
DRIVER_CORRUPTED_EXPOOL (c5)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is
caused by drivers that have corrupted the system pool. Run the driver
verifier against any new (or suspect) drivers, and if that doesn't turn up
the culprit, then use gflags to enable special pool.
Arguments:
Arg1: 00000000000b5430, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000001, value 0 = read operation, 1 = write operation
Arg4: fffff800011abd85, address which referenced memory
Debugging Details:
------------------
*** Error in in reading nt!_ETHREAD @ 0000000000000000
*** Error in in reading nt!_ETHREAD @ 0000000000000000
*** Error in in reading nt!_ETHREAD @ 0000000000000000
OVERLAPPED_MODULE: Address regions for 'Cdfs' and 'imapi.sys' overlap
BUGCHECK_STR: 0xC5_2
CURRENT_IRQL: 2
FAULTING_IP:
nt!ExAllocatePoolWithTag+c8d
fffff800`011abd85 48897008 mov [rax+0x8],rsi
DEFAULT_BUCKET_ID: DRIVER_FAULT
LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890
LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)
Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
threads
Threads: fffffade708987a0-01<*>
1 total locks, 1 locks currently held
FAULTING_THREAD: fffffade708987a0
PNP_TRIAGE:
Lock address : 0xfffff800011deb00
Thread Count : 1
Thread address: 0xfffffade708987a0
Thread wait : 0x2fb7
TRAP_FRAME: fffffade5be93b70 -- (.trap fffffade5be93b70)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed.
rax=00000000000b5428 rbx=00000000000000a0 rcx=fffffade6e74bd60
rdx=0000000000000000 rsi=fffffade5be93ed0 rdi=0000000000000001
rip=fffff800011abd85 rsp=fffffade5be93d00 rbp=0000000000000000
r8=0000000000000000 r9=000000000000000b r10=00000000000000b0
r11=0000000000000001 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz ac pe nc
nt!ExAllocatePoolWithTag+0xc8d:
fffff800`011abd85 48897008 mov [rax+0x8],rsi
ds:0002:00000000`000b5430=????????????????
Resetting default scope
STACK_TEXT:
fffffade`5be939e8 fffff800`0104e5b4 : 00000000`0000000a 00000000`000b5430
00000000`00000002 00000000`00000001 : nt!KeBugCheckEx
fffffade`5be939f0 fffff800`0104d587 : 00000000`00000000 00000000`00000000
00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x74
fffffade`5be93b70 fffff800`011abd85 : 00000000`00000000 00000000`00000000
00000000`00000000 00000000`00000000 : nt!KiPageFault+0x207
fffffade`5be93d00 fffff800`013f77a2 : 00000000`00000000 fffff800`008080be
fffffade`5b236480 fffffaae`3dad8ee0 : nt!ExAllocatePoolWithTag+0xc8d
fffffade`5be93dd0 fffff800`013ed1d1 : 00000000`0002625a fffffaae`3dad8ee0
fffffade`5b235e6c fffffade`7085f060 : nt!VfIrpReserveCallStackData+0x52
fffffade`5be93e00 fffffade`5b235e6c : fffffade`5be93e70 fffffaae`3dad8ee0
fffffadf`f77fe000 fffffade`7085f060 : nt!IovCallDriver+0x131
fffffade`5be93e70 fffffade`5b2372a1 : 00000000`00000000 fffffade`5be943b0
fffffade`5be94190 fffffade`5be94240 :
storport!RaidBusEnumeratorIssueSynchronousRequest+ 0x14c
fffffade`5be93ff0 fffffade`5b236ed3 : fffffade`5be943b8 fffffade`5b241222
fffffade`00000000 fffffade`5be94190 :
storport!RaidBusEnumeratorIssueReportLuns+0x131
fffffade`5be94070 fffffade`5b23779f : fffffade`7085f1b0 00000000`00000000
00000000`00000000 00000000`00000000 :
storport!RaidBusEnumeratorGetLunListFromTarget+0x1 43
fffffade`5be94150 fffffade`5b2326ac : 00000000`00000000 00000000`00000000
00000000`00000000 fffffade`708791b0 :
storport!RaidBusEnumeratorGetLunList+0x6f
fffffade`5be94210 fffffade`5b2327dd : 00000000`0000000a 00000000`00000000
00000000`00000006 fffffade`708791b0 : storport!RaidAdapterEnumerateBus+0x9c
fffffade`5be94390 fffffade`5b284c8f : fffff800`00000006 fffffaae`3df3ae10
fffffade`708ac480 fffffade`708791b0 : storport!RaidAdapterRescanBus+0x8d
fffffade`5be94450 fffffade`5b28490b : 00000282`00180000 fffffaae`3df3ae10
00000000`00000000 fffffade`708791b0 :
storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
fffffade`5be944f0 fffffade`5b285f4f : fffffade`6e749810 fffffade`5be94680
fffffade`70879060 fffffaae`3df3ae10 : storport!RaidAdapterPnpIrp+0x14b
fffffade`5be945c0 fffff800`013ed255 : fffffaae`3df3ae10 fffffade`6e749810
fffffaae`3df3ae10 fffffade`70879060 : storport!RaDriverPnpIrp+0xcf
fffffade`5be94650 fffffade`5ba5d949 : fffffade`6eb723e0 fffffaae`3df3ae10
00000000`00000000 fffffade`70dd4df0 : nt!IovCallDriver+0x1b5
fffffade`5be946c0 fffff800`013ed255 : fffffade`6f749610 fffffade`5be94750
fffffaae`3df3ae10 fffff800`0124d573 : mpspfltr!MPSPQueryDeviceRelations+0xa9
fffffade`5be94720 fffff800`0124d573 : 00000000`00000004 fffffaae`3df3ae10
00000000`00000000 fffffaae`3df3ae10 : nt!IovCallDriver+0x1b5
fffffade`5be94790 fffff800`010dc4c1 : 00000000`00000000 00000000`00000002
00000000`00000000 fffffade`708abc70 : nt!IopSynchronousCall+0x14a
fffffade`5be94800 fffff800`013531ea : fffffa80`007793f0 fffffade`708aca00
fffffade`5be948e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
fffffade`5be94890 fffff800`01354e95 : 00000000`00000576 00000000`00000000
00000000`00000002 fffffade`6f03b850 : nt!PipProcessDevNodeTree+0x342
fffffade`5be94c20 fffff800`010d8598 : fffff800`00000003 00000000`00000000
fffffade`708987a0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
fffffade`5be94c70 fffff800`0105507c : 00000000`00000000 fffff800`011decc0
fffff800`010d8230 fffffade`708987a0 : nt!PipDeviceActionWorker+0x368
fffffade`5be94d00 fffff800`01299cae : fffffade`708987a0 00000000`00000080
fffffade`708987a0 fffffade`5baa3680 : nt!ExpWorkerThread+0x13b
fffffade`5be94d70 fffff800`0102bbe6 : fffffade`5ba9b180 fffffade`708987a0
fffffade`5baa3680 fffff800`011b6dc0 : nt!PspSystemThreadStartup+0x3e
fffffade`5be94dd0 00000000`00000000 : 00000000`00000000 00000000`00000000
00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
STACK_COMMAND: .thread fffffade708987a0 ; kb
FOLLOWUP_IP:
storport!RaidBusEnumeratorIssueSynchronousRequest+ 14c
fffffade`5b235e6c 3d03010000 cmp eax,0x103
SYMBOL_STACK_INDEX: 6
FOLLOWUP_NAME: MachineOwner
SYMBOL_NAME: storport!RaidBusEnumeratorIssueSynchronousRequest+ 14c
MODULE_NAME: storport
IMAGE_NAME: storport.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 45d0a33d
FAILURE_BUCKET_ID:
X64_0xC5_2_storport!RaidBusEnumeratorIssueSynchron ousRequest+14c
BUCKET_ID: X64_0xC5_2_storport!RaidBusEnumeratorIssueSynchron ousRequest+14c
Followup: MachineOwner
"Kenny Speer" wrote:
> Unfortunately, all the analyze shows is that while at DISPATCH_LEVEL (irql
> 2) the system attempted to read a bogus pointer, most likely it's simply
> uninitialized (based on the garbage address 5defd) ...
>
> If you can reproduce this easily, I suggest you enable driver verifier on
> the mpio, storport, and miniport drivers.
>
> Also, if you have SANSurfer installed, please uninstall it. There is a
> kernel mode service which queries the devices on the SAN and can definetely
> cause issues.
>
> To enable verifier do this:
> 1. start->run->verifier
> 2. choose "Create standard settings"
> 3. choose "Select driver names"
> 4. check the following: mpdev.sys mpio.sys mpspfltr.sys ql2300.sys
> storport.sys
> 5. reboot
>
> Then reboot your tape library. Many times, when verifier is running it will
> catch issues earlier than the bugcheck will and should be much more
> accurate. Driver Verifier will still BSOD your host, but the dump will
> contain better info. You also most likely don't need a complete memory
> dump, kernel dump should be sufficient.
>
> Since it seems like you are an IBM shop, you should be able to report this
> issue to IBM and have them report it to MS.
>
> Good luck,
> ~kenny
>
> "Eric" wrote in message
> news:B8DCFE65-7D6F-45BD-9E7F-94001D92E4A7@microsoft.com...
> > ************************************************** *****************************
> > *
> > *
> > * Bugcheck Analysis
> > *
> > *
> > *
> > ************************************************** *****************************
> >
> > IRQL_NOT_LESS_OR_EQUAL (a)
> > An attempt was made to access a pageable (or completely invalid) address
> > at an
> > interrupt request level (IRQL) that is too high. This is usually
> > caused by drivers using improper addresses.
> > If a kernel debugger is available get the stack backtrace.
> > Arguments:
> > Arg1: 000000000005defd, memory referenced
> > Arg2: 0000000000000002, IRQL
> > Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
> > Arg4: fffff800013e0579, address which referenced memory
> >
> > Debugging Details:
> > ------------------
> >
> > *** Error in in reading nt!_ETHREAD @ 0000000000000000
> > *** Error in in reading nt!_ETHREAD @ 0000000000000000
> > *** Error in in reading nt!_ETHREAD @ 0000000000000000
> >
> > READ_ADDRESS: 000000000005defd
> >
> > CURRENT_IRQL: 2
> >
> > FAULTING_IP:
> > nt!MiFindContiguousMemoryInPool+b9
> > fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
> >
> > DEFAULT_BUCKET_ID: DRIVER_FAULT
> >
> > BUGCHECK_STR: 0xA
> >
> > LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890
> >
> > LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)
> >
> > Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
> > threads
> > Threads: fffffade708fabf0-01<*>
> > 1 total locks, 1 locks currently held
> >
> > FAULTING_THREAD: fffffade708fabf0
> >
> > PNP_TRIAGE:
> > Lock address : 0xfffff800011deb00
> > Thread Count : 1
> > Thread address: 0xfffffade708fabf0
> > Thread wait : 0xee6922c
> >
> > TRAP_FRAME: fffffade5bca8d60 -- (.trap fffffade5bca8d60)
> > NOTE: The trap frame does not contain all registers.
> > Some register values may be zeroed.
> > rax=0000000000000000 rbx=fffffade6e122000 rcx=0000000000112242
> > rdx=00000000001d43ef rsi=0000000000000001 rdi=fffffade5f55d660
> > rip=fffff800013e0579 rsp=fffffade5bca8ef0 rbp=0000000000100000
> > r8=00000000000ffffe r9=0000000000112242 r10=fffffade6e442000
> > r11=00000000000fffff r12=0000000000000000 r13=0000000000000000
> > r14=0000000000000000 r15=0000000000000000
> > iopl=0 nv up ei ng nz ac po cy
> > nt!MiFindContiguousMemoryInPool+0xb9:
> > fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
> > ds:fffffade`6e122010=0000000000000002
> > Resetting default scope
> >
> > STACK_TEXT:
> > fffffade`5bca8bd8 fffff800`0104e5b4 : 00000000`0000000a 00000000`0005defd
> > 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
> > fffffade`5bca8be0 fffff800`0104d587 : 00000006`00000000 fffffa80`033c2090
> > 00000000`00000000 00000000`00000001 : nt!KiBugCheckDispatch+0x74
> > fffffade`5bca8d60 fffff800`013e0579 : 00000000`00000002 00000000`6d436d4d
> > 00000000`00000002 00000000`00001000 : nt!KiPageFault+0x207
> > fffffade`5bca8ef0 fffff800`013797a2 : 00000000`00000000 00000000`000fffff
> > 00000000`00100000 00000000`4d546100 : nt!MiFindContiguousMemoryInPool+0xb9
> > fffffade`5bca8f90 fffff800`010edfee : 00000000`00000004 fffffade`6e27b000
> > fffffade`6e122000 00000000`00000002 : nt!MiFindContiguousMemory+0x52
> > fffffade`5bca8ff0 fffff800`010ee07b : 00000000`00000001 00000000`00000080
> > fffffade`6d2343f0 00000000`ffffffff : nt!MiAllocateContiguousMemory+0x12e
> > fffffade`5bca9070 fffffade`5b0764e2 : 00000000`00000000 00000000`00000000
> > fffffa80`021b2a30 fffffade`6d2343f0 :
> > nt!MmAllocateContiguousMemorySpecifyCache+0x5b
> > fffffade`5bca90b0 fffffade`5b07612d : fffffade`6d2343f0 fffffade`708391b0
> > fffffade`708391b0 fffffade`5bca91f0 :
> > storport!RaidUnitAllocateResources+0x370
> > fffffade`5bca9120 fffffade`5b06c71f : 00000000`00010200 00000000`00010200
> > fffffade`5bca9260 00000000`00000000 : storport!RaidCreateUnit+0x13d
> > fffffade`5bca9180 fffffade`5b06bfbf : 00000000`00000000 00000000`00000000
> > 00000000`00000002 00000000`00010200 :
> > storport!RaidBusEnumeratorGetUnit+0x6f
> > fffffade`5bca91f0 fffffade`5b06946f : fffffade`00fe0200 00000000`00000000
> > 00000000`00000001 00000000`00000002 :
> > storport!RaidBusEnumeratorVisitUnit+0x4f
> > fffffade`5bca92f0 fffffade`5b06957d : 00000000`00000000 fffffade`572c0d6d
> > fffffade`5b87b180 fffffade`708391b0 :
> > storport!RaidAdapterEnumerateBus+0xbf
> > fffffade`5bca9470 fffffade`5b088c8f : fffffade`6d9a62e0 fffffade`6cd14ae0
> > 00000000`a0000003 fffffade`708391b0 : storport!RaidAdapterRescanBus+0x8d
> > fffffade`5bca9530 fffffade`5b08890b : 00000000`00000000 fffffade`6cd14ae0
> > 00000000`00000000 fffffade`708391b0 :
> > storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
> > fffffade`5bca95d0 fffffade`5b089eef : fffffade`70893650 fffffade`5bca9820
> > fffffade`70839060 fffffade`6cd14ae0 : storport!RaidAdapterPnpIrp+0x14b
> > fffffade`5bca96a0 fffffade`5b85d949 : fffffade`5bca97e0 fffffade`70893650
> > fffffade`6cd14ae0 fffffade`6fcf6bb0 : storport!RaDriverPnpIrp+0xcf
> > fffffade`5bca9730 fffff800`0124d573 : fffffade`6cd14ae0 fffffade`5bca9820
> > fffffade`70893500 fffffade`6fcf6bb0 :
> > mpspfltr!MPSPQueryDeviceRelations+0xa9
> > fffffade`5bca9790 fffff800`010dc4c1 : 00000000`00000000 00000000`00000002
> > 00000000`00000000 fffffade`6fcf6a70 : nt!IopSynchronousCall+0x14a
> > fffffade`5bca9800 fffff800`013531ea : fffffade`709019b0 fffff800`01014f00
> > fffffade`5bca98e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
> > fffffade`5bca9890 fffff800`01354e95 : 00000000`000000c3 00000000`00000000
> > 00000000`00000002 fffffade`6f42b2f0 : nt!PipProcessDevNodeTree+0x342
> > fffffade`5bca9c20 fffff800`010d8598 : fffff800`00000003 00000000`00000000
> > fffffade`708fabf0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
> > fffffade`5bca9c70 fffff800`0105507c : 00000000`00000000 fffff800`011decc0
> > fffff800`010d8230 fffffade`708fabf0 : nt!PipDeviceActionWorker+0x368
> > fffffade`5bca9d00 fffff800`01299cae : fffffade`708fabf0 00000000`00000080
> > fffffade`708fabf0 fffffade`5b883680 : nt!ExpWorkerThread+0x13b
> > fffffade`5bca9d70 fffff800`0102bbe6 : fffffade`5b87b180 fffffade`708fabf0
> > fffffade`5b883680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e
> > fffffade`5bca9dd0 00000000`00000000 : 00000000`00000000 00000000`00000000
> > 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
> >
> >
> > STACK_COMMAND: .thread fffffade708fabf0 ; kb
> >
> > FOLLOWUP_IP:
> > storport!RaidUnitAllocateResources+370
> > fffffade`5b0764e2 4885c0 test rax,rax
> >
> > SYMBOL_STACK_INDEX: 7
> >
> > FOLLOWUP_NAME: MachineOwner
> >
> > SYMBOL_NAME: storport!RaidUnitAllocateResources+370
> >
> > MODULE_NAME: storport
> >
> > IMAGE_NAME: storport.sys
> >
> > DEBUG_FLR_IMAGE_TIMESTAMP: 448e954c
> >
> > FAILURE_BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
> >
> > BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
> >
> > Followup: MachineOwner
> > ---------
> >
> > "Kenny Speer" wrote:
> >
> >> Just because storport is in the bugcheck doesn't mean it's the cause. It
> >> may not have handled a buggy miniport correctly, but it's very possible
> >> the
> >> real problem is in a different driver.
> >>
> >> can you paste the !analyze -v output with the microsoft sym server set in
> >> your .sympath
> >>
> >> "Eric" wrote in message
> >> news:805E1809-2AEE-47F5-B7AC-4593B8324122@microsoft.com...
> >> > Right, the BSOD IS caused by a driver--the storport.sys driver in this
> >> > case.
> >> > The reason I was trying to avoid responses that suggest that I update
> >> > drivers and firmware was because I've already done all that as one of
> >> > the
> >> > first troubleshooting steps. Also, I am already on that latest
> >> > storport
> >> > version (5.2.3790.2880 for SP1) you mention in the KB article. I
> >> > mentioned
> >> > all this in my original post.
> >> >
> >> > "Pat [MSFT]" wrote:
> >> >
> >> >> Well, a BSOD is by definition caused by either a bug in a driver or a
> >> >> bug
> >> >> in
> >> >> HW/Firmware. So, your request to not suggest updating driver/firmware
> >> >> may
> >> >> not get you very far.
> >> >>
> >> >> That said, I think you are running into a known bug that was fixed &
> >> >> released in Feb (KB Article 932755). You can download the fix
> >> >> directly
> >> >> via
> >> >> http://support.microsoft.com - make sure to grab the correct package.
> >> >>
> >> >> There is a SP1 & SP2 version of the fix - so you could get relief
> >> >> w/out
> >> >> SP2
> >> >> if you absolutely needed to. I would recommend going to SP2 first b/c
> >> >> there
> >> >> are a number of updates & perf improvements that are just general
> >> >> goodness.
> >> >> Then adding the fix on-top should get you where you need to be.
> >> >>
> >> >> If the problem persists after that, then I would recommend giving
> >> >> support
> >> >> a
> >> >> call. If the issue is a bug, we refund the cost of the incident (or
> >> >> re-credit your account if you have a Premier support contract).
> >> >>
> >> >>
> >> >> Pat
> >> >>
> >> >>
> >> >>
> >> >> "Eric" wrote in message
> >> >> news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
> >> >> >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
> >> >> > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM
> >> >> > xSeries
> >> >> > servers attached using QLogic QLA2340 HBAs. We are using MPIO and
> >> >> > M$s
> >> >> > Storport driver (the latest version, or course) for multipathing on
> >> >> > all
> >> >> > servers. Furthermore, we're using IBMs StorageManager Agents (again,
> >> >> > latest
> >> >> > version) on all hosts. Also part of that SAN is a Quantum PX502
> >> >> > robotic
> >> >> > tape-library which is also Fibre-Channel and attached directly to
> >> >> > the
> >> >> > SAN
> >> >> > (i.e. not physically attached to a server). We are not using any
> >> >> > kind
> >> >> > of
> >> >> > SAN
> >> >> > partitioning, so all hosts attached to the SAN see the tape drives
> >> >> > and
> >> >> > robot.
> >> >> >
> >> >> > Here's what happens. After rebooting the tape library, some or all
> >> >> > of
> >> >> > my
> >> >> > x64
> >> >> > servers BSOD with a 0x0A stop error and your typical
> >> >> > IRQL_NOT_LESS...
> >> >> > message. x86 servers have yet to be affected. Debugging the
> >> >> > resulting
> >> >> > memory
> >> >> > dump shows that storport.sys is the culprit. Additionally, soon
> >> >> > before
> >> >> > the
> >> >> > server BSODs, the system event has log entries from PlugPlayManager
> >> >> > saying
> >> >> > that the tape drives and robot disappeared without being prepared
> >> >> > for
> >> >> > removal
> >> >> > (Event ID 12). Obviously, preparing the hardware for removal on all
> >> >> > my
> >> >> > servers is out of the question, besides, the hardware never shows up
> >> >> > in
> >> >> > the
> >> >> > list of items to be safely removed.
> >> >> >
> >> >> > I'm very aware that SP2 is out for 2k3, and I intend to install that
> >> >> > someday
> >> >> > (once I recover from all the late-night work I've had to put in
> >> >> > dealing
-
Re: x64 2K3 R2 BSODs when FC Tape Library is Rebooted
I was hoping that verifier running would catch any potentially buggy
miniport issues but instead, you've only confirmed this appears to be a
storport bug. If you don't want to report it to MS PSS, then report it to
IBM and have them report it. MS can take the .dmp and most likely already
have a fix or at least are aware of the issue.
It doesn't appear there is much more you can do. Yell at your vendor and
make them help you.
Meanwhile, zone out any hosts which do not need access to the tape libraries
so you can save them from hitting this issue.
~kenny
"Eric" wrote in message
news:F7F57AEC-85A3-4AE1-B53E-9F4886440AE4@microsoft.com...
>I uninstalled SANSurfer, turned on driver verifier for the drivers you
> listed, and rebooted. Then I rebooted the tape library which in turn
> BSODs
> my hosts as expected. However, I got a 0xC5 BSOD on one host and a 0xD1
> BSOD
> on the other. Below is the bugcheck analysis on the kernel dump of the
> host
> that got the 0xC5 after driver verifier was turned on. I'm thoroughly
> lost
> at this point. Any suggestions now?
>
> DRIVER_CORRUPTED_EXPOOL (c5)
> An attempt was made to access a pageable (or completely invalid) address
> at an
> interrupt request level (IRQL) that is too high. This is
> caused by drivers that have corrupted the system pool. Run the driver
> verifier against any new (or suspect) drivers, and if that doesn't turn up
> the culprit, then use gflags to enable special pool.
> Arguments:
> Arg1: 00000000000b5430, memory referenced
> Arg2: 0000000000000002, IRQL
> Arg3: 0000000000000001, value 0 = read operation, 1 = write operation
> Arg4: fffff800011abd85, address which referenced memory
>
> Debugging Details:
> ------------------
>
> *** Error in in reading nt!_ETHREAD @ 0000000000000000
> *** Error in in reading nt!_ETHREAD @ 0000000000000000
> *** Error in in reading nt!_ETHREAD @ 0000000000000000
>
> OVERLAPPED_MODULE: Address regions for 'Cdfs' and 'imapi.sys' overlap
>
> BUGCHECK_STR: 0xC5_2
>
> CURRENT_IRQL: 2
>
> FAULTING_IP:
> nt!ExAllocatePoolWithTag+c8d
> fffff800`011abd85 48897008 mov [rax+0x8],rsi
>
> DEFAULT_BUCKET_ID: DRIVER_FAULT
>
> LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890
>
> LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)
>
> Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
> threads
> Threads: fffffade708987a0-01<*>
> 1 total locks, 1 locks currently held
>
> FAULTING_THREAD: fffffade708987a0
>
> PNP_TRIAGE:
> Lock address : 0xfffff800011deb00
> Thread Count : 1
> Thread address: 0xfffffade708987a0
> Thread wait : 0x2fb7
>
> TRAP_FRAME: fffffade5be93b70 -- (.trap fffffade5be93b70)
> NOTE: The trap frame does not contain all registers.
> Some register values may be zeroed.
> rax=00000000000b5428 rbx=00000000000000a0 rcx=fffffade6e74bd60
> rdx=0000000000000000 rsi=fffffade5be93ed0 rdi=0000000000000001
> rip=fffff800011abd85 rsp=fffffade5be93d00 rbp=0000000000000000
> r8=0000000000000000 r9=000000000000000b r10=00000000000000b0
> r11=0000000000000001 r12=0000000000000000 r13=0000000000000000
> r14=0000000000000000 r15=0000000000000000
> iopl=0 nv up ei pl nz ac pe nc
> nt!ExAllocatePoolWithTag+0xc8d:
> fffff800`011abd85 48897008 mov [rax+0x8],rsi
> ds:0002:00000000`000b5430=????????????????
> Resetting default scope
>
> STACK_TEXT:
> fffffade`5be939e8 fffff800`0104e5b4 : 00000000`0000000a 00000000`000b5430
> 00000000`00000002 00000000`00000001 : nt!KeBugCheckEx
> fffffade`5be939f0 fffff800`0104d587 : 00000000`00000000 00000000`00000000
> 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x74
> fffffade`5be93b70 fffff800`011abd85 : 00000000`00000000 00000000`00000000
> 00000000`00000000 00000000`00000000 : nt!KiPageFault+0x207
> fffffade`5be93d00 fffff800`013f77a2 : 00000000`00000000 fffff800`008080be
> fffffade`5b236480 fffffaae`3dad8ee0 : nt!ExAllocatePoolWithTag+0xc8d
> fffffade`5be93dd0 fffff800`013ed1d1 : 00000000`0002625a fffffaae`3dad8ee0
> fffffade`5b235e6c fffffade`7085f060 : nt!VfIrpReserveCallStackData+0x52
> fffffade`5be93e00 fffffade`5b235e6c : fffffade`5be93e70 fffffaae`3dad8ee0
> fffffadf`f77fe000 fffffade`7085f060 : nt!IovCallDriver+0x131
> fffffade`5be93e70 fffffade`5b2372a1 : 00000000`00000000 fffffade`5be943b0
> fffffade`5be94190 fffffade`5be94240 :
> storport!RaidBusEnumeratorIssueSynchronousRequest+ 0x14c
> fffffade`5be93ff0 fffffade`5b236ed3 : fffffade`5be943b8 fffffade`5b241222
> fffffade`00000000 fffffade`5be94190 :
> storport!RaidBusEnumeratorIssueReportLuns+0x131
> fffffade`5be94070 fffffade`5b23779f : fffffade`7085f1b0 00000000`00000000
> 00000000`00000000 00000000`00000000 :
> storport!RaidBusEnumeratorGetLunListFromTarget+0x1 43
> fffffade`5be94150 fffffade`5b2326ac : 00000000`00000000 00000000`00000000
> 00000000`00000000 fffffade`708791b0 :
> storport!RaidBusEnumeratorGetLunList+0x6f
> fffffade`5be94210 fffffade`5b2327dd : 00000000`0000000a 00000000`00000000
> 00000000`00000006 fffffade`708791b0 :
> storport!RaidAdapterEnumerateBus+0x9c
> fffffade`5be94390 fffffade`5b284c8f : fffff800`00000006 fffffaae`3df3ae10
> fffffade`708ac480 fffffade`708791b0 : storport!RaidAdapterRescanBus+0x8d
> fffffade`5be94450 fffffade`5b28490b : 00000282`00180000 fffffaae`3df3ae10
> 00000000`00000000 fffffade`708791b0 :
> storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
> fffffade`5be944f0 fffffade`5b285f4f : fffffade`6e749810 fffffade`5be94680
> fffffade`70879060 fffffaae`3df3ae10 : storport!RaidAdapterPnpIrp+0x14b
> fffffade`5be945c0 fffff800`013ed255 : fffffaae`3df3ae10 fffffade`6e749810
> fffffaae`3df3ae10 fffffade`70879060 : storport!RaDriverPnpIrp+0xcf
> fffffade`5be94650 fffffade`5ba5d949 : fffffade`6eb723e0 fffffaae`3df3ae10
> 00000000`00000000 fffffade`70dd4df0 : nt!IovCallDriver+0x1b5
> fffffade`5be946c0 fffff800`013ed255 : fffffade`6f749610 fffffade`5be94750
> fffffaae`3df3ae10 fffff800`0124d573 :
> mpspfltr!MPSPQueryDeviceRelations+0xa9
> fffffade`5be94720 fffff800`0124d573 : 00000000`00000004 fffffaae`3df3ae10
> 00000000`00000000 fffffaae`3df3ae10 : nt!IovCallDriver+0x1b5
> fffffade`5be94790 fffff800`010dc4c1 : 00000000`00000000 00000000`00000002
> 00000000`00000000 fffffade`708abc70 : nt!IopSynchronousCall+0x14a
> fffffade`5be94800 fffff800`013531ea : fffffa80`007793f0 fffffade`708aca00
> fffffade`5be948e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
> fffffade`5be94890 fffff800`01354e95 : 00000000`00000576 00000000`00000000
> 00000000`00000002 fffffade`6f03b850 : nt!PipProcessDevNodeTree+0x342
> fffffade`5be94c20 fffff800`010d8598 : fffff800`00000003 00000000`00000000
> fffffade`708987a0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
> fffffade`5be94c70 fffff800`0105507c : 00000000`00000000 fffff800`011decc0
> fffff800`010d8230 fffffade`708987a0 : nt!PipDeviceActionWorker+0x368
> fffffade`5be94d00 fffff800`01299cae : fffffade`708987a0 00000000`00000080
> fffffade`708987a0 fffffade`5baa3680 : nt!ExpWorkerThread+0x13b
> fffffade`5be94d70 fffff800`0102bbe6 : fffffade`5ba9b180 fffffade`708987a0
> fffffade`5baa3680 fffff800`011b6dc0 : nt!PspSystemThreadStartup+0x3e
> fffffade`5be94dd0 00000000`00000000 : 00000000`00000000 00000000`00000000
> 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
>
>
> STACK_COMMAND: .thread fffffade708987a0 ; kb
>
> FOLLOWUP_IP:
> storport!RaidBusEnumeratorIssueSynchronousRequest+ 14c
> fffffade`5b235e6c 3d03010000 cmp eax,0x103
>
> SYMBOL_STACK_INDEX: 6
>
> FOLLOWUP_NAME: MachineOwner
>
> SYMBOL_NAME: storport!RaidBusEnumeratorIssueSynchronousRequest+ 14c
>
> MODULE_NAME: storport
>
> IMAGE_NAME: storport.sys
>
> DEBUG_FLR_IMAGE_TIMESTAMP: 45d0a33d
>
> FAILURE_BUCKET_ID:
> X64_0xC5_2_storport!RaidBusEnumeratorIssueSynchron ousRequest+14c
>
> BUCKET_ID:
> X64_0xC5_2_storport!RaidBusEnumeratorIssueSynchron ousRequest+14c
>
> Followup: MachineOwner
>
> "Kenny Speer" wrote:
>
>> Unfortunately, all the analyze shows is that while at DISPATCH_LEVEL
>> (irql
>> 2) the system attempted to read a bogus pointer, most likely it's simply
>> uninitialized (based on the garbage address 5defd) ...
>>
>> If you can reproduce this easily, I suggest you enable driver verifier on
>> the mpio, storport, and miniport drivers.
>>
>> Also, if you have SANSurfer installed, please uninstall it. There is a
>> kernel mode service which queries the devices on the SAN and can
>> definetely
>> cause issues.
>>
>> To enable verifier do this:
>> 1. start->run->verifier
>> 2. choose "Create standard settings"
>> 3. choose "Select driver names"
>> 4. check the following: mpdev.sys mpio.sys mpspfltr.sys ql2300.sys
>> storport.sys
>> 5. reboot
>>
>> Then reboot your tape library. Many times, when verifier is running it
>> will
>> catch issues earlier than the bugcheck will and should be much more
>> accurate. Driver Verifier will still BSOD your host, but the dump will
>> contain better info. You also most likely don't need a complete memory
>> dump, kernel dump should be sufficient.
>>
>> Since it seems like you are an IBM shop, you should be able to report
>> this
>> issue to IBM and have them report it to MS.
>>
>> Good luck,
>> ~kenny
>>
>> "Eric" wrote in message
>> news:B8DCFE65-7D6F-45BD-9E7F-94001D92E4A7@microsoft.com...
>> > ************************************************** *****************************
>> > *
>> > *
>> > * Bugcheck Analysis
>> > *
>> > *
>> > *
>> > ************************************************** *****************************
>> >
>> > IRQL_NOT_LESS_OR_EQUAL (a)
>> > An attempt was made to access a pageable (or completely invalid)
>> > address
>> > at an
>> > interrupt request level (IRQL) that is too high. This is usually
>> > caused by drivers using improper addresses.
>> > If a kernel debugger is available get the stack backtrace.
>> > Arguments:
>> > Arg1: 000000000005defd, memory referenced
>> > Arg2: 0000000000000002, IRQL
>> > Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
>> > Arg4: fffff800013e0579, address which referenced memory
>> >
>> > Debugging Details:
>> > ------------------
>> >
>> > *** Error in in reading nt!_ETHREAD @ 0000000000000000
>> > *** Error in in reading nt!_ETHREAD @ 0000000000000000
>> > *** Error in in reading nt!_ETHREAD @ 0000000000000000
>> >
>> > READ_ADDRESS: 000000000005defd
>> >
>> > CURRENT_IRQL: 2
>> >
>> > FAULTING_IP:
>> > nt!MiFindContiguousMemoryInPool+b9
>> > fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
>> >
>> > DEFAULT_BUCKET_ID: DRIVER_FAULT
>> >
>> > BUGCHECK_STR: 0xA
>> >
>> > LAST_CONTROL_TRANSFER: from fffff8000104e5b4 to fffff8000104e890
>> >
>> > LOCK_ADDRESS: fffff800011deb00 -- (!locks fffff800011deb00)
>> >
>> > Resource @ nt!IopDeviceTreeLock (0xfffff800011deb00) Shared 1 owning
>> > threads
>> > Threads: fffffade708fabf0-01<*>
>> > 1 total locks, 1 locks currently held
>> >
>> > FAULTING_THREAD: fffffade708fabf0
>> >
>> > PNP_TRIAGE:
>> > Lock address : 0xfffff800011deb00
>> > Thread Count : 1
>> > Thread address: 0xfffffade708fabf0
>> > Thread wait : 0xee6922c
>> >
>> > TRAP_FRAME: fffffade5bca8d60 -- (.trap fffffade5bca8d60)
>> > NOTE: The trap frame does not contain all registers.
>> > Some register values may be zeroed.
>> > rax=0000000000000000 rbx=fffffade6e122000 rcx=0000000000112242
>> > rdx=00000000001d43ef rsi=0000000000000001 rdi=fffffade5f55d660
>> > rip=fffff800013e0579 rsp=fffffade5bca8ef0 rbp=0000000000100000
>> > r8=00000000000ffffe r9=0000000000112242 r10=fffffade6e442000
>> > r11=00000000000fffff r12=0000000000000000 r13=0000000000000000
>> > r14=0000000000000000 r15=0000000000000000
>> > iopl=0 nv up ei ng nz ac po cy
>> > nt!MiFindContiguousMemoryInPool+0xb9:
>> > fffff800`013e0579 488b5310 mov rdx,[rbx+0x10]
>> > ds:fffffade`6e122010=0000000000000002
>> > Resetting default scope
>> >
>> > STACK_TEXT:
>> > fffffade`5bca8bd8 fffff800`0104e5b4 : 00000000`0000000a
>> > 00000000`0005defd
>> > 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
>> > fffffade`5bca8be0 fffff800`0104d587 : 00000006`00000000
>> > fffffa80`033c2090
>> > 00000000`00000000 00000000`00000001 : nt!KiBugCheckDispatch+0x74
>> > fffffade`5bca8d60 fffff800`013e0579 : 00000000`00000002
>> > 00000000`6d436d4d
>> > 00000000`00000002 00000000`00001000 : nt!KiPageFault+0x207
>> > fffffade`5bca8ef0 fffff800`013797a2 : 00000000`00000000
>> > 00000000`000fffff
>> > 00000000`00100000 00000000`4d546100 :
>> > nt!MiFindContiguousMemoryInPool+0xb9
>> > fffffade`5bca8f90 fffff800`010edfee : 00000000`00000004
>> > fffffade`6e27b000
>> > fffffade`6e122000 00000000`00000002 : nt!MiFindContiguousMemory+0x52
>> > fffffade`5bca8ff0 fffff800`010ee07b : 00000000`00000001
>> > 00000000`00000080
>> > fffffade`6d2343f0 00000000`ffffffff :
>> > nt!MiAllocateContiguousMemory+0x12e
>> > fffffade`5bca9070 fffffade`5b0764e2 : 00000000`00000000
>> > 00000000`00000000
>> > fffffa80`021b2a30 fffffade`6d2343f0 :
>> > nt!MmAllocateContiguousMemorySpecifyCache+0x5b
>> > fffffade`5bca90b0 fffffade`5b07612d : fffffade`6d2343f0
>> > fffffade`708391b0
>> > fffffade`708391b0 fffffade`5bca91f0 :
>> > storport!RaidUnitAllocateResources+0x370
>> > fffffade`5bca9120 fffffade`5b06c71f : 00000000`00010200
>> > 00000000`00010200
>> > fffffade`5bca9260 00000000`00000000 : storport!RaidCreateUnit+0x13d
>> > fffffade`5bca9180 fffffade`5b06bfbf : 00000000`00000000
>> > 00000000`00000000
>> > 00000000`00000002 00000000`00010200 :
>> > storport!RaidBusEnumeratorGetUnit+0x6f
>> > fffffade`5bca91f0 fffffade`5b06946f : fffffade`00fe0200
>> > 00000000`00000000
>> > 00000000`00000001 00000000`00000002 :
>> > storport!RaidBusEnumeratorVisitUnit+0x4f
>> > fffffade`5bca92f0 fffffade`5b06957d : 00000000`00000000
>> > fffffade`572c0d6d
>> > fffffade`5b87b180 fffffade`708391b0 :
>> > storport!RaidAdapterEnumerateBus+0xbf
>> > fffffade`5bca9470 fffffade`5b088c8f : fffffade`6d9a62e0
>> > fffffade`6cd14ae0
>> > 00000000`a0000003 fffffade`708391b0 :
>> > storport!RaidAdapterRescanBus+0x8d
>> > fffffade`5bca9530 fffffade`5b08890b : 00000000`00000000
>> > fffffade`6cd14ae0
>> > 00000000`00000000 fffffade`708391b0 :
>> > storport!RaidAdapterQueryDeviceRelationsIrp+0xcf
>> > fffffade`5bca95d0 fffffade`5b089eef : fffffade`70893650
>> > fffffade`5bca9820
>> > fffffade`70839060 fffffade`6cd14ae0 : storport!RaidAdapterPnpIrp+0x14b
>> > fffffade`5bca96a0 fffffade`5b85d949 : fffffade`5bca97e0
>> > fffffade`70893650
>> > fffffade`6cd14ae0 fffffade`6fcf6bb0 : storport!RaDriverPnpIrp+0xcf
>> > fffffade`5bca9730 fffff800`0124d573 : fffffade`6cd14ae0
>> > fffffade`5bca9820
>> > fffffade`70893500 fffffade`6fcf6bb0 :
>> > mpspfltr!MPSPQueryDeviceRelations+0xa9
>> > fffffade`5bca9790 fffff800`010dc4c1 : 00000000`00000000
>> > 00000000`00000002
>> > 00000000`00000000 fffffade`6fcf6a70 : nt!IopSynchronousCall+0x14a
>> > fffffade`5bca9800 fffff800`013531ea : fffffade`709019b0
>> > fffff800`01014f00
>> > fffffade`5bca98e0 00000000`00000000 : nt!IopQueryDeviceRelations+0x71
>> > fffffade`5bca9890 fffff800`01354e95 : 00000000`000000c3
>> > 00000000`00000000
>> > 00000000`00000002 fffffade`6f42b2f0 : nt!PipProcessDevNodeTree+0x342
>> > fffffade`5bca9c20 fffff800`010d8598 : fffff800`00000003
>> > 00000000`00000000
>> > fffffade`708fabf0 fffff800`011cf900 : nt!PiProcessReenumeration+0x85
>> > fffffade`5bca9c70 fffff800`0105507c : 00000000`00000000
>> > fffff800`011decc0
>> > fffff800`010d8230 fffffade`708fabf0 : nt!PipDeviceActionWorker+0x368
>> > fffffade`5bca9d00 fffff800`01299cae : fffffade`708fabf0
>> > 00000000`00000080
>> > fffffade`708fabf0 fffffade`5b883680 : nt!ExpWorkerThread+0x13b
>> > fffffade`5bca9d70 fffff800`0102bbe6 : fffffade`5b87b180
>> > fffffade`708fabf0
>> > fffffade`5b883680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e
>> > fffffade`5bca9dd0 00000000`00000000 : 00000000`00000000
>> > 00000000`00000000
>> > 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16
>> >
>> >
>> > STACK_COMMAND: .thread fffffade708fabf0 ; kb
>> >
>> > FOLLOWUP_IP:
>> > storport!RaidUnitAllocateResources+370
>> > fffffade`5b0764e2 4885c0 test rax,rax
>> >
>> > SYMBOL_STACK_INDEX: 7
>> >
>> > FOLLOWUP_NAME: MachineOwner
>> >
>> > SYMBOL_NAME: storport!RaidUnitAllocateResources+370
>> >
>> > MODULE_NAME: storport
>> >
>> > IMAGE_NAME: storport.sys
>> >
>> > DEBUG_FLR_IMAGE_TIMESTAMP: 448e954c
>> >
>> > FAILURE_BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
>> >
>> > BUCKET_ID: X64_0xA_storport!RaidUnitAllocateResources+370
>> >
>> > Followup: MachineOwner
>> > ---------
>> >
>> > "Kenny Speer" wrote:
>> >
>> >> Just because storport is in the bugcheck doesn't mean it's the cause.
>> >> It
>> >> may not have handled a buggy miniport correctly, but it's very
>> >> possible
>> >> the
>> >> real problem is in a different driver.
>> >>
>> >> can you paste the !analyze -v output with the microsoft sym server set
>> >> in
>> >> your .sympath
>> >>
>> >> "Eric" wrote in message
>> >> news:805E1809-2AEE-47F5-B7AC-4593B8324122@microsoft.com...
>> >> > Right, the BSOD IS caused by a driver--the storport.sys driver in
>> >> > this
>> >> > case.
>> >> > The reason I was trying to avoid responses that suggest that I
>> >> > update
>> >> > drivers and firmware was because I've already done all that as one
>> >> > of
>> >> > the
>> >> > first troubleshooting steps. Also, I am already on that latest
>> >> > storport
>> >> > version (5.2.3790.2880 for SP1) you mention in the KB article. I
>> >> > mentioned
>> >> > all this in my original post.
>> >> >
>> >> > "Pat [MSFT]" wrote:
>> >> >
>> >> >> Well, a BSOD is by definition caused by either a bug in a driver or
>> >> >> a
>> >> >> bug
>> >> >> in
>> >> >> HW/Firmware. So, your request to not suggest updating
>> >> >> driver/firmware
>> >> >> may
>> >> >> not get you very far.
>> >> >>
>> >> >> That said, I think you are running into a known bug that was fixed
>> >> >> &
>> >> >> released in Feb (KB Article 932755). You can download the fix
>> >> >> directly
>> >> >> via
>> >> >> http://support.microsoft.com - make sure to grab the correct
>> >> >> package.
>> >> >>
>> >> >> There is a SP1 & SP2 version of the fix - so you could get relief
>> >> >> w/out
>> >> >> SP2
>> >> >> if you absolutely needed to. I would recommend going to SP2 first
>> >> >> b/c
>> >> >> there
>> >> >> are a number of updates & perf improvements that are just general
>> >> >> goodness.
>> >> >> Then adding the fix on-top should get you where you need to be.
>> >> >>
>> >> >> If the problem persists after that, then I would recommend giving
>> >> >> support
>> >> >> a
>> >> >> call. If the issue is a bug, we refund the cost of the incident
>> >> >> (or
>> >> >> re-credit your account if you have a Premier support contract).
>> >> >>
>> >> >>
>> >> >> Pat
>> >> >>
>> >> >>
>> >> >>
>> >> >> "Eric" wrote in message
>> >> >> news:36137F27-5CBE-46E9-8790-3E92E7CA64D6@microsoft.com...
>> >> >> >I have a number of 2003 R2 x64 and x86 SP1 servers attached to a
>> >> >> > fabric-switched IBM Fibre-Channel SAN. The servers are all IBM
>> >> >> > xSeries
>> >> >> > servers attached using QLogic QLA2340 HBAs. We are using MPIO and
>> >> >> > M$s
>> >> >> > Storport driver (the latest version, or course) for multipathing
>> >> >> > on
>> >> >> > all
>> >> >> > servers. Furthermore, we're using IBMs StorageManager Agents
>> >> >> > (again,
>> >> >> > latest
>> >> >> > version) on all hosts. Also part of that SAN is a Quantum PX502
>> >> >> > robotic
>> >> >> > tape-library which is also Fibre-Channel and attached directly to
>> >> >> > the
>> >> >> > SAN
>> >> >> > (i.e. not physically attached to a server). We are not using any
>> >> >> > kind
>> >> >> > of
>> >> >> > SAN
>> >> >> > partitioning, so all hosts attached to the SAN see the tape
>> >> >> > drives
>> >> >> > and
>> >> >> > robot.
>> >> >> >
>> >> >> > Here's what happens. After rebooting the tape library, some or
>> >> >> > all
>> >> >> > of
>> >> >> > my
>> >> >> > x64
>> >> >> > servers BSOD with a 0x0A stop error and your typical
>> >> >> > IRQL_NOT_LESS...
>> >> >> > message. x86 servers have yet to be affected. Debugging the
>> >> >> > resulting
>> >> >> > memory
>> >> >> > dump shows that storport.sys is the culprit. Additionally, soon
>> >> >> > before
>> >> >> > the
>> >> >> > server BSODs, the system event has log entries from
>> >> >> > PlugPlayManager
>> >> >> > saying
>> >> >> > that the tape drives and robot disappeared without being prepared
>> >> >> > for
>> >> >> > removal
>> >> >> > (Event ID 12). Obviously, preparing the hardware for removal on
>> >> >> > all
>> >> >> > my
>> >> >> > servers is out of the question, besides, the hardware never shows
>> >> >> > up
>> >> >> > in
>> >> >> > the
>> >> >> > list of items to be safely removed.
>> >> >> >
>> >> >> > I'm very aware that SP2 is out for 2k3, and I intend to install
>> >> >> > that
>> >> >> > someday
>> >> >> > (once I recover from all the late-night work I've had to put in
>> >> >> > dealing