Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY - FreeBSD

This is a discussion on Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY - FreeBSD ; On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote: > Jeremy Chadwick wrote: >> On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote: >> >>> Today I was replacing disk in one Sun Fire X2100 M2 ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 26 of 26

Thread: Re: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY

  1. Re: Recommendations for servers running SATA drives [hot-swap]

    On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote:
    > Jeremy Chadwick wrote:
    >> On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote:
    >>
    >>> Today I was replacing disk in one Sun Fire X2100 M2 so I tried
    >>> hot-swapping. It was as you said: atacontrol detach ata3, replace the
    >>> HDD, atacontrol attach ata3 and new disk is in the system. I tried
    >>> it 3 times to be sure that it was not coincidence - no panic was
    >>> produced ;o)
    >>> So in this case, hot-swapping on Sun Fire X2100 M2 with FreeBSD 7.0
    >>> i386 works.

    >>
    >>
    >> That's excellent news. So it seems possibly the problem I was seeing
    >> was with "reinit" causing some sort of chaos. I'll have to check things
    >> on my testbox here at home to see how I caused the panic last time.
    >>
    >> Thanks for providing feedback, as usual! :-)

    >
    > Unfortunately there is one problem - I see a lot of interrupts after
    > disk swapping (about 193k of atapci1)
    >
    > Interrupts
    > 197k total
    > ohci0 21
    > ehci0 22
    > 193k atapci1 23
    > 2001 cpu0: time
    > 1 bge1 273
    > 2001 cpu1: time


    Okay, so it looks like the interrupt rate on atapci1 after swapping is
    going crazy. What you're showing there looks like heavily modified
    vmstat -i output.

    > Full output of systat -vm 2 is attached.
    >
    > It is shown in top as 50% interrupt (CPU state) and load 1 until I
    > rebooted the machine (I can provide MRTG graphs). The system was not in
    > production load, but almost idle. (I will put it in production tomorrow).
    > After reboot, everything is OK.


    And this box is running the ATA patch Andrey provided, yes?

    > Can somebody test hot-swapping with SATA drives and confirm this
    > behavior? (I can't test it now, because machine is in datacenter)


    I can test it on my P4SCE box.

    I'll check the interrupt rates after each step of the hot-swap to see
    if/when the problem starts.

    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  2. Re: Recommendations for servers running SATA drives [hot-swap]

    Jeremy Chadwick wrote:
    > On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote:
    >
    >>Jeremy Chadwick wrote:
    >>
    >>>On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote:
    >>>
    >>>
    >>>>Today I was replacing disk in one Sun Fire X2100 M2 so I tried
    >>>>hot-swapping. It was as you said: atacontrol detach ata3, replace the
    >>>> HDD, atacontrol attach ata3 and new disk is in the system. I tried
    >>>>it 3 times to be sure that it was not coincidence - no panic was
    >>>>produced ;o)
    >>>>So in this case, hot-swapping on Sun Fire X2100 M2 with FreeBSD 7.0
    >>>>i386 works.
    >>>
    >>>
    >>>That's excellent news. So it seems possibly the problem I was seeing
    >>>was with "reinit" causing some sort of chaos. I'll have to check things
    >>>on my testbox here at home to see how I caused the panic last time.
    >>>
    >>>Thanks for providing feedback, as usual! :-)

    >>
    >>Unfortunately there is one problem - I see a lot of interrupts after
    >>disk swapping (about 193k of atapci1)
    >>
    >>Interrupts
    >>197k total
    >> ohci0 21
    >> ehci0 22
    >>193k atapci1 23
    >>2001 cpu0: time
    >> 1 bge1 273
    >>2001 cpu1: time

    >
    >
    > Okay, so it looks like the interrupt rate on atapci1 after swapping is
    > going crazy. What you're showing there looks like heavily modified
    > vmstat -i output.


    The shown is manually cropped from systat -vm, I'll try vmstat -i next
    time.

    >>Full output of systat -vm 2 is attached.
    >>
    >>It is shown in top as 50% interrupt (CPU state) and load 1 until I
    >>rebooted the machine (I can provide MRTG graphs). The system was not in
    >>production load, but almost idle. (I will put it in production tomorrow).
    >>After reboot, everything is OK.

    >
    >
    > And this box is running the ATA patch Andrey provided, yes?


    It is clean install of FreeBSD 7.0-RELEASE-p5 amd64 without patches.

    >>Can somebody test hot-swapping with SATA drives and confirm this
    >>behavior? (I can't test it now, because machine is in datacenter)

    >
    >
    > I can test it on my P4SCE box.
    >
    > I'll check the interrupt rates after each step of the hot-swap to see
    > if/when the problem starts.


    I'll check the interrupts next time too and will post results to this
    thread.

    Miroslav Lachman
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  3. Re: Recommendations for servers running SATA drives [hot-swap]

    On Fri, Oct 17, 2008 at 04:09:17PM +0200, Miroslav Lachman wrote:
    > Jeremy Chadwick wrote:
    >> On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote:
    >>> Jeremy Chadwick wrote:
    >>>> On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote:
    >>>>
    >>>>
    >>>>> Today I was replacing disk in one Sun Fire X2100 M2 so I tried
    >>>>> hot-swapping. It was as you said: atacontrol detach ata3, replace
    >>>>> the HDD, atacontrol attach ata3 and new disk is in the system. I
    >>>>> tried it 3 times to be sure that it was not coincidence - no
    >>>>> panic was produced ;o)
    >>>>> So in this case, hot-swapping on Sun Fire X2100 M2 with FreeBSD
    >>>>> 7.0 i386 works.
    >>>>
    >>>>
    >>>> That's excellent news. So it seems possibly the problem I was seeing
    >>>> was with "reinit" causing some sort of chaos. I'll have to check things
    >>>> on my testbox here at home to see how I caused the panic last time.
    >>>>
    >>>> Thanks for providing feedback, as usual! :-)
    >>>
    >>> Unfortunately there is one problem - I see a lot of interrupts after
    >>> disk swapping (about 193k of atapci1)
    >>>
    >>> Interrupts
    >>> 197k total
    >>> ohci0 21
    >>> ehci0 22
    >>> 193k atapci1 23
    >>> 2001 cpu0: time
    >>> 1 bge1 273
    >>> 2001 cpu1: time

    >>
    >>
    >> Okay, so it looks like the interrupt rate on atapci1 after swapping is
    >> going crazy. What you're showing there looks like heavily modified
    >> vmstat -i output.

    >
    > The shown is manually cropped from systat -vm, I'll try vmstat -i next
    > time.
    >
    >>> Full output of systat -vm 2 is attached.
    >>>
    >>> It is shown in top as 50% interrupt (CPU state) and load 1 until I
    >>> rebooted the machine (I can provide MRTG graphs). The system was not
    >>> in production load, but almost idle. (I will put it in production
    >>> tomorrow).
    >>> After reboot, everything is OK.

    >>
    >>
    >> And this box is running the ATA patch Andrey provided, yes?

    >
    > It is clean install of FreeBSD 7.0-RELEASE-p5 amd64 without patches.
    >
    >>> Can somebody test hot-swapping with SATA drives and confirm this
    >>> behavior? (I can't test it now, because machine is in datacenter)

    >>
    >>
    >> I can test it on my P4SCE box.
    >>
    >> I'll check the interrupt rates after each step of the hot-swap to see
    >> if/when the problem starts.

    >
    > I'll check the interrupts next time too and will post results to this
    > thread.


    As promised, here are notes from my testing:


    First thing to note is that the BIOS on my P4SCE had the ICH5 SATA mode
    set to "Auto", which was causing PATA emulation to happen on the SATA
    controller, e.g. disk #0 == ata0-master, disk #1 == ata0-slave.

    I changed the BIOS option from Auto to "SATA Enhanced", and now the
    disks show up on their own channels, e.g. disk #0 == ata2-master, disk
    #1 == ata3-master.

    Here's the applicable data. Note that this kernel ***DOES*** include
    Andrey's ATA patch:

    FreeBSD testbox.home.lan 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Thu Oct 16 10:56:42 PDT 2008 root@testbox.home.lan:/usr/obj/usr/src/sys/TESTBOX i386

    atapci1: port 0xc000-0xc007,0xc400-0xc403,0xc800-0xc807,0xcc00-0xcc03,0xd000-0xd00f irq 18 at device 31.2 on pci0
    atapci1: [ITHREAD]
    ata2: on atapci1
    ata2: [ITHREAD]
    ata3: on atapci1
    ata3: [ITHREAD]

    SATA controller is on IRQ 18.

    ad4: 114473MB at ata2-master SATA150
    ad6: 238474MB at ata3-master SATA150

    ATA channel 2:
    Master: ad4 Serial ATA v1.0
    Slave: no device present
    ATA channel 3:
    Master: ad6 Serial ATA II
    Slave: no device present

    testbox# df -k
    Filesystem 1024-blocks Used Avail Capacity Mounted on
    /dev/ad4s1a 507630 230182 236838 49% /
    devfs 1 1 0 100% /dev
    /dev/ad4s1e 507630 12 467008 0% /tmp
    /dev/ad4s1f 108498334 2944826 96873642 3% /usr
    /dev/ad4s1d 2008622 32360 1815574 2% /var
    /dev/ad6s1d 236511738 4 217590796 0% /hotswap

    testbox# vmstat -i
    interrupt total rate
    irq4: sio0 1398 34
    irq6: fdc0 10 0
    irq15: ata1 58 1
    irq18: atapci1 945 23
    irq23: em1 8 0
    cpu0: timer 80033 1952
    cpu1: timer 79808 1946
    Total 162260 3957

    testbox# umount /hotswap
    testbox# atacontrol detach ata3
    subdisk6: detached
    ad6: detached
    testbox# vmstat -i | grep atapci1
    irq18: atapci1 2671 11

    At this point I wanted to see what happened if I just reattached without
    any physical changes to the SATA bus.

    testbox# atacontrol attach ata3
    ata3: [ITHREAD]
    ad6: 238474MB at ata3-master SATA150
    Master: ad6 Serial ATA II
    Slave: no device present

    testbox# vmstat -i | grep atapci1
    irq18: atapci1 2764 9
    testbox# mount /dev/ad6s1d /hotswap
    testbox# vmstat -i | grep atapci1
    irq18: atapci1 2779 8

    Now we're going to try detaching *without* umounting the filesystem,
    then reattaching to see what happens. Based on what I've seen and
    others have reported in the past, this should panic the kernel.
    Supposedly this problem is fixed on CURRENT.

    testbox# atacontrol detach ata3
    subdisk6: detached
    ad6: detached

    testbox# atacontrol attach ata3
    ata3: [ITHREAD]
    ad6: 238474MB at ata3-master SATA150
    Master: ad6 Serial ATA II
    Slave: no device present

    testbox# df -k
    Filesystem 1024-blocks Used Avail Capacity Mounted on
    /dev/ad4s1a 507630 230182 236838 49% /
    devfs 1 1 0 100% /dev
    /dev/ad4s1e 507630 12 467008 0% /tmp
    /dev/ad4s1f 108498334 2944826 96873642 3% /usr
    /dev/ad4s1d 2008622 32360 1815574 2% /var
    /dev/ad6s1d 236511738 4 217590796 0% /hotswap

    testbox# ls -l /hotswap


    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address = 0xc0
    fault code = supervisor read, page not present
    instruction pointer = 0x20:0xc0503ca7
    stack pointer = 0x28:0xe6310a5c
    frame pointer = 0x28:0xe6310a5c
    code segment = base 0x0, limit 0xfffff, type 0x1b
    = DPL 0, pres 1, def32 1, gran 1
    processor eflags = interrupt enabled, resume, IOPL = 0
    current process = 795 (ls)
    [thread pid 795 tid 100043 ]
    Stopped at dev2udev+0x11: movl 0xc0(%eax),%eax

    db> bt
    Tracing pid 795 tid 100043 td 0xc3dcc460
    dev2udev(3287166208,3861973668,3228755039,38619738 72,3286025312,...) at dev2udev+17
    ufs_getattr(3861973664,3861973800,3227504003,32297 00640,3861973664,...) at ufs_getattr+222
    VOP_GETATTR_APV(3229700640,3861973664,3229768320,3 288955040,3861973684,...) at VOP_GETATTR_APV+68
    vn_stat(3288955040,3861973908,3286230784,0,3286025 312,...) at vn_stat+73
    kern_lstat(3286025312,135344488,0,3861974040,38619 74064,...) at kern_lstat+147
    lstat(3286025312,3861974268,8,3861974328,386197431 6,...) at lstat+43
    syscall(3861974328) at syscall+814
    Xint0x80_syscall() at Xint0x80_syscall+32
    --- syscall (190, FreeBSD ELF32, lstat), eip = 1746463051, esp = 3217024524, ebp = 3217024664 ---

    Yup, there's the panic. :-)

    I rebooted the box from db, brought the system up in single-user, fsck'd
    all the disks/filesystems (no anomalies were found), and rebooted the
    box once more.

    Now we're going to do everything properly: unmount /hotswap, detach,
    yank the disk and insert a new Maxtor hard disk, attach, and see what
    happens.

    testbox# umount /hotswap
    testbox# atacontrol detach ata3
    subdisk6: detached
    ad6: detached

    testbox# vmstat -i | grep atapci1
    irq18: atapci1 1174 6

    I've now removed the disk physically from the machine. Let's check
    interrupts again.

    testbox# vmstat -i | grep atapci1
    irq18: atapci1 1185 4

    Now the new Maxtor disk has been inserted. LEDs for the SATA hot-swap
    backplane lit up for about 5-6 seconds, then went off. Let's check
    interrupts at this point:

    testbox# vmstat -i | grep atapci1
    irq18: atapci1 1193 3

    Now let's attach. Note that there is no filesystem on this disk (it's
    completely blank), so there's nothing to mount.

    testbox# atacontrol attach ata3
    ata3: [ITHREAD]
    ad6: 286188MB at ata3-master SATA150
    Master: ad6 Serial ATA v1.0
    Slave: no device present

    And now we check interrupts:

    testbox# vmstat -i | grep atapci1
    irq18: atapci1 1258 2

    Looks fine to me.

    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  4. Re: Recommendations for servers running SATA drives [hot-swap]

    Jeremy Chadwick wrote:
    > On Fri, Oct 17, 2008 at 04:09:17PM +0200, Miroslav Lachman wrote:
    >
    >>Jeremy Chadwick wrote:
    >>
    >>>On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote:
    >>>
    >>>>Jeremy Chadwick wrote:
    >>>>
    >>>>>On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote:
    >>>>>
    >>>>>
    >>>>>
    >>>>>>Today I was replacing disk in one Sun Fire X2100 M2 so I tried
    >>>>>>hot-swapping. It was as you said: atacontrol detach ata3, replace
    >>>>>>the HDD, atacontrol attach ata3 and new disk is in the system. I
    >>>>>>tried it 3 times to be sure that it was not coincidence - no
    >>>>>>panic was produced ;o)
    >>>>>>So in this case, hot-swapping on Sun Fire X2100 M2 with FreeBSD
    >>>>>>7.0 i386 works.
    >>>>>
    >>>>>
    >>>>>That's excellent news. So it seems possibly the problem I was seeing
    >>>>>was with "reinit" causing some sort of chaos. I'll have to check things
    >>>>>on my testbox here at home to see how I caused the panic last time.
    >>>>>
    >>>>>Thanks for providing feedback, as usual! :-)
    >>>>
    >>>>Unfortunately there is one problem - I see a lot of interrupts after
    >>>>disk swapping (about 193k of atapci1)
    >>>>
    >>>>Interrupts
    >>>>197k total
    >>>> ohci0 21
    >>>> ehci0 22
    >>>>193k atapci1 23
    >>>>2001 cpu0: time
    >>>> 1 bge1 273
    >>>>2001 cpu1: time
    >>>
    >>>
    >>>Okay, so it looks like the interrupt rate on atapci1 after swapping is
    >>>going crazy. What you're showing there looks like heavily modified
    >>>vmstat -i output.

    >>
    >>The shown is manually cropped from systat -vm, I'll try vmstat -i next
    >>time.
    >>
    >>
    >>>>Full output of systat -vm 2 is attached.
    >>>>
    >>>>It is shown in top as 50% interrupt (CPU state) and load 1 until I
    >>>>rebooted the machine (I can provide MRTG graphs). The system was not
    >>>>in production load, but almost idle. (I will put it in production
    >>>>tomorrow).
    >>>>After reboot, everything is OK.
    >>>
    >>>
    >>>And this box is running the ATA patch Andrey provided, yes?

    >>
    >>It is clean install of FreeBSD 7.0-RELEASE-p5 amd64 without patches.
    >>
    >>
    >>>>Can somebody test hot-swapping with SATA drives and confirm this
    >>>>behavior? (I can't test it now, because machine is in datacenter)
    >>>
    >>>
    >>>I can test it on my P4SCE box.
    >>>
    >>>I'll check the interrupt rates after each step of the hot-swap to see
    >>>if/when the problem starts.

    >>
    >>I'll check the interrupts next time too and will post results to this
    >>thread.

    >
    >
    > As promised, here are notes from my testing:
    >
    >
    > First thing to note is that the BIOS on my P4SCE had the ICH5 SATA mode
    > set to "Auto", which was causing PATA emulation to happen on the SATA
    > controller, e.g. disk #0 == ata0-master, disk #1 == ata0-slave.
    >
    > I changed the BIOS option from Auto to "SATA Enhanced", and now the
    > disks show up on their own channels, e.g. disk #0 == ata2-master, disk
    > #1 == ata3-master.
    >
    > Here's the applicable data. Note that this kernel ***DOES*** include
    > Andrey's ATA patch:
    >
    > FreeBSD testbox.home.lan 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Thu Oct 16 10:56:42 PDT 2008 root@testbox.home.lan:/usr/obj/usr/src/sys/TESTBOX i386
    >
    > atapci1: port 0xc000-0xc007,0xc400-0xc403,0xc800-0xc807,0xcc00-0xcc03,0xd000-0xd00f irq 18 at device 31.2 on pci0
    > atapci1: [ITHREAD]
    > ata2: on atapci1
    > ata2: [ITHREAD]
    > ata3: on atapci1
    > ata3: [ITHREAD]
    >
    > SATA controller is on IRQ 18.
    >
    > ad4: 114473MB at ata2-master SATA150
    > ad6: 238474MB at ata3-master SATA150
    >
    > ATA channel 2:
    > Master: ad4 Serial ATA v1.0
    > Slave: no device present
    > ATA channel 3:
    > Master: ad6 Serial ATA II
    > Slave: no device present
    >
    > testbox# df -k
    > Filesystem 1024-blocks Used Avail Capacity Mounted on
    > /dev/ad4s1a 507630 230182 236838 49% /
    > devfs 1 1 0 100% /dev
    > /dev/ad4s1e 507630 12 467008 0% /tmp
    > /dev/ad4s1f 108498334 2944826 96873642 3% /usr
    > /dev/ad4s1d 2008622 32360 1815574 2% /var
    > /dev/ad6s1d 236511738 4 217590796 0% /hotswap
    >
    > testbox# vmstat -i
    > interrupt total rate
    > irq4: sio0 1398 34
    > irq6: fdc0 10 0
    > irq15: ata1 58 1
    > irq18: atapci1 945 23
    > irq23: em1 8 0
    > cpu0: timer 80033 1952
    > cpu1: timer 79808 1946
    > Total 162260 3957
    >
    > testbox# umount /hotswap
    > testbox# atacontrol detach ata3
    > subdisk6: detached
    > ad6: detached
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 2671 11
    >
    > At this point I wanted to see what happened if I just reattached without
    > any physical changes to the SATA bus.
    >
    > testbox# atacontrol attach ata3
    > ata3: [ITHREAD]
    > ad6: 238474MB at ata3-master SATA150
    > Master: ad6 Serial ATA II
    > Slave: no device present
    >
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 2764 9
    > testbox# mount /dev/ad6s1d /hotswap
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 2779 8
    >
    > Now we're going to try detaching *without* umounting the filesystem,
    > then reattaching to see what happens. Based on what I've seen and
    > others have reported in the past, this should panic the kernel.
    > Supposedly this problem is fixed on CURRENT.
    >
    > testbox# atacontrol detach ata3
    > subdisk6: detached
    > ad6: detached
    >
    > testbox# atacontrol attach ata3
    > ata3: [ITHREAD]
    > ad6: 238474MB at ata3-master SATA150
    > Master: ad6 Serial ATA II
    > Slave: no device present
    >
    > testbox# df -k
    > Filesystem 1024-blocks Used Avail Capacity Mounted on
    > /dev/ad4s1a 507630 230182 236838 49% /
    > devfs 1 1 0 100% /dev
    > /dev/ad4s1e 507630 12 467008 0% /tmp
    > /dev/ad4s1f 108498334 2944826 96873642 3% /usr
    > /dev/ad4s1d 2008622 32360 1815574 2% /var
    > /dev/ad6s1d 236511738 4 217590796 0% /hotswap
    >
    > testbox# ls -l /hotswap
    >
    >
    > Fatal trap 12: page fault while in kernel mode
    > cpuid = 0; apic id = 00
    > fault virtual address = 0xc0
    > fault code = supervisor read, page not present
    > instruction pointer = 0x20:0xc0503ca7
    > stack pointer = 0x28:0xe6310a5c
    > frame pointer = 0x28:0xe6310a5c
    > code segment = base 0x0, limit 0xfffff, type 0x1b
    > = DPL 0, pres 1, def32 1, gran 1
    > processor eflags = interrupt enabled, resume, IOPL = 0
    > current process = 795 (ls)
    > [thread pid 795 tid 100043 ]
    > Stopped at dev2udev+0x11: movl 0xc0(%eax),%eax
    >
    > db> bt
    > Tracing pid 795 tid 100043 td 0xc3dcc460
    > dev2udev(3287166208,3861973668,3228755039,38619738 72,3286025312,...) at dev2udev+17
    > ufs_getattr(3861973664,3861973800,3227504003,32297 00640,3861973664,...) at ufs_getattr+222
    > VOP_GETATTR_APV(3229700640,3861973664,3229768320,3 288955040,3861973684,...) at VOP_GETATTR_APV+68
    > vn_stat(3288955040,3861973908,3286230784,0,3286025 312,...) at vn_stat+73
    > kern_lstat(3286025312,135344488,0,3861974040,38619 74064,...) at kern_lstat+147
    > lstat(3286025312,3861974268,8,3861974328,386197431 6,...) at lstat+43
    > syscall(3861974328) at syscall+814
    > Xint0x80_syscall() at Xint0x80_syscall+32
    > --- syscall (190, FreeBSD ELF32, lstat), eip = 1746463051, esp = 3217024524, ebp = 3217024664 ---
    >
    > Yup, there's the panic. :-)
    >
    > I rebooted the box from db, brought the system up in single-user, fsck'd
    > all the disks/filesystems (no anomalies were found), and rebooted the
    > box once more.
    >
    > Now we're going to do everything properly: unmount /hotswap, detach,
    > yank the disk and insert a new Maxtor hard disk, attach, and see what
    > happens.
    >
    > testbox# umount /hotswap
    > testbox# atacontrol detach ata3
    > subdisk6: detached
    > ad6: detached
    >
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 1174 6
    >
    > I've now removed the disk physically from the machine. Let's check
    > interrupts again.
    >
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 1185 4
    >
    > Now the new Maxtor disk has been inserted. LEDs for the SATA hot-swap
    > backplane lit up for about 5-6 seconds, then went off. Let's check
    > interrupts at this point:
    >
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 1193 3
    >
    > Now let's attach. Note that there is no filesystem on this disk (it's
    > completely blank), so there's nothing to mount.
    >
    > testbox# atacontrol attach ata3
    > ata3: [ITHREAD]
    > ad6: 286188MB at ata3-master SATA150
    > Master: ad6 Serial ATA v1.0
    > Slave: no device present
    >
    > And now we check interrupts:
    >
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 1258 2
    >
    > Looks fine to me.


    Thank you for your time, testing and reporting detailed results!
    I will investigate my case somewhen in the future (if time permits)

    Miroslav Lachman
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  5. Re: Recommendations for servers running SATA drives [hot-swap]

    Jeremy Chadwick wrote:
    > On Fri, Oct 17, 2008 at 04:09:17PM +0200, Miroslav Lachman wrote:
    >
    >>Jeremy Chadwick wrote:
    >>
    >>>On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote:
    >>>
    >>>>Jeremy Chadwick wrote:
    >>>>
    >>>>>On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote:
    >>>>>
    >>>>>>Today I was replacing disk in one Sun Fire X2100 M2 so I tried
    >>>>>>hot-swapping. It was as you said: atacontrol detach ata3, replace
    >>>>>>the HDD, atacontrol attach ata3 and new disk is in the system. I
    >>>>>>tried it 3 times to be sure that it was not coincidence - no
    >>>>>>panic was produced ;o)
    >>>>>>So in this case, hot-swapping on Sun Fire X2100 M2 with FreeBSD
    >>>>>>7.0 i386 works.
    >>>>>
    >>>>>
    >>>>>That's excellent news. So it seems possibly the problem I was seeing
    >>>>>was with "reinit" causing some sort of chaos. I'll have to check things
    >>>>>on my testbox here at home to see how I caused the panic last time.
    >>>>>
    >>>>>Thanks for providing feedback, as usual! :-)
    >>>>
    >>>>Unfortunately there is one problem - I see a lot of interrupts after
    >>>>disk swapping (about 193k of atapci1)
    >>>>
    >>>>Interrupts
    >>>>197k total
    >>>> ohci0 21
    >>>> ehci0 22
    >>>>193k atapci1 23
    >>>>2001 cpu0: time
    >>>> 1 bge1 273
    >>>>2001 cpu1: time
    >>>
    >>>
    >>>Okay, so it looks like the interrupt rate on atapci1 after swapping is
    >>>going crazy. What you're showing there looks like heavily modified
    >>>vmstat -i output.

    >>
    >>The shown is manually cropped from systat -vm, I'll try vmstat -i next
    >>time.
    >>
    >>
    >>>>Full output of systat -vm 2 is attached.
    >>>>
    >>>>It is shown in top as 50% interrupt (CPU state) and load 1 until I
    >>>>rebooted the machine (I can provide MRTG graphs). The system was not
    >>>>in production load, but almost idle. (I will put it in production
    >>>>tomorrow).
    >>>>After reboot, everything is OK.
    >>>
    >>>
    >>>And this box is running the ATA patch Andrey provided, yes?

    >>
    >>It is clean install of FreeBSD 7.0-RELEASE-p5 amd64 without patches.
    >>
    >>
    >>>>Can somebody test hot-swapping with SATA drives and confirm this
    >>>>behavior? (I can't test it now, because machine is in datacenter)
    >>>
    >>>
    >>>I can test it on my P4SCE box.
    >>>
    >>>I'll check the interrupt rates after each step of the hot-swap to see
    >>>if/when the problem starts.

    >>
    >>I'll check the interrupts next time too and will post results to this
    >>thread.

    >
    >
    > As promised, here are notes from my testing:
    >
    >
    > First thing to note is that the BIOS on my P4SCE had the ICH5 SATA mode
    > set to "Auto", which was causing PATA emulation to happen on the SATA
    > controller, e.g. disk #0 == ata0-master, disk #1 == ata0-slave.
    >
    > I changed the BIOS option from Auto to "SATA Enhanced", and now the
    > disks show up on their own channels, e.g. disk #0 == ata2-master, disk
    > #1 == ata3-master.
    >
    > Here's the applicable data. Note that this kernel ***DOES*** include
    > Andrey's ATA patch:
    >
    > FreeBSD testbox.home.lan 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Thu Oct 16 10:56:42 PDT 2008 root@testbox.home.lan:/usr/obj/usr/src/sys/TESTBOX i386
    >
    > atapci1: port 0xc000-0xc007,0xc400-0xc403,0xc800-0xc807,0xcc00-0xcc03,0xd000-0xd00f irq 18 at device 31.2 on pci0
    > atapci1: [ITHREAD]
    > ata2: on atapci1
    > ata2: [ITHREAD]
    > ata3: on atapci1
    > ata3: [ITHREAD]
    >
    > SATA controller is on IRQ 18.
    >
    > ad4: 114473MB at ata2-master SATA150
    > ad6: 238474MB at ata3-master SATA150
    >
    > ATA channel 2:
    > Master: ad4 Serial ATA v1.0
    > Slave: no device present
    > ATA channel 3:
    > Master: ad6 Serial ATA II
    > Slave: no device present
    >
    > testbox# df -k
    > Filesystem 1024-blocks Used Avail Capacity Mounted on
    > /dev/ad4s1a 507630 230182 236838 49% /
    > devfs 1 1 0 100% /dev
    > /dev/ad4s1e 507630 12 467008 0% /tmp
    > /dev/ad4s1f 108498334 2944826 96873642 3% /usr
    > /dev/ad4s1d 2008622 32360 1815574 2% /var
    > /dev/ad6s1d 236511738 4 217590796 0% /hotswap
    >
    > testbox# vmstat -i
    > interrupt total rate
    > irq4: sio0 1398 34
    > irq6: fdc0 10 0
    > irq15: ata1 58 1
    > irq18: atapci1 945 23
    > irq23: em1 8 0
    > cpu0: timer 80033 1952
    > cpu1: timer 79808 1946
    > Total 162260 3957
    >
    > testbox# umount /hotswap
    > testbox# atacontrol detach ata3
    > subdisk6: detached
    > ad6: detached
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 2671 11
    >
    > At this point I wanted to see what happened if I just reattached without
    > any physical changes to the SATA bus.
    >
    > testbox# atacontrol attach ata3
    > ata3: [ITHREAD]
    > ad6: 238474MB at ata3-master SATA150
    > Master: ad6 Serial ATA II
    > Slave: no device present
    >
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 2764 9
    > testbox# mount /dev/ad6s1d /hotswap
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 2779 8
    >
    > Now we're going to try detaching *without* umounting the filesystem,
    > then reattaching to see what happens. Based on what I've seen and
    > others have reported in the past, this should panic the kernel.
    > Supposedly this problem is fixed on CURRENT.
    >
    > testbox# atacontrol detach ata3
    > subdisk6: detached
    > ad6: detached
    >
    > testbox# atacontrol attach ata3
    > ata3: [ITHREAD]
    > ad6: 238474MB at ata3-master SATA150
    > Master: ad6 Serial ATA II
    > Slave: no device present
    >
    > testbox# df -k
    > Filesystem 1024-blocks Used Avail Capacity Mounted on
    > /dev/ad4s1a 507630 230182 236838 49% /
    > devfs 1 1 0 100% /dev
    > /dev/ad4s1e 507630 12 467008 0% /tmp
    > /dev/ad4s1f 108498334 2944826 96873642 3% /usr
    > /dev/ad4s1d 2008622 32360 1815574 2% /var
    > /dev/ad6s1d 236511738 4 217590796 0% /hotswap
    >
    > testbox# ls -l /hotswap
    >
    >
    > Fatal trap 12: page fault while in kernel mode
    > cpuid = 0; apic id = 00
    > fault virtual address = 0xc0
    > fault code = supervisor read, page not present
    > instruction pointer = 0x20:0xc0503ca7
    > stack pointer = 0x28:0xe6310a5c
    > frame pointer = 0x28:0xe6310a5c
    > code segment = base 0x0, limit 0xfffff, type 0x1b
    > = DPL 0, pres 1, def32 1, gran 1
    > processor eflags = interrupt enabled, resume, IOPL = 0
    > current process = 795 (ls)
    > [thread pid 795 tid 100043 ]
    > Stopped at dev2udev+0x11: movl 0xc0(%eax),%eax
    >
    > db> bt
    > Tracing pid 795 tid 100043 td 0xc3dcc460
    > dev2udev(3287166208,3861973668,3228755039,38619738 72,3286025312,...) at dev2udev+17
    > ufs_getattr(3861973664,3861973800,3227504003,32297 00640,3861973664,...) at ufs_getattr+222
    > VOP_GETATTR_APV(3229700640,3861973664,3229768320,3 288955040,3861973684,...) at VOP_GETATTR_APV+68
    > vn_stat(3288955040,3861973908,3286230784,0,3286025 312,...) at vn_stat+73
    > kern_lstat(3286025312,135344488,0,3861974040,38619 74064,...) at kern_lstat+147
    > lstat(3286025312,3861974268,8,3861974328,386197431 6,...) at lstat+43
    > syscall(3861974328) at syscall+814
    > Xint0x80_syscall() at Xint0x80_syscall+32
    > --- syscall (190, FreeBSD ELF32, lstat), eip = 1746463051, esp = 3217024524, ebp = 3217024664 ---
    >
    > Yup, there's the panic. :-)
    >
    > I rebooted the box from db, brought the system up in single-user, fsck'd
    > all the disks/filesystems (no anomalies were found), and rebooted the
    > box once more.
    >
    > Now we're going to do everything properly: unmount /hotswap, detach,
    > yank the disk and insert a new Maxtor hard disk, attach, and see what
    > happens.
    >
    > testbox# umount /hotswap
    > testbox# atacontrol detach ata3
    > subdisk6: detached
    > ad6: detached
    >
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 1174 6
    >
    > I've now removed the disk physically from the machine. Let's check
    > interrupts again.
    >
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 1185 4
    >
    > Now the new Maxtor disk has been inserted. LEDs for the SATA hot-swap
    > backplane lit up for about 5-6 seconds, then went off. Let's check
    > interrupts at this point:
    >
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 1193 3
    >
    > Now let's attach. Note that there is no filesystem on this disk (it's
    > completely blank), so there's nothing to mount.
    >
    > testbox# atacontrol attach ata3
    > ata3: [ITHREAD]
    > ad6: 286188MB at ata3-master SATA150
    > Master: ad6 Serial ATA v1.0
    > Slave: no device present
    >
    > And now we check interrupts:
    >
    > testbox# vmstat -i | grep atapci1
    > irq18: atapci1 1258 2



    I played again with hot-swapping disks in Sun Fire X2100 M2 on FreeBSD
    7.0-RELEASE-p5 i386 without ATA patches.
    Both disks (ad4 + ad6) are in gmirror. There were high interrupts load
    again! I tracked it to the point of pulling out the disk.
    Interrupt was OK after 'atacontrol detach', but rise up after disk was
    removed. When the disk is inserted back (same disk), interrupts are
    going to normal rate without need to reboot.
    I tried it three times and behavior was always the same.
    It can be related to the use of gmirror.

    Side note: If the disk was detached by 'atacontrol detach ata2' without
    removing from gmirror (without gmirror remove or gmirror deactivate) and
    then pulled out + inserted back, it was automagically attached without
    need of 'atacontrol attach ata2' and gmirror synchronization was
    autostarted.

    As I am planing my vacation, I will not have time to test newer versions
    of FreeBSD (or patches), I will test it later in December.

    Miroslav Lachman
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  6. Re: Recommendations for servers running SATA drives [hot-swap]

    On Sun, Oct 26, 2008 at 01:41:58PM +0100, Miroslav Lachman wrote:
    > Jeremy Chadwick wrote:
    >> On Fri, Oct 17, 2008 at 04:09:17PM +0200, Miroslav Lachman wrote:
    >>
    >>> Jeremy Chadwick wrote:
    >>>
    >>>> On Fri, Oct 17, 2008 at 01:50:38PM +0200, Miroslav Lachman wrote:
    >>>>
    >>>>> Jeremy Chadwick wrote:
    >>>>>
    >>>>>> On Thu, Oct 16, 2008 at 09:30:20PM +0200, Miroslav Lachman wrote:
    >>>>>>
    >>>>>>> Today I was replacing disk in one Sun Fire X2100 M2 so I
    >>>>>>> tried hot-swapping. It was as you said: atacontrol detach
    >>>>>>> ata3, replace the HDD, atacontrol attach ata3 and new disk is
    >>>>>>> in the system. I tried it 3 times to be sure that it was not
    >>>>>>> coincidence - no panic was produced ;o)
    >>>>>>> So in this case, hot-swapping on Sun Fire X2100 M2 with
    >>>>>>> FreeBSD 7.0 i386 works.
    >>>>>>
    >>>>>>
    >>>>>> That's excellent news. So it seems possibly the problem I was seeing
    >>>>>> was with "reinit" causing some sort of chaos. I'll have to check things
    >>>>>> on my testbox here at home to see how I caused the panic last time.
    >>>>>>
    >>>>>> Thanks for providing feedback, as usual! :-)
    >>>>>
    >>>>> Unfortunately there is one problem - I see a lot of interrupts
    >>>>> after disk swapping (about 193k of atapci1)
    >>>>>
    >>>>> Interrupts
    >>>>> 197k total
    >>>>> ohci0 21
    >>>>> ehci0 22
    >>>>> 193k atapci1 23
    >>>>> 2001 cpu0: time
    >>>>> 1 bge1 273
    >>>>> 2001 cpu1: time
    >>>>
    >>>>
    >>>> Okay, so it looks like the interrupt rate on atapci1 after swapping is
    >>>> going crazy. What you're showing there looks like heavily modified
    >>>> vmstat -i output.
    >>>
    >>> The shown is manually cropped from systat -vm, I'll try vmstat -i
    >>> next time.
    >>>
    >>>
    >>>>> Full output of systat -vm 2 is attached.
    >>>>>
    >>>>> It is shown in top as 50% interrupt (CPU state) and load 1 until
    >>>>> I rebooted the machine (I can provide MRTG graphs). The system
    >>>>> was not in production load, but almost idle. (I will put it in
    >>>>> production tomorrow).
    >>>>> After reboot, everything is OK.
    >>>>
    >>>>
    >>>> And this box is running the ATA patch Andrey provided, yes?
    >>>
    >>> It is clean install of FreeBSD 7.0-RELEASE-p5 amd64 without patches.
    >>>
    >>>
    >>>>> Can somebody test hot-swapping with SATA drives and confirm this
    >>>>> behavior? (I can't test it now, because machine is in
    >>>>> datacenter)
    >>>>
    >>>>
    >>>> I can test it on my P4SCE box.
    >>>>
    >>>> I'll check the interrupt rates after each step of the hot-swap to see
    >>>> if/when the problem starts.
    >>>
    >>> I'll check the interrupts next time too and will post results to this
    >>> thread.

    >>
    >>
    >> As promised, here are notes from my testing:
    >>
    >>
    >> First thing to note is that the BIOS on my P4SCE had the ICH5 SATA mode
    >> set to "Auto", which was causing PATA emulation to happen on the SATA
    >> controller, e.g. disk #0 == ata0-master, disk #1 == ata0-slave.
    >>
    >> I changed the BIOS option from Auto to "SATA Enhanced", and now the
    >> disks show up on their own channels, e.g. disk #0 == ata2-master, disk
    >> #1 == ata3-master.
    >>
    >> Here's the applicable data. Note that this kernel ***DOES*** include
    >> Andrey's ATA patch:
    >>
    >> FreeBSD testbox.home.lan 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #0: Thu Oct 16 10:56:42 PDT 2008 root@testbox.home.lan:/usr/obj/usr/src/sys/TESTBOX i386
    >>
    >> atapci1: port 0xc000-0xc007,0xc400-0xc403,0xc800-0xc807,0xcc00-0xcc03,0xd000-0xd00f irq 18 at device 31.2 on pci0
    >> atapci1: [ITHREAD]
    >> ata2: on atapci1
    >> ata2: [ITHREAD]
    >> ata3: on atapci1
    >> ata3: [ITHREAD]
    >>
    >> SATA controller is on IRQ 18.
    >>
    >> ad4: 114473MB at ata2-master SATA150
    >> ad6: 238474MB at ata3-master SATA150
    >>
    >> ATA channel 2:
    >> Master: ad4 Serial ATA v1.0
    >> Slave: no device present
    >> ATA channel 3:
    >> Master: ad6 Serial ATA II
    >> Slave: no device present
    >>
    >> testbox# df -k
    >> Filesystem 1024-blocks Used Avail Capacity Mounted on
    >> /dev/ad4s1a 507630 230182 236838 49% /
    >> devfs 1 1 0 100% /dev
    >> /dev/ad4s1e 507630 12 467008 0% /tmp
    >> /dev/ad4s1f 108498334 2944826 96873642 3% /usr
    >> /dev/ad4s1d 2008622 32360 1815574 2% /var
    >> /dev/ad6s1d 236511738 4 217590796 0% /hotswap
    >>
    >> testbox# vmstat -i
    >> interrupt total rate
    >> irq4: sio0 1398 34
    >> irq6: fdc0 10 0
    >> irq15: ata1 58 1
    >> irq18: atapci1 945 23
    >> irq23: em1 8 0
    >> cpu0: timer 80033 1952
    >> cpu1: timer 79808 1946
    >> Total 162260 3957
    >>
    >> testbox# umount /hotswap
    >> testbox# atacontrol detach ata3
    >> subdisk6: detached
    >> ad6: detached
    >> testbox# vmstat -i | grep atapci1
    >> irq18: atapci1 2671 11
    >>
    >> At this point I wanted to see what happened if I just reattached without
    >> any physical changes to the SATA bus.
    >>
    >> testbox# atacontrol attach ata3
    >> ata3: [ITHREAD]
    >> ad6: 238474MB at ata3-master SATA150
    >> Master: ad6 Serial ATA II
    >> Slave: no device present
    >>
    >> testbox# vmstat -i | grep atapci1
    >> irq18: atapci1 2764 9
    >> testbox# mount /dev/ad6s1d /hotswap
    >> testbox# vmstat -i | grep atapci1
    >> irq18: atapci1 2779 8
    >>
    >> Now we're going to try detaching *without* umounting the filesystem,
    >> then reattaching to see what happens. Based on what I've seen and
    >> others have reported in the past, this should panic the kernel.
    >> Supposedly this problem is fixed on CURRENT.
    >>
    >> testbox# atacontrol detach ata3
    >> subdisk6: detached
    >> ad6: detached
    >>
    >> testbox# atacontrol attach ata3
    >> ata3: [ITHREAD]
    >> ad6: 238474MB at ata3-master SATA150
    >> Master: ad6 Serial ATA II
    >> Slave: no device present
    >>
    >> testbox# df -k
    >> Filesystem 1024-blocks Used Avail Capacity Mounted on
    >> /dev/ad4s1a 507630 230182 236838 49% /
    >> devfs 1 1 0 100% /dev
    >> /dev/ad4s1e 507630 12 467008 0% /tmp
    >> /dev/ad4s1f 108498334 2944826 96873642 3% /usr
    >> /dev/ad4s1d 2008622 32360 1815574 2% /var
    >> /dev/ad6s1d 236511738 4 217590796 0% /hotswap
    >>
    >> testbox# ls -l /hotswap
    >>
    >>
    >> Fatal trap 12: page fault while in kernel mode
    >> cpuid = 0; apic id = 00
    >> fault virtual address = 0xc0
    >> fault code = supervisor read, page not present
    >> instruction pointer = 0x20:0xc0503ca7
    >> stack pointer = 0x28:0xe6310a5c
    >> frame pointer = 0x28:0xe6310a5c
    >> code segment = base 0x0, limit 0xfffff, type 0x1b
    >> = DPL 0, pres 1, def32 1, gran 1
    >> processor eflags = interrupt enabled, resume, IOPL = 0
    >> current process = 795 (ls)
    >> [thread pid 795 tid 100043 ]
    >> Stopped at dev2udev+0x11: movl 0xc0(%eax),%eax
    >>
    >> db> bt
    >> Tracing pid 795 tid 100043 td 0xc3dcc460
    >> dev2udev(3287166208,3861973668,3228755039,38619738 72,3286025312,...) at dev2udev+17
    >> ufs_getattr(3861973664,3861973800,3227504003,32297 00640,3861973664,...) at ufs_getattr+222
    >> VOP_GETATTR_APV(3229700640,3861973664,3229768320,3 288955040,3861973684,...) at VOP_GETATTR_APV+68
    >> vn_stat(3288955040,3861973908,3286230784,0,3286025 312,...) at vn_stat+73
    >> kern_lstat(3286025312,135344488,0,3861974040,38619 74064,...) at kern_lstat+147
    >> lstat(3286025312,3861974268,8,3861974328,386197431 6,...) at lstat+43
    >> syscall(3861974328) at syscall+814
    >> Xint0x80_syscall() at Xint0x80_syscall+32
    >> --- syscall (190, FreeBSD ELF32, lstat), eip = 1746463051, esp = 3217024524, ebp = 3217024664 ---
    >>
    >> Yup, there's the panic. :-)
    >>
    >> I rebooted the box from db, brought the system up in single-user, fsck'd
    >> all the disks/filesystems (no anomalies were found), and rebooted the
    >> box once more.
    >>
    >> Now we're going to do everything properly: unmount /hotswap, detach,
    >> yank the disk and insert a new Maxtor hard disk, attach, and see what
    >> happens.
    >>
    >> testbox# umount /hotswap
    >> testbox# atacontrol detach ata3
    >> subdisk6: detached
    >> ad6: detached
    >>
    >> testbox# vmstat -i | grep atapci1
    >> irq18: atapci1 1174 6
    >>
    >> I've now removed the disk physically from the machine. Let's check
    >> interrupts again.
    >>
    >> testbox# vmstat -i | grep atapci1
    >> irq18: atapci1 1185 4
    >>
    >> Now the new Maxtor disk has been inserted. LEDs for the SATA hot-swap
    >> backplane lit up for about 5-6 seconds, then went off. Let's check
    >> interrupts at this point:
    >>
    >> testbox# vmstat -i | grep atapci1
    >> irq18: atapci1 1193 3
    >>
    >> Now let's attach. Note that there is no filesystem on this disk (it's
    >> completely blank), so there's nothing to mount.
    >>
    >> testbox# atacontrol attach ata3
    >> ata3: [ITHREAD]
    >> ad6: 286188MB at ata3-master SATA150
    >> Master: ad6 Serial ATA v1.0
    >> Slave: no device present
    >>
    >> And now we check interrupts:
    >>
    >> testbox# vmstat -i | grep atapci1
    >> irq18: atapci1 1258 2

    >
    >
    > I played again with hot-swapping disks in Sun Fire X2100 M2 on FreeBSD
    > 7.0-RELEASE-p5 i386 without ATA patches.
    > Both disks (ad4 + ad6) are in gmirror. There were high interrupts load
    > again! I tracked it to the point of pulling out the disk.
    > Interrupt was OK after 'atacontrol detach', but rise up after disk was
    > removed. When the disk is inserted back (same disk), interrupts are
    > going to normal rate without need to reboot.
    > I tried it three times and behavior was always the same.
    > It can be related to the use of gmirror.


    Interesting -- yeah, that's the one difference (besides hardware)
    between your tests and mine: you're using gmirror, I'm not.

    I'll take some time this weekend (or the upcoming weekend) to set up
    gmirror on the above test box, and see if I can reproduce what you're
    seeing. This will be my first experience with gmirror.

    > Side note: If the disk was detached by 'atacontrol detach ata2' without
    > removing from gmirror (without gmirror remove or gmirror deactivate) and
    > then pulled out + inserted back, it was automagically attached without
    > need of 'atacontrol attach ata2' and gmirror synchronization was
    > autostarted.


    This could mean that gmirror is constantly "polling" the underlying
    device layer for certain things, which might explain the high interrupt
    rate you're seeing. We should probably involve pjd@ in this discussion.

    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


+ Reply to Thread
Page 2 of 2 FirstFirst 1 2