fsck system freeze - Help

This is a discussion on fsck system freeze - Help ; Hi I've got Red Hat 9.0 installed on a system with a couple of smaller disks and a 200 gb disk. Had a power failure and the system stopped booting up. I checked and it was forcing a check on ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: fsck system freeze

  1. fsck system freeze

    Hi
    I've got Red Hat 9.0 installed on a system with a couple of smaller
    disks and a 200 gb disk. Had a power failure and the system stopped
    booting up. I checked and it was forcing a check on the 200gb drive
    which froze up the computer. I cancelled out of it and set the last
    column for the drive in fstab (order in which to run fsck) to 0 to
    prevent it from running fsck and the system is booting up fine. I
    tried running fsck from a ssh session after unmounting the drive and
    the system completely froze up again. I did the same from a console
    with -C and -V (the partition is ext3) and during 'Pass1', it died at
    around 26.1%, again the entire system including the gui and ssh
    sessions died, couldnt even ping the machine. Any idea on whats going
    on / how can I fix it? Thanks a bunch.

    Muhammed
    PS I can read data from the drive fine, haven't read everything but
    transferred like 10-20 gb outta the disk without any problems.

  2. Re: fsck system freeze

    Thu, 29 Jul 2004 16:24:37 -0700 tarihinde, Muhammed dedi ki:

    > Hi
    > I've got Red Hat 9.0 installed on a system with a couple of smaller
    > disks and a 200 gb disk. Had a power failure and the system stopped
    > booting up. I checked and it was forcing a check on the 200gb drive
    > which froze up the computer. I cancelled out of it and set the last
    > column for the drive in fstab (order in which to run fsck) to 0 to
    > prevent it from running fsck and the system is booting up fine. I
    > tried running fsck from a ssh session after unmounting the drive and
    > the system completely froze up again. I did the same from a console
    > with -C and -V (the partition is ext3) and during 'Pass1', it died at
    > around 26.1%, again the entire system including the gui and ssh
    > sessions died, couldnt even ping the machine. Any idea on whats going
    > on / how can I fix it? Thanks a bunch.
    >
    > Muhammed
    > PS I can read data from the drive fine, haven't read everything but
    > transferred like 10-20 gb outta the disk without any problems.


    This looks like as if there is some driver issue with the controller that
    surfaces when sustained and heavy I/O takes place. Certain information
    would be helpful:
    - Any messages in /var/log/* relevant to the problem.
    - Which disk is which (hd?) and whether 200G disk and the others are on
    the same controller chip. The controller model that drives the 200G disk.
    - Whether system hangs when you force a fsck on the other disk (though
    don't try forced fsck on root disk, just in case. It would be safest if
    you can get hold of an unused disk temporarily, and use it in your tests.)
    - Whether it is feasible for you to switch to another distro (a server or
    a desktop?) At least a kernel upgrade would help, if it turns out to be a
    driver issue.

    Also I would suggest that you do the tests in virtual consoles (as
    opposed to X). Because there might be some messages posted to console by
    kernel prior to freeze, and you won't have a chance to see those messages
    if you work on X.

    Also a quick test migh be helpful: Try "noapic", "pci=noacpi", "acpi=off"
    kernel boot parameters in turn, and see if they solve the problem. E.g.:

    boot: linux noapic

    HTH
    --
    Abdullah | aramazan@ |
    Ramazanoglu | myrealbox |
    ________________| D-O-T cm |


  3. Re: fsck system freeze

    Abdullah Ramazanoglu wrote in message news:<4109951b$0$35880$14726298@news.sunsite.dk>...
    > Thu, 29 Jul 2004 16:24:37 -0700 tarihinde, Muhammed dedi ki:
    >
    > > Hi
    > > I've got Red Hat 9.0 installed on a system with a couple of smaller
    > > disks and a 200 gb disk. Had a power failure and the system stopped
    > > booting up. I checked and it was forcing a check on the 200gb drive
    > > which froze up the computer. I cancelled out of it and set the last
    > > column for the drive in fstab (order in which to run fsck) to 0 to
    > > prevent it from running fsck and the system is booting up fine. I
    > > tried running fsck from a ssh session after unmounting the drive and
    > > the system completely froze up again. I did the same from a console
    > > with -C and -V (the partition is ext3) and during 'Pass1', it died at
    > > around 26.1%, again the entire system including the gui and ssh
    > > sessions died, couldnt even ping the machine. Any idea on whats going
    > > on / how can I fix it? Thanks a bunch.
    > >
    > > Muhammed
    > > PS I can read data from the drive fine, haven't read everything but
    > > transferred like 10-20 gb outta the disk without any problems.

    >
    > This looks like as if there is some driver issue with the controller that
    > surfaces when sustained and heavy I/O takes place. Certain information
    > would be helpful:
    > - Any messages in /var/log/* relevant to the problem.
    > - Which disk is which (hd?) and whether 200G disk and the others are on
    > the same controller chip. The controller model that drives the 200G disk.
    > - Whether system hangs when you force a fsck on the other disk (though
    > don't try forced fsck on root disk, just in case. It would be safest if
    > you can get hold of an unused disk temporarily, and use it in your tests.)
    > - Whether it is feasible for you to switch to another distro (a server or
    > a desktop?) At least a kernel upgrade would help, if it turns out to be a
    > driver issue.
    >
    > Also I would suggest that you do the tests in virtual consoles (as
    > opposed to X). Because there might be some messages posted to console by
    > kernel prior to freeze, and you won't have a chance to see those messages
    > if you work on X.
    >
    > Also a quick test migh be helpful: Try "noapic", "pci=noacpi", "acpi=off"
    > kernel boot parameters in turn, and see if they solve the problem. E.g.:
    >
    > boot: linux noapic
    >
    > HTH



    Hi,
    tried all three (linux single noapic /acpi=off etc). No luck, didnt
    find anythin g mentioning that drive in the logs except a dma timeout
    error

    spurious 8259A interrupt: IRQ7
    hdg: dma_timer_expiry dma status == 0x21
    hdg: timeout waiting for DMA
    hdg: (__ide_dma_test_irq) called while not waiting

    Anyways I checked the dma status by hdparm -i and it says dma is
    active on the drive. I also tried running fsck on another drive (on
    the same controller) and it simply gave a volume clean message. I
    couldnt find any switch to force it to run through the drive.

    the disk config is as follows:
    200 gb X2 on a U133 controller
    80 GB on a U100 builtin controller.

    hde1 and hda1 = 30 gb mirrored raid
    hda2 = 50 gb
    hdg1 = 200 gb
    hde2 = 170 gb

    The disk is working fine, just not being able to run fsck , is
    creeping me out that something is wrong that i dont know about. Thanks
    again

    Muhammed

  4. Re: fsck system freeze

    Sat, 31 Jul 2004 16:25:05 -0700 tarihinde, Muhammed dedi ki:
    > Abdullah Ramazanoglu wrote in message
    >> Thu, 29 Jul 2004 16:24:37 -0700 tarihinde, Muhammed dedi ki:
    >>
    >> > Hi
    >> > I've got Red Hat 9.0 installed on a system with a couple of smaller
    >> > disks and a 200 gb disk. Had a power failure and the system stopped
    >> > booting up. I checked and it was forcing a check on the 200gb drive
    >> > which froze up the computer. I cancelled out of it and set the last
    >> > column for the drive in fstab (order in which to run fsck) to 0 to
    >> > prevent it from running fsck and the system is booting up fine. I
    >> > tried running fsck from a ssh session after unmounting the drive and
    >> > the system completely froze up again. I did the same from a console
    >> > with -C and -V (the partition is ext3) and during 'Pass1', it died at
    >> > around 26.1%, again the entire system including the gui and ssh
    >> > sessions died, couldnt even ping the machine. Any idea on whats going
    >> > on / how can I fix it? Thanks a bunch.
    >> >
    >> > Muhammed
    >> > PS I can read data from the drive fine, haven't read everything but
    >> > transferred like 10-20 gb outta the disk without any problems.

    >>
    >> This looks like as if there is some driver issue with the controller that
    >> surfaces when sustained and heavy I/O takes place. Certain information
    >> would be helpful:
    >> - Any messages in /var/log/* relevant to the problem.
    >> - Which disk is which (hd?) and whether 200G disk and the others are on
    >> the same controller chip. The controller model that drives the 200G disk.
    >> - Whether system hangs when you force a fsck on the other disk (though
    >> don't try forced fsck on root disk, just in case. It would be safest if
    >> you can get hold of an unused disk temporarily, and use it in your tests.)
    >> - Whether it is feasible for you to switch to another distro (a server or
    >> a desktop?) At least a kernel upgrade would help, if it turns out to be a
    >> driver issue.
    >>
    >> Also I would suggest that you do the tests in virtual consoles (as
    >> opposed to X). Because there might be some messages posted to console by
    >> kernel prior to freeze, and you won't have a chance to see those messages
    >> if you work on X.
    >>
    >> Also a quick test migh be helpful: Try "noapic", "pci=noacpi", "acpi=off"
    >> kernel boot parameters in turn, and see if they solve the problem. E.g.:
    >>
    >> boot: linux noapic
    >>
    >> HTH

    >
    >
    > Hi,
    > tried all three (linux single noapic /acpi=off etc). No luck, didnt
    > find anythin g mentioning that drive in the logs except a dma timeout
    > error
    >
    > spurious 8259A interrupt: IRQ7
    > hdg: dma_timer_expiry dma status == 0x21
    > hdg: timeout waiting for DMA
    > hdg: (__ide_dma_test_irq) called while not waiting
    >
    > Anyways I checked the dma status by hdparm -i and it says dma is
    > active on the drive. I also tried running fsck on another drive (on
    > the same controller) and it simply gave a volume clean message. I
    > couldnt find any switch to force it to run through the drive.
    >
    > the disk config is as follows:
    > 200 gb X2 on a U133 controller
    > 80 GB on a U100 builtin controller.
    >
    > hde1 and hda1 = 30 gb mirrored raid
    > hda2 = 50 gb
    > hdg1 = 200 gb
    > hde2 = 170 gb
    >
    > The disk is working fine, just not being able to run fsck , is
    > creeping me out that something is wrong that i dont know about. Thanks
    > again
    >
    > Muhammed



    Selam Muhammed,

    "hdparm -d0 /dev/hdg" would disable DMA on hdg and probably alleviate the
    problem at the cost of severe performance penalty (particularly so if
    there is usually high I/O load on this drive). So I would only suggest
    disabling DMA as a last resort.

    To force a check on hde2, after unmounting it you would enter "e2fsck -f
    -p -C 0 /dev/hde2" (-C 0 optional) but I guess it would also freeze the
    system. If it does, then this problem is attributable to the controller
    (i.e. driver). BTW you have not stated your controller models. The "lspci"
    command shows them.

    But before trying a forced check on hde, I would suggest wiring hdg as
    hdc. Second channel on builtin controller seems to be free, and offers a
    viable alternative. When you wire hdg as hdc you should also change all
    references of hdg to hdc in /etc/fstab as well. After the change try
    "e2fsck -p /dev/hdc1" and see if it makes a difference. If it works
    without problems, then the kernel driver for your off-board IDE controller
    must be the culprit: It falters on sustained heavy I/O activity. In that
    case I would suggest either upgrading the kernel to latest level offered
    by RedHat (for RedHat-9), or upgrading to a newer distro.

    If e2fsck on hdc hangs, then there must be something wrong with either the
    particular disk at hand, or the system in general. To see which is at
    fault, try "e2fsck -fp /dev/hde2". If it works, then the disk (hdc) is at
    fault. If it doesn't, then the system is at fault, so I would suggest a
    kernel (or distro) upgrade again.

    To confirm your findings, you can attach the disk to another Linux machine
    (with a different IDE controller, and preferably with a distro other than
    RedHat 9) and try an e2fsck there.

    If it turns out that a kernel upgrade is needed, please don't defer it.
    Because it means that your controller (or the whole system) is incapable
    of doing prolonged heavy I/O, and at the first such opportunity your
    system will freeze again. Also try to refrain from prolonged use of
    an "in error" file system (your current hdg1). Because it will lead to
    proliferation of the error, and may render the filesystem unrepairable (by
    e2fsck) after some time.

    BTW, before trying anything whatsoever, please first doublecheck IDE cable
    connections and make sure that offboard IDE controller is firmly seated in
    the PCI slot, just in case.

    Well, due to thinking and writing at the same time, I guess my suggestions
    have been in a little unusual order. First thing to try last, and last
    thing to try first.

    P.S.: If you decide to upgrade the distro (as opposed to kernel), and if
    you are accustomed to rpm based systems and don't want to change it, then
    I would suggest Mandrake 10.0 which is an rpm based RedHat derivative
    distro. If you wouldn't mind to switch to another package format then I
    would suggest Debian.

    Vesselam

    --
    Abdullah | aramazan@ |
    Ramazanoglu | myrealbox |
    ________________| D-O-T cm |


+ Reply to Thread