How can Linux demage a motherboard? - Hardware

This is a discussion on How can Linux demage a motherboard? - Hardware ; Hi, I have recently purchased a few servers, AMD Opteron based. I have installed Linux with the 2.6 kernel and started configuring. I have run into some confguration problems so after a few hours of changing setting and installing components ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 37

Thread: How can Linux demage a motherboard?

  1. How can Linux demage a motherboard?

    Hi,
    I have recently purchased a few servers, AMD Opteron based. I have installed
    Linux with the 2.6 kernel and started configuring. I have run into some
    confguration problems so after a few hours of changing setting and
    installing components I have abandoned the installed image and decided to
    plug in a different drive with an image that was installed on a different
    (Intel based) server. I have started up the server and it booted as normal,
    except the server that previously run quietly during the boot process
    started to get noisy, the fans started to work harder halfway through the
    bootup process, after a while the fans started to go run flat out for a few
    seconds and the system decided to initialise a system shutdow, and shutdown
    without a question.

    I have restarted the server and this time the server only run for about
    10-15 seconds with increasing fan speed and shut down again.

    On the third try the server only ran for a couple of secod, the fans just
    shad enough time to speed up, and the system shut down straight away. i have
    tried again and again, but no change.

    I check the CPU and all components for heat, non was overheating, all seemed
    fine.

    I had the second server unpacked and started using it. After pluging in the
    "second" image, that was created on s different system the server started
    the same behaviour and became unusable. At this stage I had no idea what is
    causing the problem. I had the two servers to rest, and after a few hours of
    rest I did a few tests.

    After "cooling off", the servers will turn on and operate for a few minutes,
    but once they do the first shutdown as previously did, they will only
    operate for another 10-15 second, and any subsequent attempt of fireing them
    up will last for a seccond or two. Again lleaving them for a few hour
    allows too get them goind for a few minutes.

    I have called the manufacturer and explained the situation, at this stage I
    was not aware of the affect of the second image, going through the simptom
    together with the tech guy on the other end of the line, we ame to the
    conclusion that this is a MotherBoard problem and they are dspatching one
    for me.

    In the evening I decided to unpack the third server and using the "first
    image" I started to continue the abandoned configuration. Another 2 hours
    went past using the 3rd server without any problem, but I could not get the
    ystem configured the way I needed so plugged in the "second image" and
    booted up the 3rd server again. During the boot while the server was going
    through the different run levels I heard the fans speeding up and somewhat
    fluctuating. I suddenly realises the situation and understood that this this
    problem only started on all 3 servers when I booted up with this image, I
    halted the boot process, but it was too late. After unpluggging the drive
    from the 3rd machine and trying to reoot again, the server run for a few
    minutes and shut down as the other 2 did previously.

    Here it became clear that Linux has made some changes, altered something in
    the bios or who knows where and I have to reset watever has changed.

    I have tried to reset the bios with the dip switch, in the bios set back to
    default and removed the power cable from the motherboard together with the
    bios battery. Left it rest for a little while, started up again, but no
    change.

    I have even managed to update the bios hoping to overwrite any changes, but
    no affect. The servers still behave the same way. Have plugged in a
    different powersupply, still made no difeerence.

    My conclusion at this stage is that some sensor or other mechanism has been
    adjusted or demage by the OS, and none of the above attempts to reset it has
    worked!!!

    What could it be? How can an operating system cause such irreversible
    change??

    TIA,

    Tom



  2. Re: How can Linux demage a motherboard?

    On Sat, 21 Apr 2007 05:23:27 +1000, Tom Szabo wrote:


    > Here it became clear that Linux has made some changes, altered something in
    > the bios or who knows where and I have to reset watever has changed.
    >

    Linux never touches the bios at all. Not even for the clock. Period.

    It is clear you've received a series of lemons.
    mb, power supply overheating, ram are defective. Who knows.





  3. Re: How can Linux demage a motherboard?

    In message <462912fc$1$25450$5a62ac22@per-qv1-newsreader-01.iinet.net.au>,
    Tom Szabo wrote:

    > Hi,
    > I have recently purchased a few servers, AMD Opteron based. I have
    > installed Linux with the 2.6 kernel and started configuring. I have run
    > into some confguration problems so after a few hours of changing setting
    > and installing components I have abandoned the installed image and decided
    > to plug in a different drive with an image that was installed on a
    > different
    > (Intel based) server. I have started up the server and it booted as
    > normal, except the server that previously run quietly during the boot
    > process started to get noisy, the fans started to work harder halfway
    > through the bootup process, after a while the fans started to go run flat
    > out for a few seconds and the system decided to initialise a system
    > shutdow, and shutdown without a question.
    >
    > I have restarted the server and this time the server only run for about
    > 10-15 seconds with increasing fan speed and shut down again.
    >
    > On the third try the server only ran for a couple of secod, the fans just
    > shad enough time to speed up, and the system shut down straight away. i
    > have tried again and again, but no change.
    >
    >

    If the fans do that, something thinks you're overheating. I know that on a
    system here, it was generally fine when not working hard, but due to poor
    contact between the heatsink and the processor, any serious computing would
    heat up the processor and cause the BIOS alarms to trip. It never actually
    got hot enough for a shutdown, but then it was using a Core 2 Duo so I
    suspect it just wound the clock speed down which I don't think AMD
    processors do (certainly not the older ones, anyway).

    During boot it's working hard, especially if it's trying to reconfigure the
    boot sequence for a new motherboard, so I'd check the seating of the
    heatsink on the processor even if it doesn't seem to be too hot by the time
    you take it apart.

    --
    Dave
    mail da ve@llondel.org (without the space)
    http://www.llondel.org
    So many gadgets, so little time

  4. Re: How can Linux demage a motherboard?

    Tom Szabo wrote:
    > Hi,
    > I have recently purchased a few servers, AMD Opteron based. I have installed
    > Linux with the 2.6 kernel and started configuring. I have run into some
    > confguration problems so after a few hours of changing setting and
    > installing components I have abandoned the installed image and decided to
    > plug in a different drive with an image that was installed on a different
    > (Intel based) server. I have started up the server and it booted as normal,
    > except the server that previously run quietly during the boot process
    > started to get noisy, the fans started to work harder halfway through the
    > bootup process, after a while the fans started to go run flat out for a few
    > seconds and the system decided to initialise a system shutdow, and shutdown
    > without a question.
    >
    > I have restarted the server and this time the server only run for about
    > 10-15 seconds with increasing fan speed and shut down again.
    >
    > On the third try the server only ran for a couple of secod, the fans just
    > shad enough time to speed up, and the system shut down straight away. i have
    > tried again and again, but no change.
    >
    > I check the CPU and all components for heat, non was overheating, all seemed
    > fine.
    >
    > I had the second server unpacked and started using it. After pluging in the
    > "second" image, that was created on s different system the server started
    > the same behaviour and became unusable. At this stage I had no idea what is
    > causing the problem. I had the two servers to rest, and after a few hours of
    > rest I did a few tests.
    >
    > After "cooling off", the servers will turn on and operate for a few minutes,
    > but once they do the first shutdown as previously did, they will only
    > operate for another 10-15 second, and any subsequent attempt of fireing them
    > up will last for a seccond or two. Again lleaving them for a few hour
    > allows too get them goind for a few minutes.
    >
    > I have called the manufacturer and explained the situation, at this stage I
    > was not aware of the affect of the second image, going through the simptom
    > together with the tech guy on the other end of the line, we ame to the
    > conclusion that this is a MotherBoard problem and they are dspatching one
    > for me.
    >
    > In the evening I decided to unpack the third server and using the "first
    > image" I started to continue the abandoned configuration. Another 2 hours
    > went past using the 3rd server without any problem, but I could not get the
    > ystem configured the way I needed so plugged in the "second image" and
    > booted up the 3rd server again. During the boot while the server was going
    > through the different run levels I heard the fans speeding up and somewhat
    > fluctuating. I suddenly realises the situation and understood that this this
    > problem only started on all 3 servers when I booted up with this image, I
    > halted the boot process, but it was too late. After unpluggging the drive
    > from the 3rd machine and trying to reoot again, the server run for a few
    > minutes and shut down as the other 2 did previously.
    >
    > Here it became clear that Linux has made some changes, altered something in
    > the bios or who knows where and I have to reset watever has changed.
    >
    > I have tried to reset the bios with the dip switch, in the bios set back to
    > default and removed the power cable from the motherboard together with the
    > bios battery. Left it rest for a little while, started up again, but no
    > change.
    >
    > I have even managed to update the bios hoping to overwrite any changes, but
    > no affect. The servers still behave the same way. Have plugged in a
    > different powersupply, still made no difeerence.
    >
    > My conclusion at this stage is that some sensor or other mechanism has been
    > adjusted or demage by the OS, and none of the above attempts to reset it has
    > worked!!!
    >
    > What could it be? How can an operating system cause such irreversible
    > change??
    >
    > TIA,
    >
    > Tom


    So you're trying to run the Opteron box off a hard disk linux installation
    that was setup on a different cpu architecture and motherboard? That's no surprise that
    didn't work too well. Intel and AMD use different power management systems
    Opteron uses "Cool-n-Quiet" and Intel uses speedstep. Since the intel and AMD
    32bit and 64bit architectures are pretty compatible (EMT64<->AMD64 or i386 on either)
    the kernel itself was ok, it does some hardware detection at boot time so it should
    work on either. But when it loaded cpuspeed the power driver is hard coded in the
    /etc/cpuspeed.conf file on most Linux's so it was likely trying to load an
    inappropriate power module. Also, /etc/modprobe.conf might be suggesting
    modules to the kernel for loading that are really for Intel

    Try installing linux on the Motherboard from scratch or only use an image
    on a hard disk from another server that's got exactly the same hardware.
    Another easy test would be to try a Live Linux CD like Knoppix and see
    if it reproduces the weird fan problem, I bet it won't.

    Mark

  5. Re: How can Linux demage a motherboard?

    Hi,

    I am not sure who are you, but I am mainly interested in answers from people
    who read the post. I am not in a crusade against Linux, I like it lot more
    than I do Windows.

    At the same time, considering the sequence of events, something has happened
    that was triggered by Linux. Not by any Linux, just one (possibly slight
    went wrong) installation. As I mentione,d it mas created on a completely
    different type of machine, than shifted over.

    Considering your theory of faulty batch you need to put into consideration
    that the servers actaully not exactly from the same batch. While I have
    purchased the first two together, their serial numbers are miles apart and
    the third one is a different subversion based on the same motherboard.

    Considering the fact that the manufacturer yet have not heard of this
    problem, and we are talking about a well known, one of the largest in the
    world, I find it hard to beleive that all 3 servers, apperantly not from the
    same batch, end up in my office, will work fine happily until I stick the
    drive with the particuar image in, and decide to act faulty the same way,
    all three.

    If you beleive that I am telling the truth, and my observations are correct,
    you can calculate the odds of your "lemon theory", and as far as I am
    concerned the probability converges to ZERO.

    While I have been working with computers for about 14 years now,
    troubleshoouting, etc, my knowledge of hardware is not very deep, but I have
    good analitical skills and quite efficient in isolating problem areas.

    Here I know how to reproduce the problem, I know it is somewhere in the
    motherboard, I know that it is not a result of any Linux installation, but I
    have a disk that has one particular installation that if that one image is
    put into a certain type of machine, it will cause some change that will
    influence the power management, some safety mechanism, or something on this
    line that will not be reset or reversed by a bios reset or bios flash.

    What I am asking here is what can this be? I am interested in any logical
    reasoning but no use to waste my time with sayingit can't happen, jus
    because you haven't seen it.

    To be fair, if someone tells me this story, I would doubt it too, I must
    admit. Anyway, I am open to any constructive approach, questions or
    suggestions,

    TIA,

    Tom




    "Burzum" wrote in message
    newsan.2007.04.20.19.58.36.247548@127.0.0.1...
    > On Sat, 21 Apr 2007 05:23:27 +1000, Tom Szabo wrote:
    >
    >
    > > Here it became clear that Linux has made some changes, altered something

    in
    > > the bios or who knows where and I have to reset watever has changed.
    > >

    > Linux never touches the bios at all. Not even for the clock. Period.
    >
    > It is clear you've received a series of lemons.
    > mb, power supply overheating, ram are defective. Who knows.
    >
    >
    >
    >




  6. Re: How can Linux demage a motherboard?

    On Sat, 21 Apr 2007 05:23:27 +1000, Tom Szabo wrote:

    > Hi,
    > I have recently purchased a few servers, AMD Opteron based. I have
    > installed Linux with the 2.6 kernel and started configuring. I have run
    > into some confguration problems so after a few hours of changing setting
    > and installing components I have abandoned the installed image and
    > decided to plug in a different drive with an image that was installed on
    > a different (Intel based) server. I have started up the server and it
    > booted as normal, except the server that previously run quietly during
    > the boot process started to get noisy, the fans started to work harder
    > halfway through the bootup process, after a while the fans started to go
    > run flat out for a few seconds and the system decided to initialise a
    > system shutdow, and shutdown without a question.
    >
    > I have restarted the server and this time the server only run for about
    > 10-15 seconds with increasing fan speed and shut down again.
    >
    > On the third try the server only ran for a couple of secod, the fans
    > just shad enough time to speed up, and the system shut down straight
    > away. i have tried again and again, but no change.
    >
    > I check the CPU and all components for heat, non was overheating, all
    > seemed fine.
    >
    > I had the second server unpacked and started using it. After pluging in
    > the "second" image, that was created on s different system the server
    > started the same behaviour and became unusable. At this stage I had no
    > idea what is causing the problem. I had the two servers to rest, and
    > after a few hours of rest I did a few tests.
    >
    > After "cooling off", the servers will turn on and operate for a few
    > minutes, but once they do the first shutdown as previously did, they
    > will only operate for another 10-15 second, and any subsequent attempt
    > of fireing them up will last for a seccond or two. Again lleaving them
    > for a few hour allows too get them goind for a few minutes.
    >
    > I have called the manufacturer and explained the situation, at this
    > stage I was not aware of the affect of the second image, going through
    > the simptom together with the tech guy on the other end of the line, we
    > ame to the conclusion that this is a MotherBoard problem and they are
    > dspatching one for me.
    >
    > In the evening I decided to unpack the third server and using the "first
    > image" I started to continue the abandoned configuration. Another 2
    > hours went past using the 3rd server without any problem, but I could
    > not get the ystem configured the way I needed so plugged in the "second
    > image" and booted up the 3rd server again. During the boot while the
    > server was going through the different run levels I heard the fans
    > speeding up and somewhat fluctuating. I suddenly realises the situation
    > and understood that this this problem only started on all 3 servers when
    > I booted up with this image, I halted the boot process, but it was too
    > late. After unpluggging the drive from the 3rd machine and trying to
    > reoot again, the server run for a few minutes and shut down as the
    > other 2 did previously.
    >
    > Here it became clear that Linux has made some changes, altered something
    > in the bios or who knows where and I have to reset watever has changed.
    >
    > I have tried to reset the bios with the dip switch, in the bios set back
    > to default and removed the power cable from the motherboard together
    > with the bios battery. Left it rest for a little while, started up
    > again, but no change.
    >
    > I have even managed to update the bios hoping to overwrite any changes,
    > but no affect. The servers still behave the same way. Have plugged in a
    > different powersupply, still made no difeerence.
    >
    > My conclusion at this stage is that some sensor or other mechanism has
    > been adjusted or demage by the OS, and none of the above attempts to
    > reset it has worked!!!
    >
    > What could it be? How can an operating system cause such irreversible
    > change??
    >
    > TIA,
    >
    > Tom


    You haven't said anything about which distro or kernel you are using. If
    it's an old kernel then it might not support your motherboards. RHEL 4
    uses a very old kernel, 2.6.9, which won't support most modern machines.
    I don't know what the official means of upgrading a RHEL 4 kernel is,
    I've only used the clones, but I had to put a 2.6.18 kernel on my A64
    laptop before I could get Scientfic Linux 4.4 to work reliably. FC6 uses
    the most recent kernel, 2.6.20,xx, so it will work on your systems. Also
    RHEL 5 uses a 2.6.18 kernel which will probably work.

  7. Re: How can Linux demage a motherboard?

    Hi Mark,

    > So you're trying to run the Opteron box off a hard disk linux installation
    > that was setup on a different cpu architecture and motherboard? That's no

    surprise that
    > didn't work too well. Intel and AMD use different power management

    systems
    > Opteron uses "Cool-n-Quiet" and Intel uses speedstep. Since the intel and

    AMD
    > 32bit and 64bit architectures are pretty compatible (EMT64<->AMD64 or i386

    on either)
    > the kernel itself was ok, it does some hardware detection at boot time so

    it should
    > work on either. But when it loaded cpuspeed the power driver is hard coded

    in the
    > /etc/cpuspeed.conf file on most Linux's so it was likely trying to load an
    > inappropriate power module. Also, /etc/modprobe.conf might be suggesting
    > modules to the kernel for loading that are really for Intel


    What you are saying regarding the powermanagement is in line with my
    thoughts, I suspected something on this line.

    > Try installing linux on the Motherboard from scratch or only use an image
    > on a hard disk from another server that's got exactly the same hardware.
    > Another easy test would be to try a Live Linux CD like Knoppix and see
    > if it reproduces the weird fan problem, I bet it won't.


    On the other hand, in your suggestion you have forgotten the fact that now
    the server doesn't run for more that 5 five minutes in first go after couple
    of hours of resting, and the second and subsequent startups are only a few
    seconds.....

    Here is my real dilemma:
    The OS loaded some driver, module, etc and changed some behaviour. That is
    understandable but that should only affect the server when the OS is
    loading.
    Here on the other hand, once the wrong image is booted up the first time the
    problem becomes permanent and not dependent on the OS any more, so tha
    changes are written some where into/onto the motherboard.....that is fine, I
    can leave with that too.

    But where does it gets written to so neither the BIOS reset or the bios
    patch can reverse it?

    Considering all simptoms, it seems like some sensory process gets adjusted
    to just below the normal oprating temperature.
    For example the CPU operates normally @ 50 degrees and normally the max
    operating temperatue is 80 degrees.
    Using this examplle, my stuffed image somehow managed to adjust the max
    operating temperature to 49.8 degrees. When I turn on the machine the first
    time, it takes a little while to get up to the 50 degrees, so I have say 5
    minutes...but next roung only 10 seconds, and after that it is down to 1 or
    2 seconds.

    Any more clues?

    TIA,

    Tom








  8. Re: How can Linux demage a motherboard?

    Hi Dave,

    > If the fans do that, something thinks you're overheating. I know that on a
    > system here, it was generally fine when not working hard, but due to poor
    > contact between the heatsink and the processor, any serious computing

    would
    > heat up the processor and cause the BIOS alarms to trip. It never actually
    > got hot enough for a shutdown, but then it was using a Core 2 Duo so I
    > suspect it just wound the clock speed down which I don't think AMD
    > processors do (certainly not the older ones, anyway).


    Thanks for your thoughts but if I think about the odds of having 3 of them
    exactly the same way, there is a little chance, too little...
    Besides I have checked one...

    >
    > During boot it's working hard, especially if it's trying to reconfigure

    the
    > boot sequence for a new motherboard, so I'd check the seating of the
    > heatsink on the processor even if it doesn't seem to be too hot by the

    time
    > you take it apart.


    They are seated well, screws are tight, all seems fine.Beside it is unlikely
    that they were fine for hours and than suddenly all 3 changed...it must be
    something else,

    Thanks anyway,

    Tom



  9. Re: How can Linux demage a motherboard?

    Sorry,

    Debian Etch (stable) - 2.6.18 I beleive ,

    Regards, Tom



  10. Re: How can Linux demage a motherboard?



    On Sat, 21 Apr 2007, Tom Szabo wrote:

    > Hi,
    >
    > I am not sure who are you, but I am mainly interested in answers from people
    > who read the post. I am not in a crusade against Linux, I like it lot more
    > than I do Windows.
    >
    > At the same time, considering the sequence of events, something has happened
    > that was triggered by Linux. Not by any Linux, just one (possibly slight
    > went wrong) installation. As I mentione,d it mas created on a completely
    > different type of machine, than shifted over.
    >
    > Considering your theory of faulty batch you need to put into consideration
    > that the servers actaully not exactly from the same batch. While I have
    > purchased the first two together, their serial numbers are miles apart and
    > the third one is a different subversion based on the same motherboard.


    Could it be related to opening the case to install the other disk rather
    than what was on the disk? Or, could it be a faulty disk that damaged the
    interface on the MB?


    >
    > Considering the fact that the manufacturer yet have not heard of this
    > problem, and we are talking about a well known, one of the largest in the
    > world, I find it hard to beleive that all 3 servers, apperantly not from the
    > same batch, end up in my office, will work fine happily until I stick the
    > drive with the particuar image in, and decide to act faulty the same way,
    > all three.


    As suggested above, there are 3 common factors: the disk image, the
    physical disk and the process of opening the case. You need to investigate
    these latter 2 issues.


  11. Re: How can Linux demage a motherboard?


    > Could it be related to opening the case to install the other disk rather
    > than what was on the disk? Or, could it be a faulty disk that damaged the
    > interface on the MB?


    The 2 drives are connected to a Perc4/SC. The same card was used in 2
    machines and a second one in the third, makes no difference.


    > As suggested above, there are 3 common factors: the disk image, the
    > physical disk and the process of opening the case. You need to investigate
    > these latter 2 issues.


    These make and mae no difference. I have checked all these, the case has no
    intrusion sensor/alarm after the first started to pay up, I thought I may
    have had it cooked due to not heavint the op on...

    this is a 1U unit

    so the next two I made sure that they did not operate without the lid.

    I also have checked, it is not critical with these servers to have the lid
    on...



  12. Re: How can Linux demage a motherboard?

    On Sat, 21 Apr 2007 06:45:07 +1000, Tom Szabo wrote:

    > Hi,
    >
    > I am not sure who are you, but I am mainly interested in answers from people
    > who read the post. I am not in a crusade against Linux, I like it lot more
    > than I do Windows.
    >

    First of all, top posting is evil.
    Second point, your troubles are clearly independent from the os.
    I wouldn't believe 3 such casualties because of xp, let alone linux.

    The fact that the manifacturer haven't heard yet of this problems is worth
    zilch. I could produce a rather long list of troubles from the largest
    manifacturer that are yet to be addressed (you know, the one with 4
    letters...).
    No chinese clone had ever troubles with a wrong "disk image", and if 3
    expensive servers could be damaged by an installation cd I would look
    elsewhere. Let alone the fact that those rubbish needed to "cool off",
    ridiculous.

    It is clearly a hardware problem, unless you have ferrite memories from
    60es inside. End of story.

    Now which piece is at fault I don't know and don't care.
    It's a manifacturer problem to fix things up. You payed big money for
    that boxes, you have the right to have everything fine and dandy.
    And not stupid excuses like bios is gone because you don't use xp pro.



  13. Re: How can Linux demage a motherboard?

    "Tom Szabo" wrote:
    ....
    >> work on either. But when it loaded cpuspeed the power driver
    >> is hard coded in the /etc/cpuspeed.conf file on most Linux's
    >> so it was likely trying to load an inappropriate power
    >> module. Also, /etc/modprobe.conf might be suggesting modules
    >> to the kernel for loading that are really for Intel

    >
    >What you are saying regarding the powermanagement is in line with my
    >thoughts, I suspected something on this line.


    That is almost certainly the general idea of what is happening.

    >> Try installing linux on the Motherboard from scratch or only use an image
    >> on a hard disk from another server that's got exactly the same hardware.
    >> Another easy test would be to try a Live Linux CD like Knoppix and see
    >> if it reproduces the weird fan problem, I bet it won't.

    >
    >On the other hand, in your suggestion you have forgotten the fact that now
    >the server doesn't run for more that 5 five minutes in first go after couple
    >of hours of resting, and the second and subsequent startups are only a few
    >seconds.....


    He is correct. Although your attempts at running a system
    configured for another motherboard might prevent you from easily
    doing that now.

    >Here is my real dilemma:
    >The OS loaded some driver, module, etc and changed some behaviour.


    Almost certainly it has to do with controlling the fans, which
    makes it almost guaranteed to be lm_sensors related. The
    potential difficulty is that you have now written a
    configuration to the chip which monitors temperatures, and if so
    it that has to be cleared in order to prevent the faulty
    temperature shutdown.

    The exact nature of the problem appears to be that you have
    either selected the wrong type of temperture tranducer, or given
    it a very low temperature as the alarm point, and the monitor
    chip thinks it is overheating when in fact it is not even at
    normal temperature. When it cools for a couple hours, it
    actually gets down to room temp and takes a significant amount
    of time to heat up. In the process the fans go through each
    stage of control from barely on to hitting it full blast. Then
    it shuts down. Of course if it is immediately restarted it
    takes much less time to hit the alarm temp.

    Have you tried to boot to single user? I'm not sure how that is
    done on your particular machines, but with the LILO boot loader
    you would give it a boot name, such a "linux", and add the word
    "single" after it.

    If the configuration is not being written to the monitor chip
    and if it is properly programmed into the boot scripts, it will
    not be done when booting to single user mode (just because an
    error in the script would prevent booting).

    If you try that and it still does not, try booting from any
    other kernel, such as a live CD (for any system) and see if that
    will continue to run.

    If you get it to run without shutting down, use whatever system
    you've boot as a "rescue system" and edit the boot scripts for
    your improper configuration to remove anything than initializes
    sensors.

    If no matter what kernel you boot it shuts down, you've got
    yourself a *major* problem! You will have to figure out the
    right configuration for your server, set it up on another
    system, and then boot it with a cold box that will run long
    enough for that part of the boot process to be executed and
    reconfigure the sensors. It won't be an easy thing to figure
    out.

    >That is
    >understandable but that should only affect the server when the OS is
    >loading.
    >Here on the other hand, once the wrong image is booted up the first time the
    >problem becomes permanent and not dependent on the OS any more, so tha
    >changes are written some where into/onto the motherboard.....that is fine, I
    >can leave with that too.
    >
    >But where does it gets written to so neither the BIOS reset or the bios
    >patch can reverse it?


    It does appear to be the configuration of the sensor monitoring
    chip.

    Another trick you might try is a reset, and instead of letting
    it boot the OS, go to the BIOS setup. In the BIOS setup do
    whatever is available for monitoring (temperatures, voltages,
    etc.). As an example, on some older Tyan dual processor boards
    for AMD processors the sensor monitoring chip would be about
    half initialized in a normal boot, but would be fully
    initialized only when the BIOS setup option to show voltages and
    temperatures was entered. If the reset button was pressed after
    that, then lm_sensors could monitor all of the temps and
    voltages. But if the box was powered down it would reset the
    chip completely, and only half of it was initialize when powered
    up. That meant half the temperatures and voltages could not be
    read by lm_sensors. (I wrote a C program to fully initialize
    the monitor chip, and ran that as part of the boot process to
    correct the problem.)

    That monitor chip did not retain configuration when powered
    down, as it seems yours is.

    >Considering all simptoms, it seems like some sensory process gets adjusted
    >to just below the normal oprating temperature.
    >For example the CPU operates normally @ 50 degrees and normally the max
    >operating temperatue is 80 degrees.
    >Using this examplle, my stuffed image somehow managed to adjust the max
    >operating temperature to 49.8 degrees. When I turn on the machine the first
    >time, it takes a little while to get up to the 50 degrees, so I have say 5
    >minutes...but next roung only 10 seconds, and after that it is down to 1 or
    >2 seconds.


    Exactly.

    >Any more clues?


    I'm glad I'm not you... ;-)

    --
    Floyd L. Davidson
    Ukpeagvik (Barrow, Alaska) floyd@apaflo.com

  14. Re: How can Linux demage a motherboard?

    > I have restarted the server and this time the server only run for about
    > 10-15 seconds with increasing fan speed and shut down again.


    Just to recap. Let me know if I'm missing something.

    - These machines ran without issues for hours using the first Linux image.
    - Because of configuration issues with the first image, a second Linux image
    was installed. The PC's shut down quickly after this image was installed and
    started.
    - Now the PC's will shut down after only a few seconds, regardless of what
    image is installed. The PC's will even shut down automatically even if no
    drive is attached, or you are in the BIOS setup.

    IF you can get into the BIOS setup, try and disable the "shutdown at this
    temp" settings and "fan detection" settings. Use the PC Health page to
    monitor the temps and fan speeds that the PC is seeing.





  15. Re: How can Linux demage a motherboard?

    On Sat, 21 Apr 2007 07:18:35 +1000, Tom Szabo wrote:

    > Sorry,
    >
    > Debian Etch (stable) - 2.6.18 I beleive ,
    >
    > Regards, Tom



    2.6.18 should be fine. Try running sys_basher.

    http://www.polybus.com/sys_basher_web/

    Sys_basher puts your system under maximum load while testing memory and
    CPU reliability. If you have lm_sensors and the lm_sensors development
    package installed sys_basher will log the CPU temps while it runs so
    you'll have record of what temperature the system was running at if it
    crashes.



  16. Re: How can Linux demage a motherboard?

    Hi Floyd,

    Thanks for your reply, I am glad to see that I am not alone :-), your
    thoughts, almost all, make sense.

    The only problem I have still, that I have another install of Linux, that
    was donee on these machines and it worked fine. This is the first "image" I
    used in the 3rd machine.
    This worked fine in the machines for hours, until I have put the second
    image in, and than the problem started. As far as I remember, I have tryed
    to use the first image and booted, but I could be little mixed up, so I will
    try again and let you know.

    As far as I can see, some code in Image #2 has changed something in the
    machine. If it was part of a normal adjustment, it should be reversed by a
    correct image. I hope you are right and I rememer wrong, and in fact I never
    got to boot proerly with the first image after the secod image run on the
    same machine....that is my hope...

    Otherwise it is a little trouble, as than it means that there was some kind
    of wierd code that went a little further than should have and changer
    something that a normal install doesn't mean to...

    Anyway, Iwill do the test and see,

    Thanks for your thoughts,

    Regards,

    Tom


    "Floyd L. Davidson" wrote in message
    news:87647qekn7.fld@barrow.com...
    > "Tom Szabo" wrote:
    > ...
    > >> work on either. But when it loaded cpuspeed the power driver
    > >> is hard coded in the /etc/cpuspeed.conf file on most Linux's
    > >> so it was likely trying to load an inappropriate power
    > >> module. Also, /etc/modprobe.conf might be suggesting modules
    > >> to the kernel for loading that are really for Intel

    > >
    > >What you are saying regarding the powermanagement is in line with my
    > >thoughts, I suspected something on this line.

    >
    > That is almost certainly the general idea of what is happening.
    >
    > >> Try installing linux on the Motherboard from scratch or only use an

    image
    > >> on a hard disk from another server that's got exactly the same

    hardware.
    > >> Another easy test would be to try a Live Linux CD like Knoppix and see
    > >> if it reproduces the weird fan problem, I bet it won't.

    > >
    > >On the other hand, in your suggestion you have forgotten the fact that

    now
    > >the server doesn't run for more that 5 five minutes in first go after

    couple
    > >of hours of resting, and the second and subsequent startups are only a

    few
    > >seconds.....

    >
    > He is correct. Although your attempts at running a system
    > configured for another motherboard might prevent you from easily
    > doing that now.
    >
    > >Here is my real dilemma:
    > >The OS loaded some driver, module, etc and changed some behaviour.

    >
    > Almost certainly it has to do with controlling the fans, which
    > makes it almost guaranteed to be lm_sensors related. The
    > potential difficulty is that you have now written a
    > configuration to the chip which monitors temperatures, and if so
    > it that has to be cleared in order to prevent the faulty
    > temperature shutdown.
    >
    > The exact nature of the problem appears to be that you have
    > either selected the wrong type of temperture tranducer, or given
    > it a very low temperature as the alarm point, and the monitor
    > chip thinks it is overheating when in fact it is not even at
    > normal temperature. When it cools for a couple hours, it
    > actually gets down to room temp and takes a significant amount
    > of time to heat up. In the process the fans go through each
    > stage of control from barely on to hitting it full blast. Then
    > it shuts down. Of course if it is immediately restarted it
    > takes much less time to hit the alarm temp.
    >
    > Have you tried to boot to single user? I'm not sure how that is
    > done on your particular machines, but with the LILO boot loader
    > you would give it a boot name, such a "linux", and add the word
    > "single" after it.
    >
    > If the configuration is not being written to the monitor chip
    > and if it is properly programmed into the boot scripts, it will
    > not be done when booting to single user mode (just because an
    > error in the script would prevent booting).
    >
    > If you try that and it still does not, try booting from any
    > other kernel, such as a live CD (for any system) and see if that
    > will continue to run.
    >
    > If you get it to run without shutting down, use whatever system
    > you've boot as a "rescue system" and edit the boot scripts for
    > your improper configuration to remove anything than initializes
    > sensors.
    >
    > If no matter what kernel you boot it shuts down, you've got
    > yourself a *major* problem! You will have to figure out the
    > right configuration for your server, set it up on another
    > system, and then boot it with a cold box that will run long
    > enough for that part of the boot process to be executed and
    > reconfigure the sensors. It won't be an easy thing to figure
    > out.
    >
    > >That is
    > >understandable but that should only affect the server when the OS is
    > >loading.
    > >Here on the other hand, once the wrong image is booted up the first time

    the
    > >problem becomes permanent and not dependent on the OS any more, so tha
    > >changes are written some where into/onto the motherboard.....that is

    fine, I
    > >can leave with that too.
    > >
    > >But where does it gets written to so neither the BIOS reset or the bios
    > >patch can reverse it?

    >
    > It does appear to be the configuration of the sensor monitoring
    > chip.
    >
    > Another trick you might try is a reset, and instead of letting
    > it boot the OS, go to the BIOS setup. In the BIOS setup do
    > whatever is available for monitoring (temperatures, voltages,
    > etc.). As an example, on some older Tyan dual processor boards
    > for AMD processors the sensor monitoring chip would be about
    > half initialized in a normal boot, but would be fully
    > initialized only when the BIOS setup option to show voltages and
    > temperatures was entered. If the reset button was pressed after
    > that, then lm_sensors could monitor all of the temps and
    > voltages. But if the box was powered down it would reset the
    > chip completely, and only half of it was initialize when powered
    > up. That meant half the temperatures and voltages could not be
    > read by lm_sensors. (I wrote a C program to fully initialize
    > the monitor chip, and ran that as part of the boot process to
    > correct the problem.)
    >
    > That monitor chip did not retain configuration when powered
    > down, as it seems yours is.
    >
    > >Considering all simptoms, it seems like some sensory process gets

    adjusted
    > >to just below the normal oprating temperature.
    > >For example the CPU operates normally @ 50 degrees and normally the max
    > >operating temperatue is 80 degrees.
    > >Using this examplle, my stuffed image somehow managed to adjust the max
    > >operating temperature to 49.8 degrees. When I turn on the machine the

    first
    > >time, it takes a little while to get up to the 50 degrees, so I have say

    5
    > >minutes...but next roung only 10 seconds, and after that it is down to 1

    or
    > >2 seconds.

    >
    > Exactly.
    >
    > >Any more clues?

    >
    > I'm glad I'm not you... ;-)
    >
    > --
    > Floyd L. Davidson
    > Ukpeagvik (Barrow, Alaska) floyd@apaflo.com




  17. Re: How can Linux demage a motherboard?

    Hi,

    What you are saying is correct, almost, just one little but could be
    significant issue.

    The first mage was installed on this, Opteron based machine, the second was
    installed on Dueal Xeon based machine ans brought over to the Opteron...

    I also appreciate your suggestion, but I already tryed everything on this
    line, I was twice in the bios looking at the temperature when the system
    suddenly wund up and shut down a few seconds later.

    All temperatures were stable and in the middle between high and low.
    Unfortunately I haven't found a way to shut down or disable the temperatur
    monitoring....

    I will do a few more tests and get back,

    Thanks anyway,

    Tom

    "Noozer" wrote in message
    news:NVbWh.101831$aG1.70365@pd7urf3no...
    > > I have restarted the server and this time the server only run for about
    > > 10-15 seconds with increasing fan speed and shut down again.

    >
    > Just to recap. Let me know if I'm missing something.
    >
    > - These machines ran without issues for hours using the first Linux image.
    > - Because of configuration issues with the first image, a second Linux

    image
    > was installed. The PC's shut down quickly after this image was installed

    and
    > started.
    > - Now the PC's will shut down after only a few seconds, regardless of what
    > image is installed. The PC's will even shut down automatically even if no
    > drive is attached, or you are in the BIOS setup.
    >
    > IF you can get into the BIOS setup, try and disable the "shutdown at this
    > temp" settings and "fan detection" settings. Use the PC Health page to
    > monitor the temps and fan speeds that the PC is seeing.
    >
    >
    >
    >




  18. Re: How can Linux demage a motherboard?

    Hi Floyd,

    I just tried with one of the machines to put back "Image #1" and started up,
    buted up fine, got into KDE, looked around a little and suddenly started
    shutting down. It simply initialises a shutdown process as if I would issue
    a HALT or REBOOT.

    So I am still stuck....it means the good image doesn't reverse the affect of
    the old one...but how can it be????

    TIA,

    Tom



  19. Re: How can Linux demage a motherboard?


    "Tom Szabo" wrote in message
    news:462988b2$0$25482$5a62ac22@per-qv1-newsreader-01.iinet.net.au...
    > Hi,
    >
    > What you are saying is correct, almost, just one little but could be
    > significant issue.
    >
    > The first mage was installed on this, Opteron based machine, the second
    > was
    > installed on Dueal Xeon based machine ans brought over to the Opteron...
    >
    > I also appreciate your suggestion, but I already tryed everything on this
    > line, I was twice in the bios looking at the temperature when the system
    > suddenly wund up and shut down a few seconds later.
    >
    > All temperatures were stable and in the middle between high and low.
    > Unfortunately I haven't found a way to shut down or disable the temperatur
    > monitoring....


    One more item, from way out in left field, is UPS monitoring...

    I'm not aware of any mainboards that handle UPS monitoring, so it's not
    likely the issue.

    I have had systems with similar symptoms after a COM port was disabled in
    the BIOS settings, or a cable moved to allow a serial connection to
    something else. This was all at OS level though, so wouldn't happen until
    the OS was booted.



  20. Re: How can Linux demage a motherboard?

    Noozer wrote:
    > "Tom Szabo" wrote in message
    > news:462988b2$0$25482$5a62ac22@per-qv1-newsreader-01.iinet.net.au...
    >> Hi,
    >>
    >> What you are saying is correct, almost, just one little but could be
    >> significant issue.
    >>
    >> The first mage was installed on this, Opteron based machine, the second
    >> was
    >> installed on Dueal Xeon based machine ans brought over to the Opteron...
    >>
    >> I also appreciate your suggestion, but I already tryed everything on this
    >> line, I was twice in the bios looking at the temperature when the system
    >> suddenly wund up and shut down a few seconds later.
    >>
    >> All temperatures were stable and in the middle between high and low.
    >> Unfortunately I haven't found a way to shut down or disable the temperatur
    >> monitoring....

    >
    > One more item, from way out in left field, is UPS monitoring...
    >
    > I'm not aware of any mainboards that handle UPS monitoring, so it's not
    > likely the issue.
    >
    > I have had systems with similar symptoms after a COM port was disabled in
    > the BIOS settings, or a cable moved to allow a serial connection to
    > something else. This was all at OS level though, so wouldn't happen until
    > the OS was booted.
    >
    >

    Check your Motherboard book to locate the CMOS jumper reset and reset.
    If the temperature/fan/fail sensors were set to something incompatible
    with standard operation, the power can be arrested. The reset on the
    motherboard will clear back to default conditions on the bios and give
    you a shot at use again.


+ Reply to Thread
Page 1 of 2 1 2 LastLast