GCC crash, hardware related? - Hardware

This is a discussion on GCC crash, hardware related? - Hardware ; When compiling some (rather large) software packages on one of my Linux boxes, the C compiler sometimes (seldom) crashes. When this happens, it never does happen at the same point in the compilation sequence - or on the same package, ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: GCC crash, hardware related?

  1. GCC crash, hardware related?

    When compiling some (rather large) software packages on one of my
    Linux boxes, the C compiler sometimes (seldom) crashes. When this
    happens, it never does happen at the same point in the compilation
    sequence - or on the same package, at that. This has happened on the two
    versions of gcc that I have tested (3.3.6 and 4.1.1).

    Since I understand that gcc is memory hungry, especially when
    optimizing, I am wondering if this is a bad memory issue? The same
    software packages and the same compilers do not have any problems on
    other Linux boxes.

    What other hardware problems could cause this anyway? It does not
    seem to be a CPU overheating problem - the failures I report do not seem
    to be correlated with the system or CPU temperature, as far as I can tell.


  2. Re: GCC crash, hardware related?

    Harold Weissman wrote:

    > When compiling some (rather large) software packages on one of my
    > Linux boxes, the C compiler sometimes (seldom) crashes. When this
    > happens, it never does happen at the same point in the compilation
    > sequence - or on the same package, at that. This has happened on the two
    > versions of gcc that I have tested (3.3.6 and 4.1.1).
    >
    > Since I understand that gcc is memory hungry, especially when
    > optimizing, I am wondering if this is a bad memory issue? The same
    > software packages and the same compilers do not have any problems on
    > other Linux boxes.


    Quite likely - bad RAM (non ECC) is more common than you'd like to believe.

    I would run memtest for at least 24 hours, preferably longer.

    > What other hardware problems could cause this anyway? It does not
    > seem to be a CPU overheating problem - the failures I report do not seem
    > to be correlated with the system or CPU temperature, as far as I can tell.


    I presume the crash is a Segmentation Fault? It's almost certainly bad RAM
    IME. Other problems (eg bad PSU, bad Mobo) tend to lock up or reboot the
    entire system, but nothing is totally impossible.

    Cheers

    Tim

  3. Re: GCC crash, hardware related?

    On 18 Apr 2007, in the Usenet newsgroup comp.os.linux.hardware, in article
    , Harold Weissman wrote:

    > When compiling some (rather large) software packages on one of my
    >Linux boxes, the C compiler sometimes (seldom) crashes. When this
    >happens, it never does happen at the same point in the compilation
    >sequence - or on the same package, at that. This has happened on the two
    >versions of gcc that I have tested (3.3.6 and 4.1.1).


    Sounds like a classic memory failure due to heating. When you flog the
    box in a compile, it gets warmer, and some parts may get very hot. The
    RAM dissipates more heat when the bits are flipping than when they are
    remaining in state.

    > Since I understand that gcc is memory hungry, especially when
    >optimizing, I am wondering if this is a bad memory issue? The same
    >software packages and the same compilers do not have any problems on
    >other Linux boxes.


    http://www.bitwizard.nl/sig11/

    That should be the most current version of a GCC-SIG11-FAQ - you can
    find the older document at ftp://ibiblio.org/pub/linux/docs/faqs/

    -rw-rw-r-- 1 gferg ldp 37134 Dec 2 1998 GCC-SIG11-FAQ

    > What other hardware problems could cause this anyway? It does
    >not seem to be a CPU overheating problem - the failures I report do not
    >seem to be correlated with the system or CPU temperature, as far as I
    >can tell.


    See the FAQ (either one). The key would seem to be the fact that the
    failure occurs after you've been flogging the snot out of the system
    (the standard test used to be compiling a Linux kernel repeatedly in
    a loop in a shell script), AND that the failure occurs at random
    places. When was the last time you had the case off the computer?
    Any dust bunnies resident? (I'm near Phoenix Arizona, about 360
    miles/600 KM East of Los Angeles and it gets quite warm in the summer
    despite 25 KW of central air conditioning - and have pets. Have to
    clean the computers yearly, despite extra air filters and fans.)

    Old guy)

  4. Re: GCC crash, hardware related?

    On Apr 19, 1:06 pm, ibupro...@painkiller.example.tld (Moe Trin) wrote:
    > On 18 Apr 2007, in the Usenet newsgroup comp.os.linux.hardware, in article
    >
    > , Harold Weissman wrote:
    > > When compiling some (rather large) software packages on one of my
    > >Linux boxes, the C compiler sometimes (seldom) crashes. When this
    > >happens, it never does happen at the same point in the compilation
    > >sequence - or on the same package, at that. This has happened on the two
    > >versions of gcc that I have tested (3.3.6 and 4.1.1).

    >
    > Sounds like a classic memory failure due to heating. When you flog the
    > box in a compile, it gets warmer, and some parts may get very hot. The
    > RAM dissipates more heat when the bits are flipping than when they are
    > remaining in state.


    Most ram modules don't come with the heatsinks i don't think it heats
    up enough
    unless overclocked. g965 northbridge on the other hand is an egg
    cooker.
    on supermicro though the system just shuts itself down when the temp
    reaches 50C on linux.
    Under windows you keep getting a prompt: "50C reached. Is the system
    functioning properly?"
    regardless of whether you press yes or no nothing happens.
    reminds me of some old space quest :-]

    but yes, it looks like there is still a lot of crappy ram on the
    market


  5. Re: GCC crash, hardware related?

    On 23 Apr 2007, in the Usenet newsgroup comp.os.linux.hardware, in article
    <1177357912.146383.113830@o5g2000hsb.googlegroups.c om>, sndive@gmail.com wrote:

    >(Moe Trin) wrote:


    >> Sounds like a classic memory failure due to heating. When you flog the
    >> box in a compile, it gets warmer, and some parts may get very hot. The
    >> RAM dissipates more heat when the bits are flipping than when they are
    >> remaining in state.

    >
    >Most ram modules don't come with the heatsinks i don't think it heats
    >up enough unless overclocked.


    I take it you haven't seen a Sun UltraSparc. The RAM is mounted in a
    nominal 5 inch square tunnel with the RAM modules sticking out from the
    walls like heat fins. At one end of the "tunnel" is a 4.7 inch fan
    delivering over 80 CF/M of cooling air. First time I saw one (an Ultra
    140), I wondered what kept it from blowing off the side of the table
    (there are also fans cooling the hard drive bays). Then I tried to
    pick it up, and had the answer.

    Power dissipation as a function of speed is a well documented fact. In
    most logic families, both output stages (plus rail to output, and
    ground to output) are "ON" for a tiny fraction of a second (typically
    in the sub-nanosecond range), and this is the reason larger integrated
    circuits have multiple power and ground pins. This is also the reason
    hardware designers include lots of "bypass" capacitors as close to the
    power/ground pins on an IC. For 'metal oxide semiconductors' (the
    logic family used in CPUs, support chips and some RAM), the power
    consumed is a function of the clock frequency. For most RAM, the power
    consumed is a function of access (read, write, or refresh), and whether
    the bit is changing (data is stored on a tiny capacitor that must be
    charged or discharged to change states from "high" to "low" or
    vice-versa). Bottom line - like everything else, heat is a function
    of how hard you are beating on the system.

    >g965 northbridge on the other hand is an egg cooker. on supermicro
    >though the system just shuts itself down when the temp reaches 50C on
    >linux. Under windows you keep getting a prompt: "50C reached. Is the
    >system functioning properly?" regardless of whether you press yes or
    >no nothing happens. reminds me of some old space quest :-]


    The Northbridge should be running hot, as it's running at the full
    CPU spped. Everything else is running slower - sometimes substantially
    so. 50C? The outside ambient temperature here gets that high several
    times a summer. +30C (86F) is not at all unusual for an inside ambient.
    That's why my home systems have extra air filters (pet hair) and fans
    blowing in "cool" air.

    >but yes, it looks like there is still a lot of crappy ram on the
    >market


    I think it's also poor mechanical design of the cases. If you can't
    keep cool air flowing over the hot stuff, expect temperature problems.
    I've also seen a lot of designs where the "cool" air flows in over the
    hard drives before cooling the CPU and other warm things.

    Old guy

  6. Re: GCC crash, hardware related?

    On Apr 24, 12:13 pm, ibupro...@painkiller.example.tld (Moe Trin)
    wrote:
    > On 23 Apr 2007, in the Usenet newsgroup comp.os.linux.hardware, in article
    >
    > <1177357912.146383.113...@o5g2000hsb.googlegroups.c om>, snd...@gmail.com wrote:
    > >(Moe Trin) wrote:
    > >> Sounds like a classic memory failure due to heating. When you flog the
    > >> box in a compile, it gets warmer, and some parts may get very hot. The
    > >> RAM dissipates more heat when the bits are flipping than when they are
    > >> remaining in state.

    >
    > >Most ram modules don't come with the heatsinks i don't think it heats
    > >up enough unless overclocked.

    >
    > I take it you haven't seen a Sun UltraSparc. The RAM is mounted in a
    > nominal 5 inch square tunnel with the RAM modules sticking out from the
    > walls like heat fins. At one end of the "tunnel" is a 4.7 inch fan
    > delivering over 80 CF/M of cooling air. First time I saw one (an Ultra
    > 140), I wondered what kept it from blowing off the side of the table
    > (there are also fans cooling the hard drive bays). Then I tried to
    > pick it up, and had the answer.
    >
    > Power dissipation as a function of speed is a well documented fact. In
    > most logic families, both output stages (plus rail to output, and
    > ground to output) are "ON" for a tiny fraction of a second (typically
    > in the sub-nanosecond range), and this is the reason larger integrated
    > circuits have multiple power and ground pins. This is also the reason
    > hardware designers include lots of "bypass" capacitors as close to the
    > power/ground pins on an IC. For 'metal oxide semiconductors' (the
    > logic family used in CPUs, support chips and some RAM), the power
    > consumed is a function of the clock frequency. For most RAM, the power
    > consumed is a function of access (read, write, or refresh), and whether
    > the bit is changing (data is stored on a tiny capacitor that must be
    > charged or discharged to change states from "high" to "low" or
    > vice-versa). Bottom line - like everything else, heat is a function
    > of how hard you are beating on the system.


    In theory, yes. Despite all that i do not have problems with the
    passively cooled
    667 ecc memory chips that receive very little if any airflow

    > >g965 northbridge on the other hand is an egg cooker. on supermicro
    > >though the system just shuts itself down when the temp reaches 50C on
    > >linux. Under windows you keep getting a prompt: "50C reached. Is the
    > >system functioning properly?" regardless of whether you press yes or
    > >no nothing happens. reminds me of some old space quest :-]

    >
    > The Northbridge should be running hot, as it's running at the full
    > CPU spped. Everything else is running slower - sometimes substantially
    > so. 50C? The outside ambient temperature here gets that high several
    > times a summer. +30C (86F) is not at all unusual for an inside ambient.


    my apartment is air conditioned to 68-73F year round.

    > That's why my home systems have extra air filters (pet hair) and fans
    > blowing in "cool" air.


    in my case 12" intake fan was not sufficient and i had to tie a 8cm
    fan directly
    to the larger aftermarket northbridge cooler (thermalright ni-05).
    the stock heatsink was a joke

    > >but yes, it looks like there is still a lot of crappy ram on the
    > >market

    >
    > I think it's also poor mechanical design of the cases. If you can't
    > keep cool air flowing over the hot stuff, expect temperature problems.
    > I've also seen a lot of designs where the "cool" air flows in over the
    > hard drives before cooling the CPU and other warm things.


    yeah, the wife was insistent on replacing the "old" fullsize tower for
    the new "modern looking" centurion 534 midtower. The airflow sucks
    not to mention that airport beacon light makes it impossible to sleep
    in the same room when the system is in standby. The dell docking
    station has the same toothache throbbing pain reminiscent
    power key: then the laptop is docked but off the light throbs.
    What a p.o.s. I want to kill the designer.


  7. Re: GCC crash, hardware related?

    On Wed, 18 Apr 2007 18:44:00 +0000, Harold Weissman wrote:

    > When compiling some (rather large) software packages on one of my Linux
    > boxes, the C compiler sometimes (seldom) crashes. When this happens, it
    > never does happen at the same point in the compilation sequence - or on
    > the same package, at that. This has happened on the two versions of gcc
    > that I have tested (3.3.6 and 4.1.1).
    >
    > Since I understand that gcc is memory hungry, especially when
    > optimizing, I am wondering if this is a bad memory issue? The same
    > software packages and the same compilers do not have any problems on
    > other Linux boxes.
    >
    > What other hardware problems could cause this anyway? It does not
    > seem to be a CPU overheating problem - the failures I report do not seem
    > to be correlated with the system or CPU temperature, as far as I can
    > tell.


    Run sys_basher on your system, if you have a memory problem it will find
    it.

    http://www.polybus.com/sys_basher_web/

  8. Re: GCC crash, hardware related?

    On 30 Apr 2007, in the Usenet newsgroup comp.os.linux.hardware, in article
    <1177963326.675762.235010@o5g2000hsb.googlegroups.c om>, sndive@gmail.com wrote:

    >ibupro...@painkiller.example.tld (Moe Trin) wrote:


    >> For most RAM, the power consumed is a function of access (read,
    >> write, or refresh), and whether the bit is changing (data is stored
    >> on a tiny capacitor that must be charged or discharged to change
    >> states from "high" to "low" or vice-versa). Bottom line - like
    >> everything else, heat is a function of how hard you are beating on
    >> the system.

    >
    >In theory, yes. Despite all that i do not have problems with the
    >passively cooled 667 ecc memory chips that receive very little if
    >any airflow


    Part of that is that manufacturers continue to advance the techniques
    of fabrication, and the geometry - the size of the individual component
    such as a transistor or capacitor - is getting smaller and smaller, and
    this smaller size translates into less power dissipated. To compensate
    for this smaller size, we automatically increase the number of such
    components - remember that the original IBM PC came with 64 Kilobytes
    (not 640 - that was a software limit) of RAM, and a full motherboard
    gave you 128 KB.

    >> The outside ambient temperature here gets that high several times
    >> a summer. +30C (86F) is not at all unusual for an inside ambient.

    >
    >my apartment is air conditioned to 68-73F year round.


    If you're in Chandler, either SRP or APS must love you ;-)

    >> That's why my home systems have extra air filters (pet hair) and fans
    >> blowing in "cool" air.

    >
    >in my case 12" intake fan was not sufficient and i had to tie a 8cm
    >fan directly to the larger aftermarket northbridge cooler (thermalright
    >ni-05). the stock heatsink was a joke


    Something is _seriously_ wrong with the 12 inch intake fan. I'm using
    a "box" that fits over the front of the PC (mid-tower) fabricated from
    furness filter material, feeding a 7 inch fan that rams air in near the
    base and in through the hard drives. The exhaust is out the back, A bit
    noisy, but turn the radio up slightly, and it's fine.

    >yeah, the wife was insistent on replacing the "old" fullsize tower for
    >the new "modern looking" centurion 534 midtower. The airflow sucks
    >not to mention that airport beacon light makes it impossible to sleep
    >in the same room when the system is in standby.


    Computers here are in a separate room. Hmmm, not an option to upgrade
    the wife for a more tolerant model?

    >The dell docking station has the same toothache throbbing pain
    >reminiscent power key: then the laptop is docked but off the light
    >throbs.


    The only lap-doggie I have is the gutted remains of a 386SX which is
    used as a perimeter firewall/router. Work doesn't allow personal
    systems, and the company boxes don't do walkies.

    >What a p.o.s. I want to kill the designer.


    I'm constantly amazed at some of the crap that is offered. What in the
    heck was the designer/marketing group thinking about when they came up
    with this/that product. I know, heresy attributing the capability
    of thinking to marketing, but...

    Old guy

  9. Re: GCC crash, hardware related?

    On May 1, 1:00 pm, ibupro...@painkiller.example.tld (Moe Trin) wrote:
    > On 30 Apr 2007, in the Usenet newsgroup comp.os.linux.hardware, in article
    >
    > <1177963326.675762.235...@o5g2000hsb.googlegroups.c om>, snd...@gmail.com wrote:
    > >ibupro...@painkiller.example.tld (Moe Trin) wrote:
    > >> For most RAM, the power consumed is a function of access (read,
    > >> write, or refresh), and whether the bit is changing (data is stored
    > >> on a tiny capacitor that must be charged or discharged to change
    > >> states from "high" to "low" or vice-versa). Bottom line - like
    > >> everything else, heat is a function of how hard you are beating on
    > >> the system.

    >
    > >In theory, yes. Despite all that i do not have problems with the
    > >passively cooled 667 ecc memory chips that receive very little if
    > >any airflow

    >
    > Part of that is that manufacturers continue to advance the techniques
    > of fabrication, and the geometry - the size of the individual component
    > such as a transistor or capacitor - is getting smaller and smaller, and
    > this smaller size translates into less power dissipated. To compensate
    > for this smaller size, we automatically increase the number of such
    > components - remember that the original IBM PC came with 64 Kilobytes
    > (not 640 - that was a software limit) of RAM, and a full motherboard
    > gave you 128 KB.
    >
    > >> The outside ambient temperature here gets that high several times
    > >> a summer. +30C (86F) is not at all unusual for an inside ambient.

    >
    > >my apartment is air conditioned to 68-73F year round.

    >
    > If you're in Chandler, either SRP or APS must love you ;-)


    Arizona? Would not live in a place where coke bottles left
    unattended explode on their own and the alimony is decucted
    straight out of one's paycheck :-]

    > >> That's why my home systems have extra air filters (pet hair) and fans
    > >> blowing in "cool" air.

    >
    > >in my case 12" intake fan was not sufficient and i had to tie a 8cm
    > >fan directly to the larger aftermarket northbridge cooler (thermalright
    > >ni-05). the stock heatsink was a joke

    >
    > Something is _seriously_ wrong with the 12 inch intake fan. I'm using
    > a "box" that fits over the front of the PC (mid-tower) fabricated from
    > furness filter material, feeding a 7 inch fan that rams air in near the
    > base and in through the hard drives. The exhaust is out the back, A bit
    > noisy, but turn the radio up slightly, and it's fine.
    >

    the fan is run off a 3 pin socket on the mobo.
    according to the supermicro mobo reporting tool it's spinning
    somewhere around 1600 rpm
    mind you i have 3 5400-7200rpm hdds sitting in a rail that the intake
    fan
    blows over.
    how often do you have to take the filters out to clean them with pets
    around?

    > >yeah, the wife was insistent on replacing the "old" fullsize tower for
    > >the new "modern looking" centurion 534 midtower. The airflow sucks
    > >not to mention that airport beacon light makes it impossible to sleep
    > >in the same room when the system is in standby.

    >
    > Computers here are in a separate room. Hmmm, not an option to upgrade
    > the wife for a more tolerant model?
    >

    i was contemplating that but i'm in silicon valley and models are far
    in
    between and even the ugly girls of bangable age are usually
    unavailable.
    Chandler then? or it's pretty much the same hardware?

    > >The dell docking station has the same toothache throbbing pain
    > >reminiscent power key: then the laptop is docked but off the light
    > >throbs.

    >
    > The only lap-doggie I have is the gutted remains of a 386SX which is
    > used as a perimeter firewall/router. Work doesn't allow personal
    > systems, and the company boxes don't do walkies.


    that last part needs translating

    > >What a p.o.s. I want to kill the designer.

    >
    > I'm constantly amazed at some of the crap that is offered. What in the
    > heck was the designer/marketing group thinking about when they came up
    > with this/that product. I know, heresy attributing the capability
    > of thinking to marketing, but...


    but they are the ones who are typically driving the product
    requirements


  10. Re: GCC crash, hardware related?

    On 1 May 2007, in the Usenet newsgroup comp.os.linux.hardware, in article
    <1178050992.415540.126990@e65g2000hsc.googlegroups. com>, sndive@gmail.com
    wrote:

    >ibupro...@painkiller.example.tld (Moe Trin) wrote:


    >> If you're in Chandler, either SRP or APS must love you ;-)

    >
    >Arizona? Would not live in a place where coke bottles left
    >unattended explode on their own


    While we have had our first 100+ day this year, it's already cooled
    back down into the mid-90s.

    >and the alimony is decucted straight out of one's paycheck :-]


    There's LOTS of places where that happens

    >> Something is _seriously_ wrong with the 12 inch intake fan. I'm using
    >> a "box" that fits over the front of the PC (mid-tower) fabricated from
    >> furness filter material, feeding a 7 inch fan that rams air in near the
    >> base and in through the hard drives. The exhaust is out the back, A bit
    >> noisy, but turn the radio up slightly, and it's fine.
    >>

    >the fan is run off a 3 pin socket on the mobo.
    >according to the supermicro mobo reporting tool it's spinning
    >somewhere around 1600 rpm
    >mind you i have 3 5400-7200rpm hdds sitting in a rail that the intake
    >fan blows over.


    My file servers with the extra disks have ducting to exhaust air as
    soon as it passes the hard drives. Amazing how warm they can run on
    their own - especially stacked without extra cooling.

    >how often do you have to take the filters out to clean them with pets
    >around?


    I schedule it every six months - usually even keep close to that
    schedule too. Part of having the filter right down in front is that
    it's in plain sight, and I'll often take a pass over the external
    side with a vacuum cleaner hose to suck off the obvious crap.

    >> Hmmm, not an option to upgrade the wife for a more tolerant model?

    >
    >i was contemplating that but i'm in silicon valley and models are far
    >in between and even the ugly girls of bangable age are usually
    >unavailable.


    Well, it's better than SacraTomato, but that's not saying much

    >Chandler then? or it's pretty much the same hardware?


    UofA is right up the freeway, but reality says "No, Thank You."
    Then again, I _am_ 0x40+, so what would I know? ;-)

    >> Work doesn't allow personal systems, and the company boxes don't
    >> do walkies.

    >
    >that last part needs translating


    The company (actually the division) does not allow company property
    to move about. I have a clunker here that belongs to the company,
    and they have a separate DSL line for it, but it's Paranoia City,
    otherwise. Only reason I have a company box is I'm a network admin,
    and this is my oversized leash. Last I looked at the ethers files,
    I'm guessing we've got less than 25 portable/laptops - compared to
    a hundred times that many desktops.

    >> I know, heresy attributing the capability of thinking to
    >> marketing, but...

    >
    >but they are the ones who are typically driving the product
    >requirements


    Regrettably all to true.

    Old guy

+ Reply to Thread