thermal events (or lack thereof) - Debian

This is a discussion on thermal events (or lack thereof) - Debian ; Hi, I've been working steadily over the past few weeks to get my new HP nx6125 working under Debian (amd64 port) and have made significant progress. However, there is one considerable problem: thermal events don't seem to be recognised or ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: thermal events (or lack thereof)

  1. thermal events (or lack thereof)

    Hi,

    I've been working steadily over the past few weeks to get my new HP nx6125
    working under Debian (amd64 port) and have made significant progress.
    However, there is one considerable problem: thermal events don't seem to be
    recognised or processed by the kernel (until I do a
    cat /proc/acpi/thermal_zone/TZ?/temperature). As soon as I do anything CPU
    intensive I really run the risk of frying my laptop :-(

    To be more specific, I am running kernel 2.6.14.3
    (www.kernel.org vanilla) with the double timer patch applied (see
    http://bugzilla.kernel.org/attachmen...61&action=view). When I boot
    up, my thermal trip points get set nicely. The first is at 58 *C, then 65*C,
    then 75 *C, and 80*C (S5 = 95 *C). When I was testing things I was doing
    frequent executions of

    cat /proc/acpi/thermal_zone/TZ?/temperature

    and I would observe the temp rise to 58 C, then the fan would kick in, the
    first trip point would then (automatically) re-set to 50 C and the CPU would
    cool through 8 C before the fan turned off (nice, I thought, and very clever
    this re-setting of trip points--sorry I'm very new to ACPI). When the fan
    turned off, the trip point would again re-set to 58 C. So, I thought all was
    working well. However, subsequent tests done by running glxgears and not
    executing the above cat command allowed the CPU temp to rise above several
    trip points without the fans kicking in! Only when I ran the above cat
    command did the fans start!?

    So, I stopped acpid and did a

    cat /proc/acpi/event

    while running glxgears. I waited a while and then did a
    cat /proc/acpi/thermal_zone/TZ?/temperature to see that indeed the temp of
    TZ1 had exceeded 58C---and immediately /proc/acpi/event received a thermal
    event (note: the temp had already exceeded 58 C, my first thermal trip point;
    the thermal event only occurred when I did the 'cat'). So, in order for
    thermal events to "get through/processed" I need to keep doing
    cat /proc/acpi/thermal_zone/TZ?/temperature!!!!

    Can anybody shed some light on this behaviour. I don't know much about ACPI,
    but it seems (?) like the linux kernel is not processing the thermal events
    properly. Incidentally, I am also seeing spurious syslog errors that read
    APIC error on CPU0: 40(40) meaning that some interrupts presumably are not
    being correctly identified by the interrupt controller (thermal ones? could
    there be some correlation here?).

    Other info: the HP nx6125 is a Turion 64 based laptop with ATI chipset (yes, I
    know). I am running acpid and have just installed powernowd (doesn't fix it).
    I have also observed the above behaviour running the standard Debian
    2.6.12-1-amd64 kernel (booting with no_timer_check to avoid double timer
    interrupts). Any help, suggestions would be greatly appreciated. If anyone
    has any ideas why catting /proc/acpi/thermal_zone/TZ?/temperature gets
    things to work, I'd be very happy to hear an explanation, too.

    Richard


    --
    To UNSUBSCRIBE, email to debian-laptop-REQUEST@lists.debian.org
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

  2. Re: thermal events (or lack thereof)

    Richard Mace wrote:

    > I've been working steadily over the past few weeks to get my new HP nx6125
    > working under Debian (amd64 port) and have made significant progress.
    > However, there is one considerable problem: thermal events don't seem to be
    > recognised or processed by the kernel (until I do a
    > cat /proc/acpi/thermal_zone/TZ?/temperature). As soon as I do anything CPU
    > intensive I really run the risk of frying my laptop :-(


    For what it's worth, I used to have that problem but that got fixed
    somewhere around 2.6.11 . This is on a regular amd laptop, not amd64.

    > To be more specific, I am running kernel 2.6.14.3


    Strange. Could it be a regression ? A good workaround for me was
    to echo a 1 into /proc/acpi/thermal_zone/THRM/polling_frequency from
    a bootscript. This had the same side-effect as periodically reading
    the temperature and it didn't consume much CPU.

    The problem had to do with delayed ACPI events, not just thermal
    events. It was easiest to diagnose with a non-preemptible kernel.
    There was an 8--10 event delay in processing. For example it would
    take several keypresses to change the screen brightness, and
    the brightness would overshoot for the first few keypresses when
    I tried to bring it back.

    With a preemptible kernel the pattern was not so clear. The
    delay queue would tend to empty itself with no intervention,
    but not reliably. It just made the problem harder to observe.
    The polling_frequency hack still worked so that's what I did.


    --
    pa at panix dot com

+ Reply to Thread