Re: Recent Problems with RELENG_7 i386 - FreeBSD

This is a discussion on Re: Recent Problems with RELENG_7 i386 - FreeBSD ; On Wed, Oct 08, 2008 at 10:00:32PM -0700, bf wrote: > > > > --- On Wed, 10/8/08, Jeremy Chadwick wrote: > > > From: Jeremy Chadwick > > Subject: Re: Recent Problems with RELENG_7 i386 > > To: "bf" ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: Re: Recent Problems with RELENG_7 i386

  1. Re: Recent Problems with RELENG_7 i386

    On Wed, Oct 08, 2008 at 10:00:32PM -0700, bf wrote:
    >
    >
    >
    > --- On Wed, 10/8/08, Jeremy Chadwick wrote:
    >
    > > From: Jeremy Chadwick
    > > Subject: Re: Recent Problems with RELENG_7 i386
    > > To: "bf"
    > > Cc: freebsd-stable@freebsd.org
    > > Date: Wednesday, October 8, 2008, 2:36 PM
    > > On Wed, Oct 08, 2008 at 10:19:47AM -0700, bf wrote:
    > > > After updating to RELENG_7 i386 of this weekend, I

    > > have been having problems
    > > > with my machine. When booting normally, the system

    > > slows or hangs at the
    > > > login prompt. If I am able to continue past the

    > > prompt, I sometimes experience
    > > > erratic mouse behavior, and a subsequent hang, after

    > > varying lengths of time,
    > > > even under light workloads. The same problem does not

    > > seem to occur in
    > > > single-user mode, and did not occur with the RELENG_7

    > > i386 of just over a
    > > > week ago. I have been unable to obtain crashdumps so

    > > far, and the only
    > > > log messages I can find that weren't present

    > > before are notices like those
    > > > recorded below:
    > > >
    > > > Oct 8 11:00:40 myhost kernel: t_delta

    > > 15.fd80bdcb75b60200 too short
    > >
    > > This comes from src/sys/kern/kern_tc.c, around line 908.
    > > I'm not
    > > familiar with the kernel, but two ideas come to mind:
    > >
    > > 1) If you have Intel SpeedStep (EIST) or AMD
    > > Cool'n'Quiet enabled in
    > > your BIOS, try disabling it,
    > >
    > > 2) If you're using powerd, disable it (I don't see
    > > it enabled),
    > >
    > > 3) Try keeping HZ at 1000 (the default).
    > >

    >
    > Thanks, Jeremy, for taking the time to consider my question and reply.
    >
    > My CPU is pre-Cool'n'Quiet, and as far as I can tell I had disabled
    > all forms of power management that may affect the clock speeds. I have
    > found that by raising kern.hz to 250, or by using the default, I no
    > longer receive the t_delta is too short messages, and the other problems
    > are no longer apparent. My question is: why did this occur now?


    I don't know. We can't rewind time and find out system parameters and
    kernel details from 6 months ago. :-)

    I'm thinking it might have something to do with the timecounter selected
    by the kernel, but as I said, we can't rewind time to find out what
    things were in the past.

    The kernel environment variables I'm talking about are kern.timecounter.
    "sysctl kern.timecounter" could help shed some light here, maybe. It
    would at least allow us to see what timecounters are available on your
    system, and if a bad/unreliable one is being selected automatically.

    > I have been using a similar configuration for months now without any
    > apparent problems. My original goal in using a lower kern.hz was to
    > avoid burdening my machine with excessive context switching.


    This is over my head, technically. I would need to pull John Baldwin
    into this, since he knows a bit about both (timecounters and context
    switching). I'm just a simple caveman..... :-)

    > I saw the relevant section of kern_tc.c before I wrote my first
    > message, but when skimming through the changes in RELENG_7 over the
    > past week or two, I couldn't see any commit that may have directly
    > affected kernel timekeeping. Has some new workload been imposed on
    > the system by recent changes, that may have made a kern.hz of 100
    > insufficient? Is this tuneable setting properly implemented, so that
    > all parts of the base system are using it's current value rather than
    > the default? Could some of my hardware, such as my RTC, be
    > malfunctioning?


    Well, I believe HZ was increased from 100 to 1000 long ago (RELENG_6?)
    as a default. I'm really not sure of the implications of decreasing it,
    besides having less granularity for some things (the only things I know
    of would be something pertaining to firewalls, I just can't remember
    what. My brain is full. :-) )

    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  2. Re: Recent Problems with RELENG_7 i386

    On Thu, 9 Oct 2008, Jeremy Chadwick wrote:
    > On Fri, Oct 10, 2008 at 03:51:02AM +1100, Ian Smith wrote:

    [..]
    > > | CPU: AMD Athlon(tm) Processor (906.35-MHz 686-class CPU)
    > > | Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
    > > | inittimecounter(0)... Timecounters tick every 10.000 msec
    > >
    > > ie HZ=100, as mentioned, and using ACPI-safe as later confirmed. So
    > > it's either a different kernel or bf updated kern.hz from loader.conf?

    >
    > Yep -- his original mail had loader.conf shown, with this in it (near the
    > bottom):
    >
    > kern.hz=100


    Missed that, thanks.

    > > > Well, I believe HZ was increased from 100 to 1000 long ago (RELENG_6?)
    > > > as a default. I'm really not sure of the implications of decreasing it,
    > > > besides having less granularity for some things (the only things I know
    > > > of would be something pertaining to firewalls, I just can't remember
    > > > what. My brain is full. :-) )

    > >
    > > You need a day off But yes, RELENG_5 still had HZ=100 default, long
    > > after the 'average' CPU clock frequency was 10 or more times faster than
    > > the 166MHz Pentiums and such (mostly then on only 100Mbps ethernet) that
    > > were comfortable at 100Hz slicing. 1000Hz was a big shift to catch up.
    > >
    > > In a day or so playing around with it years ago, I found 200-250Hz good
    > > for 300MHz, 500Hz a bit much, 1000Hz way too busy, and find my 1133MHz
    > > P3-M happy enough at 1000Hz, though I've done no specific tests on it.


    bf: to answer your later question; this was entirely empirical, a wet
    finger in the wind. The machine in question, idle with X+KDE, 5 or 6
    konsoles, 5 or 6 idle kwrite sessions, xmms, mozilla, several servers
    (httpd, mysqld, sendmail, named ..) with all of its 160MB RAM and about
    as much swap in use (but statically, not paging in or out), shows about
    9-10% CPU doing nothing, with HZ=200. From my notes then, with HZ=500,
    about an _extra_ 7% CPU shown in top. With HZ=1000, an _extra_ 20% odd.
    Such things are kinda hard to notice with a fast CPU and enough RAM

    OTOH, little difference betwee HZ=200 and 100. So, 200 is a sweet spot,
    on a 300MHz laptop that only halves its CPU frequency when on battery.

    Charles: your general impression of context switching is about right.
    Ignoring interrupt context for the moment, at every tick the scheduler
    has to figure out the highest priority ready-to-run process to switch
    to, update the CPU context (registers, memory selectors and such), then
    run that process for (the remainder of) one tick time.

    If it takes X CPU cycles to switch context for a tick interval of Y ms,
    then the switching overhead is proportional to X/Y. If we now drop the
    CPU frequency from say 1GHz to say 100MHz (eg w/powerd) then those X
    cycles take ~10 times longer to execute, but the tick interval Y remains
    constant, so the switching overhead becomes proportional to 10X/Y, ie a
    greater proportion of the tick's time is taken up by context switching.
    And then of course the selected process can only execute something less
    than a tenth as many instructions before being preempted again.

    > > Some people had perhaps similar clock issues when their fast processors
    > > were throttling/stepping down to very low speeds (100, even 75MHz) while
    > > still slicing at 1000Hz, which I didn't find too surprising. Limiting
    > > minimum CPU freq to 300Mz or more seemed to solve many such issues, but
    > > I haven't your perseverance for digging up the relevant threads ..
    > >
    > > Even in 5.5-S (/sys/conf/NOTES and /sys/i386/conf/NOTES) HZ=1000 or 2000
    > > was suggested for DEVICE_POLLING (which bf included in config, though
    > > maybe it's not enabled?) and HZ=1000 or more was recommended when using
    > > DUMMYNET with ipfw - to provide smoother queue dispatching, I gather.
    > >
    > > Bottom line, IMHO, bf should probably run the default 1000Hz, 500 at
    > > least, on an Athlon 900. With powerd, maybe set min. freq >= 150MHz?

    >
    > Wow, this is fantastic information. You've just educated me a great bit
    > about the history and use of HZ. I've always had a "general" idea of
    > its importance and key role, but I was never fully aware of the history.


    Well hardly fantastic, nor authoritative, neither am I aware of any more
    than I've gleaned by flying by the seat of my pants with several small
    machines, occasionally trying to figure out how the ACPI / cpufreq code
    works; I haven't yet been game to delve into the scheduler(s) code.

    The last time I actually wrote anything run by a scheduler was in '72 in
    IBM S/370 DOS assembler to charge CPU time used to various departments,
    unless you count some '90s MSDOS/DesqView 'TSR' gadgets in Turbo Pascal,
    so I'm a very long way off the pace; still the principles are the same.

    Now I hope someone who groks FreeBSD scheduling will be irritated enough
    by my simplistic generalisations to say something more meaningful

    > P.S. -- I need more like 6 months off. I've never taken an official
    > (read: real) vacation my entire life. Maybe some day I'll get to travel
    > to Seoul and visit Pyun Yong-Hyeon and drink lots of soju. :-)


    No soju here, but I'm sure we could find you some well-respected local
    brew or other, so sing out if you happen to be passing northern NSW ..

    cheers, Ian
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


+ Reply to Thread