Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?) - Kernel

This is a discussion on Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?) - Kernel ; On Friday 18 April 2008, Woodruff, Richard wrote: > When capturing some traces with dynamic tick we were noticing the > interrupt latency seems to go up a good amount. If you look at the trace > the gpio IRQ ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?)

  1. Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?)

    On Friday 18 April 2008, Woodruff, Richard wrote:
    > When capturing some traces with dynamic tick we were noticing the
    > interrupt latency seems to go up a good amount. If you look at the trace
    > the gpio IRQ is now offset a good amount. *Good news I guess is its
    > pretty predictable.


    That is, about 24 usec on this CPU ... an ARM v7, which I'm guessing
    is an OMAP34xx running fairly fast (order of 4x faster than most ARMs).

    Similar issues were noted, also using ETM trace, on an ARM920 core [1]
    from Atmel. There, the overhead of NO_HZ was observed to be more like
    150 usec of per-IRQ overhead, which is enough to make NO_HZ non-viable
    in some configurations.


    > I was wondering what thoughts of optimizing this might be.


    Cutting down the math implied by jiffies updates might help.
    The 64 bit math for ktime structs isn't cheap; purely by eyeball,
    that was almost 1/3 the cost of that 24 usec (mostly __do_div64).

    - Dave

    [1] http://marc.info/?l=linux-kernel&m=120471594714499&w=2

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?)

    On Fri, 18 Apr 2008, David Brownell wrote:
    > On Friday 18 April 2008, Woodruff, Richard wrote:
    > > When capturing some traces with dynamic tick we were noticing the
    > > interrupt latency seems to go up a good amount. If you look at the trace
    > > the gpio IRQ is now offset a good amount. *Good news I guess is its
    > > pretty predictable.

    >
    > That is, about 24 usec on this CPU ... an ARM v7, which I'm guessing
    > is an OMAP34xx running fairly fast (order of 4x faster than most ARMs).
    >
    > Similar issues were noted, also using ETM trace, on an ARM920 core [1]
    > from Atmel. There, the overhead of NO_HZ was observed to be more like
    > 150 usec of per-IRQ overhead, which is enough to make NO_HZ non-viable
    > in some configurations.
    >
    >
    > > I was wondering what thoughts of optimizing this might be.

    >
    > Cutting down the math implied by jiffies updates might help.
    > The 64 bit math for ktime structs isn't cheap; purely by eyeball,
    > that was almost 1/3 the cost of that 24 usec (mostly __do_div64).


    Hmm, I have no real good idea to avoid the div64 in the case of a long
    idle sleep. Any brilliant patches are welcome

    Thanks,
    tglx

  3. Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?)

    On Sat, 19 Apr 2008, Thomas Gleixner wrote:

    > On Fri, 18 Apr 2008, David Brownell wrote:
    >> On Friday 18 April 2008, Woodruff, Richard wrote:
    >>> When capturing some traces with dynamic tick we were noticing the
    >>> interrupt latency seems to go up a good amount. If you look at the trace
    >>> the gpio IRQ is now offset a good amount. *Good news I guess is its
    >>> pretty predictable.

    >>
    >> That is, about 24 usec on this CPU ... an ARM v7, which I'm guessing
    >> is an OMAP34xx running fairly fast (order of 4x faster than most ARMs).
    >>
    >> Similar issues were noted, also using ETM trace, on an ARM920 core [1]
    >> from Atmel. There, the overhead of NO_HZ was observed to be more like
    >> 150 usec of per-IRQ overhead, which is enough to make NO_HZ non-viable
    >> in some configurations.
    >>
    >>
    >>> I was wondering what thoughts of optimizing this might be.

    >>
    >> Cutting down the math implied by jiffies updates might help.
    >> The 64 bit math for ktime structs isn't cheap; purely by eyeball,
    >> that was almost 1/3 the cost of that 24 usec (mostly __do_div64).

    >
    > Hmm, I have no real good idea to avoid the div64 in the case of a long
    > idle sleep. Any brilliant patches are welcome


    how long is 'long idle sleep'? and how common are such sleeps? is it
    possibly worth the cost of a test in the hotpath to see if you need to do
    the 64 bit math or can get away with 32 bit math (at least on some
    platforms)

    David Lang

  4. Re: [linux-pm] Higer latency with dynamic tick (need for an io-ondemand govenor?)

    On Saturday 19 April 2008, david@lang.hm wrote:
    > On Sat, 19 Apr 2008, Thomas Gleixner wrote:
    >
    > > On Fri, 18 Apr 2008, David Brownell wrote:
    > >> On Friday 18 April 2008, Woodruff, Richard wrote:
    > >>> When capturing some traces with dynamic tick we were noticing the
    > >>> interrupt latency seems to go up a good amount.
    > >>
    > >>> I was wondering what thoughts of optimizing this might be.
    > >>
    > >> Cutting down the math implied by jiffies updates might help.


    And update_wall_time() costs, too.


    > >> The 64 bit math for ktime structs isn't cheap; purely by eyeball,
    > >> that was almost 1/3 the cost of that 24 usec (mostly __do_div64).

    > >
    > > Hmm, I have no real good idea to avoid the div64 in the case of a long
    > > idle sleep. Any brilliant patches are welcome


    That is, in tick_do_update_jiffies64()?

    delta = ktime_sub(delta, tick_period);
    last_jiffies_update = ktime_add(last_jiffies_update,
    tick_period);

    /* Slow path for long timeouts */
    if (unlikely(delta.tv64 >= tick_period.tv64)) {
    s64 incr = ktime_to_ns(tick_period);

    ticks = ktime_divns(delta, incr);

    last_jiffies_update = ktime_add_ns(last_jiffies_update,
    incr * ticks);
    }
    do_timer(++ticks);

    Some math not shown here is converting clocksource values
    to ktimes ... cyc2ns() has a comment about needing some
    optimization, I wonder if that's an issue here.

    Maybe turning tick_period into an *actual* constant (it's
    a function of HZ) would help a bit; "incr" too.

    Re the "ticks = ktime_divns(...)": since "incr" is constant,
    the first thing that comes to mind is a binary search over a
    precomputed table.

    For HZ=100 (common for ARM) a table of size 128 would exceed
    the normal range of NO_HZ tick rates ... down to below 1 HZ.


    > how long is 'long idle sleep'? and how common are such sleeps?


    The above code says "unlikely()" but that presumes very busy
    systems. I would have assumed taking more than one tick was
    the most common case, since most systems spend more time idle
    than working. I certainly observe it to be the common case,
    and it's a power management optimization goal.


    > is it
    > possibly worth the cost of a test in the hotpath to see if you need to do
    > the 64 bit math or can get away with 32 bit math (at least on some
    > platforms)


    Possibly opening a can of worms, I'll observe that when the
    concern is just to update jiffies, converting to ktime values
    seems all but needless. Deltas at the level of a clocksource
    can be mapped to jiffies as easily as deltas at the nsec level,
    saving some work...

    Those delta tables could use just 32 bit values in the most
    common cases: clocksource ticking at less than 4 GHz, and
    the IRQs firing more often than once a second.

    - Dave
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread