[REV2: PATCH 0/2] workqueue: Two API's to update delayed works quickly - Kernel

This is a discussion on [REV2: PATCH 0/2] workqueue: Two API's to update delayed works quickly - Kernel ; From: Krishna Kumar Implement two API's for quickly updating delayed works: int schedule_update_delayed_work(struct delayed_work *dwork, unsigned long delay); int queue_update_delayed_work(struct workqueue_struct *wq, struct delayed_work *dwork, unsigned long delay); I tested the following combinations, after Oleg suggested some improvements: A. - ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: [REV2: PATCH 0/2] workqueue: Two API's to update delayed works quickly

  1. [REV2: PATCH 0/2] workqueue: Two API's to update delayed works quickly

    From: Krishna Kumar

    Implement two API's for quickly updating delayed works:
    int schedule_update_delayed_work(struct delayed_work *dwork,
    unsigned long delay);
    int queue_update_delayed_work(struct workqueue_struct *wq,
    struct delayed_work *dwork,
    unsigned long delay);

    I tested the following combinations, after Oleg suggested some improvements:
    A. - Original submission.
    1 CPU -> Org: 474213 New: 55365 Time saved: 88.3%
    4 CPU -> Org: 3650631 New: 225160 Time saved: 93.8%
    B1. - Oleg's suggestion to avoid costly __cancel_work_timer.
    1 CPU -> Org: 474213 New: 233714 Time saved: 50.7%
    4 CPU -> Org: 3650631 New: 959455 Time saved: 73.7%
    B2. - B1 + check for same expiry.
    1 CPU -> Org: 474213 New: 72276 Time saved: 84.8%
    4 CPU -> Org: 3650631 New: 301510 Time saved: 91.7%
    C. - Merge Oleg's idea with code A.
    1 CPU -> Org: 474213 New: 55160 Time saved: 88.4%
    4 CPU -> Org: 3650631 New: 215369 Time saved: 94.1%

    Merged code seems to perform the best, though the difference with the original
    submission is only marginal. Code snippets below with comments removed:

    A: Original submission:
    -----------------------
    void queue_update_delayed_work(struct workqueue_struct *wq,
    struct delayed_work *dwork, unsigned long delay)
    {
    struct work_struct *work = &dwork->work;

    if (likely(test_and_set_bit(WORK_STRUCT_PENDING,
    work_data_bits(work)))) {
    struct timer_list *timer = &dwork->timer;

    if (jiffies + delay == timer->expires)
    return;

    __cancel_work_timer_internal(work, timer);
    }

    __queue_delayed_work(-1, dwork, work, wq, delay);
    }

    B1: Oleg suggestion:
    ------------------------
    (I optimized a bit as parallel updates are not important, only one needs to
    succeed)

    int queue_update_delayed_work(struct workqueue_struct *wq,
    struct delayed_work *dwork, unsigned long delay)
    {
    unsigned long when = jiffies + delay;

    if (queue_delayed_work(wq, dwork, delay))
    return 1;

    if (update_timer_expiry(&dwork->timer, when))
    return 0;

    cancel_work_sync(&dwork->work);
    queue_delayed_work(wq, dwork, delay);
    return 0;
    }

    B2: B1 + extra check:
    ---------------------
    int queue_update_delayed_work(struct workqueue_struct *wq,
    struct delayed_work *dwork, unsigned long delay)
    {
    unsigned long when = jiffies + delay;

    if (queue_delayed_work(wq, dwork, delay))
    return 1;

    if (when == dwork->timer.expires ||
    update_timer_expiry(&dwork->timer, when))
    return 0;

    cancel_work_sync(&dwork->work);
    queue_delayed_work(wq, dwork, delay);
    return 0;
    }

    C: Combined code:
    -----------------
    int queue_update_delayed_work(struct workqueue_struct *wq,
    struct delayed_work *dwork, unsigned long delay)
    {
    struct work_struct *work = &dwork->work;

    if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
    __queue_delayed_work(-1, dwork, work, wq, delay);
    return 1;
    } else {
    struct timer_list *timer = &dwork->timer;
    unsigned long when = jiffies + delay;
    int ret;

    if (timer->expires == when)
    return 0;

    do {
    if (likely(update_timer_expiry(timer, when)))
    return 0;

    ret = try_to_grab_pending(work);
    if (ret < 0)
    wait_on_work(work);
    } while (ret < 0);

    __queue_delayed_work(-1, dwork, work, wq, delay);
    return 0;
    }
    }

    I am submitting the merged code. Please review and provide feedback.

    Thanks

    - KK

    [PATCH 1/2]: Implement the kernel API's
    [PATCH 2/2]: Modify some drivers to use the new APIs instead of the old ones

    Signed-off-by: Krishna Kumar
    ---

    (Description of the patches from first submission:
    These API's are useful to update an existing work entry more efficiently (but
    can be used to queue a new work entry too) when the operation is done very
    frequently. The rationale is to save time of first cancelling work/timer and
    adding work/timer when the same work is added many times in quick succession.

    queue_update_delayed_work_on() and schedule_update_delayed_work_on() are not
    implemented as there are no users which can take advantage of these variations.

    Example possible users (FS's, wireless drivers):
    ------------------------------------------------
    lbs_scan_networks, nfs4_renew_state, ocfs2_schedule_truncate_log_flush,
    nfs4_schedule_state_renewal, afs_flush_callback_breaks, afs_reap_server,
    afs_purge_servers, afs_vlocation_reaper, afs_vlocation_purge,
    o2hb_arm_write_timeout, lbs_scan_networks, isr_indicate_rf_kill,
    lbs_postpone_association_work, isr_scan_complete, ipw_radio_kill_sw,
    __ipw_led_activity_on, ipw_radio_kill_sw, ds2760_battery_resume, etc.

    Performance (numbers in jiffies):
    ---------------------------------
    These API's are useful in following cases:
    a. add followed by cancel+add with long period between the add's. Time
    saved is the time to clear and re-setting the WORK_STRUCT_PENDING
    bit, plus the reduction of one API call.
    b. add followed by cancel+add with very short time between the add's.
    Time saved is all of above, plus avoid cancelling and re-adding.
    )
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [REV2: PATCH 2/2]: workqueue: Modify some users to use the new API

    From: Krishna Kumar

    Modify some users to use the new API.

    Signed-off-by: Krishna Kumar
    ---
    drivers/net/wireless/ipw2100.c | 12 ++++++------
    drivers/net/wireless/libertas/scan.c | 5 ++---
    drivers/net/wireless/libertas/wext.c | 7 +++----
    fs/afs/vlocation.c | 11 +++--------
    fs/ocfs2/cluster/heartbeat.c | 5 ++---
    5 files changed, 16 insertions(+), 24 deletions(-)

    diff -ruNp 2.6.27-rc7-org/drivers/net/wireless/ipw2100.c 2.6.27-rc7-new/drivers/net/wireless/ipw2100.c
    --- 2.6.27-rc7-org/drivers/net/wireless/ipw2100.c 2008-09-22 03:59:55.000000000 +0530
    +++ 2.6.27-rc7-new/drivers/net/wireless/ipw2100.c 2008-09-29 12:06:55.000000000 +0530
    @@ -2085,9 +2085,8 @@ static void isr_indicate_rf_kill(struct

    /* Make sure the RF Kill check timer is running */
    priv->stop_rf_kill = 0;
    - cancel_delayed_work(&priv->rf_kill);
    - queue_delayed_work(priv->workqueue, &priv->rf_kill,
    - round_jiffies_relative(HZ));
    + queue_update_delayed_work(priv->workqueue, &priv->rf_kill,
    + round_jiffies_relative(HZ));
    }

    static void send_scan_event(void *data)
    @@ -4241,9 +4240,10 @@ static int ipw_radio_kill_sw(struct ipw2
    "disabled by HW switch\n");
    /* Make sure the RF_KILL check timer is running */
    priv->stop_rf_kill = 0;
    - cancel_delayed_work(&priv->rf_kill);
    - queue_delayed_work(priv->workqueue, &priv->rf_kill,
    - round_jiffies_relative(HZ));
    +
    + queue_update_delayed_work(priv->workqueue,
    + &priv->rf_kill,
    + round_jiffies_relative(HZ));
    } else
    schedule_reset(priv);
    }
    diff -ruNp 2.6.27-rc7-org/drivers/net/wireless/libertas/scan.c 2.6.27-rc7-new/drivers/net/wireless/libertas/scan.c
    --- 2.6.27-rc7-org/drivers/net/wireless/libertas/scan.c 2008-09-22 03:59:55.000000000 +0530
    +++ 2.6.27-rc7-new/drivers/net/wireless/libertas/scan.c 2008-09-29 12:06:55.000000000 +0530
    @@ -435,9 +435,8 @@ int lbs_scan_networks(struct lbs_private
    priv->scan_channel = to_scan;
    else
    priv->scan_channel += to_scan;
    - cancel_delayed_work(&priv->scan_work);
    - queue_delayed_work(priv->work_thread, &priv->scan_work,
    - msecs_to_jiffies(300));
    + queue_update_delayed_work(priv->work_thread,
    + &priv->scan_work, msecs_to_jiffies(300));
    /* skip over GIWSCAN event */
    goto out;
    }
    diff -ruNp 2.6.27-rc7-org/drivers/net/wireless/libertas/wext.c 2.6.27-rc7-new/drivers/net/wireless/libertas/wext.c
    --- 2.6.27-rc7-org/drivers/net/wireless/libertas/wext.c 2008-09-22 03:59:55.000000000 +0530
    +++ 2.6.27-rc7-new/drivers/net/wireless/libertas/wext.c 2008-09-29 12:06:55.000000000 +0530
    @@ -24,10 +24,9 @@

    static inline void lbs_postpone_association_work(struct lbs_private *priv)
    {
    - if (priv->surpriseremoved)
    - return;
    - cancel_delayed_work(&priv->assoc_work);
    - queue_delayed_work(priv->work_thread, &priv->assoc_work, HZ / 2);
    + if (!priv->surpriseremoved)
    + queue_update_delayed_work(priv->work_thread, &priv->assoc_work,
    + HZ / 2);
    }

    static inline void lbs_cancel_association_work(struct lbs_private *priv)
    diff -ruNp 2.6.27-rc7-org/fs/afs/vlocation.c 2.6.27-rc7-new/fs/afs/vlocation.c
    --- 2.6.27-rc7-org/fs/afs/vlocation.c 2008-09-22 03:59:55.000000000 +0530
    +++ 2.6.27-rc7-new/fs/afs/vlocation.c 2008-09-29 12:06:55.000000000 +0530
    @@ -557,12 +557,8 @@ static void afs_vlocation_reaper(struct
    if (expiry > now) {
    delay = (expiry - now) * HZ;
    _debug("delay %lu", delay);
    - if (!schedule_delayed_work(&afs_vlocation_reap,
    - delay)) {
    - cancel_delayed_work(&afs_vlocation_reap);
    - schedule_delayed_work(&afs_vlocation_reap,
    - delay);
    - }
    + schedule_update_delayed_work(&afs_vlocation_reap,
    + delay);
    break;
    }

    @@ -615,8 +611,7 @@ void afs_vlocation_purge(void)
    &afs_vlocation_update, 0);
    destroy_workqueue(afs_vlocation_update_worker);

    - cancel_delayed_work(&afs_vlocation_reap);
    - schedule_delayed_work(&afs_vlocation_reap, 0);
    + schedule_update_delayed_work(&afs_vlocation_reap, 0);
    }

    /*
    diff -ruNp 2.6.27-rc7-org/fs/ocfs2/cluster/heartbeat.c 2.6.27-rc7-new/fs/ocfs2/cluster/heartbeat.c
    --- 2.6.27-rc7-org/fs/ocfs2/cluster/heartbeat.c 2008-09-22 03:59:55.000000000 +0530
    +++ 2.6.27-rc7-new/fs/ocfs2/cluster/heartbeat.c 2008-09-29 12:06:55.000000000 +0530
    @@ -172,10 +172,9 @@ static void o2hb_arm_write_timeout(struc
    {
    mlog(0, "Queue write timeout for %u ms\n", O2HB_MAX_WRITE_TIMEOUT_MS);

    - cancel_delayed_work(&reg->hr_write_timeout_work);
    reg->hr_last_timeout_start = jiffies;
    - schedule_delayed_work(&reg->hr_write_timeout_work,
    - msecs_to_jiffies(O2HB_MAX_WRITE_TIMEOUT_MS));
    + schedule_update_delayed_work(&reg->hr_write_timeout_work,
    + msecs_to_jiffies(O2HB_MAX_WRITE_TIMEOUT_MS));
    }

    static void o2hb_disarm_write_timeout(struct o2hb_region *reg)
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. [REV2: PATCH 1/2]: workqueue: Implement the kernel API

    From: Krishna Kumar

    Implement two API's for quickly updating delayed works:
    int schedule_update_delayed_work(struct delayed_work *dwork,
    unsigned long delay);
    int queue_update_delayed_work(struct workqueue_struct *wq,
    struct delayed_work *dwork,
    unsigned long delay);

    These API's are useful to update an existing work entry more efficiently (but
    can also be used to queue a new work entry) when the operation is done very
    frequently. The rationale is to save time of first cancelling work/timer and
    adding work/timer when the same work is added many times in quick succession.

    Signed-off-by: Krishna Kumar
    ---
    include/linux/timer.h | 1
    include/linux/workqueue.h | 4 +
    kernel/timer.c | 22 ++++++
    kernel/workqueue.c | 124 +++++++++++++++++++++++++++++++-----
    4 files changed, 135 insertions(+), 16 deletions(-)

    diff -ruNp 2.6.27-rc7-org/include/linux/timer.h 2.6.27-rc7-new/include/linux/timer.h
    --- 2.6.27-rc7-org/include/linux/timer.h 2008-09-22 03:59:55.000000000 +0530
    +++ 2.6.27-rc7-new/include/linux/timer.h 2008-09-29 12:06:17.000000000 +0530
    @@ -88,6 +88,7 @@ extern void add_timer_on(struct timer_li
    extern int del_timer(struct timer_list * timer);
    extern int __mod_timer(struct timer_list *timer, unsigned long expires);
    extern int mod_timer(struct timer_list *timer, unsigned long expires);
    +extern int update_timer_expiry(struct timer_list *timer, unsigned long expires);

    /*
    * The jiffies value which is added to now, when there is no timer
    diff -ruNp 2.6.27-rc7-org/include/linux/workqueue.h 2.6.27-rc7-new/include/linux/workqueue.h
    --- 2.6.27-rc7-org/include/linux/workqueue.h 2008-09-22 03:59:55.000000000 +0530
    +++ 2.6.27-rc7-new/include/linux/workqueue.h 2008-09-29 12:06:17.000000000 +0530
    @@ -185,6 +185,8 @@ extern int queue_delayed_work(struct wor
    struct delayed_work *work, unsigned long delay);
    extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
    struct delayed_work *work, unsigned long delay);
    +extern int queue_update_delayed_work(struct workqueue_struct *wq,
    + struct delayed_work *dwork, unsigned long delay);

    extern void flush_workqueue(struct workqueue_struct *wq);
    extern void flush_scheduled_work(void);
    @@ -194,6 +196,8 @@ extern int schedule_work_on(int cpu, str
    extern int schedule_delayed_work(struct delayed_work *work, unsigned long delay);
    extern int schedule_delayed_work_on(int cpu, struct delayed_work *work,
    unsigned long delay);
    +extern int schedule_update_delayed_work(struct delayed_work *work,
    + unsigned long delay);
    extern int schedule_on_each_cpu(work_func_t func);
    extern int current_is_keventd(void);
    extern int keventd_up(void);
    diff -ruNp 2.6.27-rc7-org/kernel/timer.c 2.6.27-rc7-new/kernel/timer.c
    --- 2.6.27-rc7-org/kernel/timer.c 2008-09-22 03:59:55.000000000 +0530
    +++ 2.6.27-rc7-new/kernel/timer.c 2008-09-29 12:06:29.000000000 +0530
    @@ -520,6 +520,28 @@ static struct tvec_base *lock_timer_base
    }
    }

    +/*
    + * update_timer: Quickly update the timer expiry if the timer is pending.
    + *
    + * Return 1 if the timer was successfully updated; otherwise return 0.
    + */
    +int update_timer_expiry(struct timer_list *timer, unsigned long expires)
    +{
    + struct tvec_base *base;
    + unsigned long flags;
    + int ret = 0;
    +
    + base = lock_timer_base(timer, &flags);
    + if (likely(timer_pending(timer))) {
    + detach_timer(timer, 0);
    + timer->expires = expires;
    + internal_add_timer(base, timer);
    + ret = 1;
    + }
    + spin_unlock_irqrestore(&base->lock, flags);
    + return ret;
    +}
    +
    int __mod_timer(struct timer_list *timer, unsigned long expires)
    {
    struct tvec_base *base, *new_base;
    diff -ruNp 2.6.27-rc7-org/kernel/workqueue.c 2.6.27-rc7-new/kernel/workqueue.c
    --- 2.6.27-rc7-org/kernel/workqueue.c 2008-09-22 03:59:55.000000000 +0530
    +++ 2.6.27-rc7-new/kernel/workqueue.c 2008-09-29 12:20:27.000000000 +0530
    @@ -202,6 +202,30 @@ static void delayed_work_timer_fn(unsign
    __queue_work(wq_per_cpu(wq, smp_processor_id()), &dwork->work);
    }

    +static inline void __queue_delayed_work(int cpu, struct delayed_work *dwork,
    + struct work_struct *work,
    + struct workqueue_struct *wq,
    + unsigned long delay)
    +{
    + struct timer_list *timer = &dwork->timer;
    +
    + BUG_ON(timer_pending(timer));
    + BUG_ON(!list_empty(&work->entry));
    +
    + timer_stats_timer_set_start_info(timer);
    +
    + /* This stores cwq for the moment, for the timer_fn */
    + set_wq_data(work, wq_per_cpu(wq, raw_smp_processor_id()));
    + timer->expires = jiffies + delay;
    + timer->data = (unsigned long)dwork;
    + timer->function = delayed_work_timer_fn;
    +
    + if (unlikely(cpu >= 0))
    + add_timer_on(timer, cpu);
    + else
    + add_timer(timer);
    +}
    +
    /**
    * queue_delayed_work - queue work on a workqueue after delay
    * @wq: workqueue to use
    @@ -233,31 +257,75 @@ int queue_delayed_work_on(int cpu, struc
    struct delayed_work *dwork, unsigned long delay)
    {
    int ret = 0;
    - struct timer_list *timer = &dwork->timer;
    struct work_struct *work = &dwork->work;

    if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
    - BUG_ON(timer_pending(timer));
    - BUG_ON(!list_empty(&work->entry));
    -
    - timer_stats_timer_set_start_info(&dwork->timer);
    -
    - /* This stores cwq for the moment, for the timer_fn */
    - set_wq_data(work, wq_per_cpu(wq, raw_smp_processor_id()));
    - timer->expires = jiffies + delay;
    - timer->data = (unsigned long)dwork;
    - timer->function = delayed_work_timer_fn;
    -
    - if (unlikely(cpu >= 0))
    - add_timer_on(timer, cpu);
    - else
    - add_timer(timer);
    + __queue_delayed_work(cpu, dwork, work, wq, delay);
    ret = 1;
    }
    return ret;
    }
    EXPORT_SYMBOL_GPL(queue_delayed_work_on);

    +static int try_to_grab_pending(struct work_struct *work);
    +static void wait_on_work(struct work_struct *work);
    +
    +/**
    + * queue_update_delayed_work - queue or update a work entry on a workqueue
    + * after delay
    + * @wq: workqueue to use
    + * @dwork: delayable work to queue
    + * @delay: number of jiffies to wait before queueing work
    + *
    + * This function is useful to update an existing delayed work without having
    + * to first cancel it. E.g. code snippets that can use this API:
    + * if (delayed_work_pending(&work))
    + * cancel_delayed_work(&work);
    + * queue_delayed_work(wq, &work, delay);
    + *
    + * Returns 0 if @work was already on a queue, non-zero otherwise.
    + */
    +int queue_update_delayed_work(struct workqueue_struct *wq,
    + struct delayed_work *dwork, unsigned long delay)
    +{
    + struct work_struct *work = &dwork->work;
    +
    + if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
    + __queue_delayed_work(-1, dwork, work, wq, delay);
    + return 1;
    + } else {
    + struct timer_list *timer = &dwork->timer;
    + unsigned long when = jiffies + delay;
    + int ret;
    +
    + /*
    + * Work is already scheduled. There is nothing to be done if
    + * the work is being modified to run at the same time.
    + */
    + if (timer->expires == when)
    + return 0;
    +
    + do {
    + /*
    + * If the timer has been added and it has not fired,
    + * update_timer_expiry will update it with the correct
    + * expiry. Otherwise timer has either not yet been
    + * added or it has already fired - we need to try again.
    + */
    + if (likely(update_timer_expiry(timer, when)))
    + return 0;
    +
    + ret = try_to_grab_pending(work);
    + if (ret < 0)
    + wait_on_work(work);
    + } while (ret < 0);
    +
    + __queue_delayed_work(-1, dwork, work, wq, delay);
    + return 0;
    + }
    +}
    +EXPORT_SYMBOL_GPL(queue_update_delayed_work);
    +
    static void run_workqueue(struct cpu_workqueue_struct *cwq)
    {
    spin_lock_irq(&cwq->lock);
    @@ -667,6 +735,30 @@ int schedule_delayed_work_on(int cpu,
    EXPORT_SYMBOL(schedule_delayed_work_on);

    /**
    + * schedule_update_delayed_work - queue or update a work entry in global
    + * workqueue after a delay
    + * @dwork: job to be done
    + * @delay: number of jiffies to wait before queueing work
    + *
    + * This function is useful to update an existing delayed work without having
    + * to first cancel it. E.g. code snippets that can use this API:
    + * if (delayed_work_pending(&work))
    + * cancel_delayed_work(&work);
    + * schedule_delayed_work(&work, delay);
    + *
    + * After waiting for a given time this puts a job in the kernel global
    + * workqueue.
    + *
    + * Returns 0 if @work was already on global workqueue, non-zero otherwise.
    + */
    +int schedule_update_delayed_work(struct delayed_work *dwork,
    + unsigned long delay)
    +{
    + return queue_update_delayed_work(keventd_wq, dwork, delay);
    +}
    +EXPORT_SYMBOL(schedule_update_delayed_work);
    +
    +/**
    * schedule_on_each_cpu - call a function on each online CPU from keventd
    * @func: the function to call
    *
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [REV2: PATCH 1/2]: workqueue: Implement the kernel API

    On 09/29, Krishna Kumar wrote:
    >
    > +static inline void __queue_delayed_work(int cpu, struct delayed_work *dwork,
    > + struct work_struct *work,
    > + struct workqueue_struct *wq,
    > + unsigned long delay)


    A bit fat for inline, imho.

    > +int queue_update_delayed_work(struct workqueue_struct *wq,
    > + struct delayed_work *dwork, unsigned long delay)
    > +{
    > + struct work_struct *work = &dwork->work;
    > +
    > + if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
    > + __queue_delayed_work(-1, dwork, work, wq, delay);
    > + return 1;


    Please note that the above is the open-coded queue_delayed_work().
    I'd suggest to just start with

    if (queue_delayed_work(...))
    return 1;

    ... slow path ...

    see also below.

    > + } else {
    > + struct timer_list *timer = &dwork->timer;
    > + unsigned long when = jiffies + delay;
    > + int ret;
    > +
    > + /*
    > + * Work is already scheduled. There is nothing to be done if
    > + * the work is being modified to run at the same time.
    > + */
    > + if (timer->expires == when)
    > + return 0;


    I can't understand why do you want to optimize this very unlikely case.
    Afaics, it can only improve the benchmark, when we are doing
    queue_update_delayed_work() in a loop with the same timeout, no?

    But more importantly, this is not right. We can not trust timer->expires.
    For example, let's suppose we are doing

    queue_delayed_work(dwork, delay);
    cancel_delayed_work_sync(dwork); // does not clear ->expires
    queue_work(&dwork->work); // the same

    Now, queue_update_delayed_work(new_delay) sees the pending dwork, and
    it is possible that timer->expires == jiffies + new_delay.

    Note also that INIT_DELAYED_WORK() does not initialize ->expires. Now,
    again, if we do queue_work(&dwork->work) and then update(), we may have
    problems.

    Otherwise I think the patch is correct, but please see below.

    > +
    > + do {
    > + /*
    > + * If the timer has been added and it has not fired,
    > + * update_timer_expiry will update it with the correct
    > + * expiry. Otherwise timer has either not yet been
    > + * added or it has already fired - we need to try again.
    > + */
    > + if (likely(update_timer_expiry(timer, when)))
    > + return 0;
    > +
    > + ret = try_to_grab_pending(work);
    > + if (ret < 0)
    > + wait_on_work(work);
    > + } while (ret < 0);


    It is a bit silly we are checking "ret < 0" twice, I'd suggest to just
    kill "int ret" and do

    for (; {
    ...
    if (try_to_grab_pending(work) >= 0)
    break;
    wait_on_work(work);
    }

    But this needs a comment. Why wait_on_work() is needed? "ret < 0" means that
    the queueing is in progress, and it is not necessary to "flush" this work.

    Note!!! queue_update_delayed_work() is _not_ equal to cancel() + queue()
    anyway, the fastpath doesn't cancel the work if it is active (I mean,
    it is ->current_work and running).

    But yes, we do need wait_on_work(). Let's suppose that some rt thread
    does queue_update_delayed_work() and preempts work->func() which tries
    to re-queue itself, right after it sets WORK_STRUCT_PENDING. This is
    livelockable.

    But: this also means that 2 concurrent queue_update_delayed_work()s can
    livelock in the same manner, perhaps this deserves a note.


    I am not really sure it is worth to play with WORK_STRUCT_PENDING,
    the simple

    int requeue_delayed_work(...)
    {
    int ret = 1;

    while (queue_delayed_work(...)) {
    ret = 0;
    if (update_timer_expiry(...))
    break;
    cancel_delayed_work_sync(...);
    }

    return ret;
    }

    does not need any modifications in workqueue.c, but its slow-path is a bit
    slower. Should we really care about the slow-path?

    I won't insist, but could you please comment this?


    Final note. What about the special "delay == 0" case? Actually, I don't
    understand why queue_delayed_work() has this special case, and I have
    no opinion on what update_() should do.

    But, if we should care, then the code above can be fixed trivially:

    - if (update_timer_expiry(...))
    + if (delay && update_timer_expiry(...))

    but with your patch we need the further complications.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [REV2: PATCH 2/2]: workqueue: Modify some users to use the new API

    On 09/29, Krishna Kumar wrote:
    >
    > Modify some users to use the new API.


    Looks good, but I don't understand this code.

    Krishna, I'd suggest you to send each change in a separate patch with
    the maintainer cc'ed.

    And please don't forget to mention that update() != cancel() + queue().
    Otherwise maintainer can miss this fact.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [REV2: PATCH 1/2]: workqueue: Implement the kernel API

    Oleg Nesterov wrote on 09/29/2008 07:57:34 PM:

    Thanks once again for your feedback.

    > > +static inline void __queue_delayed_work(int cpu, struct delayed_work

    *dwork,
    > > + struct work_struct *work,
    > > + struct workqueue_struct *wq,
    > > + unsigned long delay)

    >
    > A bit fat for inline, imho.


    Yes. I will change that.

    > > +int queue_update_delayed_work(struct workqueue_struct *wq,
    > > + struct delayed_work *dwork, unsigned long delay)
    > > +{
    > > + struct work_struct *work = &dwork->work;
    > > +
    > > + if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
    > > + __queue_delayed_work(-1, dwork, work, wq, delay);
    > > + return 1;

    >
    > Please note that the above is the open-coded queue_delayed_work().
    > I'd suggest to just start with
    >
    > if (queue_delayed_work(...))
    > return 1;
    >
    > ... slow path ...


    The reason I had coded this way was to avoid an unnecessary call -
    unnecessary
    since the update function should be usually called to update a work and
    hence
    the work is almost always pending. But I will make this change as it is
    more
    readable (maintainable) without losing much.

    > > + } else {
    > > + struct timer_list *timer = &dwork->timer;
    > > + unsigned long when = jiffies + delay;
    > > + int ret;
    > > +
    > > + /*
    > > + * Work is already scheduled. There is nothing to be done if
    > > + * the work is being modified to run at the same time.
    > > + */
    > > + if (timer->expires == when)
    > > + return 0;

    >
    > I can't understand why do you want to optimize this very unlikely case.
    > Afaics, it can only improve the benchmark, when we are doing
    > queue_update_delayed_work() in a loop with the same timeout, no?
    >
    > But more importantly, this is not right. We can not trust timer->expires.
    > For example, let's suppose we are doing
    >
    > queue_delayed_work(dwork, delay);
    > cancel_delayed_work_sync(dwork); // does not clear ->expires
    > queue_work(&dwork->work); // the same
    >
    > Now, queue_update_delayed_work(new_delay) sees the pending dwork, and
    > it is possible that timer->expires == jiffies + new_delay.
    >
    > Note also that INIT_DELAYED_WORK() does not initialize ->expires. Now,
    > again, if we do queue_work(&dwork->work) and then update(), we may have
    > problems.


    Good point! I will remove this check.

    > Otherwise I think the patch is correct, but please see below.
    >
    > > +
    > > + do {
    > > + /*
    > > + * If the timer has been added and it has not fired,
    > > + * update_timer_expiry will update it with the correct
    > > + * expiry. Otherwise timer has either not yet been
    > > + * added or it has already fired - we need to try again.
    > > + */
    > > + if (likely(update_timer_expiry(timer, when)))
    > > + return 0;
    > > +
    > > + ret = try_to_grab_pending(work);
    > > + if (ret < 0)
    > > + wait_on_work(work);
    > > + } while (ret < 0);

    >
    > It is a bit silly we are checking "ret < 0" twice, I'd suggest to just
    > kill "int ret" and do
    >
    > for (; {
    > ...
    > if (try_to_grab_pending(work) >= 0)
    > break;
    > wait_on_work(work);
    > }


    Thanks for pointing this out, now it definitely looks better.

    > But this needs a comment. Why wait_on_work() is needed? "ret < 0" means

    that
    > the queueing is in progress, and it is not necessary to "flush" this

    work.

    I had tried to capture this information in the comment above, maybe it
    needs
    to be more clear.

    > Note!!! queue_update_delayed_work() is _not_ equal to cancel() + queue()
    > anyway, the fastpath doesn't cancel the work if it is active (I mean,
    > it is ->current_work and running).


    Correct. But the behavior is the same as the existing (say driver) code
    which
    calls cancel (which also doesn't cancel as the work is already running) and
    then calls queue. In both cases, the work is running and we wait for the
    work
    to complete, and then do a fresh queue.

    > But: this also means that 2 concurrent queue_update_delayed_work()s can
    > livelock in the same manner, perhaps this deserves a note.


    Assuming the work is already pending (which is why both calls are in the
    middle
    of this new API):

    1. Timer is pending (work is not executing) - update_timer will break out
    for
    both calls in any order.
    2. Timer is not pending due to timer not yet been added - we loop until
    either
    update_timer or try_to_grab_pending() breaks out, which happens when the
    timer is added or fired (if delay=0) and is almost instantaneous.
    3. If timer is not pending due to timer handler already having got called -
    try_to_grab_pending returns 0 when run_workqueue clears the PENDING bit.
    This also breaks from the loop before work is finished executing (since
    the
    bit is cleared before the work->func() is called).

    > I am not really sure it is worth to play with WORK_STRUCT_PENDING,
    > the simple
    > (snip code) ...
    > does not need any modifications in workqueue.c, but its slow-path is a

    bit
    > slower. Should we really care about the slow-path?


    I am not insisting either way since both approaches achieve a big
    improvement in
    time saved. I am open to which way to go, please suggest. Here is the
    updated
    function. If you feel it is too complicated, I will go with the above
    approach.

    int queue_update_delayed_work(struct workqueue_struct *wq,
    struct delayed_work *dwork, unsigned long delay)
    {
    int ret = 1;

    if (!queue_delayed_work(wq, dwork, delay)) {
    struct work_struct *work = &dwork->work;
    struct timer_list *timer = &dwork->timer;

    ret = 0;
    for (; {
    unsigned long when = jiffies + delay;

    if (update_timer_expiry(timer, when))
    break;

    if (try_to_grab_pending(work) >= 0) {
    __queue_delayed_work(-1, dwork, work, wq,
    delay);
    break;
    }
    wait_on_work(work);
    }
    }
    return ret;
    }

    > Final note. What about the special "delay == 0" case? Actually, I don't
    > understand why queue_delayed_work() has this special case, and I have
    > no opinion on what update_() should do.
    >
    > But, if we should care, then the code above can be fixed trivially:
    >
    > - if (update_timer_expiry(...))
    > + if (delay && update_timer_expiry(...))


    Without that change, update_timer should still return success after
    updating the
    timer to run at 'jiffies' (internal_add_timer handles negative expiry). If
    the
    timer is not yet added or already fired, we loop till either timer is
    updated
    or grab_pending returns >= 0. I think both codes handles delay==0 case, or
    did
    I misunderstand your point?

    Thanks,

    - KK

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [REV2: PATCH 1/2]: workqueue: Implement the kernel API

    On 09/30, Krishna Kumar2 wrote:
    >
    > Oleg Nesterov wrote on 09/29/2008 07:57:34 PM:
    >
    > > > + if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work))) {
    > > > + __queue_delayed_work(-1, dwork, work, wq, delay);
    > > > + return 1;

    > >
    > > Please note that the above is the open-coded queue_delayed_work().
    > > I'd suggest to just start with
    > >
    > > if (queue_delayed_work(...))
    > > return 1;
    > >
    > > ... slow path ...

    >
    > The reason I had coded this way was to avoid an unnecessary call -
    > unnecessary


    Yes I see. But actually I meant "see below, I don't think we need
    __queue_delayed_work()".

    > since the update function should be usually called to update a work and
    > hence
    > the work is almost always pending.


    Exactly! And in that case update_timer_expiry() will do its work.

    Yes, the games with __queue_delayed_work() can save a couple of clear/set
    bit, but this is the slow an unlikely path, that was my point.

    And, if we are talking about the function call overhead. Please note
    that __queue_delayed_work() adds the "unnecessary" call to the "normal"
    queue_delayed_work(), and this path is more common.

    But see below.

    > > Note!!! queue_update_delayed_work() is _not_ equal to cancel() + queue()
    > > anyway, the fastpath doesn't cancel the work if it is active (I mean,
    > > it is ->current_work and running).

    >
    > Correct. But the behavior is the same as the existing (say driver) code
    > which
    > calls cancel (which also doesn't cancel as the work is already running)


    Hmm. How so? cancel() always cancells the work.

    OK, let's suppose we have

    cancel_delayed_work(dwork);
    queue_delayed_work(dwork, delay);

    and now (the next patch you sent) we turn this code into

    queue_update_delayed_work(dwork, delay);

    Now we have a subtle difference which _may_ be important for some users.

    Suppose that this dwork was queued before, the timer has expired, and
    now cwq->thread runs dwork->work->func(). Note that this dwork is not
    pending (WORK_STRUCT_PENDING is cleared).

    Before the change above, cancel_delayed_work() waits until work->func()
    completes, then we re-queue the work.

    After the change, the fast path just queues the same dwork again.
    If the workqueue is not singlethreaded, it is possible that both
    old and new "instances" will run on the different CPUs at the same
    time.

    (that is why I think these changes need the review from maintainer).

    > > But: this also means that 2 concurrent queue_update_delayed_work()s can
    > > livelock in the same manner, perhaps this deserves a note.

    >
    > Assuming the work is already pending (which is why both calls are in the
    > middle
    > of this new API):
    >
    > [...]


    No.

    dwork is not pending, L is a lower priority thread, H - real time.

    L calls queue_update_delayed_work(), and sets WORK_STRUCT_PENDING
    successfully(). It is going to do __queue_delayed_work(), but it
    is preempted by H.

    H does queue_update_delayed_work(), WORK_STRUCT_PENDING is already
    set and the timer is not pending. Now it will do try_to_grab_pending()
    in a loop "forever".

    > > I am not really sure it is worth to play with WORK_STRUCT_PENDING,
    > > the simple
    > > (snip code) ...
    > > does not need any modifications in workqueue.c, but its slow-path is a

    > bit
    > > slower. Should we really care about the slow-path?

    >
    > I am not insisting either way since both approaches achieve a big
    > improvement in
    > time saved. I am open to which way to go, please suggest. Here is the
    > updated
    > function. If you feel it is too complicated,


    Yes. I must admit, I prefer the simple, non-intrusive code I suggested
    much more.

    Once again, the slow path is (at least, supposed to be) unlikely, and
    the difference is not that large. (I mean the slow path is when both
    queue() and update_timer() fail).

    Should we complicate the code to add this minor optimization (and
    btw pessimize the "normal" queue_delayed_work) ?

    And, once we have the correct and simple code, we can optimize it
    later.

    > I will go with the above
    > approach.


    No. Please do what _you_ think right

    But, if possible, please explain why do this think this optimization
    is worthwhile.

    > > Final note. What about the special "delay == 0" case? Actually, I don't
    > > understand why queue_delayed_work() has this special case, and I have
    > > no opinion on what update_() should do.
    > >
    > > But, if we should care, then the code above can be fixed trivially:
    > >
    > > - if (update_timer_expiry(...))
    > > + if (delay && update_timer_expiry(...))

    >
    > Without that change, update_timer should still return success after
    > updating the
    > timer to run at 'jiffies' (internal_add_timer handles negative expiry). If
    > the
    > timer is not yet added or already fired, we loop till either timer is
    > updated
    > or grab_pending returns >= 0. I think both codes handles delay==0 case, or
    > did
    > I misunderstand your point?


    Yes. Please note that queue_delayed_work(work, 0) does not use the timer
    at all. IOW, queue_delayed_work(wq, work, 0) == queue_work(wq, &dwork->work).
    Perhaps (I don't know) update_queue_delayed_work() should do the same.

    From the next patch:

    - cancel_delayed_work(&afs_vlocation_reap);
    - schedule_delayed_work(&afs_vlocation_reap, 0);
    + schedule_update_delayed_work(&afs_vlocation_reap, 0);

    Again, I don't claim this is wrong, but please note that delay == 0.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread