[PATCH 0/9] RT: RT-Overload/Sched enhancements v4 - Kernel

This is a discussion on [PATCH 0/9] RT: RT-Overload/Sched enhancements v4 - Kernel ; Applies to 23-rt1 + Steve's latest push_rt patch Changes since v3: 1) Rebased to Steve's latest 2) Added a "highest_prio" feature to eliminate a race w.r.t. activating a task and the time it takes to actually reschedule the RQ. 3) ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: [PATCH 0/9] RT: RT-Overload/Sched enhancements v4

  1. [PATCH 0/9] RT: RT-Overload/Sched enhancements v4

    Applies to 23-rt1 + Steve's latest push_rt patch

    Changes since v3:

    1) Rebased to Steve's latest
    2) Added a "highest_prio" feature to eliminate a race w.r.t. activating a task
    and the time it takes to actually reschedule the RQ.
    3) Dropped the PI patch, because the highest_prio patch obsoletes it.
    4) Few small tweaks
    5) Few small fixes

    Regards,
    -Greg
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [PATCH 6/9] RT: Clean up some of the push-rt logic

    Get rid of the superfluous dst_cpu, and move the cpu_mask inside the search
    function.

    Signed-off-by: Gregory Haskins
    ---

    kernel/sched.c | 18 +++++++-----------
    1 files changed, 7 insertions(+), 11 deletions(-)

    diff --git a/kernel/sched.c b/kernel/sched.c
    index 67034aa..d604484 100644
    --- a/kernel/sched.c
    +++ b/kernel/sched.c
    @@ -1519,20 +1519,21 @@ static int double_lock_balance(struct rq *this_rq, struct rq *busiest);
    #define RT_PUSH_MAX_TRIES 3

    /* Will lock the rq it finds */
    -static struct rq *find_lock_lowest_rq(cpumask_t *cpu_mask,
    - struct task_struct *task,
    +static struct rq *find_lock_lowest_rq(struct task_struct *task,
    struct rq *this_rq)
    {
    struct rq *lowest_rq = NULL;
    - int dst_cpu = -1;
    int cpu;
    int tries;
    + cpumask_t cpu_mask;
    +
    + cpus_and(cpu_mask, cpu_online_map, task->cpus_allowed);

    for (tries = 0; tries < RT_PUSH_MAX_TRIES; tries++) {
    /*
    * Scan each rq for the lowest prio.
    */
    - for_each_cpu_mask(cpu, *cpu_mask) {
    + for_each_cpu_mask(cpu, cpu_mask) {
    struct rq *rq = &per_cpu(runqueues, cpu);

    if (cpu == smp_processor_id())
    @@ -1541,7 +1542,6 @@ static struct rq *find_lock_lowest_rq(cpumask_t *cpu_mask,
    /* We look for lowest RT prio or non-rt CPU */
    if (rq->highest_prio >= MAX_RT_PRIO) {
    lowest_rq = rq;
    - dst_cpu = cpu;
    break;
    }

    @@ -1549,7 +1549,6 @@ static struct rq *find_lock_lowest_rq(cpumask_t *cpu_mask,
    if (rq->highest_prio > task->prio &&
    (!lowest_rq || rq->highest_prio < lowest_rq->highest_prio)) {
    lowest_rq = rq;
    - dst_cpu = cpu;
    }
    }

    @@ -1564,7 +1563,7 @@ static struct rq *find_lock_lowest_rq(cpumask_t *cpu_mask,
    * migrated already or had its affinity changed.
    */
    if (unlikely(task_rq(task) != this_rq ||
    - !cpu_isset(dst_cpu, task->cpus_allowed))) {
    + !cpu_isset(lowest_rq->cpu, task->cpus_allowed))) {
    spin_unlock(&lowest_rq->lock);
    lowest_rq = NULL;
    break;
    @@ -1595,7 +1594,6 @@ static int push_rt_task(struct rq *this_rq)
    struct rq *lowest_rq;
    int dst_cpu;
    int ret = 0;
    - cpumask_t cpu_mask;

    assert_spin_locked(&this_rq->lock);

    @@ -1603,13 +1601,11 @@ static int push_rt_task(struct rq *this_rq)
    if (!next_task)
    return 0;

    - cpus_and(cpu_mask, cpu_online_map, next_task->cpus_allowed);
    -
    /* We might release this_rq lock */
    get_task_struct(next_task);

    /* find_lock_lowest_rq locks the rq if found */
    - lowest_rq = find_lock_lowest_rq(&cpu_mask, next_task, this_rq);
    + lowest_rq = find_lock_lowest_rq(next_task, this_rq);
    if (!lowest_rq)
    goto out;


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. [PATCH 2/9] RT: Add a per-cpu rt_overload indication

    The system currently evaluates all online CPUs whenever one or more enters
    an rt_overload condition. This suffers from scalability limitations as
    the # of online CPUs increases. So we introduce a cpumask to track
    exactly which CPUs need RT balancing.

    Signed-off-by: Gregory Haskins
    CC: Peter W. Morreale
    ---

    kernel/sched.c | 12 +++++++++---
    1 files changed, 9 insertions(+), 3 deletions(-)

    diff --git a/kernel/sched.c b/kernel/sched.c
    index 0dabf89..0da8c30 100644
    --- a/kernel/sched.c
    +++ b/kernel/sched.c
    @@ -632,6 +632,7 @@ static inline struct rq *this_rq_lock(void)

    #if defined(CONFIG_PREEMPT_RT) && defined(CONFIG_SMP)
    static __cacheline_aligned_in_smp atomic_t rt_overload;
    +static cpumask_t rto_cpus;
    #endif

    static inline void inc_rt_tasks(struct task_struct *p, struct rq *rq)
    @@ -640,8 +641,11 @@ static inline void inc_rt_tasks(struct task_struct *p, struct rq *rq)
    if (rt_task(p)) {
    rq->rt_nr_running++;
    # ifdef CONFIG_SMP
    - if (rq->rt_nr_running == 2)
    + if (rq->rt_nr_running == 2) {
    + cpu_set(rq->cpu, rto_cpus);
    + smp_wmb();
    atomic_inc(&rt_overload);
    + }
    # endif
    }
    #endif
    @@ -654,8 +658,10 @@ static inline void dec_rt_tasks(struct task_struct *p, struct rq *rq)
    WARN_ON(!rq->rt_nr_running);
    rq->rt_nr_running--;
    # ifdef CONFIG_SMP
    - if (rq->rt_nr_running == 1)
    + if (rq->rt_nr_running == 1) {
    atomic_dec(&rt_overload);
    + cpu_clear(rq->cpu, rto_cpus);
    + }
    # endif
    }
    #endif
    @@ -1621,7 +1627,7 @@ static void balance_rt_tasks(struct rq *this_rq, int this_cpu)
    */
    next = pick_next_task(this_rq, this_rq->curr);

    - for_each_online_cpu(cpu) {
    + for_each_cpu_mask(cpu, rto_cpus) {
    if (cpu == this_cpu)
    continue;
    src_rq = cpu_rq(cpu);

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [PATCH 5/9] RT: Maintain the highest RQ priority

    This is an implementation of Steve's idea where we should update the RQ
    concept of priority to show the highest-task, even if that task is not (yet)
    running. This prevents us from pushing multiple tasks to the RQ before it
    gets a chance to reschedule.

    Signed-off-by: Gregory Haskins
    ---

    kernel/sched.c | 37 ++++++++++++++++++++++++++++---------
    1 files changed, 28 insertions(+), 9 deletions(-)

    diff --git a/kernel/sched.c b/kernel/sched.c
    index d68f600..67034aa 100644
    --- a/kernel/sched.c
    +++ b/kernel/sched.c
    @@ -304,7 +304,7 @@ struct rq {
    #ifdef CONFIG_PREEMPT_RT
    unsigned long rt_nr_running;
    unsigned long rt_nr_uninterruptible;
    - int curr_prio;
    + int highest_prio;
    #endif

    unsigned long switch_timestamp;
    @@ -368,11 +368,23 @@ static DEFINE_MUTEX(sched_hotcpu_mutex);
    #if defined(CONFIG_PREEMPT_RT) && defined(CONFIG_SMP)
    static inline void set_rq_prio(struct rq *rq, int prio)
    {
    - rq->curr_prio = prio;
    + rq->highest_prio = prio;
    +}
    +
    +static inline void update_rq_prio(struct rq *rq)
    +{
    + struct rt_prio_array *array = &rq->rt.active;
    + int prio = MAX_PRIO;
    +
    + if (rq->nr_running)
    + prio = sched_find_first_bit(array->bitmap);
    +
    + set_rq_prio(rq, prio);
    }

    #else
    #define set_rq_prio(rq, prio) do { } while(0)
    +#define update_rq_prio(rq) do { } while(0)
    #endif

    static inline void check_preempt_curr(struct rq *rq, struct task_struct *p)
    @@ -1023,12 +1035,14 @@ static void enqueue_task(struct rq *rq, struct task_struct *p, int wakeup)
    sched_info_queued(p);
    p->sched_class->enqueue_task(rq, p, wakeup);
    p->se.on_rq = 1;
    + update_rq_prio(rq);
    }

    static void dequeue_task(struct rq *rq, struct task_struct *p, int sleep)
    {
    p->sched_class->dequeue_task(rq, p, sleep);
    p->se.on_rq = 0;
    + update_rq_prio(rq);
    }

    /*
    @@ -1525,15 +1539,15 @@ static struct rq *find_lock_lowest_rq(cpumask_t *cpu_mask,
    continue;

    /* We look for lowest RT prio or non-rt CPU */
    - if (rq->curr_prio >= MAX_RT_PRIO) {
    + if (rq->highest_prio >= MAX_RT_PRIO) {
    lowest_rq = rq;
    dst_cpu = cpu;
    break;
    }

    /* no locking for now */
    - if (rq->curr_prio > task->prio &&
    - (!lowest_rq || rq->curr_prio < lowest_rq->curr_prio)) {
    + if (rq->highest_prio > task->prio &&
    + (!lowest_rq || rq->highest_prio < lowest_rq->highest_prio)) {
    lowest_rq = rq;
    dst_cpu = cpu;
    }
    @@ -1559,7 +1573,7 @@ static struct rq *find_lock_lowest_rq(cpumask_t *cpu_mask,
    }

    /* If this rq is still suitable use it. */
    - if (lowest_rq->curr_prio > task->prio)
    + if (lowest_rq->highest_prio > task->prio)
    break;

    /* try again */
    @@ -2338,10 +2352,8 @@ static inline void finish_task_switch(struct rq *rq, struct task_struct *prev)
    */
    prev_state = prev->state;
    _finish_arch_switch(prev);
    -
    - set_rq_prio(rq, current->prio);
    -
    finish_lock_switch(rq, prev);
    +
    #if defined(CONFIG_PREEMPT_RT) && defined(CONFIG_SMP)
    /*
    * If we pushed an RT task off the runqueue,
    @@ -4646,6 +4658,9 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
    prev_resched = _need_resched();

    if (on_rq) {
    + /*
    + * Note: RQ priority gets updated in the enqueue/dequeue logic
    + */
    enqueue_task(rq, p, 0);
    /*
    * Reschedule if we are currently running on this runqueue and
    @@ -4712,6 +4727,10 @@ void set_user_nice(struct task_struct *p, long nice)
    */
    if (delta < 0 || (delta > 0 && task_running(rq, p)))
    resched_task(rq->curr);
    +
    + /*
    + * Note: RQ priority gets updated in the enqueue/dequeue logic
    + */
    }
    out_unlock:
    task_rq_unlock(rq, &flags);

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. [PATCH 8/9] RT: Fixes for push-rt patch

    From: Steven Rostedt

    Steve found these errors in the original patch

    Signed-off-by: Gregory Haskins
    ---

    kernel/sched.c | 2 +-
    kernel/sched_rt.c | 2 +-
    2 files changed, 2 insertions(+), 2 deletions(-)

    diff --git a/kernel/sched.c b/kernel/sched.c
    index 0ee1e21..8c916de 100644
    --- a/kernel/sched.c
    +++ b/kernel/sched.c
    @@ -1536,7 +1536,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task,
    for_each_cpu_mask(cpu, cpu_mask) {
    struct rq *rq = &per_cpu(runqueues, cpu);

    - if (cpu == smp_processor_id())
    + if (cpu == this_rq->cpu)
    continue;

    /* We look for lowest RT prio or non-rt CPU */
    diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
    index 8d59e62..04959fe 100644
    --- a/kernel/sched_rt.c
    +++ b/kernel/sched_rt.c
    @@ -115,7 +115,7 @@ static struct task_struct *rt_next_highest_task(struct rq *rq)

    queue = array->queue + idx;
    next = list_entry(queue->next, struct task_struct, run_list);
    - if (unlikely(next != current))
    + if (unlikely(next != rq->curr))
    return next;

    if (queue->next->next != queue) {

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. [PATCH 4/9] RT: Initialize the priority value

    We should init the base value of the current RQ priority to "IDLE"

    Signed-off-by: Gregory Haskins
    ---

    kernel/sched.c | 2 ++
    1 files changed, 2 insertions(+), 0 deletions(-)

    diff --git a/kernel/sched.c b/kernel/sched.c
    index 131f618..d68f600 100644
    --- a/kernel/sched.c
    +++ b/kernel/sched.c
    @@ -7385,6 +7385,8 @@ void __init sched_init(void)
    highest_cpu = i;
    /* delimiter for bitsearch: */
    __set_bit(MAX_RT_PRIO, array->bitmap);
    +
    + set_rq_prio(rq, MAX_PRIO);
    }

    set_load_weight(&init_task);

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. [PATCH 9/9] RT: Only dirty a cacheline if the priority is actually changing

    We can avoid dirtying a rq related cacheline with a simple check, so why not.

    Signed-off-by: Gregory Haskins
    ---

    0 files changed, 0 insertions(+), 0 deletions(-)


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH 9/9] RT: Only dirty a cacheline if the priority is actually changing

    Gregory Haskins wrote:
    > We can avoid dirtying a rq related cacheline with a simple check, so why not.
    >
    > Signed-off-by: Gregory Haskins
    > ---
    >
    > 0 files changed, 0 insertions(+), 0 deletions(-)


    I think you wanted a patch here?
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH 9/9] RT: Only dirty a cacheline if the priority is actually changing


    --
    On Sat, 20 Oct 2007, Roel Kluin wrote:

    > Gregory Haskins wrote:
    > > We can avoid dirtying a rq related cacheline with a simple check, so why not.
    > >
    > > Signed-off-by: Gregory Haskins
    > > ---
    > >
    > > 0 files changed, 0 insertions(+), 0 deletions(-)

    >
    > I think you wanted a patch here?
    >


    But it is here. Gregory is a Zen master, and this patch does exactly what
    he wanted it to do.

    -- Steve

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH 9/9] RT: Only dirty a cacheline if the priority is actually changing

    On Sat, 2007-10-20 at 04:48 +0200, Roel Kluin wrote:
    > Gregory Haskins wrote:
    > > We can avoid dirtying a rq related cacheline with a simple check, so why not.
    > >
    > > Signed-off-by: Gregory Haskins
    > > ---
    > >
    > > 0 files changed, 0 insertions(+), 0 deletions(-)

    >
    > I think you wanted a patch here?


    Hi Roel,
    I had forgotten to refresh before mailing the patches, but I sent an
    immediate followup (which unfortunately was not linked to the original
    posting. For your reference, here is the reposting:

    http://article.gmane.org/gmane.linux.rt.user/1626

    Sorry for the confusion!

    Regards,
    -Greg



    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.2 (GNU/Linux)

    iD8DBQBHGdwfSW61Ku6jJPkRAp4+AJ9iKIrZFzsHhUizYdgrjK nAFUBeDACcCPxN
    7kxlmMiqsRQvD/C+w7wgW5Y=
    =TkYf
    -----END PGP SIGNATURE-----


+ Reply to Thread