[PATCH] account_group_exec_runtime: fix the racy usage of ->signal - Kernel

This is a discussion on [PATCH] account_group_exec_runtime: fix the racy usage of ->signal - Kernel ; Compile tested. Unlike other similar routines, account_group_exec_runtime() could be called "implicitly" after exit_notify(). This means we can race with the parent doing release_task(), we can't just check ->signal != NULL. Take ->siglock to make sure ->signal can't go away. This ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal

  1. [PATCH] account_group_exec_runtime: fix the racy usage of ->signal

    Compile tested.

    Unlike other similar routines, account_group_exec_runtime() could be
    called "implicitly" after exit_notify(). This means we can race with
    the parent doing release_task(), we can't just check ->signal != NULL.

    Take ->siglock to make sure ->signal can't go away.

    This is the minimal fix, with this patch we don't need need get/put cpu,
    and I think we should uninline this function.

    Signed-off-by: Oleg Nesterov

    --- K-28/kernel/sched_stats.h~A_G_E_R_FIX 2008-11-07 17:32:02.000000000 +0100
    +++ K-28/kernel/sched_stats.h 2008-11-07 17:44:39.000000000 +0100
    @@ -351,10 +351,12 @@ static inline void account_group_exec_ru
    unsigned long long ns)
    {
    struct signal_struct *sig;
    + unsigned long flags;

    - sig = tsk->signal;
    - if (unlikely(!sig))
    + if (unlikely(!lock_task_sighand(tsk, &flags)))
    return;
    +
    + sig = tsk->signal;
    if (sig->cputime.totals) {
    struct task_cputime *times;

    @@ -362,4 +364,6 @@ static inline void account_group_exec_ru
    times->sum_exec_runtime += ns;
    put_cpu_no_resched();
    }
    +
    + unlock_task_sighand(tsk, &flags);
    }

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal


    * Oleg Nesterov wrote:

    > Compile tested.
    >
    > Unlike other similar routines, account_group_exec_runtime() could be
    > called "implicitly" after exit_notify(). This means we can race with
    > the parent doing release_task(), we can't just check ->signal != NULL.
    >
    > Take ->siglock to make sure ->signal can't go away.
    >
    > This is the minimal fix, with this patch we don't need need get/put cpu,
    > and I think we should uninline this function.
    >
    > Signed-off-by: Oleg Nesterov


    >
    > --- K-28/kernel/sched_stats.h~A_G_E_R_FIX 2008-11-07 17:32:02.000000000 +0100
    > +++ K-28/kernel/sched_stats.h 2008-11-07 17:44:39.000000000 +0100
    > @@ -351,10 +351,12 @@ static inline void account_group_exec_ru
    > unsigned long long ns)
    > {
    > struct signal_struct *sig;
    > + unsigned long flags;
    >
    > - sig = tsk->signal;
    > - if (unlikely(!sig))
    > + if (unlikely(!lock_task_sighand(tsk, &flags)))
    > return;


    i think this will lock up: the signal lock must not nest inside the rq
    lock, and these accounting functions are called from within the
    scheduler.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal

    On 11/07, Ingo Molnar wrote:
    >
    > * Oleg Nesterov wrote:
    >
    > > --- K-28/kernel/sched_stats.h~A_G_E_R_FIX 2008-11-07 17:32:02.000000000 +0100
    > > +++ K-28/kernel/sched_stats.h 2008-11-07 17:44:39.000000000 +0100
    > > @@ -351,10 +351,12 @@ static inline void account_group_exec_ru
    > > unsigned long long ns)
    > > {
    > > struct signal_struct *sig;
    > > + unsigned long flags;
    > >
    > > - sig = tsk->signal;
    > > - if (unlikely(!sig))
    > > + if (unlikely(!lock_task_sighand(tsk, &flags)))
    > > return;

    >
    > i think this will lock up:


    Ah. I worried about this, but convinced myself this is OK...

    > the signal lock must not nest inside the rq
    > lock, and these accounting functions are called from within the
    > scheduler.


    Why? we seem to never do task_rq_lock() under ->siglock ?

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal


    On Fri, 2008-11-07 at 17:21 +0100, Ingo Molnar wrote:
    > * Oleg Nesterov wrote:
    >
    > > Compile tested.
    > >
    > > Unlike other similar routines, account_group_exec_runtime() could be
    > > called "implicitly" after exit_notify(). This means we can race with
    > > the parent doing release_task(), we can't just check ->signal != NULL.
    > >
    > > Take ->siglock to make sure ->signal can't go away.
    > >
    > > This is the minimal fix, with this patch we don't need need get/put cpu,
    > > and I think we should uninline this function.
    > >
    > > Signed-off-by: Oleg Nesterov

    >
    > >
    > > --- K-28/kernel/sched_stats.h~A_G_E_R_FIX 2008-11-07 17:32:02.000000000 +0100
    > > +++ K-28/kernel/sched_stats.h 2008-11-07 17:44:39.000000000 +0100
    > > @@ -351,10 +351,12 @@ static inline void account_group_exec_ru
    > > unsigned long long ns)
    > > {
    > > struct signal_struct *sig;
    > > + unsigned long flags;
    > >
    > > - sig = tsk->signal;
    > > - if (unlikely(!sig))
    > > + if (unlikely(!lock_task_sighand(tsk, &flags)))
    > > return;

    >
    > i think this will lock up: the signal lock must not nest inside the rq
    > lock, and these accounting functions are called from within the
    > scheduler.
    >
    > Ingo


    I can confirm that this does hang on bootup.

    - Doug


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal

    On 11/07, Doug Chapman wrote:
    >
    > On Fri, 2008-11-07 at 17:21 +0100, Ingo Molnar wrote:
    > > * Oleg Nesterov wrote:
    > >
    > > > @@ -351,10 +351,12 @@ static inline void account_group_exec_ru
    > > > unsigned long long ns)
    > > > {
    > > > struct signal_struct *sig;
    > > > + unsigned long flags;
    > > >
    > > > - sig = tsk->signal;
    > > > - if (unlikely(!sig))
    > > > + if (unlikely(!lock_task_sighand(tsk, &flags)))
    > > > return;

    > >
    > > i think this will lock up: the signal lock must not nest inside the rq
    > > lock, and these accounting functions are called from within the
    > > scheduler.

    >
    > I can confirm that this does hang on bootup.


    Thanks a lot Doug.

    If only I could understand what happens. I am running the 2.6.27 kernel
    with the patch below just fine.

    Ingo, could you please explain?

    OK, perhaps we can check ->exit_state... I'll return on Monday.


    --- linux-2.6.27/kernel/sched_fair.c~DBG 2008-10-10 00:13:53.000000000 +0200
    +++ linux-2.6.27/kernel/sched_fair.c 2008-11-07 19:15:28.000000000 +0100
    @@ -484,6 +484,16 @@ __update_curr(struct cfs_rq *cfs_rq, str
    curr->vruntime += delta_exec_weighted;
    }

    +static void ttt(struct task_struct *tsk)
    +{
    + unsigned long flags;
    +
    + if (unlikely(!lock_task_sighand(tsk, &flags)))
    + return;
    +
    + unlock_task_sighand(tsk, &flags);
    +}
    +
    static void update_curr(struct cfs_rq *cfs_rq)
    {
    struct sched_entity *curr = cfs_rq->curr;
    @@ -507,6 +517,7 @@ static void update_curr(struct cfs_rq *c
    struct task_struct *curtask = task_of(curr);

    cpuacct_charge(curtask, delta_exec);
    + ttt(curtask);
    }
    }


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal


    * Oleg Nesterov wrote:

    > On 11/07, Ingo Molnar wrote:
    > >
    > > * Oleg Nesterov wrote:
    > >
    > > > --- K-28/kernel/sched_stats.h~A_G_E_R_FIX 2008-11-07 17:32:02.000000000 +0100
    > > > +++ K-28/kernel/sched_stats.h 2008-11-07 17:44:39.000000000 +0100
    > > > @@ -351,10 +351,12 @@ static inline void account_group_exec_ru
    > > > unsigned long long ns)
    > > > {
    > > > struct signal_struct *sig;
    > > > + unsigned long flags;
    > > >
    > > > - sig = tsk->signal;
    > > > - if (unlikely(!sig))
    > > > + if (unlikely(!lock_task_sighand(tsk, &flags)))
    > > > return;

    > >
    > > i think this will lock up:

    >
    > Ah. I worried about this, but convinced myself this is OK...
    >
    > > the signal lock must not nest inside the rq
    > > lock, and these accounting functions are called from within the
    > > scheduler.

    >
    > Why? we seem to never do task_rq_lock() under ->siglock ?


    signal_wake_up() ?

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal

    On 11/08, Ingo Molnar wrote:
    >
    > * Oleg Nesterov wrote:
    >
    > > On 11/07, Ingo Molnar wrote:
    > > >
    > > > the signal lock must not nest inside the rq
    > > > lock, and these accounting functions are called from within the
    > > > scheduler.

    > >
    > > Why? we seem to never do task_rq_lock() under ->siglock ?

    >
    > signal_wake_up() ?


    I'd wish very much I could say I have already realized this, but I didn't.
    Thanks Ingo!

    I don't see the good solution for this problem. I'll send the new patch in
    a minute, but it is ugly. Basically it is

    --- a/kernel/exit.c
    +++ b/kernel/exit.c
    @@ -141,6 +141,8 @@ static void __exit_signal(struct task_st
    if (sig) {
    flush_sigqueue(&sig->shared_pending);
    taskstats_tgid_free(sig);
    + smp_mb();
    + spin_unlock_wait(&task_rq(tsk)->lock);
    __cleanup_signal(sig);
    }
    }

    except this needs a helper in sched.c. You can nack it right now
    Of course we can protect ->signal with rcu, but this is even worse
    imho.

    Anybody sees a bettter fix?


    Perhaps we can change sched.c to do update_curr() only when the
    task is not running (except ->task_tick), iow perhaps we can check
    sleep/wakeup == T before calling update_cur(). But this is not easy
    even if really possible.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] account_group_exec_runtime: fix the racy usage of ->signal

    On Mon, 2008-11-10 at 14:04 +0100, Oleg Nesterov wrote:
    > On 11/08, Ingo Molnar wrote:
    > >
    > > * Oleg Nesterov wrote:
    > >
    > > > On 11/07, Ingo Molnar wrote:
    > > > >
    > > > > the signal lock must not nest inside the rq
    > > > > lock, and these accounting functions are called from within the
    > > > > scheduler.
    > > >
    > > > Why? we seem to never do task_rq_lock() under ->siglock ?

    > >
    > > signal_wake_up() ?

    >
    > I'd wish very much I could say I have already realized this, but I didn't.
    > Thanks Ingo!
    >
    > I don't see the good solution for this problem. I'll send the new patch in
    > a minute, but it is ugly. Basically it is
    >
    > --- a/kernel/exit.c
    > +++ b/kernel/exit.c
    > @@ -141,6 +141,8 @@ static void __exit_signal(struct task_st
    > if (sig) {
    > flush_sigqueue(&sig->shared_pending);
    > taskstats_tgid_free(sig);
    > + smp_mb();
    > + spin_unlock_wait(&task_rq(tsk)->lock);
    > __cleanup_signal(sig);
    > }
    > }
    >
    > except this needs a helper in sched.c. You can nack it right now
    > Of course we can protect ->signal with rcu, but this is even worse
    > imho.
    >
    > Anybody sees a bettter fix?
    >
    >
    > Perhaps we can change sched.c to do update_curr() only when the
    > task is not running (except ->task_tick), iow perhaps we can check
    > sleep/wakeup == T before calling update_cur(). But this is not easy
    > even if really possible.


    and butt ugly to boot..
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread