[PATCH] posix-timers: Do not modify an already queued timer signal - Kernel

This is a discussion on [PATCH] posix-timers: Do not modify an already queued timer signal - Kernel ; When a timer fires, posix_timer_event() zeroes out its pre-allocated siginfo structure, initialises it and then queues up the signal with send_sigqueue(). However, we may have previously queued up this signal, in which case we only want to increment si_overrun and ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: [PATCH] posix-timers: Do not modify an already queued timer signal

  1. [PATCH] posix-timers: Do not modify an already queued timer signal

    When a timer fires, posix_timer_event() zeroes out its
    pre-allocated siginfo structure, initialises it and then
    queues up the signal with send_sigqueue().

    However, we may have previously queued up this signal, in
    which case we only want to increment si_overrun and
    re-initialising the siginfo structure is incorrect.

    Also, since we are modifying an already queued signal
    without the protection of the sighand spinlock, we may also
    race with e.g. collect_signal() causing it to fail to find
    a signal on the pending list because it happens to look at
    the siginfo struct after it was zeroed and before it was
    re-initialised.

    The race was observed with a modified kvm-userspace when
    running a guest under heavy network load. When it occurs,
    KVM never sees another SIGALRM signal because although
    the signal is queued up the appropriate bit is never set
    in the pending mask. Manually sending the process a SIGALRM
    kicks it out of this state.

    The fix is simple - only modify the pre-allocated sigqueue
    once we're sure that it hasn't already been queued.

    Signed-off-by: Mark McLoughlin
    Cc: Oleg Nesterov
    Cc: Roland McGrath

    ---
    include/linux/sched.h | 2 +-
    kernel/posix-timers.c | 20 +++++++++++---------
    kernel/signal.c | 5 +++--
    3 files changed, 15 insertions(+), 12 deletions(-)

    diff --git a/include/linux/sched.h b/include/linux/sched.h
    index 2134917..718f7ec 100644
    --- a/include/linux/sched.h
    +++ b/include/linux/sched.h
    @@ -1791,7 +1791,7 @@ extern void zap_other_threads(struct task_struct *p);
    extern int kill_proc(pid_t, int, int);
    extern struct sigqueue *sigqueue_alloc(void);
    extern void sigqueue_free(struct sigqueue *);
    -extern int send_sigqueue(struct sigqueue *, struct task_struct *, int group);
    +extern int send_sigqueue(struct sigqueue *, siginfo_t *, struct task_struct *, int group);
    extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *);
    extern int do_sigaltstack(const stack_t __user *, stack_t __user *, unsigned long);

    diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
    index dbd8398..b42c964 100644
    --- a/kernel/posix-timers.c
    +++ b/kernel/posix-timers.c
    @@ -298,19 +298,21 @@ void do_schedule_next_timer(struct siginfo *info)

    int posix_timer_event(struct k_itimer *timr,int si_private)
    {
    - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    - timr->sigq->info.si_sys_private = si_private;
    + siginfo_t info;
    +
    + memset(&info, 0, sizeof(siginfo_t));
    + info.si_sys_private = si_private;
    /* Send signal to the process that owns this timer.*/

    - timr->sigq->info.si_signo = timr->it_sigev_signo;
    - timr->sigq->info.si_errno = 0;
    - timr->sigq->info.si_code = SI_TIMER;
    - timr->sigq->info.si_tid = timr->it_id;
    - timr->sigq->info.si_value = timr->it_sigev_value;
    + info.si_signo = timr->it_sigev_signo;
    + info.si_errno = 0;
    + info.si_code = SI_TIMER;
    + info.si_tid = timr->it_id;
    + info.si_value = timr->it_sigev_value;

    if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
    struct task_struct *leader;
    - int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
    + int ret = send_sigqueue(timr->sigq, &info, timr->it_process, 0);

    if (likely(ret >= 0))
    return ret;
    @@ -321,7 +323,7 @@ int posix_timer_event(struct k_itimer *timr,int si_private)
    timr->it_process = leader;
    }

    - return send_sigqueue(timr->sigq, timr->it_process, 1);
    + return send_sigqueue(timr->sigq, &info, timr->it_process, 1);
    }
    EXPORT_SYMBOL_GPL(posix_timer_event);

    diff --git a/kernel/signal.c b/kernel/signal.c
    index 6c0958e..50e0b13 100644
    --- a/kernel/signal.c
    +++ b/kernel/signal.c
    @@ -1292,9 +1292,9 @@ void sigqueue_free(struct sigqueue *q)
    __sigqueue_free(q);
    }

    -int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
    +int send_sigqueue(struct sigqueue *q, siginfo_t *info, struct task_struct *t, int group)
    {
    - int sig = q->info.si_signo;
    + int sig = info->si_signo;
    struct sigpending *pending;
    unsigned long flags;
    int ret;
    @@ -1322,6 +1322,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)

    signalfd_notify(t, sig);
    pending = group ? &t->signal->shared_pending : &t->pending;
    + copy_siginfo(&q->info, info);
    list_add_tail(&q->list, &pending->list);
    sigaddset(&pending->signal, sig);
    complete_signal(sig, t, group);
    --
    1.5.5.1

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] posix-timers: Do not modify an already queued timer signal

    On 07/16, Mark McLoughlin wrote:
    >
    > When a timer fires, posix_timer_event() zeroes out its
    > pre-allocated siginfo structure, initialises it and then
    > queues up the signal with send_sigqueue().
    >
    > However, we may have previously queued up this signal, in
    > which case we only want to increment si_overrun and
    > re-initialising the siginfo structure is incorrect.


    Quoting Roland McGrath:
    >
    > I'm not clear on how the already-queued case could ever happen. Do we
    > really need that check at all? It shouldn't be possible for the timer to
    > be firing when it's already queued, because it won't have been reloaded.
    > It only reloads via do_schedule_next_timer after it's dequeued, or because
    > a 1 return value said it never was queued.


    > Also, since we are modifying an already queued signal
    > without the protection of the sighand spinlock, we may also
    > race with e.g. collect_signal() causing it to fail to find
    > a signal on the pending list because it happens to look at
    > the siginfo struct after it was zeroed and before it was
    > re-initialised.
    >
    > The race was observed with a modified kvm-userspace when
    > running a guest under heavy network load. When it occurs,
    > KVM never sees another SIGALRM signal because although
    > the signal is queued up the appropriate bit is never set
    > in the pending mask.


    Hmm. Yes, if collect_signal() races with posix_timer_event()->memset(),
    we dequeue SIGALRM but leave the siginfo on list. I can't see how
    this is possible though...

    Could you verify that we don't have another bug? Say, add
    WARN_ON(!list_empty()) to posix_timer_event().


    If we need this fix, perhaps it is better to modify posix_timer_event()
    to check !list_empty()?

    Add Thomas.

    Oleg.

    > Manually sending the process a SIGALRM
    > kicks it out of this state.
    >
    > The fix is simple - only modify the pre-allocated sigqueue
    > once we're sure that it hasn't already been queued.
    >
    > Signed-off-by: Mark McLoughlin
    > Cc: Oleg Nesterov
    > Cc: Roland McGrath
    >
    > ---
    > include/linux/sched.h | 2 +-
    > kernel/posix-timers.c | 20 +++++++++++---------
    > kernel/signal.c | 5 +++--
    > 3 files changed, 15 insertions(+), 12 deletions(-)
    >
    > diff --git a/include/linux/sched.h b/include/linux/sched.h
    > index 2134917..718f7ec 100644
    > --- a/include/linux/sched.h
    > +++ b/include/linux/sched.h
    > @@ -1791,7 +1791,7 @@ extern void zap_other_threads(struct task_struct *p);
    > extern int kill_proc(pid_t, int, int);
    > extern struct sigqueue *sigqueue_alloc(void);
    > extern void sigqueue_free(struct sigqueue *);
    > -extern int send_sigqueue(struct sigqueue *, struct task_struct *, int group);
    > +extern int send_sigqueue(struct sigqueue *, siginfo_t *, struct task_struct *, int group);
    > extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *);
    > extern int do_sigaltstack(const stack_t __user *, stack_t __user *, unsigned long);
    >
    > diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
    > index dbd8398..b42c964 100644
    > --- a/kernel/posix-timers.c
    > +++ b/kernel/posix-timers.c
    > @@ -298,19 +298,21 @@ void do_schedule_next_timer(struct siginfo *info)
    >
    > int posix_timer_event(struct k_itimer *timr,int si_private)
    > {
    > - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    > - timr->sigq->info.si_sys_private = si_private;
    > + siginfo_t info;
    > +
    > + memset(&info, 0, sizeof(siginfo_t));
    > + info.si_sys_private = si_private;
    > /* Send signal to the process that owns this timer.*/
    >
    > - timr->sigq->info.si_signo = timr->it_sigev_signo;
    > - timr->sigq->info.si_errno = 0;
    > - timr->sigq->info.si_code = SI_TIMER;
    > - timr->sigq->info.si_tid = timr->it_id;
    > - timr->sigq->info.si_value = timr->it_sigev_value;
    > + info.si_signo = timr->it_sigev_signo;
    > + info.si_errno = 0;
    > + info.si_code = SI_TIMER;
    > + info.si_tid = timr->it_id;
    > + info.si_value = timr->it_sigev_value;
    >
    > if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
    > struct task_struct *leader;
    > - int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
    > + int ret = send_sigqueue(timr->sigq, &info, timr->it_process, 0);
    >
    > if (likely(ret >= 0))
    > return ret;
    > @@ -321,7 +323,7 @@ int posix_timer_event(struct k_itimer *timr,int si_private)
    > timr->it_process = leader;
    > }
    >
    > - return send_sigqueue(timr->sigq, timr->it_process, 1);
    > + return send_sigqueue(timr->sigq, &info, timr->it_process, 1);
    > }
    > EXPORT_SYMBOL_GPL(posix_timer_event);
    >
    > diff --git a/kernel/signal.c b/kernel/signal.c
    > index 6c0958e..50e0b13 100644
    > --- a/kernel/signal.c
    > +++ b/kernel/signal.c
    > @@ -1292,9 +1292,9 @@ void sigqueue_free(struct sigqueue *q)
    > __sigqueue_free(q);
    > }
    >
    > -int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
    > +int send_sigqueue(struct sigqueue *q, siginfo_t *info, struct task_struct *t, int group)
    > {
    > - int sig = q->info.si_signo;
    > + int sig = info->si_signo;
    > struct sigpending *pending;
    > unsigned long flags;
    > int ret;
    > @@ -1322,6 +1322,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
    >
    > signalfd_notify(t, sig);
    > pending = group ? &t->signal->shared_pending : &t->pending;
    > + copy_siginfo(&q->info, info);
    > list_add_tail(&q->list, &pending->list);
    > sigaddset(&pending->signal, sig);
    > complete_signal(sig, t, group);
    > --
    > 1.5.5.1
    >


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] posix-timers: Do not modify an already queued timer signal

    Hi Oleg,

    On Wed, 2008-07-16 at 20:21 +0400, Oleg Nesterov wrote:
    > On 07/16, Mark McLoughlin wrote:
    > >
    > > When a timer fires, posix_timer_event() zeroes out its
    > > pre-allocated siginfo structure, initialises it and then
    > > queues up the signal with send_sigqueue().
    > >
    > > However, we may have previously queued up this signal, in
    > > which case we only want to increment si_overrun and
    > > re-initialising the siginfo structure is incorrect.

    >
    > Quoting Roland McGrath:
    > >
    > > I'm not clear on how the already-queued case could ever happen. Do we
    > > really need that check at all? It shouldn't be possible for the timer to
    > > be firing when it's already queued, because it won't have been reloaded.
    > > It only reloads via do_schedule_next_timer after it's dequeued, or because
    > > a 1 return value said it never was queued.


    The app can reload the timer itself before the signal has been dequeued
    via signalfd ...

    > > Also, since we are modifying an already queued signal
    > > without the protection of the sighand spinlock, we may also
    > > race with e.g. collect_signal() causing it to fail to find
    > > a signal on the pending list because it happens to look at
    > > the siginfo struct after it was zeroed and before it was
    > > re-initialised.
    > >
    > > The race was observed with a modified kvm-userspace when
    > > running a guest under heavy network load. When it occurs,
    > > KVM never sees another SIGALRM signal because although
    > > the signal is queued up the appropriate bit is never set
    > > in the pending mask.

    >
    > Hmm. Yes, if collect_signal() races with posix_timer_event()->memset(),
    > we dequeue SIGALRM but leave the siginfo on list. I can't see how
    > this is possible though...
    >
    > Could you verify that we don't have another bug? Say, add
    > WARN_ON(!list_empty()) to posix_timer_event().


    Yes, this case definitely occurs.

    Now that I know what's going on, I've finally managed to extract a
    standalone test case. Try this:

    http://markmc.fedorapeople.org/test-posix-timer-race.c

    On my quad-core machine it triggers the race fairly readily.

    > If we need this fix, perhaps it is better to modify posix_timer_event()
    > to check !list_empty()?


    Yeah, I had considered that, but it's a tad more invasive. See below.

    I mainly don't like this patch because we may lock the sighand from one
    thread's task_struct and then unlock it via the group leader's sighand.
    That probably is safe, but is pretty nasty.

    Cheers,
    Mark.

    Subject: [PATCH] posix-timers: Do not modify an already queued timer signal

    When a timer fires, posix_timer_event() zeroes out its
    pre-allocated siginfo structure, initialises it and then
    queues up the signal with send_sigqueue().

    However, we may have previously queued up this signal, in
    which case we only want to increment si_overrun and
    re-initialising the siginfo structure is incorrect.

    Also, since we are modifying an already queued signal
    without the protection of the sighand spinlock, we may also
    race with e.g. collect_signal() causing it to fail to find
    a signal on the pending list because it happens to look at
    the siginfo struct after it was zeroed and before it was
    re-initialised.

    The race was observed with a modified kvm-userspace when
    running a guest under heavy network load. When it occurs,
    KVM never sees another SIGALRM signal because although
    the signal is queued up the appropriate bit is never set
    in the pending mask. Manually sending the process a SIGALRM
    kicks it out of this state.

    The fix is simple - only modify the pre-allocated sigqueue
    once we're sure that it hasn't already been queued.

    Signed-off-by: Mark McLoughlin
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    ---
    kernel/posix-timers.c | 23 ++++++++++++++++++++---
    kernel/signal.c | 27 ++++-----------------------
    2 files changed, 24 insertions(+), 26 deletions(-)

    diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
    index dbd8398..65ce122 100644
    --- a/kernel/posix-timers.c
    +++ b/kernel/posix-timers.c
    @@ -298,6 +298,19 @@ void do_schedule_next_timer(struct siginfo *info)

    int posix_timer_event(struct k_itimer *timr,int si_private)
    {
    + unsigned long flags;
    + int ret = -1;
    +
    + if (!likely(lock_task_sighand(timr->it_process, &flags)))
    + goto ret;
    +
    + ret = 0;
    + if (unlikely(!list_empty(&timr->sigq->list))) {
    + /* Already queued; just increment the overrun count */
    + timr->sigq->info.si_overrun++;
    + goto out;
    + }
    +
    memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    timr->sigq->info.si_sys_private = si_private;
    /* Send signal to the process that owns this timer.*/
    @@ -310,10 +323,10 @@ int posix_timer_event(struct k_itimer *timr,int si_private)

    if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
    struct task_struct *leader;
    - int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
    + ret = send_sigqueue(timr->sigq, timr->it_process, 0);

    if (likely(ret >= 0))
    - return ret;
    + goto out;

    timr->it_sigev_notify = SIGEV_SIGNAL;
    leader = timr->it_process->group_leader;
    @@ -321,7 +334,11 @@ int posix_timer_event(struct k_itimer *timr,int si_private)
    timr->it_process = leader;
    }

    - return send_sigqueue(timr->sigq, timr->it_process, 1);
    + ret = send_sigqueue(timr->sigq, timr->it_process, 1);
    +out:
    + unlock_task_sighand(timr->it_process, &flags);
    +ret:
    + return ret;
    }
    EXPORT_SYMBOL_GPL(posix_timer_event);

    diff --git a/kernel/signal.c b/kernel/signal.c
    index 6c0958e..62eb972 100644
    --- a/kernel/signal.c
    +++ b/kernel/signal.c
    @@ -1296,39 +1296,20 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
    {
    int sig = q->info.si_signo;
    struct sigpending *pending;
    - unsigned long flags;
    - int ret;

    BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
    + BUG_ON(!list_empty(&q->list));

    - ret = -1;
    - if (!likely(lock_task_sighand(t, &flags)))
    - goto ret;
    -
    - ret = 1; /* the signal is ignored */
    if (!prepare_signal(sig, t))
    - goto out;
    -
    - ret = 0;
    - if (unlikely(!list_empty(&q->list))) {
    - /*
    - * If an SI_TIMER entry is already queue just increment
    - * the overrun count.
    - */
    - BUG_ON(q->info.si_code != SI_TIMER);
    - q->info.si_overrun++;
    - goto out;
    - }
    + return 1; /* the signal is ignored */

    signalfd_notify(t, sig);
    pending = group ? &t->signal->shared_pending : &t->pending;
    list_add_tail(&q->list, &pending->list);
    sigaddset(&pending->signal, sig);
    complete_signal(sig, t, group);
    -out:
    - unlock_task_sighand(t, &flags);
    -ret:
    - return ret;
    +
    + return 0;
    }

    /*
    --
    1.5.4.1



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] posix-timers: Do not modify an already queued timer signal

    On 07/17, Mark McLoughlin wrote:
    >
    > On Wed, 2008-07-16 at 20:21 +0400, Oleg Nesterov wrote:
    > > On 07/16, Mark McLoughlin wrote:
    > > >
    > > > When a timer fires, posix_timer_event() zeroes out its
    > > > pre-allocated siginfo structure, initialises it and then
    > > > queues up the signal with send_sigqueue().
    > > >
    > > > However, we may have previously queued up this signal, in
    > > > which case we only want to increment si_overrun and
    > > > re-initialising the siginfo structure is incorrect.

    > >
    > > Quoting Roland McGrath:
    > > >
    > > > I'm not clear on how the already-queued case could ever happen. Do we
    > > > really need that check at all? It shouldn't be possible for the timer to
    > > > be firing when it's already queued, because it won't have been reloaded.
    > > > It only reloads via do_schedule_next_timer after it's dequeued, or because
    > > > a 1 return value said it never was queued.

    >
    > The app can reload the timer itself before the signal has been dequeued
    > via signalfd ...


    Indeed! Thanks Mark.

    Thomas, Roland, could you take a look?


    > > If we need this fix, perhaps it is better to modify posix_timer_event()
    > > to check !list_empty()?

    >
    > Yeah, I had considered that, but it's a tad more invasive. See below.
    >
    > I mainly don't like this patch


    Agreed, this one looks worse.


    I forgot (if ever knew this code completely, but can't we make a simpler
    fix? posix_timer_event() can check list_empty() lockless,

    posix_timer_event()
    {
    if (!list_emtpy(sigq->list))
    return 0;

    ... fill and send ->sigq...
    }

    When the signal will be dequeued, schedule_next_timer() should afaics
    set ->it_overrun_last which is copied to ->si_overrun then. Or we can
    increment timr->it_overrun before return if I misread hrtimer_forward().

    What do you think?


    Another possible fix... we can change sys_timer_settime() to do
    sigqueue_free() and re-alloc ->sigq when it is pending. Not that
    I like this very much though.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] posix-timers: Do not modify an already queued timer signal

    On Thu, 2008-07-17 at 17:55 +0400, Oleg Nesterov wrote:

    > I forgot (if ever knew this code completely, but can't we make a simpler
    > fix? posix_timer_event() can check list_empty() lockless,
    >
    > posix_timer_event()
    > {
    > if (!list_emtpy(sigq->list))
    > return 0;
    >
    > ... fill and send ->sigq...
    > }


    Well, one issue with this is that we need to set the si_private supplied
    to posix_timer_event() on the queued siginfo. See updated version of the
    original patch below.

    So, for that reason, we can't currently do it lockless.

    Now, I've spent a while looking at the it_requeue_pending code and I
    can't fully satisfy myself that we need it to be a modification counter
    that we match up via si_sys_private. Do you know why this is needed? It
    seems to me that it could be seriously simplified.

    The explanation in the pre-2.6 code I could find was:

    * An interesting problem can occure if, while a signal, and thus a call
    * back is pending, the timer is rearmed, i.e. stopped and restarted.
    * We then need to sort out the call back and do the right thing. What
    * we do is to put a counter in the info block and match it with the
    * timers copy on the call back.

    > When the signal will be dequeued, schedule_next_timer() should afaics
    > set ->it_overrun_last which is copied to ->si_overrun then. Or we can
    > increment timr->it_overrun before return if I misread hrtimer_forward().


    Yep; that sounds reasonable to me - i.e. do all the overrun accounting
    in a callback just before the signal is to be delivered.

    However, since the race condition I identified basically breaks the
    timer permanently for the app (i.e. it will never fire again unless the
    signal is delivered to the process for another reason) I do think the
    fix below makes sense in advance of simplifying all this long standing
    code and correcting overrun accounting.

    Cheers,
    Mark.

    Subject: [PATCH] posix-timers: Do not modify an already queued timer signal

    When a timer fires, posix_timer_event() zeroes out its
    pre-allocated siginfo structure, initialises it and then
    queues up the signal with send_sigqueue().

    However, we may have previously queued up this signal, in
    which case we only want to increment si_overrun and
    re-initialising the siginfo structure is incorrect.

    Also, since we are modifying an already queued signal
    without the protection of the sighand spinlock, we may also
    race with e.g. collect_signal() causing it to fail to find
    a signal on the pending list because it happens to look at
    the siginfo struct after it was zeroed and before it was
    re-initialised.

    The race was observed with a modified kvm-userspace when
    running a guest under heavy network load. When it occurs,
    KVM never sees another SIGALRM signal because although
    the signal is queued up the appropriate bit is never set
    in the pending mask. Manually sending the process a SIGALRM
    kicks it out of this state.

    The fix is simple - only modify the pre-allocated sigqueue
    once we're sure that it hasn't already been queued.

    Signed-off-by: Mark McLoughlin
    Cc: Oleg Nesterov
    Cc: Roland McGrath
    Cc: Thomas Gleixner
    ---
    include/linux/sched.h | 2 +-
    kernel/posix-timers.c | 23 ++++++++++++-----------
    kernel/signal.c | 6 ++++--
    3 files changed, 17 insertions(+), 14 deletions(-)

    diff --git a/include/linux/sched.h b/include/linux/sched.h
    index c5d3f84..f260585 100644
    --- a/include/linux/sched.h
    +++ b/include/linux/sched.h
    @@ -1786,7 +1786,7 @@ extern void zap_other_threads(struct task_struct *p);
    extern int kill_proc(pid_t, int, int);
    extern struct sigqueue *sigqueue_alloc(void);
    extern void sigqueue_free(struct sigqueue *);
    -extern int send_sigqueue(struct sigqueue *, struct task_struct *, int group);
    +extern int send_sigqueue(struct sigqueue *, siginfo_t *, struct task_struct *, int group);
    extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *);
    extern int do_sigaltstack(const stack_t __user *, stack_t __user *, unsigned long);

    diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
    index dbd8398..d3a8561 100644
    --- a/kernel/posix-timers.c
    +++ b/kernel/posix-timers.c
    @@ -296,21 +296,22 @@ void do_schedule_next_timer(struct siginfo *info)
    unlock_timer(timr, flags);
    }

    -int posix_timer_event(struct k_itimer *timr,int si_private)
    +int posix_timer_event(struct k_itimer *timr, int si_private)
    {
    - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    - timr->sigq->info.si_sys_private = si_private;
    - /* Send signal to the process that owns this timer.*/
    + siginfo_t info;

    - timr->sigq->info.si_signo = timr->it_sigev_signo;
    - timr->sigq->info.si_errno = 0;
    - timr->sigq->info.si_code = SI_TIMER;
    - timr->sigq->info.si_tid = timr->it_id;
    - timr->sigq->info.si_value = timr->it_sigev_value;
    + memset(&info, 0, sizeof(siginfo_t));
    +
    + info.si_sys_private = si_private;
    + info.si_signo = timr->it_sigev_signo;
    + info.si_errno = 0;
    + info.si_code = SI_TIMER;
    + info.si_tid = timr->it_id;
    + info.si_value = timr->it_sigev_value;

    if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
    struct task_struct *leader;
    - int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
    + int ret = send_sigqueue(timr->sigq, &info, timr->it_process, 0);

    if (likely(ret >= 0))
    return ret;
    @@ -321,7 +322,7 @@ int posix_timer_event(struct k_itimer *timr,int si_private)
    timr->it_process = leader;
    }

    - return send_sigqueue(timr->sigq, timr->it_process, 1);
    + return send_sigqueue(timr->sigq, &info, timr->it_process, 1);
    }
    EXPORT_SYMBOL_GPL(posix_timer_event);

    diff --git a/kernel/signal.c b/kernel/signal.c
    index 6c0958e..95a34ff 100644
    --- a/kernel/signal.c
    +++ b/kernel/signal.c
    @@ -1292,9 +1292,9 @@ void sigqueue_free(struct sigqueue *q)
    __sigqueue_free(q);
    }

    -int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
    +int send_sigqueue(struct sigqueue *q, siginfo_t *info, struct task_struct *t, int group)
    {
    - int sig = q->info.si_signo;
    + int sig = info->si_signo;
    struct sigpending *pending;
    unsigned long flags;
    int ret;
    @@ -1317,12 +1317,13 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
    */
    BUG_ON(q->info.si_code != SI_TIMER);
    q->info.si_overrun++;
    + q->info.si_sys_private = info->si_sys_private;
    goto out;
    }

    signalfd_notify(t, sig);
    pending = group ? &t->signal->shared_pending : &t->pending;
    + copy_siginfo(&q->info, info);
    list_add_tail(&q->list, &pending->list);
    sigaddset(&pending->signal, sig);
    complete_signal(sig, t, group);
    --
    1.5.5.1



    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] posix-timers: Do not modify an already queued timer signal

    On 07/18, Mark McLoughlin wrote:
    >
    > On Thu, 2008-07-17 at 17:55 +0400, Oleg Nesterov wrote:
    >
    > > I forgot (if ever knew this code completely, but can't we make a simpler
    > > fix? posix_timer_event() can check list_empty() lockless,
    > >
    > > posix_timer_event()
    > > {
    > > if (!list_emtpy(sigq->list))
    > > return 0;
    > >
    > > ... fill and send ->sigq...
    > > }

    >
    > Well, one issue with this is that we need to set the si_private supplied
    > to posix_timer_event() on the queued siginfo. See updated version of the
    > original patch below.
    >
    > So, for that reason, we can't currently do it lockless.
    >
    > Now, I've spent a while looking at the it_requeue_pending code and I
    > can't fully satisfy myself that we need it to be a modification counter
    > that we match up via si_sys_private. Do you know why this is needed? It
    > seems to me that it could be seriously simplified.


    No, I don't understand what does si_sys_private mean. In fact I don't even
    understand what should we do with info.si_overrun in this corner case.

    We have the active timer, the app does sys_timer_settime() which changes the
    timer. This looks like creating the new timer which "inherits" ->it_id and
    ->it_sigev_value. But the queued siginfo is connected to the "old" timer...
    OK, I just don't understand this all.

    > Subject: [PATCH] posix-timers: Do not modify an already queued timer signal
    >
    > When a timer fires, posix_timer_event() zeroes out its
    > pre-allocated siginfo structure, initialises it and then
    > queues up the signal with send_sigqueue().
    >
    > However, we may have previously queued up this signal, in
    > which case we only want to increment si_overrun and
    > re-initialising the siginfo structure is incorrect.
    >
    > Also, since we are modifying an already queued signal
    > without the protection of the sighand spinlock, we may also
    > race with e.g. collect_signal() causing it to fail to find
    > a signal on the pending list because it happens to look at
    > the siginfo struct after it was zeroed and before it was
    > re-initialised.
    >
    > The race was observed with a modified kvm-userspace when
    > running a guest under heavy network load. When it occurs,
    > KVM never sees another SIGALRM signal because although
    > the signal is queued up the appropriate bit is never set
    > in the pending mask. Manually sending the process a SIGALRM
    > kicks it out of this state.


    Please update the changelog to explain how it is possible to hit the
    already queued siginfo.

    > -int posix_timer_event(struct k_itimer *timr,int si_private)
    > +int posix_timer_event(struct k_itimer *timr, int si_private)
    > {
    > - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    > - timr->sigq->info.si_sys_private = si_private;
    > - /* Send signal to the process that owns this timer.*/
    > + siginfo_t info;
    >
    > - timr->sigq->info.si_signo = timr->it_sigev_signo;
    > - timr->sigq->info.si_errno = 0;
    > - timr->sigq->info.si_code = SI_TIMER;
    > - timr->sigq->info.si_tid = timr->it_id;
    > - timr->sigq->info.si_value = timr->it_sigev_value;
    > + memset(&info, 0, sizeof(siginfo_t));
    > +
    > + info.si_sys_private = si_private;
    > + info.si_signo = timr->it_sigev_signo;
    > + info.si_errno = 0;
    > + info.si_code = SI_TIMER;
    > + info.si_tid = timr->it_id;
    > + info.si_value = timr->it_sigev_value;
    >
    > if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
    > struct task_struct *leader;
    > - int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
    > + int ret = send_sigqueue(timr->sigq, &info, timr->it_process, 0);


    I think this is a bit overkill. Note that (unless I missed something)
    posix_timer_event() populates timr->sigq->info with the same numbers
    every time, so afaics we can do

    --- kernel/posix-timers.c
    +++ kernel/posix-timers.c
    @@ -298,19 +298,14 @@ void do_schedule_next_timer(struct sigin

    int posix_timer_event(struct k_itimer *timr,int si_private)
    {
    - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    - timr->sigq->info.si_sys_private = si_private;
    - /* Send signal to the process that owns this timer.*/
    -
    timr->sigq->info.si_signo = timr->it_sigev_signo;
    - timr->sigq->info.si_errno = 0;
    timr->sigq->info.si_code = SI_TIMER;
    timr->sigq->info.si_tid = timr->it_id;
    timr->sigq->info.si_value = timr->it_sigev_value;

    if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
    struct task_struct *leader;
    - int ret = send_sigqueue(timr->sigq, timr->it_process, 0);
    + int ret = send_sigqueue(timr->sigq, si_private, timr->it_process, 0);

    if (likely(ret >= 0))
    return ret;
    @@ -321,7 +316,7 @@ int posix_timer_event(struct k_itimer *t
    timr->it_process = leader;
    }

    - return send_sigqueue(timr->sigq, timr->it_process, 1);
    + return send_sigqueue(timr->sigq, si_private, timr->it_process, 1);
    }
    EXPORT_SYMBOL_GPL(posix_timer_event);

    @@ -435,6 +430,7 @@ static struct k_itimer * alloc_posix_tim
    kmem_cache_free(posix_timers_cache, tmr);
    tmr = NULL;
    }
    + memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    return tmr;
    }

    --- kernel/signal.c
    +++ kernel/signal.c
    @@ -1283,7 +1283,7 @@ void sigqueue_free(struct sigqueue *q)
    __sigqueue_free(q);
    }

    -int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
    +int send_sigqueue(struct sigqueue *q, int si_private, struct task_struct *t, int group)
    {
    int sig = q->info.si_signo;
    struct sigpending *pending;
    @@ -1300,6 +1300,8 @@ int send_sigqueue(struct sigqueue *q, st
    if (!prepare_signal(sig, t))
    goto out;

    + q->info.si_sys_private = info->si_sys_private;
    +
    ret = 0;
    if (unlikely(!list_empty(&q->list))) {
    /*

    But can't we do a simpler change?

    --- kernel/posix-timers.c
    +++ kernel/posix-timers.c
    @@ -298,7 +298,6 @@ void do_schedule_next_timer(struct sigin

    int posix_timer_event(struct k_itimer *timr,int si_private)
    {
    - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    timr->sigq->info.si_sys_private = si_private;
    /* Send signal to the process that owns this timer.*/

    @@ -435,6 +434,7 @@ static struct k_itimer * alloc_posix_tim
    kmem_cache_free(posix_timers_cache, tmr);
    tmr = NULL;
    }
    + memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    return tmr;
    }

    Yes, if sigq->info is queued, it can be dequeued right after
    ".si_sys_private = si_private" and before we send the signal. As I said,
    I don't know what si_sys_private means for the user-level, is this bad?

    Note that the we can't race with do_schedule_next_timer(), the timer is
    locked.

    Thoughts?

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] posix-timers: Do not modify an already queued timer signal

    I think the solution we want is to make sure that a timer whose
    expiration signal is still pending never gets a second notification.

    POSIX says, "The effect of disarming or resetting a timer with pending
    expiration notifications is unspecified." This gives us lots of
    latitude. I think the best behavior is the one that comports with the
    general specification, "Only a single signal shall be queued to the
    process for a given timer at any point in time."

    The it_requeue_pending/si_sys_private logic is indeed intended to
    address this case. It predates my involvement of the code, and I
    agree it has always looked oddly hairy. Its plan is to make sure that
    dequeue_signal's call to do_schedule_next_timer does not re-arm the
    timer when an intervening timer_settime call has already done so. As
    Mark found, this is buggy because it's not really safe for the timer
    to go off (call posix_timer_event again) before the dequeue.

    In the signals code, si_sys_private is used purely as a flag that
    dequeue_signal needs to bother with dropping the siglock to call
    do_schedule_next_timer. Its use as a serialization counter between
    timer_settime and do_schedule_next_timer is entirely local to the
    posix-timers code.

    What userland should observe is that when the new timer expiration
    time arrives while the previously-fired timer signal is still pending,
    there is no new signal, and the timer accrues an overrun.

    An obvious idea is simply not to re-arm in timer_settime when the
    timer already has a pending notification. When the signal gets
    dequeued, do_schedule_next_timer will arm it then. However, the way
    we dispatch to different clocks' timer_set functions makes it awkward
    to validate and store a new setting and fetch the old value without
    actually arming the timer. Also, the logic for one-shot timers has
    more wrinkles.

    So I think the thing to do is arm the timer in timer_settime even when
    it's already pending (as we do now), but have posix_timer_event detect
    a firing when already queued and just bump the overrun count.

    We can do away with it_requeue_pending, and make si_sys_private purely
    a simple 0/1 flag for whether to call do_schedule_next_timer. To
    optimize the extra checks, we use different bookkeeping fields in
    struct k_itimer.

    Have a flag that means "might be queued". do_schedule_next_timer
    replaces:

    if (timr && timr->it_requeue_pending == info->si_sys_private) {
    ...
    }

    with:

    if (timr && timr->fired) {
    timr->fired = 0;
    ...
    }

    That avoids races with timer_settime, which does (after timer_set
    call, timer still locked):

    if (timr->fired) {
    spin_lock(&current->sighand->siglock);
    if (list_empty(&timr->sigq->list))
    timr->fired = 0;
    spin_unlock(&current->sighand->siglock);
    }

    In posix_timer_event (timer is locked), check:

    if (timr->fired) {
    spin_lock(&current->sighand->siglock);
    if (list_empty(&timr->sigq->list))
    timr->fired = 0;
    spin_unlock(&current->sighand->siglock);
    if (timr->fired) {
    timr->it_overrun++;
    return 0;
    }
    }
    timr->fired = 1;

    The siglock'd double-check is in case of a one-shot firing that didn't
    ever call do_schedule_next_timer to reset the flag.

    This bookkeeping plan may need more thought. I think the idea is
    sound: "firing", meaning notification and reload, only occurs when
    both the expiration time has passed and the previous expiration
    notification signal has been dequeued. When the expiration time
    passes but the other criterion is unmet, there's an overrun and no new
    notification or reload.


    Thanks,
    Roland
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] posix-timers: Do not modify an already queued timer signal

    On 07/19, Roland McGrath wrote:
    >
    > I think the solution we want is to make sure that a timer whose
    > expiration signal is still pending never gets a second notification.
    >
    > POSIX says, "The effect of disarming or resetting a timer with pending
    > expiration notifications is unspecified." This gives us lots of
    > latitude. I think the best behavior is the one that comports with the
    > general specification, "Only a single signal shall be queued to the
    > process for a given timer at any point in time."
    >
    > The it_requeue_pending/si_sys_private logic is indeed intended to
    > address this case. It predates my involvement of the code, and I
    > agree it has always looked oddly hairy. Its plan is to make sure that
    > dequeue_signal's call to do_schedule_next_timer does not re-arm the
    > timer when an intervening timer_settime call has already done so. As
    > Mark found, this is buggy because it's not really safe for the timer
    > to go off (call posix_timer_event again) before the dequeue.
    >
    > In the signals code, si_sys_private is used purely as a flag that
    > dequeue_signal needs to bother with dropping the siglock to call
    > do_schedule_next_timer. Its use as a serialization counter between
    > timer_settime and do_schedule_next_timer is entirely local to the
    > posix-timers code.


    Yes, thanks, I see. But does it have any meaning for the user-space?

    Let me repeat. Can't we make a simple fix for now for this nasty and
    ancient bug, before we do the more clever changes,

    --- kernel/posix-timers.c
    +++ kernel/posix-timers.c
    @@ -298,12 +298,10 @@ void do_schedule_next_timer(struct sigin

    int posix_timer_event(struct k_itimer *timr,int si_private)
    {
    - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    timr->sigq->info.si_sys_private = si_private;
    /* Send signal to the process that owns this timer.*/

    timr->sigq->info.si_signo = timr->it_sigev_signo;
    - timr->sigq->info.si_errno = 0;
    timr->sigq->info.si_code = SI_TIMER;
    timr->sigq->info.si_tid = timr->it_id;
    timr->sigq->info.si_value = timr->it_sigev_value;
    @@ -435,6 +433,7 @@ static struct k_itimer * alloc_posix_tim
    kmem_cache_free(posix_timers_cache, tmr);
    tmr = NULL;
    }
    + memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    return tmr;
    }


    ?

    Suppose ->sigq is queued. In that case "si_sys_private = si_private"
    can race with dequeue_signal/do_schedule_next_timer, but everything
    looks OK because the timer is locked.

    If dequeue_signal() sees the old value of si_sys_private, it won't
    call do_schedule_next_timer(). If it sees the new value, do_schedule_next_timer()
    will wait for ->it_lock.

    Yes, in both cases we can queue ->sigq again instead of incrementing
    info.si_overrun. But this is not worse than we have now, and afaics
    this is also possible with timr->fired, please see below.

    > What userland should observe is that when the new timer expiration
    > time arrives while the previously-fired timer signal is still pending,
    > there is no new signal, and the timer accrues an overrun.
    >
    > An obvious idea is simply not to re-arm in timer_settime when the
    > timer already has a pending notification. When the signal gets
    > dequeued, do_schedule_next_timer will arm it then. However, the way
    > we dispatch to different clocks' timer_set functions makes it awkward
    > to validate and store a new setting and fetch the old value without
    > actually arming the timer. Also, the logic for one-shot timers has
    > more wrinkles.
    >
    > So I think the thing to do is arm the timer in timer_settime even when
    > it's already pending (as we do now), but have posix_timer_event detect
    > a firing when already queued and just bump the overrun count.
    >
    > We can do away with it_requeue_pending, and make si_sys_private purely
    > a simple 0/1 flag for whether to call do_schedule_next_timer. To
    > optimize the extra checks, we use different bookkeeping fields in
    > struct k_itimer.
    >
    > Have a flag that means "might be queued". do_schedule_next_timer
    > replaces:
    >
    > if (timr && timr->it_requeue_pending == info->si_sys_private) {
    > ...
    > }
    >
    > with:
    >
    > if (timr && timr->fired) {
    > timr->fired = 0;
    > ...
    > }
    >
    > That avoids races with timer_settime, which does (after timer_set
    > call, timer still locked):
    >
    > if (timr->fired) {
    > spin_lock(&current->sighand->siglock);
    > if (list_empty(&timr->sigq->list))
    > timr->fired = 0;
    > spin_unlock(&current->sighand->siglock);
    > }
    >
    > In posix_timer_event (timer is locked), check:
    >
    > if (timr->fired) {
    > spin_lock(&current->sighand->siglock);
    > if (list_empty(&timr->sigq->list))
    > timr->fired = 0;
    > spin_unlock(&current->sighand->siglock);


    (we can't use current->sighand->siglock, this is irq context, we should
    use timr->it_process->group_leader...)

    Suppose that the signal was already dequeued but sigq->info wasn't copied
    to the user-space. The thread which does dequeue_signal() unlocked siglock
    and waits for ->it_lock.

    posix_timer_event() sees list_empty(), proceeds and calls send_sigqueue().
    Now we requeue this ->sigq instead of incrementing si_overrun.

    The thread which does dequeue_signal() continues and re-schedules the
    timer while ->sigq is queued. Then it copies sigq->info to the user space.

    The same problem as with the simple patch above. Not very bad though,
    afaics. The user space just recieves 2 signals instead of one with
    the right value of si_overrun.



    While we are here, off-topics question. posix_timer_event() does

    if (timr->it_sigev_notify & SIGEV_THREAD_ID) {
    struct task_struct *leader;
    int ret = send_sigqueue(timr->sigq, timr->it_process, 0);

    if (likely(ret >= 0))
    return ret;

    timr->it_sigev_notify = SIGEV_SIGNAL;
    leader = timr->it_process->group_leader;
    put_task_struct(timr->it_process);
    timr->it_process = leader;
    }

    return send_sigqueue(timr->sigq, timr->it_process, 1);

    Is it really useful to convert the SIGEV_THREAD_ID timer to the group-wide
    timer when the thread dies?

    This uglifies de_thread(), but more importantly this doesn't work reliably.
    Suppose that timr->it_process dies with the pending ->sigq. In that case
    the timer stops anyway.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH] posix-timers: Do not modify an already queued timer signal

    On 07/20, Oleg Nesterov wrote:
    >
    > Suppose that the signal was already dequeued but sigq->info wasn't copied
    > to the user-space.


    To avoid a possible confusion: this is of course fine by itself, the content
    of sigq->info was already copied to the local siginfo.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH] posix-timers: Do not modify an already queued timer signal

    > Yes, thanks, I see. But does it have any meaning for the user-space?
    [si_sys_private]

    No, it's not part of the user ABI. It's not even copied out (see
    copy_siginfo_to_user). (A spare siginfo_t field was just used as a handy
    bit of storage in the signal queue element. It's a kludge.)

    > Let me repeat. Can't we make a simple fix for now for this nasty and
    > ancient bug, before we do the more clever changes,


    Probably. I don't find it easy to be sure there aren't more bad
    problems caused by trying to re-send the same sigqueue entry.

    You do need to clear si_overrun there to be correct in the usual case
    (not already queued). The fields set explicitly in posix_timer_event,
    plus si_overrun, are the only meaningful fields for SI_TIMER (again see
    copy_siginfo_to_user). The memset is already superfluous.

    It would be a perfectly fine and worthwhile optimization/cleanup on its
    own just to move all the initialization of sigq->info (everything but
    si_sys_private) to alloc_posix_timer. That would also happen to mask
    the symptom of this double-queuing problem that's been observed.

    Even if it's a fine stopgap, I'm not comfortable calling this a real "fix".
    It seems likely this is the good choice for the stable branch.

    > Suppose that the signal was already dequeued but sigq->info wasn't copied
    > to the user-space. The thread which does dequeue_signal() unlocked siglock
    > and waits for ->it_lock.


    This is fine. sigq->info is copied into a local (kernel stack) copy in
    collect_signal(), i.e. inside dequeue_signal() long before the unlock.
    In short, once __dequeue_signal() returns, the signal goes from "queued"
    to "delivered". The delivery is on its way to happening, and from the
    instant that thread released the siglock it's no different from if it
    had completed handler setup, returned to user, and had a long scheduling
    delay before actually executing the first user instruction. At this
    point, any other event is not "racing", it's just "after".

    > posix_timer_event() sees list_empty(), proceeds and calls send_sigqueue().
    > Now we requeue this ->sigq instead of incrementing si_overrun.


    It's not "requeue". It's "queue" after it was no longer queued, with
    semantics no different from the first time you queue it.

    > The thread which does dequeue_signal() continues and re-schedules the
    > timer while ->sigq is queued. Then it copies sigq->info to the user space.


    The thread that dequeued the first timer signal had ceased all reference
    to sigq by the time it unlocked siglock. When its do_schedule_next_timer
    call gets it_lock, it can do bookkeeping in struct k_itimer to figure out
    what posix_timer_event or timer_settime has done lately. (Like I said,
    the ->fired flag alone might not really cover all the angles. I was
    expecting you to figure out the exactly details for me. ;-)

    > While we are here, off-topics question.


    You know I hate those. We aren't here. We're in chairs reading email.
    We'll be in the same places if you send a separate message for a separate
    topic.

    > Is it really useful to convert the SIGEV_THREAD_ID timer to the group-wide
    > timer when the thread dies?


    Not really. I think the main intent there is just to be sure we never
    hold on to an old task_struct pointer (which might have been the bug in
    the past).


    Thanks,
    Roland
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH] posix-timers: Do not modify an already queued timer signal

    On 07/20, Roland McGrath wrote:
    >
    > > Yes, thanks, I see. But does it have any meaning for the user-space?

    > [si_sys_private]
    >
    > No, it's not part of the user ABI. It's not even copied out (see
    > copy_siginfo_to_user).


    Heh, I didn't know, thanks.

    > > Let me repeat. Can't we make a simple fix for now for this nasty and
    > > ancient bug, before we do the more clever changes,

    >
    > You do need to clear si_overrun there to be correct in the usual case
    > (not already queued).


    Indeed, I missed that. Can't we do this in send_sigqueue() ?

    > It would be a perfectly fine and worthwhile optimization/cleanup on its
    > own just to move all the initialization of sigq->info (everything but
    > si_sys_private) to alloc_posix_timer.


    Yes, we can do this in sys_timer_create(). But this is not very trivial to
    do without uglifying the code futher, note this "if (timer_event_spec)".
    And we can't do this after "->it_process = process", the timer is already
    visible to sys_timer_settime().

    > Even if it's a fine stopgap, I'm not comfortable calling this a real "fix".
    > ...
    > I don't find it easy to be sure there aren't more bad
    > problems caused by trying to re-send the same sigqueue entry.


    Yes, yes, I agree. I propose this change as a first step only.

    > It seems likely this is the good choice for the stable branch.


    So, what do you and Mark think about the patch below?

    > > The thread which does dequeue_signal() continues and re-schedules the
    > > timer while ->sigq is queued. Then it copies sigq->info to the user space.

    >
    > The thread that dequeued the first timer signal had ceased all reference
    > to sigq by the time it unlocked siglock. When its do_schedule_next_timer
    > call gets it_lock, it can do bookkeeping in struct k_itimer to figure out
    > what posix_timer_event or timer_settime has done lately.


    Yes, this should work.

    Oleg.

    --- a/kernel/posix-timers.c
    +++ b/kernel/posix-timers.c
    @@ -298,12 +298,10 @@ void do_schedule_next_timer(struct sigin

    int posix_timer_event(struct k_itimer *timr,int si_private)
    {
    - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    timr->sigq->info.si_sys_private = si_private;
    /* Send signal to the process that owns this timer.*/

    timr->sigq->info.si_signo = timr->it_sigev_signo;
    - timr->sigq->info.si_errno = 0;
    timr->sigq->info.si_code = SI_TIMER;
    timr->sigq->info.si_tid = timr->it_id;
    timr->sigq->info.si_value = timr->it_sigev_value;
    @@ -435,6 +433,7 @@ static struct k_itimer * alloc_posix_tim
    kmem_cache_free(posix_timers_cache, tmr);
    tmr = NULL;
    }
    + memset(&tmr->sigq->info, 0, sizeof(siginfo_t));
    return tmr;
    }

    --- a/kernel/signal.c
    +++ b/kernel/signal.c
    @@ -1310,6 +1310,7 @@ int send_sigqueue(struct sigqueue *q, st
    q->info.si_overrun++;
    goto out;
    }
    + q->info.si_overrun = 0;

    signalfd_notify(t, sig);
    pending = group ? &t->signal->shared_pending : &t->pending;

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. do_schedule_next_timer && si_overrun (Was: [PATCH] posix-timers: Do not modify an already queued timer signal)

    On 07/21, Oleg Nesterov wrote:
    >
    > On 07/20, Roland McGrath wrote:
    > >
    > > You do need to clear si_overrun there to be correct in the usual case
    > > (not already queued).

    >
    > Indeed, I missed that. Can't we do this in send_sigqueue() ?


    The more I look at the code, the more I'm getting confused...

    Suppose that posix_timer_event()->send_sigqueue() increments info.si_overrun.
    But (in general) it is not reported to the user-space, do_schedule_next_timer()
    does:

    info->si_overrun = timr->it_overrun_last;

    shouldn't it do

    info->si_overrun += timr->it_overrun_last;

    instead?

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH] posix-timers: Do not modify an already queued timer signal

    [add stable]

    On 7/21/08, Oleg Nesterov wrote:
    > On 07/20, Roland McGrath wrote:
    >>
    >> > Yes, thanks, I see. But does it have any meaning for the user-space?

    >> [si_sys_private]
    >>
    >> No, it's not part of the user ABI. It's not even copied out (see
    >> copy_siginfo_to_user).

    >
    > Heh, I didn't know, thanks.
    >
    >> > Let me repeat. Can't we make a simple fix for now for this nasty and
    >> > ancient bug, before we do the more clever changes,

    >>
    >> You do need to clear si_overrun there to be correct in the usual case
    >> (not already queued).

    >
    > Indeed, I missed that. Can't we do this in send_sigqueue() ?
    >
    >> It would be a perfectly fine and worthwhile optimization/cleanup on its
    >> own just to move all the initialization of sigq->info (everything but
    >> si_sys_private) to alloc_posix_timer.

    >
    > Yes, we can do this in sys_timer_create(). But this is not very trivial to
    > do without uglifying the code futher, note this "if (timer_event_spec)".
    > And we can't do this after "->it_process = process", the timer is already
    > visible to sys_timer_settime().
    >
    >> Even if it's a fine stopgap, I'm not comfortable calling this a real
    >> "fix".
    >> ...
    >> I don't find it easy to be sure there aren't more bad
    >> problems caused by trying to re-send the same sigqueue entry.

    >
    > Yes, yes, I agree. I propose this change as a first step only.
    >
    >> It seems likely this is the good choice for the stable branch.

    >
    > So, what do you and Mark think about the patch below?
    >
    >> > The thread which does dequeue_signal() continues and re-schedules the
    >> > timer while ->sigq is queued. Then it copies sigq->info to the user
    >> > space.

    >>
    >> The thread that dequeued the first timer signal had ceased all reference
    >> to sigq by the time it unlocked siglock. When its do_schedule_next_timer
    >> call gets it_lock, it can do bookkeeping in struct k_itimer to figure out
    >> what posix_timer_event or timer_settime has done lately.

    >
    > Yes, this should work.
    >
    > Oleg.
    >
    > --- a/kernel/posix-timers.c
    > +++ b/kernel/posix-timers.c
    > @@ -298,12 +298,10 @@ void do_schedule_next_timer(struct sigin
    >
    > int posix_timer_event(struct k_itimer *timr,int si_private)
    > {
    > - memset(&timr->sigq->info, 0, sizeof(siginfo_t));
    > timr->sigq->info.si_sys_private = si_private;
    > /* Send signal to the process that owns this timer.*/
    >
    > timr->sigq->info.si_signo = timr->it_sigev_signo;
    > - timr->sigq->info.si_errno = 0;
    > timr->sigq->info.si_code = SI_TIMER;
    > timr->sigq->info.si_tid = timr->it_id;
    > timr->sigq->info.si_value = timr->it_sigev_value;
    > @@ -435,6 +433,7 @@ static struct k_itimer * alloc_posix_tim
    > kmem_cache_free(posix_timers_cache, tmr);
    > tmr = NULL;
    > }
    > + memset(&tmr->sigq->info, 0, sizeof(siginfo_t));
    > return tmr;
    > }
    >
    > --- a/kernel/signal.c
    > +++ b/kernel/signal.c
    > @@ -1310,6 +1310,7 @@ int send_sigqueue(struct sigqueue *q, st
    > q->info.si_overrun++;
    > goto out;
    > }
    > + q->info.si_overrun = 0;
    >
    > signalfd_notify(t, sig);
    > pending = group ? &t->signal->shared_pending : &t->pending;
    >
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    > Please read the FAQ at http://www.tux.org/lkml/
    >



    --
    Thanks,
    Oliver
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread