[PATCH] x86_64: fix delayed signals - Kernel

This is a discussion on [PATCH] x86_64: fix delayed signals - Kernel ; On three of the several paths in entry_64.S that call do_notify_resume() on the way back to user mode, we fail to properly check again for newly-arrived work that requires another call to do_notify_resume() before going to user mode. These paths ...

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 20 of 45

Thread: [PATCH] x86_64: fix delayed signals

  1. [PATCH] x86_64: fix delayed signals

    On three of the several paths in entry_64.S that call
    do_notify_resume() on the way back to user mode, we fail to properly
    check again for newly-arrived work that requires another call to
    do_notify_resume() before going to user mode. These paths set the
    mask to check only _TIF_NEED_RESCHED, but this is wrong. The other
    paths that lead to do_notify_resume() do this correctly already, and
    entry_32.S does it correctly in all cases.

    All paths back to user mode have to check all the _TIF_WORK_MASK
    flags at the last possible stage, with interrupts disabled.
    Otherwise, we miss any flags (TIF_SIGPENDING for example) that were
    set any time after we entered do_notify_resume(). More work flags
    can be set (or left set) synchronously inside do_notify_resume(), as
    TIF_SIGPENDING can be, or asynchronously by interrupts or other CPUs
    (which then send an asynchronous interrupt).

    There are many different scenarios that could hit this bug, most of
    them races. The simplest one to demonstrate does not require any
    race: when one signal has done handler setup at the check before
    returning from a syscall, and there is another signal pending that
    should be handled. The second signal's handler should interrupt the
    first signal handler before it actually starts (so the interrupted PC
    is still at the handler's entry point). Instead, it runs away until
    the next kernel entry (next syscall, tick, etc).

    This test behaves correctly on 32-bit kernels, and fails on 64-bit
    (either 32-bit or 64-bit test binary). With this fix, it works.

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include

    #ifndef REG_RIP
    #define REG_RIP REG_EIP
    #endif

    static sig_atomic_t hit1, hit2;

    static void
    handler (int sig, siginfo_t *info, void *ctx)
    {
    ucontext_t *uc = ctx;

    if ((void *) uc->uc_mcontext.gregs[REG_RIP] == &handler)
    {
    if (sig == SIGUSR1)
    hit1 = 1;
    else
    hit2 = 1;
    }

    printf ("%s at %#lx\n", strsignal (sig),
    uc->uc_mcontext.gregs[REG_RIP]);
    }

    int
    main (void)
    {
    struct sigaction sa;
    sigset_t set;

    sigemptyset (&sa.sa_mask);
    sa.sa_flags = SA_SIGINFO;
    sa.sa_sigaction = &handler;

    if (sigaction (SIGUSR1, &sa, NULL)
    || sigaction (SIGUSR2, &sa, NULL))
    return 2;

    sigemptyset (&set);
    sigaddset (&set, SIGUSR1);
    sigaddset (&set, SIGUSR2);
    if (sigprocmask (SIG_BLOCK, &set, NULL))
    return 3;

    printf ("main at %p, handler at %p\n", &main, &handler);

    raise (SIGUSR1);
    raise (SIGUSR2);

    if (sigprocmask (SIG_UNBLOCK, &set, NULL))
    return 4;

    if (hit1 + hit2 == 1)
    {
    puts ("PASS");
    return 0;
    }

    puts ("FAIL");
    return 1;
    }

    Signed-off-by: Roland McGrath
    ---
    arch/x86/kernel/entry_64.S | 7 +++----
    1 files changed, 3 insertions(+), 4 deletions(-)

    diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
    index 556a8df..968cf54 100644
    --- a/arch/x86/kernel/entry_64.S
    +++ b/arch/x86/kernel/entry_64.S
    @@ -305,7 +305,7 @@ sysret_signal:
    leaq -ARGOFFSET(%rsp),%rdi # &pt_regs -> arg1
    xorl %esi,%esi # oldset -> arg2
    call ptregscall_common
    -1: movl $_TIF_NEED_RESCHED,%edi
    +1: movl $_TIF_WORK_MASK,%edi
    /* Use IRET because user could have changed frame. This
    works because ptregscall_common has called FIXUP_TOP_OF_STACK. */
    DISABLE_INTERRUPTS(CLBR_NONE)
    @@ -393,7 +393,7 @@ int_signal:
    movq %rsp,%rdi # &ptregs -> arg1
    xorl %esi,%esi # oldset -> arg2
    call do_notify_resume
    -1: movl $_TIF_NEED_RESCHED,%edi
    +1: movl $_TIF_WORK_MASK,%edi
    int_restore_rest:
    RESTORE_REST
    DISABLE_INTERRUPTS(CLBR_NONE)
    @@ -647,9 +647,8 @@ retint_signal:
    RESTORE_REST
    DISABLE_INTERRUPTS(CLBR_NONE)
    TRACE_IRQS_OFF
    - movl $_TIF_NEED_RESCHED,%edi
    GET_THREAD_INFO(%rcx)
    - jmp retint_check
    + jmp retint_with_reschedule

    #ifdef CONFIG_PREEMPT
    /* Returning to kernel space. Check if we need preemption */
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] x86_64: fix delayed signals



    On Thu, 10 Jul 2008, Roland McGrath wrote:
    >
    > There are many different scenarios that could hit this bug, most of
    > them races. The simplest one to demonstrate does not require any
    > race: when one signal has done handler setup at the check before
    > returning from a syscall, and there is another signal pending that
    > should be handled. The second signal's handler should interrupt the
    > first signal handler before it actually starts (so the interrupted PC
    > is still at the handler's entry point). Instead, it runs away until
    > the next kernel entry (next syscall, tick, etc).


    I have this dim memory of at least _some_ of this being on purpose.

    If you look at old kernels (_really_ old ones - I think it's way before
    even the historical git archive, but I didn't take a look), we used to set
    up several stack frames at once, so that we'd nest the stack frames
    completely.

    In other words, the code in do_signal() used to literally be a loop,
    something like

    while ((signr = get_signal_to_deliver(&info, &ka, regs, NULL)) > 0) {
    .. setup signal frame ..

    (No, I don't think that's at all accurate of the actual code we used to
    have - I just took the current do_signal() code as an example)

    And that explicit loop was removed in order for us to have just a single
    outstanding signal at a time. I forget the exact details why.

    But if you really want that behaviour, then re-introducing the loop would
    likely be the better approach (or should be combined), since I think you
    effectively just re-introduced it (at a much bigger granularity).

    Hmm.

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] x86_64: fix delayed signals

    > But if you really want that behaviour, then re-introducing the loop would
    > likely be the better approach (or should be combined), since I think you
    > effectively just re-introduced it (at a much bigger granularity).


    I don't think so. Firstly, TIF_SIGPENDING is not the only flag in
    question. There are other reasons to re-enter do_notify_resume().
    If those are set during signal processing et al, they should take
    effect before going back to user mode.

    Second, there is always a race. Anywhere after the last time the
    siglock was held inside do_signal(), there can be an interrupt that
    sets TIF_SIGPENDING (or other _TIF_DO_NOTIFY_MASK flags). If you go
    on to return to user mode, then it can be a long time before the new
    signal is actually delivered (til the next tick).

    It really is necessary to check all the _TIF_WORK_MASK flags with
    interrupts disabled, last thing. I just don't see how any short
    cuts here can be robust. It's simple, it's right, and it's what all
    the other paths (and all other arch code I've ever noticed) do.

    Since it's necessary to have robust checks in the final part of the
    assembly code path anyway, and stacked signals are rare, there is
    just no special reason to have a loop in do_signal(). In the common
    case it every time retakes the siglock again when unnecessary, with
    bad SMP performance effects; optimizing that with a signal_pending()
    check just shows why it's simpler not to have a loop. Frankly, I'm
    glad we don't have one because it would fix only the scenario that
    has a test case that's real easy to write, and leave lying all the
    much more hairy ones that will cause someone to spend days and days
    some day later tearing his hair out.


    Thanks,
    Roland
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] x86_64: fix delayed signals



    On Thu, 10 Jul 2008, Roland McGrath wrote:
    >
    > > But if you really want that behaviour, then re-introducing the loop would
    > > likely be the better approach (or should be combined), since I think you
    > > effectively just re-introduced it (at a much bigger granularity).

    >
    > I don't think so. Firstly, TIF_SIGPENDING is not the only flag in
    > question. There are other reasons to re-enter do_notify_resume().
    > If those are set during signal processing et al, they should take
    > effect before going back to user mode.


    You're ignoring the background question - we expressly _stopped_ doing
    this long ago. So the real issue was the ".. if you really .." part.

    Do we really? What's the actual downside here?

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] x86_64: fix delayed signals



    On Thu, 10 Jul 2008, Linus Torvalds wrote:
    >
    > You're ignoring the background question - we expressly _stopped_ doing
    > this long ago. So the real issue was the ".. if you really .." part.


    Side note: the fact that 32-bit does it seems to imply it's ok, but it
    bothers me that I know we changed the signal code at some point. Is the
    64-bit code doing the exact _same_ things as 32-bit now?

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] x86_64: fix delayed signals

    > You're ignoring the background question - we expressly _stopped_ doing
    > this long ago. So the real issue was the ".. if you really .." part.
    >
    > Do we really? What's the actual downside here?


    I'm not convinced it was real "express". It was never expressed in a
    comment or log entry. The change came in (pre-git) with:
    [PATCH] x86-64 architecture specific sync for 2.5.8
    and commit 10ffdbb8d605be88b148f127ec86452f1364d4f0 "cleaned up slightly"
    making the other paths match, with no explanation on the subject.

    i386 has never behaved this way, and still doesn't. I would doubt any
    other arch ever has. (My fix makes x86_64 and i386 treatment of
    _TIF_WORK_MASK and any related signal race issues identical.)

    The behavior of the test case I posted is just demonstrably wrong. I
    know you're never swayed by the fact that it has always been specified
    and documented clearly to behave this way (in the case of multiple
    pending signals like the test case). Since it always did on i386, it's
    easy to expect that there may be all manner of applications lurking
    around that have depended on the correct semantics in subtle (and
    probably intermittent) ways their poor users and maintainers may never
    figure out.

    What really irks me about the thought of leaving this wrong is that we
    have spent so much effort lately on establishing a simple rule that when
    you set TIF_SIGPENDING it will be acted on. We did this after a lot of
    painful time from a lot of people went into tracking down subtle weird
    problems and races. So, KISS. Make a rule we can rely on, and then be
    damn careful that we don't break the rule. That's been serving us well,
    which is to say preventing it going from two people who can keep track
    of what's going with signals on any given day, to zero. Now that rule
    that kept life barely comprehensible is amended with, unless it's
    already inside signals code or some nearby arch code, or it's a race,
    or, yeah, I think that's all the cases, but check with--well, noone
    really knows, so I don't know who you check with, sorry. You just can't
    reason about the code if you don't maintain the invariants.

    The "actual" downsides include numerous unknowns, and I always forget
    not to be surprised when you aren't scared that we have no idea what-all
    the code might actually do. The easy scenarios to think of off hand
    have downsides like loss of timely signal delivery, where something can
    chew 15ms of CPU after you killed it. If I try all day I can come up
    with more specific cases and maybe even some with instantly terrible
    outcomes. But I won't think of them all. The worst ones will come up
    much later (or are already dogging someone unwitting now), when someone
    else sinks lots of time and effort trying to figure out strange
    misbehaviors in their systems.


    Thanks,
    Roland
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] x86_64: fix delayed signals



    On Thu, 10 Jul 2008, Roland McGrath wrote:
    >
    > The "actual" downsides include numerous unknowns, and I always forget
    > not to be surprised when you aren't scared that we have no idea what-all
    > the code might actually do.


    You're just grand-standing here.

    The fact is, this has been "broken" for a long long time. A lot of
    applications (realistically, probably _all_ current 64-bit apps) have been
    tested with the old behaviour.

    The thing is, I'm simply not the _least_ nervous about behavior that we've
    obviously had for a long time. And you shouldn't be either - nor should
    you try to make up some bogus argument to the contrary.

    You should be scared about the case that IS NOT tested - which is by
    definition the new behaviour (admittedly only for 64-bit apps). Claiming
    anything else is just stupid and dishonest.

    So stop making up _bad_ arguments. You do have a good one: the fact that
    x86-32 apparently works the way it works. But then you just irritate me
    with your idiotic other ones about how things are "supposed" to work.

    So just be honest: you didn't have any _actual_ downsides that you knew
    about. Which was what I wondered about and asked about.

    Those things make a big difference when I'm just about ready to cut a new
    release. This is _clearly_ not a regression (unless you talk about things
    from years and years ago), and apparently you don't have any actual code
    that is broken - just your expectations.

    In short, it doesn't smell like something I should apply under -rc9 rules.

    [ And dammit, I wish more developers would ask themselves that question:
    "Does this patch need to go in when we're very late in the -rc
    sequence?". Or at least not then try to grand-stand and avoid the
    question I asked. ]

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] x86_64: fix delayed signals



    On Thu, 10 Jul 2008, Roland McGrath wrote:
    >
    > i386 has never behaved this way, and still doesn't.


    Btw, I told you differently, you didn't believe me. So I went back just
    because I have an ego easily the size of Manhattan, and when you doubt me,
    my ego is hurt.

    i386 _has_ very much behaved this way. The system call code literally used
    to do

    ..
    cmpl $0,sigpending(%ebx)
    jne signal_return
    ..

    and then 'signal_return' case would handle one signal and then return to
    user space:

    ..
    call SYMBOL_NAME(do_signal)
    #ifdef CONFIG_SMP
    GET_CURRENT(%ebx)
    #endif
    cli
    CHECK_SOFTIRQ
    je restore_all
    ..

    ie it would call "restore_all" without re-checking signals. I wasn't
    hallucinating.

    So that's really old code.

    The _really_ ancient code (like a decade ago or more) used to actually
    loop in do_signal() generating multiple signal stacks in one go. Then, a
    long long time ago, that was actually removed, and we expressly only tried
    to queue one signal at a time. I do not remember why, but it was before
    2.4.0. Probably _long_ before, but it's so long ago my memory is
    unreliable.

    The "queue multiple signals" code that you claim has always been there is
    in fact pretty recent. For x86-32, it was a thing introduced in commit
    c3ff8ec31c1249d268cd11390649768a12bec1b9: by _you_, just three years ago,
    in 2.6.14, in regular git times.

    So the x86-64 behaviour actually matches what the x86-32 behavior was at
    the time before things split.

    And I'd also like to point out another commit, namely "[PATCH] fix broken
    vm86 interrupt/signal handling" (4031ff388138b58e5cd472dccce38828bcb8c706)
    that fixed a bug with an endless loop you had introduced in that original
    x86-32 code when you fixed this exact same issue back when.

    Heh. That's the kind of thing that worries me.

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH] x86_64: fix delayed signals



    On Thu, 10 Jul 2008, Linus Torvalds wrote:
    >
    > So the x86-64 behaviour actually matches what the x86-32 behavior was at
    > the time before things split.
    >
    > And I'd also like to point out another commit, namely "[PATCH] fix broken
    > vm86 interrupt/signal handling" (4031ff388138b58e5cd472dccce38828bcb8c706)
    > that fixed a bug with an endless loop you had introduced in that original
    > x86-32 code when you fixed this exact same issue back when.
    >
    > Heh. That's the kind of thing that worries me.


    That said, seeing the full history of this same thing on the x86-32 side
    actually makes me _much_ happier about the patch. Because now I can tell
    when we did it, and what problems it seems to have caused (answer:
    "apparently none that are possibly relevant on x86-64").

    IOW, just seeing the fact that we used to behave the same, and then
    changed it on the 32-bit side (with no apparent big issues) and did it
    without updating the 64-bit side, makes me just much happier about your
    patch. Just because I know know the background to why this was x86-64
    only.

    So now I'm considering just putting it in before the 2.6.26 release after
    all

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH] x86_64: fix delayed signals



    On Thu, 10 Jul 2008, Linus Torvalds wrote:
    >
    > So now I'm considering just putting it in before the 2.6.26 release after
    > all


    ... and having looked at the code, and thought about it some more, I'm
    definitely off the patch again.

    The reason is actually exactly the same bug that showed up when you did
    this for x86-32 three years ago, and that may in fact still be lurking.

    The endless loop of "call do_notify_resume until all the work flags are
    zero" is very fragile: it will immediately cause a hard lockup if there is
    some circumstance where do_notify_resume will not clear the flag.

    And when it comes to signals, there are several cases that can cause
    TIF_SIGPENDING to not be cleared:

    - confusion about user/kernel mode, where "do_signal()" will return
    without doing anything at all if we're in user mode.

    This was the bug we hit back in 2005 with a out-of-tree kernel-based
    vm86 model (which hopefully has since died a painful death).

    - get_signal_to_deliver() returning and not handling the signal.
    dequeue_signal() will do this for that collect_signal() case and for
    the whole DRI notifier thing. The DRI notifier() case actually clears
    TIF_SIGPENDING, but then we do "recalc_sigpending()" in the caller, so
    it might get set again.

    I do hate that code (I know you do too), and the code _should_ block
    the signal that gets ignored (so recalc_sigpending() should keep it
    cleared), but it's not entirely obvious. Maybe it gets into an endless
    loop of calling the notifier if this case ever triggers?

    - recalc_sigpending() expressly does not clear the TIF_SIGPENDING flag if
    we hit the "freezing(current)" case. So TIF_SIGPENDING stays set for
    freezing() processes. I think (and *hope*) they all get caught by other
    means anyway in that whole do_notify_resume() loop, but this is another
    of those "the freezer code is insane, I'm not going to try to think it
    through" cases.

    In short, I think your patch is fine now, but I'm also nervous enough
    about it that I'm not going to apply it. Any bugs it could expose look
    very unlikely, and if they exist they are probably bugs on 32-bit as we
    speak, but call me a worry-wart.

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH] x86_64: fix delayed signals



    On Thu, 10 Jul 2008, Linus Torvalds wrote:
    >
    > - confusion about user/kernel mode, where "do_signal()" will return
    > without doing anything at all if we're in user mode.


    s/in/not returning to/

    Duh.

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [PATCH] x86_64: fix delayed signals


    * Roland McGrath wrote:

    > On three of the several paths in entry_64.S that call
    > do_notify_resume() on the way back to user mode, we fail to properly
    > check again for newly-arrived work that requires another call to
    > do_notify_resume() before going to user mode. These paths set the
    > mask to check only _TIF_NEED_RESCHED, but this is wrong. The other
    > paths that lead to do_notify_resume() do this correctly already, and
    > entry_32.S does it correctly in all cases.


    nice find! Roland, is this related to the thread started by Elias
    Oltmanns yesterday:

    http://lkml.org/lkml/2008/7/10/57

    which is also related to the thread started by Edwin Török:

    http://lkml.org/lkml/2008/6/28/50

    ? The weight of complains seems to be on the 64-bit side, in those
    threads 32-bit systems seem to be implicated as well by latencytop.
    Perhaps the 64-bit delays are related to the bug you fix and we could
    chalk up the 32-bit delays to IO delays?

    more people Cc:-ed: if your delays happen on 64-bit x86 systems - does
    Roland's patch (also repeated below) fix those delay issues?

    Ingo

    ------------------------->
    Subject: x86: fix delayed signals
    From: Roland McGrath

    On three of the several paths in entry_64.S that call
    do_notify_resume() on the way back to user mode, we fail to properly
    check again for newly-arrived work that requires another call to
    do_notify_resume() before going to user mode. These paths set the
    mask to check only _TIF_NEED_RESCHED, but this is wrong. The other
    paths that lead to do_notify_resume() do this correctly already, and
    entry_32.S does it correctly in all cases.

    All paths back to user mode have to check all the _TIF_WORK_MASK
    flags at the last possible stage, with interrupts disabled.
    Otherwise, we miss any flags (TIF_SIGPENDING for example) that were
    set any time after we entered do_notify_resume(). More work flags
    can be set (or left set) synchronously inside do_notify_resume(), as
    TIF_SIGPENDING can be, or asynchronously by interrupts or other CPUs
    (which then send an asynchronous interrupt).

    There are many different scenarios that could hit this bug, most of
    them races. The simplest one to demonstrate does not require any
    race: when one signal has done handler setup at the check before
    returning from a syscall, and there is another signal pending that
    should be handled. The second signal's handler should interrupt the
    first signal handler before it actually starts (so the interrupted PC
    is still at the handler's entry point). Instead, it runs away until
    the next kernel entry (next syscall, tick, etc).

    This test behaves correctly on 32-bit kernels, and fails on 64-bit
    (either 32-bit or 64-bit test binary). With this fix, it works.

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include

    #ifndef REG_RIP
    #define REG_RIP REG_EIP
    #endif

    static sig_atomic_t hit1, hit2;

    static void
    handler (int sig, siginfo_t *info, void *ctx)
    {
    ucontext_t *uc = ctx;

    if ((void *) uc->uc_mcontext.gregs[REG_RIP] == &handler)
    {
    if (sig == SIGUSR1)
    hit1 = 1;
    else
    hit2 = 1;
    }

    printf ("%s at %#lx\n", strsignal (sig),
    uc->uc_mcontext.gregs[REG_RIP]);
    }

    int
    main (void)
    {
    struct sigaction sa;
    sigset_t set;

    sigemptyset (&sa.sa_mask);
    sa.sa_flags = SA_SIGINFO;
    sa.sa_sigaction = &handler;

    if (sigaction (SIGUSR1, &sa, NULL)
    || sigaction (SIGUSR2, &sa, NULL))
    return 2;

    sigemptyset (&set);
    sigaddset (&set, SIGUSR1);
    sigaddset (&set, SIGUSR2);
    if (sigprocmask (SIG_BLOCK, &set, NULL))
    return 3;

    printf ("main at %p, handler at %p\n", &main, &handler);

    raise (SIGUSR1);
    raise (SIGUSR2);

    if (sigprocmask (SIG_UNBLOCK, &set, NULL))
    return 4;

    if (hit1 + hit2 == 1)
    {
    puts ("PASS");
    return 0;
    }

    puts ("FAIL");
    return 1;
    }

    Signed-off-by: Roland McGrath
    ---
    arch/x86/kernel/entry_64.S | 7 +++----
    1 files changed, 3 insertions(+), 4 deletions(-)

    diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
    index 556a8df..968cf54 100644
    --- a/arch/x86/kernel/entry_64.S
    +++ b/arch/x86/kernel/entry_64.S
    @@ -305,7 +305,7 @@ sysret_signal:
    leaq -ARGOFFSET(%rsp),%rdi # &pt_regs -> arg1
    xorl %esi,%esi # oldset -> arg2
    call ptregscall_common
    -1: movl $_TIF_NEED_RESCHED,%edi
    +1: movl $_TIF_WORK_MASK,%edi
    /* Use IRET because user could have changed frame. This
    works because ptregscall_common has called FIXUP_TOP_OF_STACK. */
    DISABLE_INTERRUPTS(CLBR_NONE)
    @@ -393,7 +393,7 @@ int_signal:
    movq %rsp,%rdi # &ptregs -> arg1
    xorl %esi,%esi # oldset -> arg2
    call do_notify_resume
    -1: movl $_TIF_NEED_RESCHED,%edi
    +1: movl $_TIF_WORK_MASK,%edi
    int_restore_rest:
    RESTORE_REST
    DISABLE_INTERRUPTS(CLBR_NONE)
    @@ -647,9 +647,8 @@ retint_signal:
    RESTORE_REST
    DISABLE_INTERRUPTS(CLBR_NONE)
    TRACE_IRQS_OFF
    - movl $_TIF_NEED_RESCHED,%edi
    GET_THREAD_INFO(%rcx)
    - jmp retint_check
    + jmp retint_with_reschedule

    #ifdef CONFIG_PREEMPT
    /* Returning to kernel space. Check if we need preemption */
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [PATCH] x86_64: fix delayed signals

    On 2008-07-11 08:46, Ingo Molnar wrote:
    > * Roland McGrath wrote:
    >
    >
    >> On three of the several paths in entry_64.S that call
    >> do_notify_resume() on the way back to user mode, we fail to properly
    >> check again for newly-arrived work that requires another call to
    >> do_notify_resume() before going to user mode. These paths set the
    >> mask to check only _TIF_NEED_RESCHED, but this is wrong. The other
    >> paths that lead to do_notify_resume() do this correctly already, and
    >> entry_32.S does it correctly in all cases.
    >>

    >
    > nice find! Roland, is this related to the thread started by Elias
    > Oltmanns yesterday:
    >
    > http://lkml.org/lkml/2008/7/10/57
    >


    I will try to get a latency trace using the debug features in
    linux-next, but first
    I have to bisect a boot failure in linux-next.

    > which is also related to the thread started by Edwin Török:
    >
    > http://lkml.org/lkml/2008/6/28/50
    >
    > ? The weight of complains seems to be on the 64-bit side, in those
    > threads 32-bit systems seem to be implicated as well by latencytop.
    > Perhaps the 64-bit delays are related to the bug you fix and we could
    > chalk up the 32-bit delays to IO delays?
    >
    > more people Cc:-ed: if your delays happen on 64-bit x86 systems - does
    > Roland's patch (also repeated below) fix those delay issues?
    >
    >


    Thanks for CC-ing me.
    I will give the patch a try.

    Best regards,
    --Edwin

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [PATCH] x86_64: fix delayed signals

    Ingo Molnar wrote:
    > * Roland McGrath wrote:
    >
    >> On three of the several paths in entry_64.S that call
    >> do_notify_resume() on the way back to user mode, we fail to properly
    >> check again for newly-arrived work that requires another call to
    >> do_notify_resume() before going to user mode. These paths set the
    >> mask to check only _TIF_NEED_RESCHED, but this is wrong. The other
    >> paths that lead to do_notify_resume() do this correctly already, and
    >> entry_32.S does it correctly in all cases.

    >
    > nice find! Roland, is this related to the thread started by Elias
    > Oltmanns yesterday:
    >
    > http://lkml.org/lkml/2008/7/10/57
    >
    > which is also related to the thread started by Edwin Török:
    >
    > http://lkml.org/lkml/2008/6/28/50
    >
    > ? The weight of complains seems to be on the 64-bit side, in those
    > threads 32-bit systems seem to be implicated as well by latencytop.
    > Perhaps the 64-bit delays are related to the bug you fix and we could
    > chalk up the 32-bit delays to IO delays?


    Since I'm using a 32-bit machine, this patch isn't going to solve my
    particular problem. Still, I'm glad to hear that someone is looking at
    the assembly bits related to signal handling right now because that's
    precisely the part where I gave up to figure out for myself how signal
    handling works in the kernel.

    As for delayed signal handling on 32-bit platforms due to heavy disk
    I/O, I'm still reluctant to accept that signals should be delayed that
    much even though everything else that is not related to disk I/O works
    perfectly well. Now that I've seen some documentation for ftrace, I'll
    give it a go and try produce some helpful data. However, I'm rather busy
    right now and may not be able to do so before next week.

    Regards,

    Elias
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [PATCH] x86_64: fix delayed signals



    On Fri, 11 Jul 2008, Ingo Molnar wrote:
    >
    > nice find! Roland, is this related to the thread started by Elias
    > Oltmanns yesterday:
    >
    > http://lkml.org/lkml/2008/7/10/57


    No.

    First off, the delayed sending of signals isn't actually delayed that
    much. It's the next kernel entry at the most - certainly *not* across
    scheduling points, which the fact that ^Z shows up in the xterm says
    happen.

    Secondly, it seems to happen on x86-32 too, and it is claimed to be a
    regression. Neither of which would be true for this case.

    Thirdly, there seems to be no other signals involved (ie it's a single
    signal).

    I think the IO interactivity report is simply because of IO scheduling
    and/or paging out the shell too aggressively. No signals will be delivered
    while the process in in 'D' state (the new 'killable' thing being an
    exception in _some_ cases, but that's literally just for killing signals,
    not backgrounding).

    So to look at the original report:

    "By sprinkling some printk()s all over the place, I've managed to
    establish the following sequence of events taking place in the event of
    delayed signal handling as described above:
    The first Ctrl+Z event enqueues a SIGTSTP signal which eventually
    results in a call to kick_process(). For some reason though, the signal
    isn't handled straight away but remains on the queue for some time."

    the thing is, kick_process() only helps if the other thread is _running_
    on the other CPU. If it's actively waiting for disk, it's a no-op: the
    signal handling code expects that the scheduler will either wake it up in
    "wake_up_state()" (which won't happen if it is in 'D' state, of course!)
    or that the process will just handle it in its own time when it wakes up
    when IO completes.

    So if things have gotten worse latency, it's most likely simply because
    our IO has worse latency - probably because of having more requests
    outstanding, or because of unlucky/bad IO scheduler behaviour. The first
    thing to do (because it's fairly easy) is to start off testing different
    IO schedulers, but also see if there are some cases where we should try to
    return early.

    In the particular case of Edwin Török, his latency problem seems to b with
    "find". That's interesting, because it's one of the cases where we *could*
    easily improve on latency, by making "readdir()" return early both for
    killable signals, but also for regular signals when we've filled the
    buffer partially (because unlike a "read()" system call, we don't promise
    to fill the whole buffer _anyway_).

    It may be, for example, that the increased latency isn't actually because
    the *kernel* has increased latencies at all, but perhaps 'find' uses a
    much bigger buffer for it's readdir() code? Anyway, _if_ it's readdir()
    that has high latency, the appended patch might make a difference. I think
    it's probably a good idea regardless..

    But it would be really good to have the thing bisected regardless of what
    it is.

    Linus

    ---

    This makes 'readdir()' return early if it has a signal pending and it has
    filled at least one entry (the -EINTR will never be passed on to user
    space: the readdir() system call will return the length of the filled-in
    buffer)

    fs/readdir.c | 6 ++++++
    1 files changed, 6 insertions(+), 0 deletions(-)

    diff --git a/fs/readdir.c b/fs/readdir.c
    index 4e026e5..ec8c192 100644
    --- a/fs/readdir.c
    +++ b/fs/readdir.c
    @@ -159,6 +159,9 @@ static int filldir(void * __buf, const char * name, int namlen, loff_t offset,
    return -EOVERFLOW;
    dirent = buf->previous;
    if (dirent) {
    + /* Only check signals if we have filled at least one entry! */
    + if (signal_pending(current))
    + return -EINTR;
    if (__put_user(offset, &dirent->d_off))
    goto efault;
    }
    @@ -241,6 +244,9 @@ static int filldir64(void * __buf, const char * name, int namlen, loff_t offset,
    return -EINVAL;
    dirent = buf->previous;
    if (dirent) {
    + /* Only check signals if we have filled at least one entry! */
    + if (signal_pending(current))
    + return -EINTR;
    if (__put_user(offset, &dirent->d_off))
    goto efault;
    }
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [PATCH] x86_64: fix delayed signals

    > + /* Only check signals if we have filled at least one entry! */
    > + if (signal_pending(current))
    > + return -EINTR;


    Shouldn't it return the short count?
    (And if not, then probably should be -ERESTARTSYS.)


    Thanks,
    Roland
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [PATCH] x86_64: fix delayed signals



    On Fri, 11 Jul 2008, Linus Torvalds wrote:
    >
    > But it would be really good to have the thing bisected regardless of what
    > it is.


    Btw, did any of the impacted people test -rc9? Edwin's report is about
    -rc2 and -rc8, and one of the things we fixed since -rc8 is that incorrect
    and unintentional nr_zones zeroing that effectively disabled kswapd - and
    made everybody do synchronous memory freeing when they wanted to allocate
    more memory.. That can play havoc with any interactive stuff.

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [PATCH] x86_64: fix delayed signals



    On Fri, 11 Jul 2008, Roland McGrath wrote:
    >
    > > + /* Only check signals if we have filled at least one entry! */
    > > + if (signal_pending(current))
    > > + return -EINTR;

    >
    > Shouldn't it return the short count?
    > (And if not, then probably should be -ERESTARTSYS.)


    readdir (getdents) is a complex system call, and the path is

    sys_getdents64() ->
    vfs_readdir (with 'filldir64' as a callback) ->
    low-level filesystem 'readdir' function ->
    filldir64 callback

    and what happens is that when that 'filldir64()' callback returns an
    error, the low-level filesystem just stops and returns with a success
    (because everything worked for _it_ - it was the callback that errored
    out).

    The 'sys_getdents64()' then does:

    ..
    lastdirent = buf.previous;
    if (lastdirent) {
    typeof(lastdirent->d_off) d_off = file->f_pos;
    error = -EFAULT;
    if (__put_user(d_off, &lastdirent->d_off))
    goto out_putf;
    error = count - buf.count;
    }
    ..

    ie we'll do that "return partial buffer size" at that level, not deep in
    the callback.

    So making the callback just return an error is the right thing to do -
    because the callers will then fix up the file offset in the last directory
    entry and the final return value.

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [PATCH] x86_64: fix delayed signals



    On Fri, 11 Jul 2008, Linus Torvalds wrote:
    >
    > So making the callback just return an error is the right thing to do -
    > because the callers will then fix up the file offset in the last directory
    > entry and the final return value.


    That said, I didn't actually _test_ my patch. That's what users are for!

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [PATCH] x86_64: fix delayed signals



    On Fri, 11 Jul 2008, Linus Torvalds wrote:
    >
    > Btw, did any of the impacted people test -rc9? Edwin's report is about
    > -rc2 and -rc8, and one of the things we fixed since -rc8 is that incorrect
    > and unintentional nr_zones zeroing that effectively disabled kswapd - and
    > made everybody do synchronous memory freeing when they wanted to allocate
    > more memory.. That can play havoc with any interactive stuff.


    Hmm. Edwin's latencytop output includes this (ignoring the _very_ top
    entries that are all either CD-ROM media change tests or are interruptible
    pipe/select things) at the top:

    21 10264428 915514 get_request_wait __make_request generic_make_request
    submit_bio xfs_submit_ioend_bio xfs_submit_ioend
    xfs_page_state_convert xfs_vm_writepage __writepage
    write_cache_pages generic_writepages xfs_vm_writepages

    26 3369263 2260529 down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags
    xfs_trans_read_buf xfs_imap_to_bp xfs_itobp xfs_iread
    xfs_iget_core xfs_iget xfs_lookup xfs_vn_lookup 1 17888 17888 down
    xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags
    xfs_trans_read_buf xfs_da_do_buf xfs_da_read_buf
    xfs_dir2_block_getdents xfs_readdir xfs_file_readdir vfs_readdir
    sys_getdents64
    ..

    which says that (a) yes, readdir() is part of the problematic paths, so my
    patch may make a difference but also (b) we also have so many writeback
    IO's in flight that the write request queue is totally full, and the
    writing side is simply waiting for the queue to empty.

    I guess (b) isn't a surprise (considering the load), but it does explain
    why any IO read will be very much delayed. If the IO scheduler (or the
    disk itself - tagged commands etc) doesn't prioritize reads and
    effectively always put them ahead of the queue, you can get very very long
    latencies just because you have to wait for lots of writes to complete
    first.

    Linus
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast