Strange stop-signal behavior in multithreaded program with defunct main - Kernel

This is a discussion on Strange stop-signal behavior in multithreaded program with defunct main - Kernel ; Bert Wesarg described a scenario that I quickly replicated on 2.6.28-rc2 (and 2.6.25 -- it's not a regression in 2.6.28-rc) using the program below: if we have a multithreaded process with a defunct main thread running on a tty, and ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: Strange stop-signal behavior in multithreaded program with defunct main

  1. Strange stop-signal behavior in multithreaded program with defunct main

    Bert Wesarg described a scenario that I quickly replicated on
    2.6.28-rc2 (and 2.6.25 -- it's not a regression in 2.6.28-rc)
    using the program below: if we have a multithreaded process
    with a defunct main thread running on a tty, and that
    process is sent a stop signal (either ^Z (SIGTSTP) or a stop
    signal sent from another terminal using kill(1)), then:

    a) the terminal is locked up; and

    b) the program is unresponsive to any other signal, except SIGKILL
    or SIGCONT.

    An example run:
    $ ./pthreads_zombie_main 1 # Creates one thread besides main
    0: 0
    0: 1
    0: 2
    ^Z

    At this point, no shell prompt appears, and typing ^C (or ^\) has no
    effect. The process can be killed (and the terminal restored) by sending
    SIGKILL from another terminal. (If one instead types ^C at the terminal,
    and then sends SIGCONT from another terminal, then the terminal is restored
    and the program can be seen (via $?) to have terminated because of
    SIGINT.)

    I'm (wildly) guessing that there is some problem in the terminal driver's
    understanding of the state and identify of the foreground job, but am not
    sure how to analyze this further. (I couldn't find a bug report or LKML
    thread that seemed to describe exactly this problem.) Ideas?

    Cheers,

    Michael

    /* pthreads_zombie_main.c */

    #include
    #include
    #include
    #include
    #include
    #include
    #include

    #define errExitEN(en, msg) { errno = en; perror(msg); \
    exit(EXIT_FAILURE); }

    static void *
    thread_start(void *arg)
    {
    int tnum = (int) arg;
    int j;

    for (j = 0; ; j++) {
    sleep(3);
    printf("%d: %d\n", tnum, j);
    }
    }

    int
    main(int argc, char *argv[])
    {
    int s, tnum;
    pthread_t thr;

    if (argc != 2) {
    fprintf(stderr, "Usage: %s \n", argv[0]);
    exit(EXIT_SUCCESS);
    }

    for (tnum = 0; tnum < atoi(argv[1]); tnum++) {
    s = pthread_create(&thr, NULL, &thread_start, (void *) tnum);
    if (s != 0)
    errExitEN(s, "pthread_create");
    }

    pthread_exit(NULL);
    }


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: Strange stop-signal behavior in multithreaded program with defunct main

    On 10/28, Michael Kerrisk wrote:
    >
    > Bert Wesarg described a scenario that I quickly replicated on
    > 2.6.28-rc2 (and 2.6.25 -- it's not a regression in 2.6.28-rc)
    > using the program below: if we have a multithreaded process
    > with a defunct main thread running on a tty, and that
    > process is sent a stop signal (either ^Z (SIGTSTP) or a stop
    > signal sent from another terminal using kill(1)), then:
    >
    > a) the terminal is locked up; and
    >
    > b) the program is unresponsive to any other signal, except SIGKILL
    > or SIGCONT.


    Yes, known problem. Please look at

    [RFC,PATCH 3/3] do_wait: fix waiting for stopped group with dead leader
    http://marc.info/?t=119713920000003

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: Strange stop-signal behavior in multithreaded program with defunct main

    Hi Oleg,

    On Thu, Oct 30, 2008 at 6:00 AM, Oleg Nesterov wrote:
    > On 10/28, Michael Kerrisk wrote:
    >>
    >> Bert Wesarg described a scenario that I quickly replicated on
    >> 2.6.28-rc2 (and 2.6.25 -- it's not a regression in 2.6.28-rc)
    >> using the program below: if we have a multithreaded process
    >> with a defunct main thread running on a tty, and that
    >> process is sent a stop signal (either ^Z (SIGTSTP) or a stop
    >> signal sent from another terminal using kill(1)), then:
    >>
    >> a) the terminal is locked up; and
    >>
    >> b) the program is unresponsive to any other signal, except SIGKILL
    >> or SIGCONT.

    >
    > Yes, known problem. Please look at
    >
    > [RFC,PATCH 3/3] do_wait: fix waiting for stopped group with dead leader
    > http://marc.info/?t=119713920000003


    Okay -- thanks for the info. I've added some text to man-pages to
    cover this bug.

    Cheers,

    Michael

    --- a/man3/pthread_exit.3
    +++ b/man3/pthread_exit.3
    @@ -21,7 +21,7 @@
    .\" Formatted or processed versions of this manual, if unaccompanied by
    .\" the source, must acknowledge the copyright and authors of this work.
    .\"
    -.TH PTHREAD_EXIT 3 2008-10-24 "Linux" "Linux Programmer's Manual"
    +.TH PTHREAD_EXIT 3 2008-10-30 "Linux" "Linux Programmer's Manual"
    .SH NAME
    pthread_exit \- terminate calling thread
    .SH SYNOPSIS
    @@ -87,6 +87,18 @@ The value pointed to by
    .IR retval
    should not be located on the calling thread's stack,
    since the contents of that stack are undefined after the thread terminates.
    +.SH BUGS
    +Currently,
    +.\" Linux 2.6.27
    +there are limitations in the kernel implementation logic for
    +.BR wait (2)ing
    +on a stopped thread group with a dead thread group leader.
    +This can manifest in problems such as a locked terminal if a stop signal is
    +sent to a foreground process whose thread group leader has already called
    +.BR pthread_exit (3).
    +.\" FIXME . review a later kernel to see if this gets fixed
    +.\" http://thread.gmane.org/gmane.linux.kernel/611611
    +.\" http://marc.info/?l=linux-kernel&m=122525468300823&w=2
    .SH SEE ALSO
    .BR pthread_create (3),
    .BR pthread_join (3),
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: Strange stop-signal behavior in multithreaded program with defunct main

    On 10/30, Michael Kerrisk wrote:
    >
    > On Thu, Oct 30, 2008 at 6:00 AM, Oleg Nesterov wrote:
    > > On 10/28, Michael Kerrisk wrote:
    > >>
    > >> Bert Wesarg described a scenario that I quickly replicated on
    > >> 2.6.28-rc2 (and 2.6.25 -- it's not a regression in 2.6.28-rc)
    > >> using the program below: if we have a multithreaded process
    > >> with a defunct main thread running on a tty, and that
    > >> process is sent a stop signal (either ^Z (SIGTSTP) or a stop
    > >> signal sent from another terminal using kill(1)), then:
    > >>
    > >> a) the terminal is locked up; and
    > >>
    > >> b) the program is unresponsive to any other signal, except SIGKILL
    > >> or SIGCONT.

    > >
    > > Yes, known problem. Please look at
    > >
    > > [RFC,PATCH 3/3] do_wait: fix waiting for stopped group with dead leader
    > > http://marc.info/?t=119713920000003

    >
    > Okay -- thanks for the info. I've added some text to man-pages to
    > cover this bug.


    Well, we should fix this bug, of course.

    I'll try to redo my old patch, but fyi I am very busy right now, and
    most probably I will be completely offline during the next week.

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread