SIGCHILD, wait(), and select() - Unix

This is a discussion on SIGCHILD, wait(), and select() - Unix ; I'm trying to work out how to both talk with *and* wait for descendant processes but can't seem to find a way to do so safely. Here's a little background: The top-level process forks. On the child side a new ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: SIGCHILD, wait(), and select()

  1. SIGCHILD, wait(), and select()

    I'm trying to work out how to both talk with *and* wait for descendant
    processes but can't seem to find a way to do so safely. Here's a little
    background:

    The top-level process forks. On the child side a new program
    (the "controller") is exec-ed. The controller can and will fork many
    descendants. Meanwhile, the parent becomes the so-called "monitor"; it
    goes into a select loop because each program started by the controller
    will need to open a socket and report some data to it just before
    exiting. This monitor loop must run until all processes started by the
    controller have reported back, at which point the monitor can exit.

    It's safe to assume that the controller will wait for all its
    descendants, i.e. there will be no 'background' processes.

    Thus, as soon as the controller process has reported back _its own_
    results, the monitor can break out of the select loop and issue a wait.
    It knows that all the descendant processes must have completed, and also
    that the controller is just about to exit so that a wait() will not
    block for long. The main point here is that wait() is not used except at
    the very end when we already know it'll have something to wait for.

    This works great *except* when the controller process dumps core or
    otherwise aborts. In that case the monitor basically hangs waiting for
    that final report (there is a select timeout but that's still quite
    painful). So I tried putting in a handler for SIGCHLD to take care of
    this case but that throws out a lot of "interrupted system call" errors,
    presumably because the signal arrives during select().

    Is there a well-known technique for handling this situation?

    Thanks,
    Arch Stanton

  2. Re: SIGCHILD, wait(), and select()

    In article <9OednXJO8YtaQUbVnZ2dnUVZ_hGdnZ2d@comcast.com>,
    Arch Stanton wrote:

    > I'm trying to work out how to both talk with *and* wait for descendant
    > processes but can't seem to find a way to do so safely. Here's a little
    > background:
    >
    > The top-level process forks. On the child side a new program
    > (the "controller") is exec-ed. The controller can and will fork many
    > descendants. Meanwhile, the parent becomes the so-called "monitor"; it
    > goes into a select loop because each program started by the controller
    > will need to open a socket and report some data to it just before
    > exiting. This monitor loop must run until all processes started by the
    > controller have reported back, at which point the monitor can exit.
    >
    > It's safe to assume that the controller will wait for all its
    > descendants, i.e. there will be no 'background' processes.
    >
    > Thus, as soon as the controller process has reported back _its own_
    > results, the monitor can break out of the select loop and issue a wait.
    > It knows that all the descendant processes must have completed, and also
    > that the controller is just about to exit so that a wait() will not
    > block for long. The main point here is that wait() is not used except at
    > the very end when we already know it'll have something to wait for.
    >
    > This works great *except* when the controller process dumps core or
    > otherwise aborts. In that case the monitor basically hangs waiting for
    > that final report (there is a select timeout but that's still quite
    > painful). So I tried putting in a handler for SIGCHLD to take care of
    > this case but that throws out a lot of "interrupted system call" errors,
    > presumably because the signal arrives during select().
    >
    > Is there a well-known technique for handling this situation?


    The SIGCHLD handler should set a variable that indicates that the
    controller has exited. When select() returns due to the interrupted
    system call, check this variable and get out of the select() loop.

    --
    Barry Margolin, barmar@alum.mit.edu
    Arlington, MA
    *** PLEASE post questions in newsgroups, not directly to me ***
    *** PLEASE don't copy me on replies, I'll read them in the group ***

  3. Re: SIGCHILD, wait(), and select()

    On Thu, 25 Sep 2008 15:33:50 -0400, Barry Margolin wrote:

    > In article <9OednXJO8YtaQUbVnZ2dnUVZ_hGdnZ2d@comcast.com>,
    > Arch Stanton wrote:
    >
    >> I'm trying to work out how to both talk with *and* wait for descendant
    >> processes but can't seem to find a way to do so safely. Here's a little
    >> background:
    >>
    >> The top-level process forks. On the child side a new program (the
    >> "controller") is exec-ed. The controller can and will fork many
    >> descendants. Meanwhile, the parent becomes the so-called "monitor"; it
    >> goes into a select loop because each program started by the controller
    >> will need to open a socket and report some data to it just before
    >> exiting. This monitor loop must run until all processes started by the
    >> controller have reported back, at which point the monitor can exit.
    >>
    >> It's safe to assume that the controller will wait for all its
    >> descendants, i.e. there will be no 'background' processes.
    >>
    >> Thus, as soon as the controller process has reported back _its own_
    >> results, the monitor can break out of the select loop and issue a wait.
    >> It knows that all the descendant processes must have completed, and
    >> also that the controller is just about to exit so that a wait() will
    >> not block for long. The main point here is that wait() is not used
    >> except at the very end when we already know it'll have something to
    >> wait for.
    >>
    >> This works great *except* when the controller process dumps core or
    >> otherwise aborts. In that case the monitor basically hangs waiting for
    >> that final report (there is a select timeout but that's still quite
    >> painful). So I tried putting in a handler for SIGCHLD to take care of
    >> this case but that throws out a lot of "interrupted system call"
    >> errors, presumably because the signal arrives during select().
    >>
    >> Is there a well-known technique for handling this situation?

    >
    > The SIGCHLD handler should set a variable that indicates that the
    > controller has exited. When select() returns due to the interrupted
    > system call, check this variable and get out of the select() loop.


    Another way would be a self-pipe-trick:
    - the supervisor has both ends of a pipe open,
    - the SIGCHLD handler does the wait(), collects the pid and writes it to
    the write end of the pipe.
    - EINTR can be ignored.
    - The read-end of the pipe is in the read- fd_set of the select(),
    - the main loop can read the pid from the read end of the pipe.

    HTH,
    AvK

  4. Re: SIGCHILD, wait(), and select()

    Arch Stanton writes:
    > The top-level process forks. On the child side a new program
    > (the "controller") is exec-ed. The controller can and will fork many
    > descendants. Meanwhile, the parent becomes the so-called "monitor"; it
    > goes into a select loop because each program started by the controller
    > will need to open a socket and report some data to it just before
    > exiting.


    [...]

    > Thus, as soon as the controller process has reported back _its own_
    > results, the monitor can break out of the select loop and issue a
    > wait.


    [...]

    > This works great *except* when the controller process dumps core or
    > otherwise aborts. In that case the monitor basically hangs waiting for
    > that final report (there is a select timeout but that's still quite
    > painful).


    If you are using anything with 'virtual circuit' semantics, eg
    connected stream sockets or pipes, the file descriptor connected to
    the socket the controller was supposed to use for reporting should
    become readable and trying to read from it should return an EOF
    inidicator (read returns 0).

+ Reply to Thread