signal race - Unix

This is a discussion on signal race - Unix ; Hi, I'm trying to figure out how I can prevent a certain kind of race condition. There are two things to do: 1) open a file 2) wait for a child process Both can block infinitely, because the open might ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 25

Thread: signal race

  1. signal race

    Hi,

    I'm trying to figure out how I can prevent a certain kind of race condition.

    There are two things to do:
    1) open a file
    2) wait for a child process

    Both can block infinitely, because the open might be a fifo which has no
    input yet, and either event should be processed as soon as possible.

    If I setup a signal handler to cat SIGCHILD to interrupt the open, I get
    the following critical race:

    sigaction()
    signal occurs right before call to open
    open() -> blocks infinitely, although a signal occurred

    Is this correct?

    What can I do about it?

    TIA,
    Thomas

  2. Re: signal race

    Thomas Maier-Komor writes:
    > There are two things to do:
    > 1) open a file
    > 2) wait for a child process
    >
    > Both can block infinitely, because the open might be a fifo which has
    > no input yet, and either event should be processed as soon as possible.
    >
    > If I setup a signal handler to cat SIGCHILD to interrupt the open, I
    > get the following critical race:
    >
    > sigaction()
    > signal occurs right before call to open
    > open() -> blocks infinitely, although a signal occurred
    >
    > Is this correct?


    Yes.

    > What can I do about it?


    One possibility would be to use (sig)longjmp from the signal handler
    instead relying on EINTR.

  3. Re: signal race

    On Aug 8, 7:46*am, Thomas Maier-Komor wrote:
    > Hi,
    >
    > I'm trying to figure out how I can prevent a certain kind of race condition.
    >
    > There are two things to do:
    > 1) open a file
    > 2) wait for a child process
    >
    > Both can block infinitely, because the open might be a fifo which has no
    > input yet, and either event should be processed as soon as possible.
    >
    > If I setup a signal handler to cat SIGCHILD to interrupt the open, I get
    > the following critical race:
    >
    > sigaction()
    > signal occurs right before call to open
    > open() *-> blocks infinitely, although a signal occurred
    >


    what is the context i.e. related processes, like a parent sharing a
    fifo with a child ?
    The important thing is to know who is supposed to write and read it.

    If the parent reads the fifo, you can open it using O_NONBLOCK/
    O_NODELAY flag and in read-only mode.
    That changes your workflow, but I do not see any other way to get rid
    of the race: there is no way to coordinate open() sys call and
    signals.

    More info on how the fifo is used would get you more interesting
    results than what I've written ;-)

    cheers,
    -- paulo

  4. Re: signal race

    Rainer Weikusat schrieb:
    > Thomas Maier-Komor writes:
    >> There are two things to do:
    >> 1) open a file
    >> 2) wait for a child process
    >>
    >> Both can block infinitely, because the open might be a fifo which has
    >> no input yet, and either event should be processed as soon as possible.
    >>
    >> If I setup a signal handler to cat SIGCHILD to interrupt the open, I
    >> get the following critical race:
    >>
    >> sigaction()
    >> signal occurs right before call to open
    >> open() -> blocks infinitely, although a signal occurred
    >>
    >> Is this correct?

    >
    > Yes.
    >
    >> What can I do about it?

    >
    > One possibility would be to use (sig)longjmp from the signal handler
    > instead relying on EINTR.


    Yes, siglongjmp might get me around creating a thread for cleaning up
    the child processes. Neither choice is really appealing to me...

    Thanks!

  5. Re: signal race

    ppi schrieb:
    > On Aug 8, 7:46 am, Thomas Maier-Komor wrote:
    >> Hi,
    >>
    >> I'm trying to figure out how I can prevent a certain kind of race condition.
    >>
    >> There are two things to do:
    >> 1) open a file
    >> 2) wait for a child process
    >>
    >> Both can block infinitely, because the open might be a fifo which has no
    >> input yet, and either event should be processed as soon as possible.
    >>
    >> If I setup a signal handler to cat SIGCHILD to interrupt the open, I get
    >> the following critical race:
    >>
    >> sigaction()
    >> signal occurs right before call to open
    >> open() -> blocks infinitely, although a signal occurred
    >>

    >
    > what is the context i.e. related processes, like a parent sharing a
    > fifo with a child ?
    > The important thing is to know who is supposed to write and read it.
    >
    > If the parent reads the fifo, you can open it using O_NONBLOCK/
    > O_NODELAY flag and in read-only mode.
    > That changes your workflow, but I do not see any other way to get rid
    > of the race: there is no way to coordinate open() sys call and
    > signals.
    >
    > More info on how the fifo is used would get you more interesting
    > results than what I've written ;-)
    >
    > cheers,
    > -- paulo


    the parent is reading the fifo, and some independent 3rd party will
    eventually write on it.

    The problem with opening the fifo non-blocking is that the control flow
    gets stuck in a loop between open/read and wait.

    The more I think over it, the more I get the impression, I'll have to
    create a thread for cleaning up the child processes.

    - Thomas

  6. Re: signal race

    On Aug 8, 6:52 am, Thomas Maier-Komor wrote:
    > ppi schrieb:
    >
    >
    >
    > > On Aug 8, 7:46 am, Thomas Maier-Komor wrote:
    > >> Hi,

    >
    > >> I'm trying to figure out how I can prevent a certain kind of race condition.

    >
    > >> There are two things to do:
    > >> 1) open a file
    > >> 2) wait for a child process

    >
    > >> Both can block infinitely, because the open might be a fifo which has no
    > >> input yet, and either event should be processed as soon as possible.

    >
    > >> If I setup a signal handler to cat SIGCHILD to interrupt the open, I get
    > >> the following critical race:

    >
    > >> sigaction()
    > >> signal occurs right before call to open
    > >> open() -> blocks infinitely, although a signal occurred

    >
    > > what is the context i.e. related processes, like a parent sharing a
    > > fifo with a child ?
    > > The important thing is to know who is supposed to write and read it.

    >
    > > If the parent reads the fifo, you can open it using O_NONBLOCK/
    > > O_NODELAY flag and in read-only mode.
    > > That changes your workflow, but I do not see any other way to get rid
    > > of the race: there is no way to coordinate open() sys call and
    > > signals.

    >
    > > More info on how the fifo is used would get you more interesting
    > > results than what I've written ;-)

    >
    > > cheers,
    > > -- paulo

    >
    > the parent is reading the fifo, and some independent 3rd party will
    > eventually write on it.
    >
    > The problem with opening the fifo non-blocking is that the control flow
    > gets stuck in a loop between open/read and wait.
    >
    > The more I think over it, the more I get the impression, I'll have to
    > create a thread for cleaning up the child processes.


    Depending on what cleanup is involved, you might be able to do it
    inside the signal handler. If you just need to call wait(), you can
    just do it, since it is async-safe. If you need to touch some global
    data structure (call it child_list), you have to worry about what
    happens if the signal arrives while something else is using
    child_list. One possibility would be to have the other code that uses
    child_list block SIGCHLD while it is running.

    Regarding (sig)longjmp, I am having trouble seeing how that would
    work, because it seems like there is a race at the other end. Suppose
    we write

    jmp_buf jb;
    void handler(int sig) {
    longjmp(jb, 1);
    }

    int status;

    void open_and_wait(void) {
    int fd = -1, exited = 0;
    if (setjmp(jb) == 1) {
    waitpid(child, &status, 0);
    exited = 1;
    }
    signal(SIGCHLD, handler);
    if (fd == -1)
    fd = open(fifo, O_RDONLY);
    signal(SIGCHLD, SIG_DFL);
    if (!exited)
    waitpid(child, &status, 0);
    }

    But it could happen that the signal arrives just *after* the open has
    completed, but before its return value is stored into fd. In this
    case we would never know that the open was accomplished, so that the
    fifo would be opened twice, which is not good.

    Is there a way to avoid this that I am missing?

  7. Re: signal race

    On Aug 8, 6:52 am, Thomas Maier-Komor wrote:
    > ppi schrieb:
    >
    >
    >
    > > On Aug 8, 7:46 am, Thomas Maier-Komor wrote:
    > >> Hi,

    >
    > >> I'm trying to figure out how I can prevent a certain kind of race condition.

    >
    > >> There are two things to do:
    > >> 1) open a file
    > >> 2) wait for a child process

    >
    > >> Both can block infinitely, because the open might be a fifo which has no
    > >> input yet, and either event should be processed as soon as possible.

    >
    > >> If I setup a signal handler to cat SIGCHILD to interrupt the open, I get
    > >> the following critical race:

    >
    > >> sigaction()
    > >> signal occurs right before call to open
    > >> open() -> blocks infinitely, although a signal occurred

    >
    > > what is the context i.e. related processes, like a parent sharing a
    > > fifo with a child ?
    > > The important thing is to know who is supposed to write and read it.

    >
    > > If the parent reads the fifo, you can open it using O_NONBLOCK/
    > > O_NODELAY flag and in read-only mode.
    > > That changes your workflow, but I do not see any other way to get rid
    > > of the race: there is no way to coordinate open() sys call and
    > > signals.

    >
    > > More info on how the fifo is used would get you more interesting
    > > results than what I've written ;-)

    >
    > > cheers,
    > > -- paulo

    >
    > the parent is reading the fifo, and some independent 3rd party will
    > eventually write on it.
    >
    > The problem with opening the fifo non-blocking is that the control flow
    > gets stuck in a loop between open/read and wait.


    Another thought is to use the "pipe hack". Open your fifo in non-
    blocking mode. Then create another pipe via pipe(p). Set up a signal
    handler for SIGCHLD which writes one byte to p[1] and then returns.
    This is async-safe and can't block, because PIPE_BUF is bigger than
    1. The main program then calls select() or poll() with both the
    original fifo and p[0], and when p[0] becomes ready, it knows the
    child has exited and cleans it up.



  8. Re: signal race

    fjblurt@yahoo.com writes:
    > On Aug 8, 6:52 am, Thomas Maier-Komor wrote:
    >> ppi schrieb:
    >> > On Aug 8, 7:46 am, Thomas Maier-Komor wrote:
    >> >> Hi,

    >>
    >> >> I'm trying to figure out how I can prevent a certain kind of race condition.

    >>
    >> >> There are two things to do:
    >> >> 1) open a file
    >> >> 2) wait for a child process

    >>
    >> >> Both can block infinitely, because the open might be a fifo which has no
    >> >> input yet, and either event should be processed as soon as possible.

    >>
    >> >> If I setup a signal handler to cat SIGCHILD to interrupt the open, I get
    >> >> the following critical race:

    >>
    >> >> sigaction()
    >> >> signal occurs right before call to open
    >> >> open() -> blocks infinitely, although a signal occurred

    >>
    >> > what is the context i.e. related processes, like a parent sharing a
    >> > fifo with a child ?
    >> > The important thing is to know who is supposed to write and read it.

    >>
    >> > If the parent reads the fifo, you can open it using O_NONBLOCK/
    >> > O_NODELAY flag and in read-only mode.
    >> > That changes your workflow, but I do not see any other way to get rid
    >> > of the race: there is no way to coordinate open() sys call and
    >> > signals.

    >>
    >> > More info on how the fifo is used would get you more interesting
    >> > results than what I've written ;-)

    >>
    >> > cheers,
    >> > -- paulo

    >>
    >> the parent is reading the fifo, and some independent 3rd party will
    >> eventually write on it.
    >>
    >> The problem with opening the fifo non-blocking is that the control flow
    >> gets stuck in a loop between open/read and wait.
    >>
    >> The more I think over it, the more I get the impression, I'll have to
    >> create a thread for cleaning up the child processes.

    >
    > Depending on what cleanup is involved, you might be able to do it
    > inside the signal handler. If you just need to call wait(), you can
    > just do it, since it is async-safe. If you need to touch some global
    > data structure (call it child_list), you have to worry about what
    > happens if the signal arrives while something else is using
    > child_list. One possibility would be to have the other code that uses
    > child_list block SIGCHLD while it is running.
    >
    > Regarding (sig)longjmp, I am having trouble seeing how that would
    > work, because it seems like there is a race at the other end. Suppose
    > we write
    >
    > jmp_buf jb;
    > void handler(int sig) {
    > longjmp(jb, 1);
    > }
    >
    > int status;
    >
    > void open_and_wait(void) {
    > int fd = -1, exited = 0;
    > if (setjmp(jb) == 1) {
    > waitpid(child, &status, 0);
    > exited = 1;
    > }
    > signal(SIGCHLD, handler);
    > if (fd == -1)
    > fd = open(fifo, O_RDONLY);
    > signal(SIGCHLD, SIG_DFL);
    > if (!exited)
    > waitpid(child, &status, 0);
    > }
    >
    > But it could happen that the signal arrives just *after* the open has
    > completed, but before its return value is stored into fd. In this
    > case we would never know that the open was accomplished, so that the
    > fifo would be opened twice, which is not good.
    >
    > Is there a way to avoid this that I am missing?


    Declare fd as volatile. A signal handler can only be executed when the
    kernel arranges for it to be executed and invoking a signal handler on
    exiting from a sucessful system call which allocated some resource
    in need of being freed again instead of first returning the ressource
    identifier to userspace would be a kernel bug.

  9. Re: signal race

    On Aug 8, 11:32 am, Rainer Weikusat wrote:
    > fjbl...@yahoo.com writes:


    > > Regarding (sig)longjmp, I am having trouble seeing how that would
    > > work, because it seems like there is a race at the other end. Suppose
    > > we write

    >
    > > jmp_buf jb;
    > > void handler(int sig) {
    > > longjmp(jb, 1);
    > > }

    >
    > > int status;

    >
    > > void open_and_wait(void) {
    > > int fd = -1, exited = 0;
    > > if (setjmp(jb) == 1) {
    > > waitpid(child, &status, 0);
    > > exited = 1;
    > > }
    > > signal(SIGCHLD, handler);
    > > if (fd == -1)
    > > fd = open(fifo, O_RDONLY);
    > > signal(SIGCHLD, SIG_DFL);
    > > if (!exited)
    > > waitpid(child, &status, 0);
    > > }

    >
    > > But it could happen that the signal arrives just *after* the open has
    > > completed, but before its return value is stored into fd. In this
    > > case we would never know that the open was accomplished, so that the
    > > fifo would be opened twice, which is not good.

    >
    > > Is there a way to avoid this that I am missing?

    >
    > Declare fd as volatile. A signal handler can only be executed when the
    > kernel arranges for it to be executed and invoking a signal handler on
    > exiting from a sucessful system call which allocated some resource
    > in need of being freed again instead of first returning the ressource
    > identifier to userspace would be a kernel bug.


    I certainly should have declared 'fd' as volatile, so that it can't be
    clobbered by the longjmp(). And the file descriptor certainly gets
    returned to user space. But there is a window between the return into
    user space (inside libc's open() function) and the spot where the
    variable 'fd' is stored.

    For example, on x86, the 'open' system call is invoked by executing an
    interrupt instruction; upon return to user space the return value is
    in a certain register. Now libc's open() function must look at this
    register, set errno if appropriate, put its own return value into the
    appropriate register, and return, whereupon my open_and_wait() must
    store the value from this register into the stack location of 'fd'.
    This all takes several instructions, and if the signal should happen
    to arrive in the meantime, the file descriptor will never make it out
    of a register and into 'fd', and we will lose it when we longjmp().
    The kernel won't prevent this from happening, because as far as it is
    concerned this is all just random userspace code running. It's the
    fault of userspace that it threw away the file descriptor when it
    handled the signal, but I do not see how to avoid it.

    In principle we could do something really nonportable where we issue
    the interrupt instruction manually, note the register where the return
    value resides, and have the signal handler examine the saved registers
    to retrieve the value if need be. But of course that's not an
    acceptable solution.

  10. Re: signal race

    Thomas Maier-Komor wrote:

    > the parent is reading the fifo, and some independent 3rd party will
    > eventually write on it.
    >
    > The problem with opening the fifo non-blocking is that the control flow
    > gets stuck in a loop between open/read and wait.


    This comment leads me to think that you're merely trying to reap zombies. In
    that case, either call waitpid() from the signal handler, or set the handler
    to SIG_IGN, and the kernel won't create zombies at all.

    > The more I think over it, the more I get the impression, I'll have to
    > create a thread for cleaning up the child processes.


    pselect() was devised specifically to resolve this race. You both block and
    set a handler for the signals you're interested in. You then call pselect(),
    passing a sigset_t argument, which atomically enters the kernel and swaps
    signal masks. Any pending signals will get delivered, and pselect() will
    return w/ EINTR. This allows you to set a flag in the handler, then take
    action on the signal in the main control flow of your program.

    In this case, of course, it requires you to open the fifo non-blocking, but
    you won't have to spin, alternatively calling read and wait. pselect() will
    effectively block until the fifo is ready for reading, or a signal was
    delivered. The signal could have been raised outside of pselect(); it will
    just be defered until you enter pselect().

    Unfortunately, pselect() is relatively new. Not too long ago Linux didn't
    support this syscall in the kernel, and glibc merely provided a race-prone
    stub. But, I _believe_ recent Linux distributions will support this
    properly. I'd wager Solaris does, too. Some systems have ppoll() as an
    analogue to pselect(), though I think only the latter is standardized.

    If you want to go off the portability reservation, most (all?) modern unices
    have specialized solutions: kqueue on BSDs, signalfd in Linux, and Event
    Completion Ports for Solaris, to name a few. These all allow you to inline
    signal delivery notification in their own peculiar fashion.


  11. Re: signal race

    In article ,
    Thomas Maier-Komor wrote:

    > Hi,
    >
    > I'm trying to figure out how I can prevent a certain kind of race condition.
    >
    > There are two things to do:
    > 1) open a file
    > 2) wait for a child process
    >
    > Both can block infinitely, because the open might be a fifo which has no
    > input yet, and either event should be processed as soon as possible.
    >
    > If I setup a signal handler to cat SIGCHILD to interrupt the open, I get
    > the following critical race:
    >
    > sigaction()
    > signal occurs right before call to open
    > open() -> blocks infinitely, although a signal occurred
    >
    > Is this correct?
    >
    > What can I do about it?


    Why don't you process the signal in the signal handler?

    --
    Barry Margolin, barmar@alum.mit.edu
    Arlington, MA
    *** PLEASE post questions in newsgroups, not directly to me ***
    *** PLEASE don't copy me on replies, I'll read them in the group ***

  12. Re: signal race

    fjblurt@yahoo.com wrote:
    >>
    >> The problem with opening the fifo non-blocking is that the control flow
    >> gets stuck in a loop between open/read and wait.

    >
    > Another thought is to use the "pipe hack". Open your fifo in non-
    > blocking mode. Then create another pipe via pipe(p). Set up a signal
    > handler for SIGCHLD which writes one byte to p[1] and then returns.
    > This is async-safe and can't block, because PIPE_BUF is bigger than
    > 1. The main program then calls select() or poll() with both the
    > original fifo and p[0], and when p[0] becomes ready, it knows the
    > child has exited and cleans it up.
    >
    >


    I'm unable to understand how this would help, as I'm having a problem
    with open blocking. If open succeeds and SIGCHLD gets delivered after
    open, but before the returned fd is written into the variable, another
    call to open would return a different file descriptor that refers to a
    different connection.

    The solution to use siglongjmp in the signal handler has only a very
    small race-condition - but it looks to me like a real issue. So, as far
    as I currently understand I really need to spawn a thread to open up the
    pipe to avoid the race.

  13. Re: signal race

    William Ahern wrote:
    > Thomas Maier-Komor wrote:
    >
    >> the parent is reading the fifo, and some independent 3rd party will
    >> eventually write on it.
    >>
    >> The problem with opening the fifo non-blocking is that the control flow
    >> gets stuck in a loop between open/read and wait.

    >
    > This comment leads me to think that you're merely trying to reap zombies. In
    > that case, either call waitpid() from the signal handler, or set the handler
    > to SIG_IGN, and the kernel won't create zombies at all.
    >


    I need to handle the exit codes of the childs properly. So, detaching
    the child processes isn't an option.

    >> The more I think over it, the more I get the impression, I'll have to
    >> create a thread for cleaning up the child processes.

    >
    > pselect() was devised specifically to resolve this race. You both block and
    > set a handler for the signals you're interested in. You then call pselect(),
    > passing a sigset_t argument, which atomically enters the kernel and swaps
    > signal masks. Any pending signals will get delivered, and pselect() will
    > return w/ EINTR. This allows you to set a flag in the handler, then take
    > action on the signal in the main control flow of your program.
    >
    > In this case, of course, it requires you to open the fifo non-blocking, but
    > you won't have to spin, alternatively calling read and wait. pselect() will
    > effectively block until the fifo is ready for reading, or a signal was
    > delivered. The signal could have been raised outside of pselect(); it will
    > just be defered until you enter pselect().
    >
    > Unfortunately, pselect() is relatively new. Not too long ago Linux didn't
    > support this syscall in the kernel, and glibc merely provided a race-prone
    > stub. But, I _believe_ recent Linux distributions will support this
    > properly. I'd wager Solaris does, too. Some systems have ppoll() as an
    > analogue to pselect(), though I think only the latter is standardized.
    >
    > If you want to go off the portability reservation, most (all?) modern unices
    > have specialized solutions: kqueue on BSDs, signalfd in Linux, and Event
    > Completion Ports for Solaris, to name a few. These all allow you to inline
    > signal delivery notification in their own peculiar fashion.
    >


    Unfortunately, pselect is unable to wait for a fifo to not block on
    open, because for this to be possible, we need a valid file descriptor,
    and this will only be available, once the named pipe can be opened
    non-blocking.

    Or did I miss something?

  14. Re: signal race

    Barry Margolin wrote:
    >
    > Why don't you process the signal in the signal handler?
    >


    Because, I have to do some non-trivial cleanup sequence that cannot be
    performed in the signal handler.

  15. Re: signal race

    Thomas Maier-Komor wrote:
    > fjblurt@yahoo.com wrote:
    >>> The problem with opening the fifo non-blocking is that the control flow
    >>> gets stuck in a loop between open/read and wait.

    >> Another thought is to use the "pipe hack". Open your fifo in non-
    >> blocking mode. Then create another pipe via pipe(p). Set up a signal
    >> handler for SIGCHLD which writes one byte to p[1] and then returns.
    >> This is async-safe and can't block, because PIPE_BUF is bigger than
    >> 1. The main program then calls select() or poll() with both the
    >> original fifo and p[0], and when p[0] becomes ready, it knows the
    >> child has exited and cleans it up.
    >>
    >>

    >
    > I'm unable to understand how this would help, as I'm having a problem
    > with open blocking. If open succeeds and SIGCHLD gets delivered after
    > open, but before the returned fd is written into the variable, another
    > call to open would return a different file descriptor that refers to a
    > different connection.
    >
    > The solution to use siglongjmp in the signal handler has only a very
    > small race-condition - but it looks to me like a real issue. So, as far
    > as I currently understand I really need to spawn a thread to open up the
    > pipe to avoid the race.


    Sorry, I have to revise my statement:
    Another call to open(2) on the pipe will reopen the same connection. So
    in the case of the race condition, all that happens is that a file
    descriptor leaks.

    I think, I can live with that.

  16. Re: signal race

    Thomas Maier-Komor writes:
    > Thomas Maier-Komor wrote:


    [...]

    >> The solution to use siglongjmp in the signal handler has only a very
    >> small race-condition - but it looks to me like a real issue. So, as far
    >> as I currently understand I really need to spawn a thread to open up the
    >> pipe to avoid the race.

    >
    > Sorry, I have to revise my statement:
    > Another call to open(2) on the pipe will reopen the same connection. So
    > in the case of the race condition, all that happens is that a file
    > descriptor leaks.
    >
    > I think, I can live with that.


    It is not entirely clear if the problem exists practically at all. A
    signal is not really an interrupt and a signal handler can only run
    when the kernel choses to run it. If so, this will happen on the
    'exit from the kernel'-codepath (the reason being 'it is implemented
    this way'). Running a signal handler instead of just returning from
    the system call when some side effect inside the kernel the process
    needs to be informed of (eg creation of a new open file descriptor)
    has already happened would definitely be a kernel bug (the leak would
    happen because the kernel 'actively choses' to throw this information
    away). Theoretically (meaning, I haven't looked at the actual code),
    the next option to run the signal handler would be when the process is
    exiting from the kernel after the scheduler has chosen to run it for
    the next time, ie after the kernel has switched to a different process
    at least once and is now busy with switching back to the process
    having executed the open. The question would then be 'can the process
    returning from the open call be preempted before it has had a chance
    to store the returned descriptor somewhere where it is still
    accessible after the signal handler has executed and after it has
    returned from the open'. If so, this would affect any system call
    which causes a side effect the process needs to be informed of, eg a
    read from a TCP connection could become lost because the number of
    bytes read never makes it into userspace. While this seems plausible
    when assuming that a hardware interrupt could just cause it to
    happen at any time (higher priority process becomes runnable), it
    would mean that doing a longjmp from a signal handler generally works
    only by chance. But this is documented as usable in various places,
    the glibc documentation being an actual example I remember. Until I
    have found the time to examine this in more detail, I therefore
    wouldn't be willing to not assume that, alternatively, I could be
    missing something here.

  17. Re: signal race

    Thomas Maier-Komor wrote:

    > > This comment leads me to think that you're merely trying to reap zombies. In
    > > that case, either call waitpid() from the signal handler, or set the handler
    > > to SIG_IGN, and the kernel won't create zombies at all.
    > >

    >
    > I need to handle the exit codes of the childs properly. So, detaching
    > the child processes isn't an option.


    Handle how? You can call waitpid from the signal handler. waitpid is
    async-signal-safe. The only remaining question is what do you want to do w/
    the exit status?

    > > In this case, of course, it requires you to open the fifo non-blocking, but
    > > you won't have to spin, alternatively calling read and wait. pselect() will
    > > effectively block until the fifo is ready for reading, or a signal was
    > > delivered. The signal could have been raised outside of pselect(); it will
    > > just be defered until you enter pselect().


    > Unfortunately, pselect is unable to wait for a fifo to not block on
    > open, because for this to be possible, we need a valid file descriptor,
    > and this will only be available, once the named pipe can be opened
    > non-blocking.


    > Or did I miss something?


    Opening a fifo non-blocking opens it immediately (i.e. it doesn't block the
    open call). You immediately get a valid descriptor. Select wouldn't then
    signal that the fifo was opened for writing, but that something wrote to it
    and there's data ready. Which is functionally equivalent, unless the writer
    will never write anything before some other condition occurs--to be
    triggered by the act of opening the fifo for writing.


  18. Re: signal race

    Rainer Weikusat wrote:
    > Thomas Maier-Komor writes:
    >> Thomas Maier-Komor wrote:

    >
    > [...]
    >
    >>> The solution to use siglongjmp in the signal handler has only a very
    >>> small race-condition - but it looks to me like a real issue. So, as far
    >>> as I currently understand I really need to spawn a thread to open up the
    >>> pipe to avoid the race.

    >> Sorry, I have to revise my statement:
    >> Another call to open(2) on the pipe will reopen the same connection. So
    >> in the case of the race condition, all that happens is that a file
    >> descriptor leaks.
    >>
    >> I think, I can live with that.

    >
    > It is not entirely clear if the problem exists practically at all. A
    > signal is not really an interrupt and a signal handler can only run
    > when the kernel choses to run it. If so, this will happen on the
    > 'exit from the kernel'-codepath (the reason being 'it is implemented
    > this way'). Running a signal handler instead of just returning from
    > the system call when some side effect inside the kernel the process
    > needs to be informed of (eg creation of a new open file descriptor)
    > has already happened would definitely be a kernel bug (the leak would
    > happen because the kernel 'actively choses' to throw this information
    > away). Theoretically (meaning, I haven't looked at the actual code),
    > the next option to run the signal handler would be when the process is
    > exiting from the kernel after the scheduler has chosen to run it for
    > the next time, ie after the kernel has switched to a different process
    > at least once and is now busy with switching back to the process
    > having executed the open. The question would then be 'can the process
    > returning from the open call be preempted before it has had a chance
    > to store the returned descriptor somewhere where it is still
    > accessible after the signal handler has executed and after it has
    > returned from the open'. If so, this would affect any system call
    > which causes a side effect the process needs to be informed of, eg a
    > read from a TCP connection could become lost because the number of
    > bytes read never makes it into userspace. While this seems plausible
    > when assuming that a hardware interrupt could just cause it to
    > happen at any time (higher priority process becomes runnable), it
    > would mean that doing a longjmp from a signal handler generally works
    > only by chance. But this is documented as usable in various places,
    > the glibc documentation being an actual example I remember. Until I
    > have found the time to examine this in more detail, I therefore
    > wouldn't be willing to not assume that, alternatively, I could be
    > missing something here.


    Rainer,

    I am also not completely sure, if the point fjblurt@yahoo.com was making
    is correct (please correct me if the following analysis is not what you
    were talking about). When I try to make this issue down it looks like
    the following:

    The line of code in question is:
    FifoFD = open("/path/to/some/fifo",RD_ONLY);

    The C language translates this to the following:
    - prepare arguments
    - call to open
    - assign return value from open to FifoFD

    So the question is not if SIGCHLD can occur somewhere in open(2) and
    cause it to return an illegal value and thereby lose the valid file
    descriptor.

    The question is: can SIGCHLD force the execution of a signal handler
    before the 'assign return value from to FifoFD' operation. The problem I
    see here, is that open itself cannot ensure that this assignment after
    the call is protected against the execution of the signal handler.

    In contrast, mutex_lock takes a pointer to its lock, and therefore it
    can ensure that the necessary operations on the lock can be performed
    under mutual exclusion. The assignment after the open, isn't protected
    by any means against interruption of a signal handler.

    What do you think about this?


  19. Re: signal race

    William Ahern wrote:
    > Thomas Maier-Komor wrote:
    >
    >>> This comment leads me to think that you're merely trying to reap zombies. In
    >>> that case, either call waitpid() from the signal handler, or set the handler
    >>> to SIG_IGN, and the kernel won't create zombies at all.
    >>>

    >> I need to handle the exit codes of the childs properly. So, detaching
    >> the child processes isn't an option.

    >
    > Handle how? You can call waitpid from the signal handler. waitpid is
    > async-signal-safe. The only remaining question is what do you want to do w/
    > the exit status?
    >


    I have to clean up some dynamic memory. This requires calls to free and
    working on some global variables, which cannot be accomplished async
    signal safe.

    >>> In this case, of course, it requires you to open the fifo non-blocking, but
    >>> you won't have to spin, alternatively calling read and wait. pselect() will
    >>> effectively block until the fifo is ready for reading, or a signal was
    >>> delivered. The signal could have been raised outside of pselect(); it will
    >>> just be defered until you enter pselect().

    >
    >> Unfortunately, pselect is unable to wait for a fifo to not block on
    >> open, because for this to be possible, we need a valid file descriptor,
    >> and this will only be available, once the named pipe can be opened
    >> non-blocking.

    >
    >> Or did I miss something?

    >
    > Opening a fifo non-blocking opens it immediately (i.e. it doesn't block the
    > open call). You immediately get a valid descriptor. Select wouldn't then
    > signal that the fifo was opened for writing, but that something wrote to it
    > and there's data ready. Which is functionally equivalent, unless the writer
    > will never write anything before some other condition occurs--to be
    > triggered by the act of opening the fifo for writing.
    >


    Correct. opening a fifo non-blocking returns a valid file descriptor.
    But this file descriptor refers to an empty fifo, if no writer has
    already opened the fifo for writing. The next call to read will return 0.

    Only after closing the file descriptor and open'ing it after some writer
    also has opened the fifo, you get a file descriptor on something that
    will return >0 on read.

    So, using O_NONBLOCK in open, I'd get a loop.

  20. Re: signal race

    Thomas Maier-Komor wrote:

    > Correct. opening a fifo non-blocking returns a valid file descriptor.
    > But this file descriptor refers to an empty fifo, if no writer has
    > already opened the fifo for writing. The next call to read will return 0.
    >
    > Only after closing the file descriptor and open'ing it after some writer
    > also has opened the fifo, you get a file descriptor on something that
    > will return >0 on read.


    Possibly (I haven't tried). But that doesn't mean select() would also
    immediately return. And in fact, it doesn't. It won't return readiness
    _until_ there's something to read, no matter whether the other end is open
    for writing or not. Indeed, I've just confirmed this behavior on OS X and
    Linux.

    > So, using O_NONBLOCK in open, I'd get a loop.


    Not using pselect(). Don't call read until select() indicates readiness.

+ Reply to Thread
Page 1 of 2 1 2 LastLast