behavior of accept(2) in multithreaded application - Unix

This is a discussion on behavior of accept(2) in multithreaded application - Unix ; I'm trying to build a multi-threaded server. When I close a listening socket in one thread, while another thread is blocked on this socket within accept(2) this gives me different results depending on platform. On Mach and Windows NT the ...

+ Reply to Thread
Results 1 to 8 of 8

Thread: behavior of accept(2) in multithreaded application

  1. behavior of accept(2) in multithreaded application

    I'm trying to build a multi-threaded server.
    When I close a listening socket in one thread,
    while another thread is blocked on this socket within accept(2)
    this gives me different results depending on platform.

    On Mach and Windows NT the accept(2) nicely deblocks,
    while on Linux it stays blocked forever (no timeout...).

    For me it looks much more logic to let the OS deblock
    accept(2) when the listening socket is closed.
    Does anybody know why Linux behaves that way?
    Or is this just a bug?

  2. Re: behavior of accept(2) in multithreaded application

    On Apr 3, 7:36 pm, Frank Mertens wrote:

    > I'm trying to build a multi-threaded server.
    > When I close a listening socket in one thread,
    > while another thread is blocked on this socket within accept(2)
    > this gives me different results depending on platform.


    This is impossible to do reliably. There is no way you can know for
    sure that the other thread is blocked within 'accept'. It may have
    just returned from 'accept' or be about to call 'accept'.

    > On Mach and Windows NT the accept(2) nicely deblocks,
    > while on Linux it stays blocked forever (no timeout...).
    >
    > For me it looks much more logic to let the OS deblock
    > accept(2) when the listening socket is closed.
    > Does anybody know why Linux behaves that way?
    > Or is this just a bug?


    It's a bug, but the bug is in your code. One thread cannot safely
    release a resource while another thread is or might be using it. It is
    quite literally impossible to make this work.

    DS

  3. Re: behavior of accept(2) in multithreaded application

    David Schwartz wrote:
    > On Apr 3, 7:36 pm, Frank Mertens wrote:
    >
    >> I'm trying to build a multi-threaded server.
    >> When I close a listening socket in one thread,
    >> while another thread is blocked on this socket within accept(2)
    >> this gives me different results depending on platform.

    >
    > This is impossible to do reliably. There is no way you can know for
    > sure that the other thread is blocked within 'accept'. It may have
    > just returned from 'accept' or be about to call 'accept'.


    Why should 'accept(fd, ...)' become blocked or stay blocked if
    'close(fd)' was called?

    >
    >> On Mach and Windows NT the accept(2) nicely deblocks,
    >> while on Linux it stays blocked forever (no timeout...).
    >>
    >> For me it looks much more logic to let the OS deblock
    >> accept(2) when the listening socket is closed.
    >> Does anybody know why Linux behaves that way?
    >> Or is this just a bug?

    >
    > It's a bug, but the bug is in your code. One thread cannot safely
    > release a resource while another thread is or might be using it. It is
    > quite literally impossible to make this work.
    >
    > DS


  4. Re: behavior of accept(2) in multithreaded application

    On Apr 5, 5:24 pm, Frank Mertens wrote:

    > Why should 'accept(fd, ...)' become blocked or stay blocked if
    > 'close(fd)' was called?


    Consider:

    1) Your thread is about to call 'accept'. It has pushes the arguments
    on the stack and is about to make the system call, but before it can,
    another thread is scheduled.

    2) Your other thread runs and calls 'close(fd)'.

    3) A system thread runs, and opens a listening socket. It gets the
    same file descriptor you closed in step 2.

    4) That first thread runs again, enters 'accept' and is now waiting on
    the socket the system thread opened that will never receive any data.

    That's why.

    Until you explain how you can, within the POSIX standard, 100% ensure
    this can't happen, it *can* happen. So what you are doing cannot be
    guaranteed safe.

    DS

  5. Re: behavior of accept(2) in multithreaded application

    In article
    ,
    David Schwartz wrote:

    > On Apr 5, 5:24 pm, Frank Mertens wrote:
    >
    > > Why should 'accept(fd, ...)' become blocked or stay blocked if
    > > 'close(fd)' was called?

    >
    > Consider:
    >
    > 1) Your thread is about to call 'accept'. It has pushes the arguments
    > on the stack and is about to make the system call, but before it can,
    > another thread is scheduled.
    >
    > 2) Your other thread runs and calls 'close(fd)'.
    >
    > 3) A system thread runs, and opens a listening socket. It gets the
    > same file descriptor you closed in step 2.
    >
    > 4) That first thread runs again, enters 'accept' and is now waiting on
    > the socket the system thread opened that will never receive any data.
    >
    > That's why.
    >
    > Until you explain how you can, within the POSIX standard, 100% ensure
    > this can't happen, it *can* happen. So what you are doing cannot be
    > guaranteed safe.


    Are these operations not required to be thread-safe? I tried to check
    the SUS, but www.unix.org seems to be down.

    If they're thread-safe, they presumably need to perform mutual exclusion
    to ensure that file descriptors can't change in the middle of an
    operation.

    --
    Barry Margolin, barmar@alum.mit.edu
    Arlington, MA
    *** PLEASE don't copy me on replies, I'll read them in the group ***

  6. Re: behavior of accept(2) in multithreaded application

    Barry Margolin wrote:
    > In article
    > ,
    > David Schwartz wrote:
    >> On Apr 5, 5:24 pm, Frank Mertens wrote:
    >>
    >>> Why should 'accept(fd, ...)' become blocked or stay blocked if
    >>> 'close(fd)' was called?

    >> Consider:
    >>
    >> 1) Your thread is about to call 'accept'. It has pushes the arguments
    >> on the stack and is about to make the system call, but before it can,
    >> another thread is scheduled.
    >>
    >> 2) Your other thread runs and calls 'close(fd)'.
    >>
    >> 3) A system thread runs, and opens a listening socket. It gets the
    >> same file descriptor you closed in step 2.
    >>
    >> 4) That first thread runs again, enters 'accept' and is now waiting on
    >> the socket the system thread opened that will never receive any data.
    >>
    >> That's why.
    >>
    >> Until you explain how you can, within the POSIX standard, 100% ensure
    >> this can't happen, it *can* happen. So what you are doing cannot be
    >> guaranteed safe.

    >
    > Are these operations not required to be thread-safe? I tried to check
    > the SUS, but www.unix.org seems to be down.
    >
    > If they're thread-safe, they presumably need to perform mutual exclusion
    > to ensure that file descriptors can't change in the middle of an
    > operation.


    It doesn't matter either way - in the situation David described the
    thread calling accept() is not "in the middle of an operation".

    If I understand correctly, there are two crucial points:

    1. You cannot guarantee that one thread is blocked in accept() when
    another calls close(). That is, you cannot avoid a possible interval
    between the call to close() and the call to accept(). (You can easily
    get about as close as David describes by holding a mutex except during
    the accept() call, and acquiring the mutex before calling close().)

    2. You cannot guarantee that during that interval, no other thread will
    run and cause the descriptor to be reused (because that thread may have
    been created "behind your back" - I think this is what David meant by "a
    system thread").

    Alex

  7. Re: behavior of accept(2) in multithreaded application

    Alex Fraser writes:
    > Barry Margolin wrote:
    >> David Schwartz wrote:


    [...]

    >>> 1) Your thread is about to call 'accept'. It has pushes the arguments
    >>> on the stack and is about to make the system call, but before it can,
    >>> another thread is scheduled.
    >>>
    >>> 2) Your other thread runs and calls 'close(fd)'.
    >>>
    >>> 3) A system thread runs, and opens a listening socket. It gets the
    >>> same file descriptor you closed in step 2.
    >>>
    >>> 4) That first thread runs again, enters 'accept' and is now waiting on
    >>> the socket the system thread opened that will never receive any data.


    [...]

    >>> Until you explain how you can, within the POSIX standard, 100% ensure
    >>> this can't happen, it *can* happen. So what you are doing cannot be
    >>> guaranteed safe.


    [...]

    > 2. You cannot guarantee that during that interval, no other thread
    > will run and cause the descriptor to be reused (because that thread
    > may have been created "behind your back" - I think this is what David
    > meant by "a system thread").


    That a particular behaviour would not contradict some standard text
    does not constitute a proof that this particular behaviour "can
    happen". It just means that an implementation where it could happen
    would not be not standard-conformant because of this. Coming from the
    other direction, 'not conforming to some standard' is different from
    'not existing': Assuming the standard would prohibit a particular
    behaviour, this would not constitute a proof that no implementation of
    'most parts of this standard' exists (eg Linux) which exhibits this
    particular behaviour nevertheless. Practically, this boils down to the
    questions 'which implementation(s), if any, behave in this particular
    way' and 'is this relevant for the software in question'? A lot of
    things could exist. The point of 'science' is to determine if they
    do.

  8. Re: behavior of accept(2) in multithreaded application

    Rainer Weikusat wrote:
    > Alex Fraser writes:
    >> Barry Margolin wrote:
    >>> David Schwartz wrote:

    >
    > [...]
    >
    >>>> 1) Your thread is about to call 'accept'. It has pushes the arguments
    >>>> on the stack and is about to make the system call, but before it can,
    >>>> another thread is scheduled.
    >>>>
    >>>> 2) Your other thread runs and calls 'close(fd)'.
    >>>>
    >>>> 3) A system thread runs, and opens a listening socket. It gets the
    >>>> same file descriptor you closed in step 2.
    >>>>
    >>>> 4) That first thread runs again, enters 'accept' and is now waiting on
    >>>> the socket the system thread opened that will never receive any data.

    >
    > [...]
    >
    >>>> Until you explain how you can, within the POSIX standard, 100% ensure
    >>>> this can't happen, it *can* happen. So what you are doing cannot be
    >>>> guaranteed safe.

    >
    > [...]
    >
    >> 2. You cannot guarantee that during that interval, no other thread
    >> will run and cause the descriptor to be reused (because that thread
    >> may have been created "behind your back" - I think this is what David
    >> meant by "a system thread").

    >
    > That a particular behaviour would not contradict some standard text
    > does not constitute a proof that this particular behaviour "can
    > happen". It just means that an implementation where it could happen
    > would not be not standard-conformant because of this. Coming from the
    > other direction, 'not conforming to some standard' is different from
    > 'not existing': Assuming the standard would prohibit a particular
    > behaviour, this would not constitute a proof that no implementation of
    > 'most parts of this standard' exists (eg Linux) which exhibits this
    > particular behaviour nevertheless. Practically, this boils down to the
    > questions 'which implementation(s), if any, behave in this particular
    > way' and 'is this relevant for the software in question'? A lot of
    > things could exist. The point of 'science' is to determine if they
    > do.


    Thanks for all the replies. I couldn't grasp what David was explaining,
    but the replies let me think a bit deeper about the problem.
    Wrote two test programs, one showing the behavior when using pthreads
    and one using fork(2).

    The results are mixed. The multi-threaded test case can't reproduce
    my initial problem! (Nicely hinting to a hidden race condition,-)
    On the other hand, the test program using fork(2) always blocks,
    both on Mach and Linux.
    Of course it must do so! Because closing a file descriptor in one
    process does not mean it is closed in a another process.
    (netstat shows the open socket still available and in LISTEN state.)

    If I would write the OS, I would delegate the close(2) to a cleanup
    thread/process and return immediately to the caller. Opening a wealth
    of race conditions for someone assuming close() cleans up synchronously.
    (I have seen Linux delegating I/O cleanup on files to one or more
    bdflush process.)

    Assuring accept(2) becomes deblocked on server shutdown was my initial
    problem. I know signals can do this, but they are a nightmare of
    portability. Luckily even Windows can do select(2) with timeout
    and probably I will end up using it to wait for connections...

    ----------------- test_accept_fork.c

    #include
    #include
    #include
    #include
    #include
    #include
    #include

    int main()
    {
    int lsd, asd, ret, status;
    struct sockaddr_in addr;
    socklen_t len;
    pid_t pid;

    lsd = socket(AF_INET, SOCK_STREAM, 0);
    printf("socket, lsd = %d\n", lsd);

    memset(&addr, 0, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_port = htons(8001);
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    ret = bind(lsd, (struct sockaddr*)&addr, sizeof(addr));
    printf("bind, ret = %d\n", ret);

    ret = listen(lsd, 16);
    printf("listen, ret = %d\n", ret);

    pid = fork();
    if (pid == 0) {
    ret = close(lsd);
    printf("close, ret = %d\n", ret);
    return 0;
    }

    ret = waitpid(pid, &status, 0);
    printf("waitpid, ret = %d, status = %d\n", ret, WEXITSTATUS(status));

    ret = accept(lsd, (struct sockaddr*)&addr, &len);
    printf("accept, ret = %d\n", ret);

    return 0;
    }

    ----------------- test_accept_pthread.c

    #include
    #include
    #include
    #include
    #include
    #include

    void* closeInThread(void* p)
    {
    int lsd = *(int*)p;
    int ret = close(lsd);
    printf("close(%d), ret = %d\n", lsd, ret);
    return 0;
    }

    int main()
    {
    int lsd, asd, ret, status;
    struct sockaddr_in addr;
    socklen_t len;
    pid_t pid;
    pthread_t thread;

    lsd = socket(AF_INET, SOCK_STREAM, 0);
    printf("socket, lsd = %d\n", lsd);

    memset(&addr, 0, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_port = htons(8001);
    addr.sin_addr.s_addr = htonl(INADDR_ANY);
    ret = bind(lsd, (struct sockaddr*)&addr, sizeof(addr));
    printf("bind, ret = %d\n", ret);

    ret = listen(lsd, 16);
    printf("listen, ret = %d\n", ret);

    pthread_create(&thread, 0, &closeInThread, &lsd);
    pthread_join(thread, 0);

    ret = accept(lsd, (struct sockaddr*)&addr, &len);
    printf("accept, ret = %d\n", ret);

    return 0;
    }

+ Reply to Thread