epoll design problems with common fork/exec patterns - Kernel

This is a discussion on epoll design problems with common fork/exec patterns - Kernel ; Hi! I ran into what I see as unsolvable problems that make epoll useless as a generic event mechanism. I recently switched to libevent as event loop, and found that my programs work fine when it is using select or ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 29

Thread: epoll design problems with common fork/exec patterns

  1. epoll design problems with common fork/exec patterns

    Hi!

    I ran into what I see as unsolvable problems that make epoll useless as a
    generic event mechanism.

    I recently switched to libevent as event loop, and found that my programs
    work fine when it is using select or poll, but work eratically or halt
    when using epoll.

    The reason as I found out is the peculiar behaviour of epoll over fork.
    It doesn't work as documented, and even if, it would make the use of
    third-party libraries using fork usually impossible.

    Here are two scenarios where it screws up:

    - some library forks, explicitly closes all fd's it doesn't need, and execs
    another program (which is common behvaiour).

    In this case, the parent process works fine until the child closes fds,
    after which the fds become unarmed in the parent too. This works as
    documented, but since libraries expect this to work without affecting the
    parent, this puts a new and incompatible strain on what libraries can do,
    which in turn makes epoll unsuitable in cases where you don't control all
    your code.

    - I have a library that emulates asynchronous I/O with a thread pool, and
    uses a pipe for event notification. That library registers a fork handler
    that closes the pipe in the child and recreates it, so the child could
    continue doing AIO (as could the parent).

    This, too, screws up notifications for the parent,

    Now, the epoll manpage says that closing a fd will remove it from all
    fd sets. This would explain the behaviour above. Unfortunately (or
    fortunately?) this is not what happens: when the fds are being closed by
    exec or exit, the fds do not get removed from the epoll set.

    This behaviour strikes me as extremely illogical. On the one hand, one
    cannot share the epoll fd between processes normally, but on fork,
    you can, even though it makes no sense (the child has a different fd
    "namespace" than the parent) and actually works on (then( unrelated fds in
    the other process.

    It also strikes as weird that the order of closing fds should make so much
    of a difference: if the epoll fd is closed first in the child, the other
    fds will survive in the parent, if its closed last, they don't. Makes no
    sense to me.

    Now, the problem I see is not that it makes no sense to me - thats clearly
    my problem. The problem I see is that there is no way to avoid the
    associated problems except by patching all code that would ever use fork,
    even if it never has heard anything about epoll yet. This is extremely
    nonlocal action at a distance, as this affects a lot of code not even the
    author might be aware of (fork is rather common).

    To illustrate, here are some workarounds I thought about:

    - rearming all fds after fork: doesn't work, as the fds get removed
    asynchronously so I would have to wait for the child to do it.
    - closing the epoll fd after fork: doesn't work unless I control
    the fork. I can install a handler to be called using pthreads, but
    that won't help as other handlers might be called first (as in the case of
    the aio library above), screwing me.
    - closing and recreating the epoll fd before the fork: isn't support event
    remotely by libevent or similar event loops, and would not help either
    as I cnanot control the calls to fork.

    Is epoll really designed to be so incompatible with the most commno fork
    patterns? Shouldn't epoll do refcounting, as is commonly done under
    Unix? As the fd space is not shared between rpocesses, why does epoll
    try? Shouldn't the epoll information be copied just like the fd table
    itself, memory, and other resources?

    As it looks now, epoll looks useless except in the most controlled
    environments, as it doesn't duplicate state on fork as is done with the
    other fd-related resources (as opposed to the underlying files, which are
    properly shared).

    --
    The choice of a
    -----==- _GNU_ Deliantra, the free in data+content MORPG
    ----==-- _ generation
    ---==---(_)__ __ ____ __ http://www.deliantra.net/
    --==---/ / _ \/ // /\ \/ /
    -=====/_/_//_/\_,_/ /_/\_\
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: epoll design problems with common fork/exec patterns

    Marc Lehmann a écrit :
    > Hi!
    >
    > I ran into what I see as unsolvable problems that make epoll useless as a
    > generic event mechanism.
    >
    > I recently switched to libevent as event loop, and found that my programs
    > work fine when it is using select or poll, but work eratically or halt
    > when using epoll.
    >
    > The reason as I found out is the peculiar behaviour of epoll over fork.
    > It doesn't work as documented, and even if, it would make the use of
    > third-party libraries using fork usually impossible.
    >
    > Here are two scenarios where it screws up:
    >
    > - some library forks, explicitly closes all fd's it doesn't need, and execs
    > another program (which is common behvaiour).
    >
    > In this case, the parent process works fine until the child closes fds,
    > after which the fds become unarmed in the parent too. This works as


    I have no idea what exact problem you have. But if the child closes some file
    descriptor that were 'cloned' at fork() time, this only decrements a refcount,
    and definitely should not close it for the 'parent'. epoll in this regard uses
    a generic kernel service (file descriptor sharing between tasks).

    I have some apps that are happily using epoll() and fork()/exec() and have no
    problem at all. I usually use O_CLOEXEC so that all close() are done at exec()
    time without having to do it in a loop. epoll continues to work as expected in
    the parent process.

    > documented, but since libraries expect this to work without affecting the
    > parent, this puts a new and incompatible strain on what libraries can do,
    > which in turn makes epoll unsuitable in cases where you don't control all
    > your code.
    >
    > - I have a library that emulates asynchronous I/O with a thread pool, and
    > uses a pipe for event notification. That library registers a fork handler
    > that closes the pipe in the child and recreates it, so the child could
    > continue doing AIO (as could the parent).
    >
    > This, too, screws up notifications for the parent,
    >
    > Now, the epoll manpage says that closing a fd will remove it from all
    > fd sets. This would explain the behaviour above. Unfortunately (or
    > fortunately?) this is not what happens: when the fds are being closed by
    > exec or exit, the fds do not get removed from the epoll set.


    at exec() (granted CLOEXEC is asserted) or exit() time, only the refcount of
    each file is decremented. Only if their refcount becomes NULL, files are then
    removed from epoll set.

    >
    > This behaviour strikes me as extremely illogical. On the one hand, one
    > cannot share the epoll fd between processes normally, but on fork,
    > you can, even though it makes no sense (the child has a different fd
    > "namespace" than the parent) and actually works on (then( unrelated fds in
    > the other process.
    >
    > It also strikes as weird that the order of closing fds should make so much
    > of a difference: if the epoll fd is closed first in the child, the other
    > fds will survive in the parent, if its closed last, they don't. Makes no
    > sense to me.
    >
    > Now, the problem I see is not that it makes no sense to me - thats clearly
    > my problem. The problem I see is that there is no way to avoid the
    > associated problems except by patching all code that would ever use fork,
    > even if it never has heard anything about epoll yet. This is extremely
    > nonlocal action at a distance, as this affects a lot of code not even the
    > author might be aware of (fork is rather common).
    >
    > To illustrate, here are some workarounds I thought about:
    >
    > - rearming all fds after fork: doesn't work, as the fds get removed
    > asynchronously so I would have to wait for the child to do it.
    > - closing the epoll fd after fork: doesn't work unless I control
    > the fork. I can install a handler to be called using pthreads, but
    > that won't help as other handlers might be called first (as in the case of
    > the aio library above), screwing me.
    > - closing and recreating the epoll fd before the fork: isn't support event
    > remotely by libevent or similar event loops, and would not help either
    > as I cnanot control the calls to fork.
    >
    > Is epoll really designed to be so incompatible with the most commno fork
    > patterns? Shouldn't epoll do refcounting, as is commonly done under
    > Unix? As the fd space is not shared between rpocesses, why does epoll
    > try? Shouldn't the epoll information be copied just like the fd table
    > itself, memory, and other resources?


    Too many questions here, showing lack of understanding.

    >
    > As it looks now, epoll looks useless except in the most controlled
    > environments, as it doesn't duplicate state on fork as is done with the
    > other fd-related resources (as opposed to the underlying files, which are
    > properly shared).
    >


    epoll definitly is not useless. It is used on major and critical apps.
    You certainly missed something.
    Please provide some code to illustrate one exact problem you have.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: epoll design problems with common fork/exec patterns

    On Sat, Oct 27, 2007 at 10:23:17AM +0200, Eric Dumazet wrote:
    > > In this case, the parent process works fine until the child closes fds,
    > > after which the fds become unarmed in the parent too. This works as

    >
    > I have no idea what exact problem you have.


    Well, I explained it rather succinctly, I think. If you tell me whats unclear
    I can explain...

    > But if the child closes some
    > file descriptor that were 'cloned' at fork() time, this only decrements a
    > refcount, and definitely should not close it for the 'parent'.


    It doesn't. It removes it from the epoll set, though, so the parent will not
    receive events for that fd anymore.

    > I have some apps that are happily using epoll() and fork()/exec() and have


    The problem I described is fork/close/exec. close being the explicit
    syscall.

    > no problem at all. I usually use O_CLOEXEC so that all close() are done at
    > exec() time without having to do it in a loop. epoll continues to work as
    > expected in the parent process.


    This is because epoll doesn't behave like documented: It removes the fd
    from the parents epoll set only on an explicit close() syscall, not on an
    implicit close from exec.

    > >fd sets. This would explain the behaviour above. Unfortunately (or
    > >fortunately?) this is not what happens: when the fds are being closed by
    > >exec or exit, the fds do not get removed from the epoll set.

    >
    > at exec() (granted CLOEXEC is asserted) or exit() time, only the refcount
    > of each file is decremented. Only if their refcount becomes NULL, files are
    > then removed from epoll set.


    Yes. But thats obviously not the only way to close fds.

    > >Is epoll really designed to be so incompatible with the most commno fork
    > >patterns? Shouldn't epoll do refcounting, as is commonly done under
    > >Unix? As the fd space is not shared between rpocesses, why does epoll
    > >try? Shouldn't the epoll information be copied just like the fd table
    > >itself, memory, and other resources?

    >
    > Too many questions here, showing lack of understanding.


    You already said you don't the problem. No need to get insulting

    > epoll definitly is not useless. It is used on major and critical apps.
    > You certainly missed something.


    Well, it behaves like documented, which is the problem. You admit you
    don't understand the problem or the documentation, so again, no need to
    insult me.

    > Please provide some code to illustrate one exact problem you have.


    // assume there is an open epoll set that listens for events on fd 5
    if (fork () = 0)
    {
    close (5);
    // fd 5 is now removed from the epoll set of the parent.
    _exit (0);
    }

    --
    The choice of a
    -----==- _GNU_
    ----==-- _ generation Marc Lehmann
    ---==---(_)__ __ ____ __ pcg@goof.com
    --==---/ / _ \/ // /\ \/ / http://schmorp.de/
    -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: epoll design problems with common fork/exec patterns

    Marc Lehmann a écrit :
    > On Sat, Oct 27, 2007 at 10:23:17AM +0200, Eric Dumazet wrote:
    >>> In this case, the parent process works fine until the child closes fds,
    >>> after which the fds become unarmed in the parent too. This works as

    >> I have no idea what exact problem you have.

    >
    > Well, I explained it rather succinctly, I think. If you tell me whats unclear
    > I can explain...
    >
    >> But if the child closes some
    >> file descriptor that were 'cloned' at fork() time, this only decrements a
    >> refcount, and definitely should not close it for the 'parent'.

    >
    > It doesn't. It removes it from the epoll set, though, so the parent will not
    > receive events for that fd anymore.
    >
    >> I have some apps that are happily using epoll() and fork()/exec() and have

    >
    > The problem I described is fork/close/exec. close being the explicit
    > syscall.
    >
    >> no problem at all. I usually use O_CLOEXEC so that all close() are done at
    >> exec() time without having to do it in a loop. epoll continues to work as
    >> expected in the parent process.

    >
    > This is because epoll doesn't behave like documented: It removes the fd
    > from the parents epoll set only on an explicit close() syscall, not on an
    > implicit close from exec.
    >
    >>> fd sets. This would explain the behaviour above. Unfortunately (or
    >>> fortunately?) this is not what happens: when the fds are being closed by
    >>> exec or exit, the fds do not get removed from the epoll set.

    >> at exec() (granted CLOEXEC is asserted) or exit() time, only the refcount
    >> of each file is decremented. Only if their refcount becomes NULL, files are
    >> then removed from epoll set.

    >
    > Yes. But thats obviously not the only way to close fds.
    >
    >>> Is epoll really designed to be so incompatible with the most commno fork
    >>> patterns? Shouldn't epoll do refcounting, as is commonly done under
    >>> Unix? As the fd space is not shared between rpocesses, why does epoll
    >>> try? Shouldn't the epoll information be copied just like the fd table
    >>> itself, memory, and other resources?

    >> Too many questions here, showing lack of understanding.

    >
    > You already said you don't the problem. No need to get insulting
    >
    >> epoll definitly is not useless. It is used on major and critical apps.
    >> You certainly missed something.

    >
    > Well, it behaves like documented, which is the problem. You admit you
    > don't understand the problem or the documentation, so again, no need to
    > insult me.


    Hum... I will update my english vocabulary and mark "missed" as an insult.

    I have no problem with epoll nor its documentation.

    >
    >> Please provide some code to illustrate one exact problem you have.

    >
    > // assume there is an open epoll set that listens for events on fd 5
    > if (fork () = 0)
    > {
    > close (5);
    > // fd 5 is now removed from the epoll set of the parent.
    > _exit (0);
    > }
    >


    It doesnt on every kernels I had played with. And I played with *lot* of
    kernels you know.

    If such a bug exists on your kernel, please fill a complete bug report, giving
    details.

    Thank you

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: epoll design problems with common fork/exec patterns

    On Sat, Oct 27, 2007 at 11:22:25AM +0200, Eric Dumazet wrote:
    > >Well, it behaves like documented, which is the problem. You admit you
    > >don't understand the problem or the documentation, so again, no need to
    > >insult me.

    >
    > Hum... I will update my english vocabulary and mark "missed" as an insult.


    Well, ignoring my arguments by claiming I lack understanding is an insult,
    as you didn't take my arguments at face value but declassified them by
    attacking my person.

    > I have no problem with epoll nor its documentation.


    Thats fine for you. But I have, at least, with epoll, as the documented
    and observed behaviour makes epoll unusable as a general event loop
    replacement.

    > It doesnt on every kernels I had played with. And I played with *lot* of
    > kernels you know.


    No, I don't know that. And so far you only said you used fork+exec, not
    close in between, so maybe the playing you did was not related to this
    problem?

    I also played with a lot of kernels, but for epoll specifically, I played
    with 2.6.21-2-amd64 and 2.6.22-1-amd64, both from debian unstable with no
    customisations.

    > If such a bug exists on your kernel, please fill a complete bug report,
    > giving details.


    As this behaviour is clearly documented in the epoll manpage, why do you
    think it is a bug? I think its fairly bad, but at least tis documented as
    the behaviour it should be:

    Q6 Will the close of an fd cause it to be removed from all epoll sets automatically?
    A6 Yes.

    As such filing, a bug report for behaviour which isn't in fact a bug would
    be counterproductive. My goal in my mail was to find out if there are
    work arounds for this peculiar behaviour (Or inspire discussion on this
    behaviour).

    Of course, one can create big programs using epoll to their advantage. I
    never claimed otherwise. But as a general event loop replacement (i.e.
    outside of controleld environments), epoll does not currently qualify,
    as I would have to control an awful lot of code (think of an perl module
    interfacing to epoll: you would not have to control all third-party
    modules that might interfere with fork+close+exec. This is very common in
    scripting languages).

    --
    The choice of a Deliantra, the free code+content MORPG
    -----==- _GNU_ http://www.deliantra.net
    ----==-- _ generation
    ---==---(_)__ __ ____ __ Marc Lehmann
    --==---/ / _ \/ // /\ \/ / pcg@goof.com
    -=====/_/_//_/\_,_/ /_/\_\
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: epoll design problems with common fork/exec patterns

    Marc Lehmann a écrit :
    > On Sat, Oct 27, 2007 at 11:22:25AM +0200, Eric Dumazet wrote:
    >
    >> If such a bug exists on your kernel, please fill a complete bug report,
    >> giving details.

    >
    > As this behaviour is clearly documented in the epoll manpage, why do you
    > think it is a bug? I think its fairly bad, but at least tis documented as
    > the behaviour it should be:
    >
    > Q6 Will the close of an fd cause it to be removed from all epoll sets automatically?
    > A6 Yes.


    Answer : epoll documentation cannot explain the full semantic of file
    descriptors, or difference between user side (file descriptors) and kernel
    side (files and fds)
    Or should, since you had problems. But then, if the epoll documentation has to
    document the full Unix/Linux files semantic, nobody will read it.

    The 'close' of a file is not close(fd)
    But : the last close() so that underlying file refcount is 0

    example 1)

    fd = open("somefile", ...)
    fd1 = dup(fd);
    epoll_add_in_my_set(fd1);/* setup epoll work on fd1 */
    {do_something;}
    close(fd1); /* this is not the last close and will NOT close 'somefile' */
    /* It wont be removed from epoll sets NOW */


    close(fd); /* oh yes, this one is the real 'file close', now we perform epoll
    cleanups */

    epoll has to deal with files, but documentation is a User side documentation,
    so has to use 'file descriptors'. So everything that plays with the file
    descriptor table can make the thing complex to understand/document.
    (fork()/dup()/close()/exit()/exec()....)

    example 2)

    int pfd[2];
    pipe(pfd);
    epoll_add_in_my_set(pfd[0]);/* setup epoll work on pfd[0] for example */
    pid = fork();
    if (pid == 0) {


    close(pfd[0]); /* this is not the last close and will NOT close pipe */
    /* epoll has NO WAY to perform some cleanup at this stage */

    close(pfd[1]); /* this not the last close and will NOT close the pipe*/
    _exit(0);
    }
    close(pfd[1]);
    wait(NULL);
    {do_something_epoll_related;}
    close(pfd[0]); /* finally we close the pipe, and epoll can do its cleanup */

    fork() is acting sort of dup() , as it increases all file refcounts.

    You have problems about close()/dup()/fork()/... file descriptors semantic,
    which is handled by a layer independent from epoll stuff.


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: epoll design problems with common fork/exec patterns

    On Sat, Oct 27, 2007 at 12:23:52PM +0200, Eric Dumazet wrote:
    > > Q6 Will the close of an fd cause it to be removed from all epoll
    > > sets automatically?
    > > A6 Yes.

    >
    > Answer : epoll documentation cannot explain the full semantic of file


    epoll documentation easily can. there is nothig keeping it from it. don't
    make silly arguments like that.

    > Or should, since you had problems


    You are again implying I lakc understanding. That is, however, not true.
    I don't see the point in being insulted by you, so I won'T continue
    talking to you

    > The 'close' of a file is not close(fd)


    Good that you understand that.

    That is one of my problems, as the manpage talks about closing of the fd,
    but there are multiple ways to do that, and some are not handled the same
    way.

    > epoll has to deal with files, but documentation is a User side
    > documentation, so has to use 'file descriptors'.


    There is obviously no need for documentation to do that, contrary to your
    claim. The manpages for e.g. dup or the official sus manpages manage to
    document it (mostly) correctly, so your claim that documentation must use
    file descriptors when the underlying file structure is meant is disproven.

    > fork() is acting sort of dup() , as it increases all file refcounts.
    >
    > You have problems about close()/dup()/fork()/... file descriptors semantic,
    > which is handled by a layer independent from epoll stuff.


    No, I have no problem with dup at all.

    I have a problem with explicitlx closing file descriptors in the child will
    stop events for those files to be reported in the parent.

    I am sorry, but I epxlained this very clearly a number of times, but for some
    reason, apart from accusing me to not understanding files and file
    descritpors or (clear enough) documentation, you ignore that and instead
    hammer on other problems.

    To me, it seems you are not the one who understands.

    --
    The choice of a Deliantra, the free code+content MORPG
    -----==- _GNU_ http://www.deliantra.net
    ----==-- _ generation
    ---==---(_)__ __ ____ __ Marc Lehmann
    --==---/ / _ \/ // /\ \/ / pcg@goof.com
    -=====/_/_//_/\_,_/ /_/\_\
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: epoll design problems with common fork/exec patterns

    On Sat, 27 Oct 2007, Marc Lehmann wrote:

    > > Please provide some code to illustrate one exact problem you have.

    >
    > // assume there is an open epoll set that listens for events on fd 5
    > if (fork () = 0)
    > {
    > close (5);
    > // fd 5 is now removed from the epoll set of the parent.
    > _exit (0);
    > }


    Hmmm ... what? I assume you know that:

    1) A file descriptor is a userspace view/handle of a kernel object

    2) The kernel object has a use-count for as many file descriptors that
    have been handed out to userspace

    3) A close() decreases the internal counter by one

    4) The kernel object gets effectively closed when the internal counter
    goes to zero

    5) A fork() acts as a dup() on the file descriptors by hence bumping up
    its internal counter

    6) Epoll removes the file from the set, when the *kernel* object gets
    closed (internal use-count goes to zero)

    With that in mind, how can the code snippet above trigger a removal from
    the epoll set?



    - Davide


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: epoll design problems with common fork/exec patterns

    On Sat, Oct 27, 2007 at 09:59:07AM -0700, Davide Libenzi wrote:
    > On Sat, 27 Oct 2007, Marc Lehmann wrote:
    >
    > > > Please provide some code to illustrate one exact problem you have.

    > >
    > > // assume there is an open epoll set that listens for events on fd 5
    > > if (fork () = 0)
    > > {
    > > close (5);
    > > // fd 5 is now removed from the epoll set of the parent.
    > > _exit (0);
    > > }

    >
    > Hmmm ... what? I assume you know that:
    >
    > 1) A file descriptor is a userspace view/handle of a kernel object
    >
    > 2) The kernel object has a use-count for as many file descriptors that
    > have been handed out to userspace
    >
    > 3) A close() decreases the internal counter by one
    >
    > 4) The kernel object gets effectively closed when the internal counter
    > goes to zero
    >
    > 5) A fork() acts as a dup() on the file descriptors by hence bumping up
    > its internal counter
    >
    > 6) Epoll removes the file from the set, when the *kernel* object gets
    > closed (internal use-count goes to zero)
    >
    > With that in mind, how can the code snippet above trigger a removal from
    > the epoll set?


    Davide,

    from what I understand, Marc is not asking for the code above to remove
    the fd from the epoll set, but he's in fact complaining that he *observed*
    that the fd was removed from the epoll set in the *parent* process when
    the child closes it, which is of course not expected at all. As strange
    as it looks like, this might need investigation. It is possible that there
    is some strange bug somewhere in some kernel versions.

    Marc, I think that if you indicate the last kernel version on which you
    observed this and provide a very short and easy reproducer, it would
    help everyone investigating this. Basically something which reports "OK"
    or "KO".

    Regards,
    Willy

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: epoll design problems with common fork/exec patterns

    On Sat, 27 Oct 2007, Willy Tarreau wrote:

    > On Sat, Oct 27, 2007 at 09:59:07AM -0700, Davide Libenzi wrote:
    > > On Sat, 27 Oct 2007, Marc Lehmann wrote:
    > >
    > > > > Please provide some code to illustrate one exact problem you have.
    > > >
    > > > // assume there is an open epoll set that listens for events on fd 5
    > > > if (fork () = 0)
    > > > {
    > > > close (5);
    > > > // fd 5 is now removed from the epoll set of the parent.
    > > > _exit (0);
    > > > }

    > >
    > > Hmmm ... what? I assume you know that:
    > >
    > > 1) A file descriptor is a userspace view/handle of a kernel object
    > >
    > > 2) The kernel object has a use-count for as many file descriptors that
    > > have been handed out to userspace
    > >
    > > 3) A close() decreases the internal counter by one
    > >
    > > 4) The kernel object gets effectively closed when the internal counter
    > > goes to zero
    > >
    > > 5) A fork() acts as a dup() on the file descriptors by hence bumping up
    > > its internal counter
    > >
    > > 6) Epoll removes the file from the set, when the *kernel* object gets
    > > closed (internal use-count goes to zero)
    > >
    > > With that in mind, how can the code snippet above trigger a removal from
    > > the epoll set?

    >
    > Davide,
    >
    > from what I understand, Marc is not asking for the code above to remove
    > the fd from the epoll set, but he's in fact complaining that he *observed*
    > that the fd was removed from the epoll set in the *parent* process when
    > the child closes it, which is of course not expected at all. As strange
    > as it looks like, this might need investigation. It is possible that there
    > is some strange bug somewhere in some kernel versions.


    That would be *really* strange, since epoll hooks in __fput() in order to
    perform proper cleanup. This means that, in the case above, the file will
    be really closed in the parent too. That, I think, would trigger way more
    serious problems in userspace.



    > Marc, I think that if you indicate the last kernel version on which you
    > observed this and provide a very short and easy reproducer, it would
    > help everyone investigating this. Basically something which reports "OK"
    > or "KO".


    Of course. That'd be great.



    - Davide


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. RE: epoll design problems with common fork/exec patterns


    > 6) Epoll removes the file from the set, when the *kernel* object gets
    > closed (internal use-count goes to zero)
    >
    > With that in mind, how can the code snippet above trigger a removal from
    > the epoll set?


    I don't see how that can be. Suppose I add fd 8 to an epoll set. Suppose fd
    5 is a dup of fd 8. Now, I close fd 8. How can fd 8 remain in my epoll set,
    since there no longer is an fd 8? Events on files registered for epoll
    notification are reported by descriptor, so the set membership has to be
    associated (as reflected into userspace) with the descriptor, not the file.

    For example, consider:

    1) Process creates an epoll set, the set gets fd 4.

    2) Process creates a socket, it gets fd 5.

    3) The process adds fd 5 to set 4.

    4) The process forks.

    5) The child inherits the epoll set but not the socket.

    Here the kernel cannot quite do the right thing. Ideally, the parent would
    still have fd 5 in its version of the epoll set. After all, it has not
    closed fd 5. However, the child *cannot* see fd 5 in its version of the
    epoll set since it has no fd 5. An event reported for fd 5 would be
    nonsense.

    So it seems the kernel either has to break one of these "would/cannot"
    requirements, or it has to split the epoll set in two. However, splitting
    the set into two sets is clearly wrong since the processes should share it.

    Q6 Will the close of an fd cause it to be removed from
    all
    epoll sets automatically?

    A6 Yes.

    Note that this talks of the close of an "fd", not a file. The 'close'
    function in fact closes an fd, as that fd is then reusable. So it sounds
    like the problem above is solved by removing the fd from the set, but in
    practice this doesn't happen. I have programs that call 'close' between
    'fork' and 'exec' and do not see the socket removed from the poll set.

    DS


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: epoll design problems with common fork/exec patterns

    David Schwartz a écrit :
    >> 6) Epoll removes the file from the set, when the *kernel* object gets
    >> closed (internal use-count goes to zero)
    >>
    >> With that in mind, how can the code snippet above trigger a removal from
    >> the epoll set?

    >
    > I don't see how that can be. Suppose I add fd 8 to an epoll set. Suppose fd
    > 5 is a dup of fd 8. Now, I close fd 8. How can fd 8 remain in my epoll set,
    > since there no longer is an fd 8? Events on files registered for epoll
    > notification are reported by descriptor, so the set membership has to be
    > associated (as reflected into userspace) with the descriptor, not the file.


    Events are not necessarly reported "by descriptors". epoll uses an opaque
    field provided by the user.

    It's up to the user to properly chose a tag that will makes sense if the user
    app is playing dup()/close() games for example.

    typedef union epoll_data
    {
    void *ptr;
    int fd;
    uint32_t u32;
    uint64_t u64;
    } epoll_data_t;


    It's true some applications are using 'fd' field from epoll_data_t, but in
    this case they should not play dup()/close() games that could change the
    meaning of their 'epoll tags'. They would better use 'ptr/u64' for example to
    map the event to an application object. In this object they might find the
    correct handle (fd) to communicate with the kernel for a given 'file'. This
    handle could then be remapped to another handle using dup()/fcntl()/close()...


    >
    > For example, consider:
    >
    > 1) Process creates an epoll set, the set gets fd 4.
    >
    > 2) Process creates a socket, it gets fd 5.
    >
    > 3) The process adds fd 5 to set 4.
    >
    > 4) The process forks.
    >
    > 5) The child inherits the epoll set but not the socket.
    >
    > Here the kernel cannot quite do the right thing. Ideally, the parent would
    > still have fd 5 in its version of the epoll set. After all, it has not
    > closed fd 5. However, the child *cannot* see fd 5 in its version of the
    > epoll set since it has no fd 5. An event reported for fd 5 would be
    > nonsense.


    Yes, it would be nonsense that the child still tries to get events from the
    epoll set while he cannot possibly use the socket. If you use 'ptr' field to
    retrieve an object, this object probably would have no meaning in the child
    anyway, especially after an exec() syscall.

    That kind of user error can also happens with select()/poll(), if you do for
    example :

    FD_ZERO(&fdset);
    FD_SET(fd, &fdset);
    select(fd+1,&fdset, NULL, NULL, NULL);
    newfd = dup(fd);
    close(fd);
    for (i = 0 ; i < maxfd ; i++)
    if (FD_ISSET(i, &fdset))
    read(i, ...)


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. RE: epoll design problems with common fork/exec patterns

    On Sat, 27 Oct 2007, David Schwartz wrote:

    > I don't see how that can be. Suppose I add fd 8 to an epoll set. Suppose fd
    > 5 is a dup of fd 8. Now, I close fd 8. How can fd 8 remain in my epoll set,
    > since there no longer is an fd 8? Events on files registered for epoll
    > notification are reported by descriptor, so the set membership has to be
    > associated (as reflected into userspace) with the descriptor, not the file.


    Eric already answered to your question (epoll deals with internal kernel
    objects - aka file*).
    I just want to answer this one for another reason. WTF is wrong with all
    of you Cc-list-trimmers?
    Could you *please* stop trimming Cc-lists?



    - Davide


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. RE: epoll design problems with common fork/exec patterns


    Eric Dumazet wrote:

    > Events are not necessarly reported "by descriptors". epoll uses an opaque
    > field provided by the user.
    >
    > It's up to the user to properly chose a tag that will makes sense
    > if the user
    > app is playing dup()/close() games for example.


    Great. So the only issue then is that the documentation is confusing. It
    frequently uses the term "fd" where it means file. For example, it says:

    Q1 What happens if you add the same fd to an
    epoll_set
    twice?

    A1 You will probably get EEXIST. However, it is
    possible
    that two threads may add the same fd twice. This is
    a
    harmless condition.

    This gives no reason to think there's anything wrong with adding the same
    file twice so long as you do so through different descriptors. (One can
    imagine an application that does this to segregate read and write operations
    to avoid a race where the descriptor is closed from under a writer due to
    handling a fatal read error.) Obviously, that won't work.

    And this part:

    Q6 Will the close of an fd cause it to be removed from
    all
    epoll sets automatically?

    A6 Yes.

    This is incorrect. Closing an fd will not cause it to be removed from all
    epoll sets automatically. Only closing a file will. This is what caused the
    OP's confusion, and it is at best imprecise and, at worst, flat out wrong.

    DS

    PS: It is customary to trim individuals off of CC lists when replying to a
    list when the subject matter of the post is squarely inside the subject of
    the list. If the person CC'd was interested in the list's subject, he or she
    would presumably subscribe to the list. Not everyone wants two copies of
    every post. Not everyone wants a personal copy of every sub-thread that
    results from a post they make. In the past few years, I've received
    approximately an equal number of complaints about trimming CC's on posts to
    LKML and not trimming CC's on such posts.


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. RE: epoll design problems with common fork/exec patterns

    On Sun, 28 Oct 2007, David Schwartz wrote:

    >
    > Eric Dumazet wrote:
    >
    > > Events are not necessarly reported "by descriptors". epoll uses an opaque
    > > field provided by the user.
    > >
    > > It's up to the user to properly chose a tag that will makes sense
    > > if the user
    > > app is playing dup()/close() games for example.

    >
    > Great. So the only issue then is that the documentation is confusing. It
    > frequently uses the term "fd" where it means file. For example, it says:
    >
    > Q1 What happens if you add the same fd to an
    > epoll_set
    > twice?
    >
    > A1 You will probably get EEXIST. However, it is
    > possible
    > that two threads may add the same fd twice. This is
    > a
    > harmless condition.
    >
    > This gives no reason to think there's anything wrong with adding the same
    > file twice so long as you do so through different descriptors. (One can
    > imagine an application that does this to segregate read and write operations
    > to avoid a race where the descriptor is closed from under a writer due to
    > handling a fatal read error.) Obviously, that won't work.


    I agree, that is confusing. However, you can safely add two different file
    descriptors pointing to the same file*, with different event masks, and
    that will work as expected.




    > And this part:
    >
    > Q6 Will the close of an fd cause it to be removed from
    > all
    > epoll sets automatically?
    >
    > A6 Yes.
    >
    > This is incorrect. Closing an fd will not cause it to be removed from all
    > epoll sets automatically. Only closing a file will. This is what caused the
    > OP's confusion, and it is at best imprecise and, at worst, flat out wrong.


    OTOH you cannot list *every* possible scenario in a man page, otherwise
    you end up writing a book instead of a man page. I will try to find some
    time with Michael to refine the man page.



    > PS: It is customary to trim individuals off of CC lists when replying to a
    > list when the subject matter of the post is squarely inside the subject of
    > the list. If the person CC'd was interested in the list's subject, he or she
    > would presumably subscribe to the list. Not everyone wants two copies of
    > every post. Not everyone wants a personal copy of every sub-thread that
    > results from a post they make. In the past few years, I've received
    > approximately an equal number of complaints about trimming CC's on posts to
    > LKML and not trimming CC's on such posts.


    Does anyone that in 2007 still did not manage to find a way to avoid dups
    in hitting his mailbox, deserve any consideration at all?
    OTOH many ppl, like myself, uses To and Cc header to direct email to
    proper folders, where they are treated with a different level of
    attention. And your stripp-all-headers mania screws that up badly.



    - Davide


    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: epoll design problems with common fork/exec patterns

    Willy Tarreau wrote:
    >> On Sat, 27 Oct 2007, Marc Lehmann wrote:
    >>
    >>>> Please provide some code to illustrate one exact problem you have.
    >>> // assume there is an open epoll set that listens for events on fd 5
    >>> if (fork () = 0)
    >>> {
    >>> close (5);
    >>> // fd 5 is now removed from the epoll set of the parent.
    >>> _exit (0);
    >>> }

    ...
    > from what I understand, Marc is not asking for the code above to remove
    > the fd from the epoll set, but he's in fact complaining that he *observed*
    > that the fd was removed from the epoll set in the *parent* process when
    > the child closes it, which is of course not expected at all. As strange
    > as it looks like, this might need investigation. It is possible that there
    > is some strange bug somewhere in some kernel versions.
    >
    > Marc, I think that if you indicate the last kernel version on which you
    > observed this and provide a very short and easy reproducer, it would
    > help everyone investigating this. Basically something which reports "OK"
    > or "KO".


    That's how I read it, too.
    So basically, a program like this, perhaps.
    Except that, here running 2.6.23.1, it works just fine (no removal bug).

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    static int del_from_epoll_set (int efd, int fd, const char *msg)
    {
    struct epoll_event e;

    memset(&e, 0, sizeof(e));
    e.data.fd = fd;
    if (epoll_ctl(efd, EPOLL_CTL_DEL, fd, &e) == -1) {
    int err = errno;
    fprintf(stderr, "epoll_ctl(DEL) failed (%s): %s\n", msg, strerror(err));
    return -1;
    }
    return 0;
    }

    static int add_to_epoll_set (int efd, int fd, __uint32_t events, const char *msg)
    {
    struct epoll_event e;

    memset(&e, 0, sizeof(e));
    e.events = events;
    e.data.fd = fd;
    if (epoll_ctl(efd, EPOLL_CTL_ADD, fd, &e) == -1) {
    int err = errno;
    fprintf(stderr, "epoll_ctl(ADD) failed (%s): %s\n", msg, strerror(err));
    return -1;
    }
    return 0;
    }

    int main (int argc, char **argv)
    {
    int efd, sd, fds[2];
    pid_t cpid;

    if (pipe(fds) == -1) {
    perror("pipe()");
    exit(1);
    }
    sd = socket(PF_INET, SOCK_STREAM, 0);
    if (sd == -1) {
    perror("socket");
    exit(1);
    }

    efd = epoll_create(5);
    if (efd == -1) {
    perror("epoll_create");
    exit(1);
    }

    if (add_to_epoll_set(efd, fileno(stdin), EPOLLIN, "stdin"))
    exit(1);
    if (add_to_epoll_set(efd, fileno(stdout), EPOLLOUT, "stdout"))
    exit(1);
    if (add_to_epoll_set(efd, fds[0], EPOLLIN, "pipe_read"))
    exit(1);
    if (add_to_epoll_set(efd, fds[1], EPOLLOUT, "pipe_write"))
    exit(1);
    if (add_to_epoll_set(efd, sd, EPOLLIN|EPOLLOUT, "socket"))
    exit(1);

    // assume there is an open epoll set that listens for events on fd 5
    cpid = fork();
    if (cpid == 0) {
    close(fileno(stdin));
    close(fileno(stdout));
    close(fds[0]);
    close(fds[1]);
    close(sd);
    exit(0);
    }
    waitpid(cpid, NULL, 0);

    // now test whether the fd's are still in the epoll set:
    add_to_epoll_set(efd, sd, EPOLLIN|EPOLLOUT, "sd");
    add_to_epoll_set(efd, fds[0], EPOLLIN, "fds[0]");
    add_to_epoll_set(efd, fds[1], EPOLLOUT, "fds[1]");
    add_to_epoll_set(efd, fileno(stdin), EPOLLIN, "fileno(stdin)");
    add_to_epoll_set(efd, fileno(stdout), EPOLLOUT, "fileno(stdout)");

    del_from_epoll_set(efd, sd, "sd");
    del_from_epoll_set(efd, fds[0], "fds[0]");
    del_from_epoll_set(efd, fds[1], "fds[1]");
    del_from_epoll_set(efd, fileno(stdin), "fileno(stdin)");
    del_from_epoll_set(efd, fileno(stdout), "fileno(stdout)");

    printf("Done.\n");
    exit(0);
    }

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: epoll design problems with common fork/exec patterns



    Davide Libenzi wrote:
    > On Sun, 28 Oct 2007, David Schwartz wrote:
    >
    >> Eric Dumazet wrote:
    >>
    >>> Events are not necessarly reported "by descriptors". epoll uses an opaque
    >>> field provided by the user.
    >>>
    >>> It's up to the user to properly chose a tag that will makes sense
    >>> if the user
    >>> app is playing dup()/close() games for example.

    >> Great. So the only issue then is that the documentation is confusing. It
    >> frequently uses the term "fd" where it means file. For example, it says:
    >>
    >> Q1 What happens if you add the same fd to an
    >> epoll_set
    >> twice?
    >>
    >> A1 You will probably get EEXIST. However, it is
    >> possible
    >> that two threads may add the same fd twice. This is
    >> a
    >> harmless condition.
    >>
    >> This gives no reason to think there's anything wrong with adding the same
    >> file twice so long as you do so through different descriptors. (One can
    >> imagine an application that does this to segregate read and write operations
    >> to avoid a race where the descriptor is closed from under a writer due to
    >> handling a fatal read error.) Obviously, that won't work.

    >
    > I agree, that is confusing. However, you can safely add two different file
    > descriptors pointing to the same file*, with different event masks, and
    > that will work as expected.


    So can I summarize what I understand:

    a) Adding the same file descriptor twice to an epoll set will cause an
    error (EEXIST).

    b) In a separate message to linux-man, Chris Heath says that two threads
    *can't* add the same fd twice to an epoll set, despite what the existing
    man page text says. I haven't tested that, but it sounds to me as though
    it is likely to be true. Can you comment please Davide?

    c) It is possible to add duplicated file descriptors referring to the same
    underlying open file description ("file *"). As you note, this can be a
    useful filtering technique, if the two file descriptors specify different
    masks.

    Assuming that is all correct, for man-pages-2.79, I've reworked the text
    for Q1/A1 as follows:

    Q1 What happens if you add the same file descriptor
    to an epoll set twice?

    A1 You will probably get EEXIST. However, it is pos-
    sible to add a duplicate (dup(2), dup2(2),
    fcntl(2) F_DUPFD, fork(2)) descriptor to the same
    epoll set. This can be a useful technique for
    filtering events, if the duplicate file descrip-
    tors are registered with different events masks.

    Seem okay Davide?

    Cheers,

    Michael

    PS I've trimmed the part of this thread about Q6/A6, since I dealt with
    that in another thread ("epoll and shared fd's").

    --
    Michael Kerrisk
    Maintainer of the Linux man-pages project
    http://www.kernel.org/doc/man-pages/
    Want to report a man-pages bug? Look here:
    http://www.kernel.org/doc/man-pages/reporting_bugs.html

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: epoll design problems with common fork/exec patterns

    On Tue, 26 Feb 2008, Michael Kerrisk wrote:

    > Davide Libenzi wrote:
    > > On Sun, 28 Oct 2007, David Schwartz wrote:
    > >
    > >> Eric Dumazet wrote:
    > >>
    > >>> Events are not necessarly reported "by descriptors". epoll uses an opaque
    > >>> field provided by the user.
    > >>>
    > >>> It's up to the user to properly chose a tag that will makes sense
    > >>> if the user
    > >>> app is playing dup()/close() games for example.
    > >> Great. So the only issue then is that the documentation is confusing. It
    > >> frequently uses the term "fd" where it means file. For example, it says:
    > >>
    > >> Q1 What happens if you add the same fd to an
    > >> epoll_set
    > >> twice?
    > >>
    > >> A1 You will probably get EEXIST. However, it is
    > >> possible
    > >> that two threads may add the same fd twice. This is
    > >> a
    > >> harmless condition.
    > >>
    > >> This gives no reason to think there's anything wrong with adding the same
    > >> file twice so long as you do so through different descriptors. (One can
    > >> imagine an application that does this to segregate read and write operations
    > >> to avoid a race where the descriptor is closed from under a writer due to
    > >> handling a fatal read error.) Obviously, that won't work.

    > >
    > > I agree, that is confusing. However, you can safely add two different file
    > > descriptors pointing to the same file*, with different event masks, and
    > > that will work as expected.

    >
    > So can I summarize what I understand:
    >
    > a) Adding the same file descriptor twice to an epoll set will cause an
    > error (EEXIST).


    Yes.



    > b) In a separate message to linux-man, Chris Heath says that two threads
    > *can't* add the same fd twice to an epoll set, despite what the existing
    > man page text says. I haven't tested that, but it sounds to me as though
    > it is likely to be true. Can you comment please Davide?


    Yes, you can't add the same fd twice. Think about a DB where "file*,fd" is
    the key.



    > c) It is possible to add duplicated file descriptors referring to the same
    > underlying open file description ("file *"). As you note, this can be a
    > useful filtering technique, if the two file descriptors specify different
    > masks.
    >
    > Assuming that is all correct, for man-pages-2.79, I've reworked the text
    > for Q1/A1 as follows:
    >
    > Q1 What happens if you add the same file descriptor
    > to an epoll set twice?
    >
    > A1 You will probably get EEXIST. However, it is pos-
    > sible to add a duplicate (dup(2), dup2(2),
    > fcntl(2) F_DUPFD, fork(2)) descriptor to the same
    > epoll set. This can be a useful technique for
    > filtering events, if the duplicate file descrip-
    > tors are registered with different events masks.
    >
    > Seem okay Davide?


    Looks sane to me.



    - Davide


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: epoll design problems with common fork/exec patterns

    On Tue, 2008-02-26 at 10:51 -0800, Davide Libenzi wrote:
    > On Tue, 26 Feb 2008, Michael Kerrisk wrote:
    >
    > > Davide Libenzi wrote:
    > > > On Sun, 28 Oct 2007, David Schwartz wrote:
    > > >
    > > >> Eric Dumazet wrote:
    > > >>
    > > >>> Events are not necessarly reported "by descriptors". epoll uses an opaque
    > > >>> field provided by the user.
    > > >>>
    > > >>> It's up to the user to properly chose a tag that will makes sense
    > > >>> if the user
    > > >>> app is playing dup()/close() games for example.
    > > >> Great. So the only issue then is that the documentation is confusing. It
    > > >> frequently uses the term "fd" where it means file. For example, it says:
    > > >>
    > > >> Q1 What happens if you add the same fd to an
    > > >> epoll_set
    > > >> twice?
    > > >>
    > > >> A1 You will probably get EEXIST. However, it is
    > > >> possible
    > > >> that two threads may add the same fd twice. This is
    > > >> a
    > > >> harmless condition.
    > > >>
    > > >> This gives no reason to think there's anything wrong with adding the same
    > > >> file twice so long as you do so through different descriptors. (One can
    > > >> imagine an application that does this to segregate read and write operations
    > > >> to avoid a race where the descriptor is closed from under a writer due to
    > > >> handling a fatal read error.) Obviously, that won't work.
    > > >
    > > > I agree, that is confusing. However, you can safely add two different file
    > > > descriptors pointing to the same file*, with different event masks, and
    > > > that will work as expected.

    > >
    > > So can I summarize what I understand:
    > >
    > > a) Adding the same file descriptor twice to an epoll set will cause an
    > > error (EEXIST).

    >
    > Yes.
    >
    >
    >
    > > b) In a separate message to linux-man, Chris Heath says that two threads
    > > *can't* add the same fd twice to an epoll set, despite what the existing
    > > man page text says. I haven't tested that, but it sounds to me as though
    > > it is likely to be true. Can you comment please Davide?

    >
    > Yes, you can't add the same fd twice. Think about a DB where "file*,fd" is
    > the key.


    To clarify, the key appears to be file* plus the user-space integer that
    represents the fd.


    > > c) It is possible to add duplicated file descriptors referring to the same
    > > underlying open file description ("file *"). As you note, this can be a
    > > useful filtering technique, if the two file descriptors specify different
    > > masks.
    > >
    > > Assuming that is all correct, for man-pages-2.79, I've reworked the text
    > > for Q1/A1 as follows:
    > >
    > > Q1 What happens if you add the same file descriptor
    > > to an epoll set twice?
    > >
    > > A1 You will probably get EEXIST. However, it is pos-
    > > sible to add a duplicate (dup(2), dup2(2),
    > > fcntl(2) F_DUPFD, fork(2)) descriptor to the same
    > > epoll set. This can be a useful technique for
    > > filtering events, if the duplicate file descrip-
    > > tors are registered with different events masks.
    > >
    > > Seem okay Davide?

    >
    > Looks sane to me.


    I think fork(2) should not be in the above list. fork(2) duplicates the
    kernel's fd, but the user-space integer that represents the fd remains
    the same, so you will get EEXIST if you try to add the fd that was
    duplicated by fork.

    Chris

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: epoll design problems with common fork/exec patterns

    On Tue, 26 Feb 2008, Chris "ã~B¯" Heath wrote:

    > On Tue, 2008-02-26 at 10:51 -0800, Davide Libenzi wrote:
    > >
    > > Yes, you can't add the same fd twice. Think about a DB where "file*,fd"is
    > > the key.

    >
    > To clarify, the key appears to be file* plus the user-space integer that
    > represents the fd.


    Yes, that's what I said.



    > > > c) It is possible to add duplicated file descriptors referring to thesame
    > > > underlying open file description ("file *"). As you note, this can be a
    > > > useful filtering technique, if the two file descriptors specify different
    > > > masks.
    > > >
    > > > Assuming that is all correct, for man-pages-2.79, I've reworked the text
    > > > for Q1/A1 as follows:
    > > >
    > > > Q1 What happens if you add the same file descriptor
    > > > to an epoll set twice?
    > > >
    > > > A1 You will probably get EEXIST. However, it is pos-
    > > > sible to add a duplicate (dup(2), dup2(2),
    > > > fcntl(2) F_DUPFD, fork(2)) descriptor to the same
    > > > epoll set. This can be a useful technique for
    > > > filtering events, if the duplicate file descrip-
    > > > tors are registered with different events masks.
    > > >
    > > > Seem okay Davide?

    > >
    > > Looks sane to me.

    >
    > I think fork(2) should not be in the above list. fork(2) duplicates the
    > kernel's fd, but the user-space integer that represents the fd remains
    > the same, so you will get EEXIST if you try to add the fd that was
    > duplicated by fork.


    Good catch, fork(2) should not be there.



    - Davide



+ Reply to Thread
Page 1 of 2 1 2 LastLast