epoll behaviour after running out of descriptors - Kernel

This is a discussion on epoll behaviour after running out of descriptors - Kernel ; Hi, I noticed some strange behaviour of epoll after running out of descriptors. I've registered a listen socket to epoll with edge triggering. On the client-side I use an app that simply keeps opening connections. When accept returns EMFILE, I ...

+ Reply to Thread
Results 1 to 15 of 15

Thread: epoll behaviour after running out of descriptors

  1. epoll behaviour after running out of descriptors

    Hi,

    I noticed some strange behaviour of epoll after running out of descriptors.
    I've registered a listen socket to epoll with edge triggering. On the
    client-side I use an app that simply keeps opening connections.
    When accept returns EMFILE, I call epoll_wait and accept and it
    returns with another EMFILE.
    This happens 10 times or so, after that epoll_wait no longer returns
    with the listen socket ready.
    I then close all file descriptors, but epoll_wait will still not return.
    So my question is, why does it 'only' happen 10 times and what is the
    expected behaviour?
    And how should an app handle this?

    The example in the epoll man page doesn't seem to handle this.

    An idea I had was for epoll_wait to only return with accept / EMFILE
    once. Then after a descriptor becomes available, epoll_wait would
    return again.

    See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502901

    Hi,

    I've written a web app that should be able to handle a lot of new
    connections per second (1000+). On multiple servers I've hit a bug.
    After running out of descriptors, then closing descriptors, epoll_wait
    doesn't return anymore for the listen socket.
    I've attached code to reproduce the issue. And an strace log. Even
    before closing the descriptors you see epoll_wait already stops returning.

    On the other side, I used a self-written app that just opens tons of
    connections. Is there a standard utility to do that?

    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    using namespace std;

    int main()
    {
    int l = socket(AF_INET, SOCK_STREAM, 0);
    unsigned long p = true;
    ioctl(l, FIONBIO, &p);
    sockaddr_in a = {0};
    a.sin_family = AF_INET;
    a.sin_addr.s_addr = INADDR_ANY;
    a.sin_port = htons(2710);
    bind(l, reinterpret_cast(&a), sizeof(sockaddr_in));
    listen(l, SOMAXCONN);
    int fd = epoll_create(1 << 10);
    epoll_event e;
    e.data.fd = l;
    e.events = EPOLLIN | EPOLLOUT | EPOLLPRI | EPOLLERR | EPOLLHUP
    | EPOLLET;
    epoll_ctl(fd, EPOLL_CTL_ADD, l, &e);
    const int c_events = 64;
    epoll_event events[c_events];
    typedef vector sockets_t;
    sockets_t sockets;
    time_t t = time(NULL);
    while (1)
    {
    int r = epoll_wait(fd, events, c_events, 5000);
    if (r == -1)
    continue;
    if (!r && time(NULL) - t > 30)
    {
    for (int i = 0; i < sockets.size(); i++)
    close(sockets[i]);
    sockets.clear();
    t = INT_MAX;
    }
    for (int i = 0; i < r; i++)
    {
    if (events[i].data.fd == l)
    {
    while (1)
    {
    int s = accept(l, NULL, NULL);
    if (s == -1)
    {
    if (errno == EAGAIN)
    break;
    break; // continue;
    }
    sockets.push_back(s);
    }
    }
    else
    assert(false);
    }
    }
    return 0;
    }

    socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
    ioctl(3, FIONBIO, [1]) = 0
    bind(3, {sa_family=AF_INET, sin_port=htons(2710),
    sin_addr=inet_addr("0.0.0.0")}, 16) = 0
    listen(3, 128) = 0
    epoll_create(1024) = 4
    epoll_ctl(4, EPOLL_CTL_ADD, 3,
    {EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLL ET, {u32=3,
    u64=13806959039201935363}}) = 0
    time(NULL) = 1224527442
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527447
    epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
    accept(3, 0, NULL) = 5
    brk(0) = 0x804c000
    brk(0x806d000) = 0x806d000
    accept(3, 0, NULL) = 6
    accept(3, 0, NULL) = 7
    accept(3, 0, NULL) = 8
    accept(3, 0, NULL) = -1 EAGAIN (Resource
    temporarily unavailable)
    epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
    accept(3, 0, NULL) = 9
    ....
    accept(3, 0, NULL) = 85
    accept(3, 0, NULL) = -1 EAGAIN (Resource
    temporarily unavailable)
    epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
    accept(3, 0, NULL) = 86
    ....
    accept(3, 0, NULL) = 1023
    accept(3, 0, NULL) = -1 EMFILE (Too many open files)
    epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
    accept(3, 0, NULL) = -1 EMFILE (Too many open files)
    epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
    ....
    epoll_wait(4, {{EPOLLIN, {u32=3, u64=13806959039201935363}}}, 64, 5000) = 1
    accept(3, 0, NULL) = -1 EMFILE (Too many open files)
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527454
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527459
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527464
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527469
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527474
    close(5) = 0
    ....
    close(1023) = 0
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527479
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527484
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527489
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527494
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527499
    epoll_wait(4, {}, 64, 5000) = 0
    time(NULL) = 1224527504

    -- Package-specific info:
    ** Version:
    Linux version 2.6.24-etchnhalf.1-686 (Debian 2.6.24-6~etchnhalf.5)
    (dannf@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian
    4.1.1-21)) #1 SMP Mon Sep 8 06:19:11 UTC 2008

    ** Command line:
    root=/dev/sda1 ro

    ** Not tainted
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: epoll behaviour after running out of descriptors

    On Sat, 1 Nov 2008, Olaf van der Spek wrote:

    > Hi,
    >
    > I noticed some strange behaviour of epoll after running out of descriptors.
    > I've registered a listen socket to epoll with edge triggering. On the
    > client-side I use an app that simply keeps opening connections.
    > When accept returns EMFILE, I call epoll_wait and accept and it
    > returns with another EMFILE.
    > This happens 10 times or so, after that epoll_wait no longer returns
    > with the listen socket ready.
    > I then close all file descriptors, but epoll_wait will still not return.
    > So my question is, why does it 'only' happen 10 times and what is the
    > expected behaviour?
    > And how should an app handle this?
    >
    > The example in the epoll man page doesn't seem to handle this.
    >
    > An idea I had was for epoll_wait to only return with accept / EMFILE
    > once. Then after a descriptor becomes available, epoll_wait would
    > return again.
    >
    > See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=502901
    >
    > Hi,
    >
    > I've written a web app that should be able to handle a lot of new
    > connections per second (1000+). On multiple servers I've hit a bug.
    > After running out of descriptors, then closing descriptors, epoll_wait
    > doesn't return anymore for the listen socket.
    > I've attached code to reproduce the issue. And an strace log. Even
    > before closing the descriptors you see epoll_wait already stops returning.


    A bug? For starters, epoll_wait does NOT create new files, so no EMFILE
    can come out from there.
    You are saturating the port space, and your whole code logic is rather (at
    least) buggy. Try a `netstat -n -t | grep TIME_WAIT | wc -l`



    - Davide


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: epoll behaviour after running out of descriptors

    On Sun, Nov 2, 2008 at 7:25 PM, Davide Libenzi wrote:
    > A bug? For starters, epoll_wait does NOT create new files, so no EMFILE
    > can come out from there.


    It's accept that returns EMFILE.

    > You are saturating the port space, and your whole code logic is rather (at
    > least) buggy. Try a `netstat -n -t | grep TIME_WAIT | wc -l`


    What makes you think I'm saturating the port space?
    That space is way bigger than 1 k AFAIK.

    EMFILE The per-process limit of open file descriptors has been reached.

    And what part of my code logic is buggy?

    Olaf
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: epoll behaviour after running out of descriptors

    On Sun, Nov 2, 2008 at 7:48 PM, Davide Libenzi wrote:
    > Why don't you grep for TIME_WAIT?


    Because I don't have access to the test environment at the moment.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: epoll behaviour after running out of descriptors

    On Sun, 2 Nov 2008, Olaf van der Spek wrote:

    > On Sun, Nov 2, 2008 at 7:25 PM, Davide Libenzi wrote:
    > > A bug? For starters, epoll_wait does NOT create new files, so no EMFILE
    > > can come out from there.

    >
    > It's accept that returns EMFILE.
    >
    > > You are saturating the port space, and your whole code logic is rather (at
    > > least) buggy. Try a `netstat -n -t | grep TIME_WAIT | wc -l`

    >
    > What makes you think I'm saturating the port space?
    > That space is way bigger than 1 k AFAIK.


    Why don't you grep for TIME_WAIT?



    - Davide


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: epoll behaviour after running out of descriptors

    On Sun, 2 Nov 2008, Olaf van der Spek wrote:

    > On Sun, Nov 2, 2008 at 7:48 PM, Davide Libenzi wrote:
    > > Why don't you grep for TIME_WAIT?

    >
    > Because I don't have access to the test environment at the moment.


    Here:

    http://tinyurl.com/5ay86v



    - Davide


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: epoll behaviour after running out of descriptors

    Olaf van der Spek a écrit :
    > On Sun, Nov 2, 2008 at 7:48 PM, Davide Libenzi wrote:
    >> Why don't you grep for TIME_WAIT?

    >
    > Because I don't have access to the test environment at the moment.


    Hello Olaf

    If your application calls accept() and accept() returns EMFILE, its a nullop.

    On listen queue, socket is still ready for an accept().


    Since you use edge trigered epoll, you'll only reveive new notification.

    You probably had in you app a : listen(sock, 10), so after 10 notifications,
    your listen queue is full and TCP stack refuses to handle new connections.

    In order to cope with this kind of thing the trick I personnally use is to always keep
    around a *free* fd, that is :

    At start of program, reserve an "emergency fd"
    free_fd = open("/dev/null", O_RDONLY)

    Then later :

    newfd = accept(...)
    if (newfd == -1 && errno == EMFILE) {
    /* emergency action : clean listen queue */
    close(free_fd);
    newfd = accept(...);
    close(newfd); /* forget this incoming connection, we dont have enough fd */
    free_fd = open("/dev/null"; O_RDONLY);
    }

    Of course, if your application is multi-threaded, you might adapt (and eventually reserve
    one emergency fd per thread)


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: epoll behaviour after running out of descriptors

    On Sun, Nov 2, 2008 at 8:17 PM, Davide Libenzi wrote:
    > On Sun, 2 Nov 2008, Olaf van der Spek wrote:
    >
    >> On Sun, Nov 2, 2008 at 7:48 PM, Davide Libenzi wrote:
    >> > Why don't you grep for TIME_WAIT?

    >>
    >> Because I don't have access to the test environment at the moment.

    >
    > Here:
    >
    > http://tinyurl.com/5ay86v


    I know what TIME_WAIT is. I just think it's not applicable to this situation.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: epoll behaviour after running out of descriptors

    On Sun, 2 Nov 2008, Olaf van der Spek wrote:

    > On Sun, Nov 2, 2008 at 8:17 PM, Davide Libenzi wrote:
    > > On Sun, 2 Nov 2008, Olaf van der Spek wrote:
    > >
    > >> On Sun, Nov 2, 2008 at 7:48 PM, Davide Libenzi wrote:
    > >> > Why don't you grep for TIME_WAIT?
    > >>
    > >> Because I don't have access to the test environment at the moment.

    > >
    > > Here:
    > >
    > > http://tinyurl.com/5ay86v

    >
    > I know what TIME_WAIT is. I just think it's not applicable to this situation.


    It is. You are saturating the port space, so no new POLLIN/accept events
    are sent (until some TIME_WAIT clears), so epoll_wait() returns nothing
    (or does not return, if INF timeo).
    Keeping only 1K (if this is what you meant with your *only* 1K)
    connections *alive*, does not mean the trail that does moving 1K
    connections leave, is free.
    If you ever played with things like httperf, you should know what I'm
    talking about.



    - Davide


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: epoll behaviour after running out of descriptors

    On Sun, Nov 2, 2008 at 8:10 PM, Eric Dumazet wrote:
    > On listen queue, socket is still ready for an accept().


    True, but not handy.

    > Since you use edge trigered epoll, you'll only reveive new notification.


    The strace shows I receive 10+.
    If a return with EMFILE is indeed a no-op, I should receive only one.

    > You probably had in you app a : listen(sock, 10), so after 10 notifications,
    > your listen queue is full and TCP stack refuses to handle new connections.


    I've got listen(l, SOMAXCONN);
    IIRC SOMAXCONN is 128.

    > close(newfd); /* forget this incoming connection, we dont have enough fd */


    Why not keep them in the queue until you do have enough descriptors?

    > Of course, if your application is multi-threaded, you might adapt (and
    > eventually reserve
    > one emergency fd per thread)


    Sounds like a great recipe for race conditions.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: epoll behaviour after running out of descriptors

    On Sun, Nov 2, 2008 at 8:27 PM, Davide Libenzi wrote:
    >> I know what TIME_WAIT is. I just think it's not applicable to this situation.

    >
    > It is. You are saturating the port space, so no new POLLIN/accept events
    > are sent (until some TIME_WAIT clears), so epoll_wait() returns nothing
    > (or does not return, if INF timeo).
    > Keeping only 1K (if this is what you meant with your *only* 1K)
    > connections *alive*, does not mean the trail that does moving 1K
    > connections leave, is free.
    > If you ever played with things like httperf, you should know what I'm
    > talking about.


    Wouldn't the port space require about 20+ k connects? This issue
    happens after 1 k.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: epoll behaviour after running out of descriptors

    On Sun, 2 Nov 2008, Olaf van der Spek wrote:

    > On Sun, Nov 2, 2008 at 8:27 PM, Davide Libenzi wrote:
    > >> I know what TIME_WAIT is. I just think it's not applicable to this situation.

    > >
    > > It is. You are saturating the port space, so no new POLLIN/accept events
    > > are sent (until some TIME_WAIT clears), so epoll_wait() returns nothing
    > > (or does not return, if INF timeo).
    > > Keeping only 1K (if this is what you meant with your *only* 1K)
    > > connections *alive*, does not mean the trail that does moving 1K
    > > connections leave, is free.
    > > If you ever played with things like httperf, you should know what I'm
    > > talking about.

    >
    > Wouldn't the port space require about 20+ k connects? This issue
    > happens after 1 k.


    The reason for "When accept returns EMFILE, I call epoll_wait and accept
    and it returns with another EMFILE." is because your sockets-close logic
    is broken. You get an event for the listening fd, you go call accept(2)
    and in one or two passes you fill up the avail fd space, then you go back
    calling epoll_wait(), and yet back to accept(2). This w/out triggering the
    file-close-relief code (yes, you fill up 1K fds *before* 30 seconds). Of
    course you get another EMFILE. When after a little while the close-loop
    triggers, likely the client quit trying, or the kernel accept backlog is
    full and no new events (remember, you chose ET) are triggered.
    EMFILE is not EAGAIN, and it means that the fd can still have something
    for you. Going back to sleep with (EMFILE && ET) is bad mojo.
    This is more food for linux-userspace than linux-kernel though.



    - Davide


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: epoll behaviour after running out of descriptors

    On Sun, Nov 2, 2008 at 10:17 PM, Davide Libenzi wrote:
    >> Wouldn't the port space require about 20+ k connects? This issue
    >> happens after 1 k.

    >
    > The reason for "When accept returns EMFILE, I call epoll_wait and accept
    > and it returns with another EMFILE." is because your sockets-close logic
    > is broken.


    It's not broken, it's designed that way. It's designed to hit the
    descriptor limit and then close all sockets some time after.

    > You get an event for the listening fd, you go call accept(2)
    > and in one or two passes you fill up the avail fd space, then you go back
    > calling epoll_wait(), and yet back to accept(2). This w/out triggering the
    > file-close-relief code (yes, you fill up 1K fds *before* 30 seconds). Of
    > course you get another EMFILE.


    The second EMFILE doesn't make sense, epoll_wait shouldn't signal the
    socket as ready again, right?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: epoll behaviour after running out of descriptors

    On Sun, 2 Nov 2008, Olaf van der Spek wrote:

    > On Sun, Nov 2, 2008 at 10:17 PM, Davide Libenzi wrote:
    > >> Wouldn't the port space require about 20+ k connects? This issue
    > >> happens after 1 k.

    > >
    > > The reason for "When accept returns EMFILE, I call epoll_wait and accept
    > > and it returns with another EMFILE." is because your sockets-close logic
    > > is broken.

    >
    > It's not broken, it's designed that way. It's designed to hit the
    > descriptor limit and then close all sockets some time after.
    >
    > > You get an event for the listening fd, you go call accept(2)
    > > and in one or two passes you fill up the avail fd space, then you go back
    > > calling epoll_wait(), and yet back to accept(2). This w/out triggering the
    > > file-close-relief code (yes, you fill up 1K fds *before* 30 seconds). Of
    > > course you get another EMFILE.

    >
    > The second EMFILE doesn't make sense, epoll_wait shouldn't signal the
    > socket as ready again, right?


    At the time of the first EMFILE, you've filled up the fd space, but not
    the kernel listen backlog. Additions to the backlog, triggers new events,
    that you see after the first EMFILE. At a given point, the backlog is
    full, so no new half connections are dropped in there, so no new events
    are generated.
    Again, sleeping on (EMFILE && ET) is bad mojo, and nowhere is written that
    events should be generated in the EMFILE->no-EMFILE transitions.



    - Davide


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: epoll behaviour after running out of descriptors

    On Sun, Nov 2, 2008 at 11:49 PM, Davide Libenzi wrote:
    > At the time of the first EMFILE, you've filled up the fd space, but not
    > the kernel listen backlog. Additions to the backlog, triggers new events,


    Shouldn't ET only fire again *after* you drained the queue? When
    accept returns EMFILE, you did not drain the queue.

    > that you see after the first EMFILE. At a given point, the backlog is
    > full, so no new half connections are dropped in there, so no new events
    > are generated.


    The backlog is 128 entries though, I don't see that many EMFILEs.

    > Again, sleeping on (EMFILE && ET) is bad mojo,


    It's not always best to free up descriptors right away.

    > and nowhere is written that
    > events should be generated in the EMFILE->no-EMFILE transitions.


    That's true, but I'm saying that this might be handy to have.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread