resource leak - FreeBSD

This is a discussion on resource leak - FreeBSD ; Hello List, I am running into a strange problem that points to a resource leak. The problem manifests itself after one of our remote systems has been up around 100 days. The symptom is that it appears no new processes ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: resource leak

  1. resource leak

    Hello List,

    I am running into a strange problem that points to a resource leak. The problem
    manifests itself after one of our remote systems has been up around 100 days.
    The symptom is that it appears no new processes can be spawned. If I try to
    ssh to the unit, I can see the 3-way tcp handshake and then no more traffic.
    Examining log files, like cron, etc show that when this happens no more entries
    are written into the cron log. The unit is acting as a firewall, router and vpn
    appliance these functions continue to work. We have a C application that is
    periodically started out of a shell script that reports various information
    about the system, it stops reporting, while vpns, ospf routing, and ipfilter
    firewalling continue to work and write into their logfiles.

    My question is how do I monitor the various resources in the system that could
    prevent the spawning of a new process?

    This is on FreeBSD 6.1, ipsec-tools-6.6, quagga-0.99.3

    Any ideas or directions would be greatly appreciated.


    Thanks,
    Steve
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  2. Re: resource leak

    Jeremy Chadwick wrote:
    > On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote:
    >> Hello List,
    >>
    >> I am running into a strange problem that points to a resource leak. The
    >> problem manifests itself after one of our remote systems has been up
    >> around 100 days.
    >> The symptom is that it appears no new processes can be spawned. If I try to
    >> ssh to the unit, I can see the 3-way tcp handshake and then no more traffic.
    >> Examining log files, like cron, etc show that when this happens no more entries
    >> are written into the cron log. The unit is acting as a firewall, router
    >> and vpn appliance these functions continue to work. We have a C
    >> application that is periodically started out of a shell script that
    >> reports various information about the system, it stops reporting, while
    >> vpns, ospf routing, and ipfilter firewalling continue to work and write
    >> into their logfiles.
    >>
    >> My question is how do I monitor the various resources in the system that could
    >> prevent the spawning of a new process?

    >
    > Periodically logging "ps -auxw" output to a file would be useful, as
    > ideally you'd gradually see the list get longer and longer over time;
    > it's possible you have many zombie processes as a result of a parent
    > which is not reaping its children (calling waitpid(2) or its friends).
    >
    > Other things that might come in useful are "fstat" and "vmstat -s".
    >
    > It sounds like your C program relies heavily on system() or execl() and
    > fork(), which is why it's affected -- while the other programs are
    > likely kernel-level.
    >

    Thanks Jeremy,

    I have added those commands to a periodic daily script.

    Another thing I have noticed is that quite often the problem seems to
    start at 2am in the morning, right when the periodic daily script runs.

    But I think it is coincidence and that we have reached the edge of the resource
    limit and all the jobs that get spawned by the periodic daily scripts pushes us
    over the limit.

    The other thing is that having logged into some of the systems that have been up
    in the 80 day range, I don't see a lot/any zombies. I just wonder if it is and
    fd leak, the fstat should point that out.

    Steve

    --

    "They that give up essential liberty to obtain temporary safety,
    deserve neither liberty nor safety." (Ben Franklin)

    "The course of history shows that as a government grows, liberty
    decreases." (Thomas Jefferson)


    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  3. Re: resource leak

    On Wed, Oct 01, 2008 at 08:30:26AM -0400, Stephen Clark wrote:
    > Jeremy Chadwick wrote:
    >> On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote:
    >>> Hello List,
    >>>
    >>> I am running into a strange problem that points to a resource leak.
    >>> The problem manifests itself after one of our remote systems has been
    >>> up around 100 days.
    >>> The symptom is that it appears no new processes can be spawned. If I try to
    >>> ssh to the unit, I can see the 3-way tcp handshake and then no more traffic.
    >>> Examining log files, like cron, etc show that when this happens no more entries
    >>> are written into the cron log. The unit is acting as a firewall,
    >>> router and vpn appliance these functions continue to work. We have a
    >>> C application that is periodically started out of a shell script that
    >>> reports various information about the system, it stops reporting,
    >>> while vpns, ospf routing, and ipfilter firewalling continue to work
    >>> and write into their logfiles.
    >>>
    >>> My question is how do I monitor the various resources in the system that could
    >>> prevent the spawning of a new process?

    >>
    >> Periodically logging "ps -auxw" output to a file would be useful, as
    >> ideally you'd gradually see the list get longer and longer over time;
    >> it's possible you have many zombie processes as a result of a parent
    >> which is not reaping its children (calling waitpid(2) or its friends).
    >>
    >> Other things that might come in useful are "fstat" and "vmstat -s".
    >>
    >> It sounds like your C program relies heavily on system() or execl() and
    >> fork(), which is why it's affected -- while the other programs are
    >> likely kernel-level.
    >>

    > Thanks Jeremy,
    >
    > I have added those commands to a periodic daily script.
    >
    > Another thing I have noticed is that quite often the problem seems to
    > start at 2am in the morning, right when the periodic daily script runs.
    >
    > But I think it is coincidence and that we have reached the edge of the
    > resource limit and all the jobs that get spawned by the periodic daily
    > scripts pushes us over the limit.
    >
    > The other thing is that having logged into some of the systems that have
    > been up in the 80 day range, I don't see a lot/any zombies. I just wonder
    > if it is and fd leak, the fstat should point that out.


    You might find the below thread beneficial -- an individual came to the
    lists stating that they were running out of fds as a result of some
    Java software running amok on their systems.

    http://lists.freebsd.org/pipermail/f...ead.html#45383
    http://lists.freebsd.org/pipermail/f...er/045383.html

    --
    | Jeremy Chadwick jdc at parodius.com |
    | Parodius Networking http://www.parodius.com/ |
    | UNIX Systems Administrator Mountain View, CA, USA |
    | Making life hard for others since 1977. PGP: 4BD6C0CB |

    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  4. Re: resource leak

    Jeremy Chadwick wrote:
    > On Wed, Oct 01, 2008 at 08:30:26AM -0400, Stephen Clark wrote:
    >> Jeremy Chadwick wrote:
    >>> On Wed, Oct 01, 2008 at 07:41:56AM -0400, Stephen Clark wrote:
    >>>> Hello List,
    >>>>
    >>>> I am running into a strange problem that points to a resource leak.
    >>>> The problem manifests itself after one of our remote systems has been
    >>>> up around 100 days.
    >>>> The symptom is that it appears no new processes can be spawned. If I try to
    >>>> ssh to the unit, I can see the 3-way tcp handshake and then no more traffic.
    >>>> Examining log files, like cron, etc show that when this happens no more entries
    >>>> are written into the cron log. The unit is acting as a firewall,
    >>>> router and vpn appliance these functions continue to work. We have a
    >>>> C application that is periodically started out of a shell script that
    >>>> reports various information about the system, it stops reporting,
    >>>> while vpns, ospf routing, and ipfilter firewalling continue to work
    >>>> and write into their logfiles.
    >>>>
    >>>> My question is how do I monitor the various resources in the system that could
    >>>> prevent the spawning of a new process?
    >>> Periodically logging "ps -auxw" output to a file would be useful, as
    >>> ideally you'd gradually see the list get longer and longer over time;
    >>> it's possible you have many zombie processes as a result of a parent
    >>> which is not reaping its children (calling waitpid(2) or its friends).
    >>>
    >>> Other things that might come in useful are "fstat" and "vmstat -s".
    >>>
    >>> It sounds like your C program relies heavily on system() or execl() and
    >>> fork(), which is why it's affected -- while the other programs are
    >>> likely kernel-level.
    >>>

    >> Thanks Jeremy,
    >>
    >> I have added those commands to a periodic daily script.
    >>
    >> Another thing I have noticed is that quite often the problem seems to
    >> start at 2am in the morning, right when the periodic daily script runs.
    >>
    >> But I think it is coincidence and that we have reached the edge of the
    >> resource limit and all the jobs that get spawned by the periodic daily
    >> scripts pushes us over the limit.
    >>
    >> The other thing is that having logged into some of the systems that have
    >> been up in the 80 day range, I don't see a lot/any zombies. I just wonder
    >> if it is and fd leak, the fstat should point that out.

    >
    > You might find the below thread beneficial -- an individual came to the
    > lists stating that they were running out of fds as a result of some
    > Java software running amok on their systems.
    >
    > http://lists.freebsd.org/pipermail/f...ead.html#45383
    > http://lists.freebsd.org/pipermail/f...er/045383.html
    >

    Thanks, but after reading the thread is there a single place in the kernel that
    reports the how many fds are currently in use? Does the "no more fds" message
    get logged in /var/log/messages or only in the kernel log buffer, since I
    haven't seen that message in the messages file, and since we force to have a
    remote user reboot the box the kernel buffer is gone.

    Steve

    --

    "They that give up essential liberty to obtain temporary safety,
    deserve neither liberty nor safety." (Ben Franklin)

    "The course of history shows that as a government grows, liberty
    decreases." (Thomas Jefferson)


    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  5. Re: resource leak

    > Thanks, but after reading the thread is there a single place in the kernel
    > that reports the how many fds are currently in use? Does the "no more fds"
    > message get logged in /var/log/messages or only in the kernel log buffer,
    > since I haven't seen that message in the messages file, and since we force
    > to have a remote user reboot the box the kernel buffer is gone.


    Just a guess, but perhaps:

    vmstat -m | grep -E 'filedesc|Type'

    Regards,
    Josh
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/lis...freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


+ Reply to Thread