anomalous SIGKILL - Linux

This is a discussion on anomalous SIGKILL - Linux ; I occasionally get the simple message "Killed", when I try to execute a particular program. Using strace on it shows only two lines: execve (...) and ....got SIGKILL. Why is it doing this? The program runs as a daemon with ...

+ Reply to Thread
Results 1 to 16 of 16

Thread: anomalous SIGKILL

  1. anomalous SIGKILL

    I occasionally get the simple message "Killed", when I try to execute
    a
    particular program. Using strace on it shows only two lines: execve
    (...) and
    ....got SIGKILL. Why is it doing this?

    The program runs as a daemon with one name "drd", and as an user
    interface with another name "dr", using a symbolic link. The user
    interface
    code is trivially simple, using only ncurses and two files.

    Also when this happens, the running daemon is apparently unable to
    execute new processes using system ("...") calls, which it normally
    does several times a minute. It is as if new processes can't start,
    but
    everything else seems to work. The number of processes is small,
    and the CPU load is small.

    -Mike


  2. Re: anomalous SIGKILL

    Mike writes:

    > I occasionally get the simple message "Killed", when I try to execute
    > a
    > particular program. Using strace on it shows only two lines: execve
    > (...) and
    > ...got SIGKILL. Why is it doing this?
    >
    > The program runs as a daemon with one name "drd", and as an user
    > interface with another name "dr", using a symbolic link. The user
    > interface code is trivially simple, using only ncurses and two
    > files.
    >
    > Also when this happens, the running daemon is apparently unable to
    > execute new processes using system ("...") calls, which it normally
    > does several times a minute. It is as if new processes can't start,
    > but everything else seems to work. The number of processes is small,
    > and the CPU load is small.


    Sounds like you've hit the limit for number of processes. Check your
    ulimit settings.

    --
    Måns Rullgård
    mans@mansr.com

  3. Re: anomalous SIGKILL

    Måns Rullgård writes:

    > Mike writes:
    >
    >> ...got SIGKILL. Why is it doing this?


    Perhaps because OOM killer decided that you consumed too much RAM?

    >> Also when this happens, the running daemon is apparently unable to
    >> execute new processes using system ("...") calls


    You mean "just before this happens"?
    It's hard to expect process that has been killed with SIGKILL to
    be able to execute anything.

    >> but everything else seems to work. The number of processes is small,
    >> and the CPU load is small.

    >
    > Sounds like you've hit the limit for number of processes.


    Not really: hitting that limit will not in and of itself cause the
    process to be terminated with SIGKILL, and he stated that "number
    of processes is small".

    Cheers,
    --
    In order to understand recursion you must first understand recursion.
    Remove /-nsp/ for email.

  4. Re: anomalous SIGKILL

    Paul Pluzhnikov writes:

    > Måns Rullgård writes:
    >
    >> Mike writes:
    >>
    >>> ...got SIGKILL. Why is it doing this?

    >
    > Perhaps because OOM killer decided that you consumed too much RAM?
    >
    >>> Also when this happens, the running daemon is apparently unable to
    >>> execute new processes using system ("...") calls

    >
    > You mean "just before this happens"?
    > It's hard to expect process that has been killed with SIGKILL to
    > be able to execute anything.
    >
    >>> but everything else seems to work. The number of processes is small,
    >>> and the CPU load is small.

    >>
    >> Sounds like you've hit the limit for number of processes.

    >
    > Not really: hitting that limit will not in and of itself cause the
    > process to be terminated with SIGKILL, and he stated that "number


    I realised this just after posting. He could be hitting some other
    ulimit though.

    --
    Måns Rullgård
    mans@mansr.com

  5. Re: anomalous SIGKILL


    > > Perhaps because OOM killer decided that you consumed too much RAM?


    Possibly, but the "top" program shows the daemon using only 6% of
    memory.

    >
    > >>> Also when this happens, the running daemon is apparently unable to
    > >>> execute new processes using system ("...") calls

    >
    > > You mean "just before this happens"?
    > > It's hard to expect process that has been killed with SIGKILL to
    > > be able to execute anything.


    No, the daemon runs continuously. The user interface program runs
    occasionally.
    (same executable program actually, but different filename, like gzip/
    gunzip).
    When the interface program won't run because it gets SIGKILL, that
    happens
    precisely when the daemon system ("...") calls fail also.

    > >>> but everything else seems to work. The number of processes is small,
    > >>> and the CPU load is small.

    >
    > >> Sounds like you've hit the limit for number of processes.

    >
    > > Not really: hitting that limit will not in and of itself cause the
    > > process to be terminated with SIGKILL, and he stated that "number

    >
    > I realised this just after posting. He could be hitting some other
    > ulimit though.


    The programs are all run as root. "ulimit" returns "unlimited".
    However, killing other processes seems to temporarily solve the
    problem.

    -Mike


  6. Re: anomalous SIGKILL

    Mike writes:

    >> > You mean "just before this happens"?

    >
    > When the interface program won't run because it gets SIGKILL, that
    > happens precisely when the daemon system ("...") calls fail also.


    See if you can run 'strace -fpo /tmp/junk.trace ' when
    the problem happens, and if you can determine which part of system(3)
    is failing.

    Probably clone(2) is failing, but why? If it's ENOMEM, then something
    is taking up all that memory; if it's EAGAIN, too many processes
    (or perhaps too many threads).

    > The programs are all run as root. "ulimit" returns "unlimited".


    You want 'ulimit -a' (it's highly unusual for all the separate
    limits to be "unlimited").

    Cheers,
    --
    In order to understand recursion you must first understand recursion.
    Remove /-nsp/ for email.

  7. Re: anomalous SIGKILL

    On Oct 8, 3:07 am, Paul Pluzhnikov
    wrote:

    > You want 'ulimit -a' (it's highly unusual for all the separate
    > limits to be "unlimited").


    Ah yes, I see that 'ulimit -a' is not unlimited for root for:

    open files (-n) 1024
    pipe size (512 bytes, -p) 8
    max user processes (-u) 10234

    However, I think that the problem is caused by an out-of-memory
    condition
    caused by another program - the 'opera' browser. Killing the browser
    cleared
    up the problem previously, and then today, the browser crashed while
    opening
    a new tab, producing an out-of-memory kernel message in "/var/log/
    messages":
    ...
    Oct 10 00:35:52 A241105 kernel: HighMem: 1*4kB 0*...
    Oct 10 00:35:52 A241105 kernel: Swap cache: add 339056, delete 339009,
    find 109385/123622, race 0+0
    Oct 10 00:35:52 A241105 kernel: Free swap: 0kB
    Oct 10 00:35:52 A241105 kernel: 327488 pages of RAM
    Oct 10 00:35:52 A241105 kernel: 98112 pages of HIGHMEM
    Oct 10 00:35:52 A241105 kernel: 3790 reserved pages
    Oct 10 00:35:52 A241105 kernel: 10488 pages shared
    Oct 10 00:35:52 A241105 kernel: 47 pages swap cached
    Oct 10 00:35:52 A241105 kernel: Out of Memory: Killed process 2218
    (opera).


    This make me wonder if RAM size plus swap size were larger than 4GB,
    would
    this still happen? 'Top' shows:

    Mem: 1295052k total, 1258848k used, 36204k free, 23956k
    buffers
    Swap: 793760k total, 793756k used, 4k free, 56492k
    cached

    -Mike


  8. Re: anomalous SIGKILL

    The problem has returned, without 'opera' running.

    'Top' shows zero free swap space. It shows the largest memory use
    by X, at 35%.

    As expected, I killed a non-essential process and the problem
    cleared.

    What is going on?

    -Mike




  9. Re: anomalous SIGKILL

    > The problem has returned, without 'opera' running.
    >
    > 'Top' shows zero free swap space. It shows the largest memory use
    > by X, at 35%.
    >
    > As expected, I killed a non-essential process and the problem
    > cleared.
    >
    > What is going on?


    "'Top' shows zero free swap space."

    The kernel believes that it is out of memory. Look in /proc/meminfo,
    "ps axl" (VSZ and RSS), "df" and /proc/mounts (for RAM-based filesystems
    such as /dev/shm, tmpfs, etc.) to see where the pages of RAM+swap went.
    Are any RAM disks in use? [/var/log/messages: "RAMDISK driver initialized:
    xx RAM disks of xxxxxK size 1024 blocksize"] Show "uname -a" and "uptime".
    Contrast with a freshly-booted system. Take snapshots once per hour;
    more often when swap drops below 15% free. For a suspicious process,
    consult /proc//smaps .

    --


  10. Re: anomalous SIGKILL

    Update:
    The problem seems to stem from running a browser (firefox or opera),
    which then causes X to consume large amounts of memory. If those
    programs aren't run after boot-up, then the 800MB of swap never gets
    used at all.

    Oddly, right after reboot, even the RAM usage is small, which then
    gradually rises over many hours to consume almost the entire
    1.3GB of RAM, except for perhaps 20 or 30MB, and settles there.

    Does this seem like normal behavior?

    -Mike



  11. Re: anomalous SIGKILL

    Mike writes:

    > Update:
    > The problem seems to stem from running a browser (firefox or opera),
    > which then causes X to consume large amounts of memory. If those
    > programs aren't run after boot-up, then the 800MB of swap never gets
    > used at all.
    >
    > Oddly, right after reboot, even the RAM usage is small, which then
    > gradually rises over many hours to consume almost the entire
    > 1.3GB of RAM, except for perhaps 20 or 30MB, and settles there.
    >
    > Does this seem like normal behavior?


    Most of that used RAM is probably disk cache. Check /proc/meminfo for
    a breakdown.

    --
    Måns Rullgård
    mans@mansr.com

  12. Re: anomalous SIGKILL

    On Oct 13, 1:53 pm, Måns Rullgård wrote:

    > Most of that used RAM is probably disk cache. Check /proc/meminfo for
    > a breakdown.


    Cached is 745MB, so it's more than half.

    Also, to answer John's question, there is no RAMDISK.

    My question now is, assuming there are no bugs causing a memory leak
    in the
    browser or X code, should it run out of swap space? Why?

    -Mike


  13. Re: anomalous SIGKILL

    Mike writes:

    > On Oct 13, 1:53 pm, Måns Rullgård wrote:
    >
    >> Most of that used RAM is probably disk cache. Check /proc/meminfo for
    >> a breakdown.

    >
    > Cached is 745MB, so it's more than half.
    >
    > Also, to answer John's question, there is no RAMDISK.
    >
    > My question now is, assuming there are no bugs causing a memory leak
    > in the browser or X code, should it run out of swap space? Why?


    If you try to run too many applications at once you can certainly run
    out of memory, both RAM and swap space. Your only choices are to not
    run all those apps at the same time, install more physical RAM in the
    machine, or create more swap space.

    --
    Måns Rullgård
    mans@mansr.com

  14. Re: anomalous SIGKILL

    On Oct 13, 5:53 pm, Måns Rullgård wrote:
    > Mike writes:


    > > My question now is, assuming there are no bugs causing a memory leak
    > > in the browser or X code, should it run out of swap space? Why?

    >
    > If you try to run too many applications at once you can certainly run
    > out of memory, both RAM and swap space. Your only choices are to not
    > run all those apps at the same time, install more physical RAM in the
    > machine, or create more swap space.


    OK, thanks for the advice. I'll try more swap space.
    However, I don't consider the computer to be heavily loaded.

    Also, it seems odd to me, that by adding just one
    more application (firefox or the opera browser), that this
    gradually causes all of the 800MB swap space to be
    used up, when otherwise it is never used.

    -Mike


  15. Re: anomalous SIGKILL

    Mike writes:

    > Also, it seems odd to me, that by adding just one
    > more application (firefox or the opera browser), that this
    > gradually causes all of the 800MB swap space to be
    > used up, when otherwise it is never used.


    It is extremely unlikely that running out of swap is not caused by
    a leak in either the browser(s), or the X server.

    What can you do about it? One of your messages implies that X is
    "bigger" than firefox.

    You may want to make sure you have the latest X; but if that's
    still showing the problem, you can either try to analyze X for
    leaks, or give up and just not run firefox/opera on that system
    (or restart X daily).

    Cheers,
    --
    In order to understand recursion you must first understand recursion.
    Remove /-nsp/ for email.

  16. Re: anomalous SIGKILL

    On Oct 13, 9:30 pm, Paul Pluzhnikov
    wrote:

    > or give up and just not run firefox/opera on that system
    > (or restart X daily).


    Update:
    I have discovered that the konqueror browser does not cause the
    problem.
    It comes with KDE for SuSE linux, but firefox and opera were
    separately
    downloaded. With konqueror, no swap space gets used. This appears to
    be true even when viewing weather satellite animations, which I think
    are fairly big memory consumers.

    -Mike


+ Reply to Thread