SuSE SLES 9 && Java sometimes terminating with SIG_KILL - Linux

This is a discussion on SuSE SLES 9 && Java sometimes terminating with SIG_KILL - Linux ; Hello, We have in this environment: # cat /etc/SuSE-release SUSE LINUX Enterprise Server 9 (i586) VERSION = 9 # uname -a Linux L000SA03 2.6.5-7.139-bigsmp #1 SMP Fri Jan 14 15:41:33 UTC 2005 i686 i686 i386 GNU/Linux # /usr/local/j2sdk1.4.2_03/bin/java -version java ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: SuSE SLES 9 && Java sometimes terminating with SIG_KILL

  1. SuSE SLES 9 && Java sometimes terminating with SIG_KILL


    Hello,

    We have in this environment:

    # cat /etc/SuSE-release
    SUSE LINUX Enterprise Server 9 (i586)
    VERSION = 9
    # uname -a
    Linux L000SA03 2.6.5-7.139-bigsmp #1 SMP Fri Jan 14 15:41:33 UTC 2005 i686 i686 i386 GNU/Linux
    # /usr/local/j2sdk1.4.2_03/bin/java -version
    java version "1.4.2_03"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
    Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

    from time to time (let's say one in five days) the problem that
    the java proc is killed with SIG_KILL (signal 9). It took us some
    time to figure it out because the proc is just disappearing and
    only starting it with a small shell wrapper made it possible to
    catch the exit value of 137 of the proc.

    The system has enough real memory:

    # dmesg | fgrep -i memory
    Memory: 7264160k/8126464k available (2340k kernel code, 74444k reserved, 973k data, 252k init, 6422128k highmem)

    and some 16 GByte swap device. Also monitoring the moment of that
    with vmstat 5 ... : does not show any problem (the kill happened
    between this time frame):

    Sa Sep 9 11:14:53 CEST 2006
    procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa
    0 0 2564 1061448 56884 5586524 0 0 1 1 1 1 2 1 96 1
    0 0 2564 1061448 56884 5586524 0 0 11 23 0 0 1 1 98 0
    0 1 2564 1057640 57016 5589476 0 0 2 41 0 0 0 2 97 0
    1 0 2564 1058216 57028 5589464 0 0 33 21599 0 0 2 1 94 3
    0 0 2564 1058312 57032 5589460 0 0 3 58 0 0 2 1 97 0
    0 0 2564 1058440 57032 5589460 0 0 7 9 0 0 1 1 98 0
    0 0 2564 1058312 57032 5589460 0 0 4 13 0 0 1 1 98 0
    0 0 2564 1058440 57032 5589460 0 0 4 9 0 0 1 1 98 0
    0 0 2564 1058184 57032 5589460 0 0 8 13 0 0 1 1 98 0
    0 0 2564 1058312 57036 5589456 0 0 5 12 0 0 1 1 98 0
    0 0 2564 1058312 57036 5589456 0 0 4 30 0 0 1 1 99 0
    0 0 2564 1058568 57036 5589456 0 0 8 20 0 0 1 1 97 0
    Sa Sep 9 11:15:48 CEST 2006

    The peak in 'bo' is caused by some copy in the file system of transaction
    logs, triggered by cron, just to mention that.

    What can we do to nail this down?
    Thx

    matthias


    --
    Matthias Apitz
    Manager Technical Support - OCLC PICA GmbH
    Gruenwalder Weg 28g - 82041 Oberhaching - Germany
    t +49-89-61308 351 - f +49-89-61308 399 - m +49-170-4527211
    e - w http://www.oclcpica.org/ http://guru.UnixLand.de/

  2. Re: SuSE SLES 9 && Java sometimes terminating with SIG_KILL

    Matthias Apitz writes:

    >Hello,


    >We have in this environment:


    ># cat /etc/SuSE-release
    >SUSE LINUX Enterprise Server 9 (i586)
    >VERSION = 9
    ># uname -a
    >Linux L000SA03 2.6.5-7.139-bigsmp #1 SMP Fri Jan 14 15:41:33 UTC 2005 i686 i686 i386 GNU/Linux
    ># /usr/local/j2sdk1.4.2_03/bin/java -version
    >java version "1.4.2_03"
    >Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
    >Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)


    >from time to time (let's say one in five days) the problem that
    >the java proc is killed with SIG_KILL (signal 9). It took us some
    >time to figure it out because the proc is just disappearing and
    >only starting it with a small shell wrapper made it possible to
    >catch the exit value of 137 of the proc.


    We still not have found a solution for the above problem. We
    know that it is not the OOM-kill, because there are no kernel
    messages about and because when we run the java proc with another
    user-id as all other parts of the application software, the killing
    goes away; this let me think that one of the application software
    parts is killing a wrong / randomly process, either from some
    shell-script or with the kill(2) syscall from the application
    software written in C;

    is there some way to get the kernel logging each kill(2) syscall
    without changing the source of the kernel?

    thx

    matthias

    --
    Matthias Apitz
    Manager Technical Support - OCLC PICA GmbH
    Gruenwalder Weg 28g - 82041 Oberhaching - Germany
    t +49-89-61308 351 - f +49-89-61308 399 - m +49-170-4527211
    e - w http://www.oclcpica.org/ http://guru.UnixLand.de/

  3. Re: SuSE SLES 9 && Java sometimes terminating with SIG_KILL

    Matthias Apitz writes:
    > Matthias Apitz writes:
    >>We have in this environment:

    >
    >># cat /etc/SuSE-release
    >>SUSE LINUX Enterprise Server 9 (i586)
    >>VERSION = 9
    >># uname -a
    >>Linux L000SA03 2.6.5-7.139-bigsmp #1 SMP Fri Jan 14 15:41:33 UTC 2005 i686 i686 i386 GNU/Linux
    >># /usr/local/j2sdk1.4.2_03/bin/java -version
    >>java version "1.4.2_03"
    >>Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_03-b02)
    >>Java HotSpot(TM) Client VM (build 1.4.2_03-b02, mixed mode)

    >
    >>from time to time (let's say one in five days) the problem that
    >>the java proc is killed with SIG_KILL (signal 9). It took us some
    >>time to figure it out because the proc is just disappearing and
    >>only starting it with a small shell wrapper made it possible to
    >>catch the exit value of 137 of the proc.

    >
    > We still not have found a solution for the above problem. We
    > know that it is not the OOM-kill, because there are no kernel
    > messages about and because when we run the java proc with another
    > user-id as all other parts of the application software, the killing
    > goes away; this let me think that one of the application software
    > parts is killing a wrong / randomly process, either from some
    > shell-script or with the kill(2) syscall from the application
    > software written in C;
    >
    > is there some way to get the kernel logging each kill(2) syscall
    > without changing the source of the kernel?


    No. The top-level 'kill system call' is the sys_kill routine in
    .../kernel/signal.c, quoted below:

    asmlinkage long
    sys_kill(int pid, int sig)
    {
    struct siginfo info;

    info.si_signo = sig;
    info.si_errno = 0;
    info.si_code = SI_USER;
    info.si_pid = task_tgid_vnr(current);
    info.si_uid = current->uid;

    return kill_something_info(sig, &info, pid);
    }

    adding a printk roughly like

    printk(KERN_DEBUG "%d killing %d with signal %d",
    info.si_pid, pid, sig);

    shouldn't be too complicated (the KERN_DEBUG usually causes the
    message to be of a prioprity lower enough for not being printed
    on the console)

  4. Re: SuSE SLES 9 && Java sometimes terminating with SIG_KILL

    Rainer Weikusat writes:

    >> is there some way to get the kernel logging each kill(2) syscall
    >> without changing the source of the kernel?


    >No. The top-level 'kill system call' is the sys_kill routine in
    >../kernel/signal.c, quoted below:


    ...

    >shouldn't be too complicated (the KERN_DEBUG usually causes the
    >message to be of a prioprity lower enough for not being printed
    >on the console)


    Thanks for the feedback & hint; I was looking for a way without
    re-compiling the kernel because this problem occurs on one or two
    production systems only at customer site.

    matthias

  5. Re: SuSE SLES 9 && Java sometimes terminating with SIG_KILL

    Matthias Apitz writes:
    > Rainer Weikusat writes:
    >
    >>> is there some way to get the kernel logging each kill(2) syscall
    >>> without changing the source of the kernel?

    >
    >>No. The top-level 'kill system call' is the sys_kill routine in
    >>../kernel/signal.c, quoted below:

    >
    > ...
    >
    >>shouldn't be too complicated (the KERN_DEBUG usually causes the
    >>message to be of a prioprity lower enough for not being printed
    >>on the console)

    >
    > Thanks for the feedback & hint; I was looking for a way without
    > re-compiling the kernel because this problem occurs on one or two
    > production systems only at customer site.


    While probably being useless in this particular situation, it should
    be possible to achieve this with 'kprobes'.

+ Reply to Thread