Dying processes (inetd, cron, syslogd, sshd) - SCO

This is a discussion on Dying processes (inetd, cron, syslogd, sshd) - SCO ; Hi, We are having problems on various SMP machines (5.0.6a + rs506a installed) where at times of large load most of the running processes just seem to stop (e.g. inetd, cron, syslogd, sshd,....) This always seems to occur at times ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 25

Thread: Dying processes (inetd, cron, syslogd, sshd)

  1. Dying processes (inetd, cron, syslogd, sshd)

    Hi,

    We are having problems on various SMP machines (5.0.6a + rs506a
    installed) where at times of large load most of the running processes
    just seem to stop (e.g. inetd, cron, syslogd, sshd,....) This always
    seems to occur at times of large stress to the disks, but we have never
    managed to put our fingers on exactly what is causing it. When it does
    happen not only does the inetd process die, but also cron and syslog
    which makes it very tricky for us to put anything in place to try and
    catch what is happening.

    We are able to ping the machine when it does happed and also login at
    the console and over a modem but not over a telnet of ssh connection.

    We have had an issue open with SCO before who advised us to install
    scodb and set it to trigger when the inetd process stops - and when it
    does to get a sysdump. We have tried this, but the sysdump created was
    too big for swap - do you know of any way from within scodb to reduce
    the size of the sysdump created?

    This machine (which has had the problem once a day for the last three,)
    is used as a backup server in our office. All that runs on it is two
    rsync's of our main machine - one for mail/uucp spools, and one for the
    main data. The problem always has occured during these rsyncs, normally
    when transferring a large file.

    Any help on this would be much appreciated - or even advice as to which
    way to look to find the answer to what is happening would be much
    appreciated.

    Again, thanks,

    Keith


  2. Re: Dying processes (inetd, cron, syslogd, sshd)

    On Mon, 1 Aug 2005 17:13:24 +0100, keith@actual-systems.com wrote
    (in message <1122912804.902870.308720@g44g2000cwa.googlegroups. com>):

    > We are having problems on various SMP machines (5.0.6a + rs506a
    > installed) where at times of large load most of the running processes
    > just seem to stop (e.g. inetd, cron, syslogd, sshd,....) This always
    > seems to occur at times of large stress to the disks, but we have never
    > managed to put our fingers on exactly what is causing it. When it does
    > happen not only does the inetd process die, but also cron and syslog
    > which makes it very tricky for us to put anything in place to try and
    > catch what is happening.
    >
    > We are able to ping the machine when it does happed and also login at
    > the console and over a modem but not over a telnet of ssh connection.


    When you say stop, do you mean that the process dies (quits) or is still
    there but no longer doing anything ?

    Can you kill a process which has stopped ?


    It sounds similar to a problem we had at work when we first got our IBM
    server (X250) with ServRaid controller. There was a known bug in the firmware
    which would result in disk writes failing to complete in a random and
    intermittent manner - it was fixed by a firmware update.

    The symptoms were that every now and then (fortunately it came to light
    during pre-changover testing) there would be an error in the raid controller
    log, NOT the OS logs, and the write would never complete. So the system would
    be in a position where the raid controller thinks it's had an error and never
    completes the operation, and the OS is waiting for the controller to either
    complete or report an error (which it never did). The result of this is that
    whatever process was trying to write would halt waiting for the IO to
    complete, and since it was waiting at kernel level (I assume) the process was
    unkillable.

    What's more, any other process that tried to access the block in question
    would also halt waiting for the IO request to complete. One day, the fault
    was somewhere important and over a period of a few hours, every process 'just
    stopped' and the only thing left was to hit the power switch. When the thing
    rebooted and we looked at the data, it became apparent that the system had
    stopped writing data the day before (I guess the buffer flushing process got
    stuck) !

    Simon


  3. Re: Dying processes (inetd, cron, syslogd, sshd)

    When it happens the processes are no longer there - they just
    disappear.

    The machine is still running and we can login to the console and work
    so it sounds different to your issue.

    Keith


  4. Re: Dying processes (inetd, cron, syslogd, sshd)

    Anyone have any idea's on this problem?

    What would be the outcome if you had one process that kept on wanting
    more and more resource?

    Do other processes hold onto the resource they have or will they
    eventually get 'bullied' out of the resource they are using and
    essentially stop (which theoretically would give the results I am
    seeing.)

    Any ideas? or does anyone have any idea's as to how I would track down
    what was causing this to happen.

    Thanks again,

    Keith


  5. Re: Dying processes (inetd, cron, syslogd, sshd)

    keith@actual-systems.com wrote:

    > Anyone have any idea's on this problem?


    I posted on August 1st, but never saw it come back to me. This time I'm
    Bcc'ing you so you'll see it even if USENET swallows it again...

    keith@actual-systems.com wrote:

    > We are having problems on various SMP machines (5.0.6a + rs506a
    > installed) where at times of large load most of the running processes
    > just seem to stop (e.g. inetd, cron, syslogd, sshd,....) This always
    > seems to occur at times of large stress to the disks, but we have never
    > managed to put our fingers on exactly what is causing it. When it does
    > happen not only does the inetd process die, but also cron and syslog
    > which makes it very tricky for us to put anything in place to try and
    > catch what is happening.
    >
    > We are able to ping the machine when it does happed and also login at
    > the console and over a modem but not over a telnet of ssh connection.
    >
    > We have had an issue open with SCO before who advised us to install
    > scodb and set it to trigger when the inetd process stops - and when it
    > does to get a sysdump. We have tried this, but the sysdump created was
    > too big for swap - do you know of any way from within scodb to reduce
    > the size of the sysdump created?
    >
    > This machine (which has had the problem once a day for the last three,)
    > is used as a backup server in our office. All that runs on it is two
    > rsync's of our main machine - one for mail/uucp spools, and one for the
    > main data. The problem always has occured during these rsyncs, normally
    > when transferring a large file.


    scodb can't reduce the size of a crash dump, but you can force the dumps
    to fit by limiting the amount of memory seen by the kernel. To do this,
    append " mem=1m-100m" to DEFBOOTSTR in /etc/default/boot (substituting a
    bit less than actual size of your dump area in place of "100m").

    The load you describe would probably run in 12MB of RAM, but don't limit
    memory more than you have to. The problem might be memory size-related.
    You want to keep as much as you can of the machine's normal memory size.

    [new material begins]

    > What would be the outcome if you had one process that kept on wanting
    > more and more resource?


    There are some problem scenarios like that. A common one is a process
    spinning out of control, allocating more and more memory. It will
    eventually use all available memory; its next allocation attempt will
    fail, and in most cases it will then die. Unless you have changed the
    defaults, such a process usually writes a core dump. On OSR5, during
    the dumping of a process's core, the process continues to own all of its
    memory until the dump is complete. This means that the machine remains
    critically out of memory for a long time. The process may have grown
    nearly as large as your combined RAM + swap. To dump it, not only does
    the kernel have to write that much data, it also may have to page a
    large portion of it in from swap. This can take many minutes with large
    memory and a slow disk...

    During that period, other processes that try to allocate memory will
    usually fail. Their subsequent behavior depends on their error
    handling. Some will dump core, some will exit gracefully, some may even
    stay up. And some may get into weird catatonic states.

    > Do other processes hold onto the resource they have or will they
    > eventually get 'bullied' out of the resource they are using and
    > essentially stop (which theoretically would give the results I am
    > seeing.)


    For memory, a "hog" process will cause others to get written out to
    swap, but those processes still "own" their memory (it will get paged
    back in if they need to access it). The troubles happen when a process
    tries to allocate more memory while the system is strapped.

    There are probably other resources where similar things could happen.

    > Any ideas? or does anyone have any idea's as to how I would track down
    > what was causing this to happen.


    If you had a process spin out and dump, it would leave a huge core file
    that you would be able to find. If a process spins out and dies
    _without_ leaving a dump, a more subtle trace is left. Normally, OSR5
    doesn't use any swap at all; `swap -l` will have identical values in the
    "blocks" and "free" columns. ("Normal" modern systems have enough RAM
    that they never need to invoke the tremendous performance loss of
    swapping.) After such an incident, `swap -l` will show quite a bit of
    swap in use. This represents pages that got pushed out, and whose
    processes have never actually needed to access them since the incident.

    What does `crash` "p" show in the "EVENT" column for the hung processes?

    >Bela<


  6. Re: Dying processes (inetd, cron, syslogd, sshd)

    Thanks Bela.

    I will give those things a try and get back to you on the results.

    Hope all is going well for you now that you have left SCO.

    Keith


    Bela Lubkin wrote:
    > keith@actual-systems.com wrote:
    >
    > > Anyone have any idea's on this problem?

    >
    > I posted on August 1st, but never saw it come back to me. This time I'm
    > Bcc'ing you so you'll see it even if USENET swallows it again...
    >
    > keith@actual-systems.com wrote:
    >
    > > We are having problems on various SMP machines (5.0.6a + rs506a
    > > installed) where at times of large load most of the running processes
    > > just seem to stop (e.g. inetd, cron, syslogd, sshd,....) This always
    > > seems to occur at times of large stress to the disks, but we have never
    > > managed to put our fingers on exactly what is causing it. When it does
    > > happen not only does the inetd process die, but also cron and syslog
    > > which makes it very tricky for us to put anything in place to try and
    > > catch what is happening.
    > >
    > > We are able to ping the machine when it does happed and also login at
    > > the console and over a modem but not over a telnet of ssh connection.
    > >
    > > We have had an issue open with SCO before who advised us to install
    > > scodb and set it to trigger when the inetd process stops - and when it
    > > does to get a sysdump. We have tried this, but the sysdump created was
    > > too big for swap - do you know of any way from within scodb to reduce
    > > the size of the sysdump created?
    > >
    > > This machine (which has had the problem once a day for the last three,)
    > > is used as a backup server in our office. All that runs on it is two
    > > rsync's of our main machine - one for mail/uucp spools, and one for the
    > > main data. The problem always has occured during these rsyncs, normally
    > > when transferring a large file.

    >
    > scodb can't reduce the size of a crash dump, but you can force the dumps
    > to fit by limiting the amount of memory seen by the kernel. To do this,
    > append " mem=1m-100m" to DEFBOOTSTR in /etc/default/boot (substituting a
    > bit less than actual size of your dump area in place of "100m").
    >
    > The load you describe would probably run in 12MB of RAM, but don't limit
    > memory more than you have to. The problem might be memory size-related.
    > You want to keep as much as you can of the machine's normal memory size.
    >
    > [new material begins]
    >
    > > What would be the outcome if you had one process that kept on wanting
    > > more and more resource?

    >
    > There are some problem scenarios like that. A common one is a process
    > spinning out of control, allocating more and more memory. It will
    > eventually use all available memory; its next allocation attempt will
    > fail, and in most cases it will then die. Unless you have changed the
    > defaults, such a process usually writes a core dump. On OSR5, during
    > the dumping of a process's core, the process continues to own all of its
    > memory until the dump is complete. This means that the machine remains
    > critically out of memory for a long time. The process may have grown
    > nearly as large as your combined RAM + swap. To dump it, not only does
    > the kernel have to write that much data, it also may have to page a
    > large portion of it in from swap. This can take many minutes with large
    > memory and a slow disk...
    >
    > During that period, other processes that try to allocate memory will
    > usually fail. Their subsequent behavior depends on their error
    > handling. Some will dump core, some will exit gracefully, some may even
    > stay up. And some may get into weird catatonic states.
    >
    > > Do other processes hold onto the resource they have or will they
    > > eventually get 'bullied' out of the resource they are using and
    > > essentially stop (which theoretically would give the results I am
    > > seeing.)

    >
    > For memory, a "hog" process will cause others to get written out to
    > swap, but those processes still "own" their memory (it will get paged
    > back in if they need to access it). The troubles happen when a process
    > tries to allocate more memory while the system is strapped.
    >
    > There are probably other resources where similar things could happen.
    >
    > > Any ideas? or does anyone have any idea's as to how I would track down
    > > what was causing this to happen.

    >
    > If you had a process spin out and dump, it would leave a huge core file
    > that you would be able to find. If a process spins out and dies
    > _without_ leaving a dump, a more subtle trace is left. Normally, OSR5
    > doesn't use any swap at all; `swap -l` will have identical values in the
    > "blocks" and "free" columns. ("Normal" modern systems have enough RAM
    > that they never need to invoke the tremendous performance loss of
    > swapping.) After such an incident, `swap -l` will show quite a bit of
    > swap in use. This represents pages that got pushed out, and whose
    > processes have never actually needed to access them since the incident.
    >
    > What does `crash` "p" show in the "EVENT" column for the hung processes?
    >
    > >Bela<



  7. Re: Dying processes (inetd, cron, syslogd, sshd)

    Hi,
    Just a couple of updates on the suggestions Bela gave.

    I've done a 'find' for any core files and didn't find any new ones, so
    they are not being created - I have yet to have the problem happen
    again so I havn't done a 'swap -l' to see if any of it is being used
    when the problem occurs.

    While I've not used 'crash' to see what the EVENT column shows - I
    don't actually think it will show anything as a ps done after the
    problem has occured doesn't show the processes - they have just
    disappeared.

    I've set 'mem=1m-100m' on both my machines here and will try to force
    the problem to happen tonight and hopefully I will have more questions
    tommorow.

    Thanks again,

    Keith


  8. Re: Dying processes (inetd, cron, syslogd, sshd)

    keith@actual-systems.com wrote:

    > I've done a 'find' for any core files and didn't find any new ones, so
    > they are not being created - I have yet to have the problem happen
    > again so I havn't done a 'swap -l' to see if any of it is being used
    > when the problem occurs.


    Do at least one before the system goes weird, so you'll know what's
    normal.

    > While I've not used 'crash' to see what the EVENT column shows - I
    > don't actually think it will show anything as a ps done after the
    > problem has occured doesn't show the processes - they have just
    > disappeared.


    Ah, I don't think I had understood properly. Disappearing processes
    could have been killed by a low-memory situation, but also by another
    process deliberately killing them, or all sorts of other things.

    > I've set 'mem=1m-100m' on both my machines here and will try to force
    > the problem to happen tonight and hopefully I will have more questions
    > tommorow.


    Is that the actual size of swap on the machines? Like I said, you
    shouldn't restrain memory tighter than necessary: use "mem=1m-%dm",
    filling in the actual size of swap on each machine (short it by 1MB to
    be sure).

    >Bela<


  9. Re: Dying processes (inetd, cron, syslogd, sshd)

    I have now had the problem happen again and have a sysdump created from
    within scodb (I was triggered into debug mode when the inetd process
    stopped.)

    I have a few questions now.
    1. From looking at this sysdump how can I tell what has caused this
    process to stop running?
    2. Bela, in your previous answer you mentioned that 'Disappearing
    processes could have been killed by a low-memory situation' - is there
    any way to tell if this is what has happened (or does the answer to the
    first question give me the answer to the second....)

    Again thanks for all your help,

    Keith


  10. Re: Dying processes (inetd, cron, syslogd, sshd)

    keith@actual-systems.com wrote:

    > I have now had the problem happen again and have a sysdump created from
    > within scodb (I was triggered into debug mode when the inetd process
    > stopped.)
    >
    > I have a few questions now.
    > 1. From looking at this sysdump how can I tell what has caused this
    > process to stop running?


    Exactly how did you trigger scodb on death of the process? If you had
    it catch the right thing, you should be able to see what signal caused
    the death. If that signal was raised by the process's own action (e.g.
    `kill(getpid(), SIGTERM)', or by trying to allocate memory when none was
    available) -- the evidence will be on its own stack. If it was killed
    by another process (`kill(1234, SIGTERM)'), that process's stack will
    tell the story.

    Start by doing a "stack -p 1234", where 1234 is the PID of inetd.

    > 2. Bela, in your previous answer you mentioned that 'Disappearing
    > processes could have been killed by a low-memory situation' - is there
    > any way to tell if this is what has happened (or does the answer to the
    > first question give me the answer to the second....)


    It might. But you can also examine the system's memory state. The 4
    most interesting variables are the ones shown by `sar -r`, although one
    is spelled differently: freemem, freeswap, availrmem and availsmem. You
    can display them by just entering their names:

    scodb:1> freemem
    48E4B
    scodb:2> freeswap
    100000
    scodb:3> availrmem
    5DBE6
    scodb:4> availsmem
    146949

    That corresponds to:

    $ sar -r 1 1

    SCO_SV deeptht 3.2v5.0.6 i80386 08/12/2005

    15:40:42 freemem freeswp availrmem availsmem (-r)
    15:40:43 299572 8388608 383979 1337833

    `scodb` output is in hexadecimal. All of the variables are in 4K
    pages; `sar -r` output is in 4K pages except "freeswp", which is in
    512-byte sectors, so we need to multiply it by 8:

    $ bc
    ibase=16
    48E4B; 100000*8; 5DBE6; 146949
    298571
    8388608
    383974
    1337673

    The numbers are slightly different because I'm looking at a live system.

    If your system is deeply out of memory, `availsmem' will be near 0.
    Anything under about 1000 is desperate. `freemem' and `freeswap' may or
    may not be near 0, depending on exactly what the memory-eating process
    is doing. `availrmem' is unlikely to be low.

    Lack of memory is just one possible cause...

    Show me the "stack -p ..." output for inetd. That'll give me some
    idea...

    >Bela<


  11. Re: Dying processes (inetd, cron, syslogd, sshd)

    Bela,
    We were tripped into scodb as we had setup a watch point on the inetd
    process.

    We work out the pid of inetd using crash (p | grep inetd) - then we
    convert it to hex and go into scodb.

    -------------
    # crash
    dumpfile = /dev/mem, namelist = /unix, outfile = stdout
    > p | grep inetd

    29 s 256 1 256 0 76 0 selwait inetd load nxec
    > q


    (PID 29=1D in hex)

    Ctrl+alt+D
    debug0:1> bp wl &proc[1d].p_utime
    debug0:2> q
    --------------------------------

    Then once this problem occured we were in debug mode and I did the
    following to get the dumpfile I have.

    -------------------
    debug0:4> bp dis *
    debug0:5> sysdump()
    debug0:6> bc *
    debug0:7> q
    # sysdump -i /dev/swap -fbhm -o /usr/tmp/dump
    ---------------------


    Now the following is the output when I used scodb to look at the
    dumpfile I have - I would have followed your advice about getting the
    PID of inetd, but I think the process must have already stopped by the
    time I had the dump so I just did 'stack' to give you everything and
    then 'p' from within crash to get al the running processes.

    ----------------------------------------------------------------
    back:/usr/tmp # scodb -d dump
    dumpfile = dump
    namelist = dump
    stunfile = dump
    varifile = dump
    PID 0147: /etc/inetd
    scodb:1>
    scodb:1> stack
    E0000DC0 exit2(0, 9) <- psig+18F
    E0000DE8 psig(? 8057644, 0, 0, F01D4D98) <- systrap+39F
    E0000E10 systrap(E0000E1C) <-
    scall_nokentry+14
    scodb:2> q


    # crash -d dump
    dumpfile = dump, namelist = dump, outfile = stdout
    > p

    PROC TABLE SIZE = 83
    SLOT ST PID PPID PGRP UID PRI CPU EVENT NAME
    FLAGS
    0 s 0 0 0 0 95 0 runout sched
    load sys lock nwak
    1 r 1 0 0 0 80 0 init
    load nwak
    2 s 2 0 0 0 95 0 vhand vhand
    load sys lock nwak nxec
    3 r 3 0 0 0 95 0 bdflush
    load sys lock nwak nxec
    4 r 4 0 0 0 36 0 CPU1 idle proc
    load sys lock nxec 2000000
    5 p 5 0 0 0 36 0 CPU2 idle proc
    load sys lock nxec 2000000
    6 s 6 0 0 0 95 0 kmd_id kmdaemon
    load sys lock nwak nxec
    7 r 7 1 0 0 95 0 htepi_daemon
    load sys lock nwak
    8 s 8 0 0 0 95 0 pbintrpool strd
    load sys lock nwak nxec
    10 r 55 1 55 0 73 0 ifor_pmd
    load nxec exit
    11 z 57 55 55 0 76 0 zombie
    nou nxec exit
    12 r 52 1 50 0 80 0 syslogd
    load nwak nxec exit
    13 s 43 1 0 0 95 0 0xc1031150 htepi_daemon
    load sys lock nwak
    14 r 524 1 524 0 75 0 getty
    load
    15 r 85 1 71 0 75 0 strerr
    load exit
    18 r 525 1 525 0 75 0 getty
    load
    20 r 526 1 526 0 75 0 getty
    load
    21 r 276 1 276 0 81 0 cron
    load nwak nxec exit
    25 s 206 1 0 0 95 0 0xc1035150 htepi_daemon
    load sys lock nwak
    26 r 210 1 0 0 95 0 htepi_daemon
    load sys lock nwak
    29 z 568 1 0 204 76 0 zombie
    nou exit
    31 z 384 1 384 0 76 0 zombie
    nou nxec exit
    33 z 382 1 382 0 76 0 zombie
    nou nxec exit
    34 p 327 1 327 0 76 0 inetd
    load nxec exit
    35 r 17987 1 17987 0 80 0 getty
    load nwak
    41 r 392 1 384 0 75 0 lockd
    load nxec exit
    42 r 393 392 384 0 75 0 lockd
    load nxec
    43 r 394 392 384 0 75 0 lockd
    load nxec
    44 r 395 392 384 0 75 0 lockd
    load nxec
    45 r 396 392 384 0 75 0 lockd
    load nxec
    47 r 434 1 432 0 80 0 calserver
    load nwak nxec exit
    48 r 17869 1 276 0 76 0 rsync
    load exit
    49 r 442 1 442 0 81 0 caldaemon
    load nwak exit
    50 r 449 1 449 0 66 0 prngd
    load nxec exit
    53 r 527 1 527 0 75 0 getty
    load
    54 r 507 1 507 0 76 0 sshd
    load nxec exit
    55 r 528 1 528 0 75 0 getty
    load
    56 r 529 1 529 0 75 0 getty
    load
    57 r 530 1 530 0 75 0 getty
    load
    58 r 531 1 531 0 75 0 getty
    load
    59 r 532 1 532 0 75 0 getty
    load
    60 r 533 1 533 0 75 0 getty
    load
    61 r 534 1 534 0 75 0 getty
    load
    62 r 535 1 535 0 80 0 sdd
    load nwak exit
    63 z 17871 17869 276 0 88 0 zombie
    nou exit
    64 z 17240 276 276 0 73 0 zombie
    nou exit
    65 r 17872 1 276 0 75 0 rcmd
    load nxec exit
    68 s 17900 17869 276 0 95 0 0xfc2e3b90 rsync
    load nwak nxec
    > q

    ------------------------

    And also looking at the memory as suggested shows it wasn't critically
    close to it's limits (unless this is showing the freemem etc after the
    inetd process has stopped.)

    ------------------------
    scodb:2> freemem
    6EEA0
    scodb:3> freeswap
    100000
    scodb:4> availrmem
    6FB0C
    scodb:5> availsmem
    16E941
    ------------------------


    As always your help here is very much appreciated.

    Thanks,

    Keith


  12. Re: Dying processes (inetd, cron, syslogd, sshd)

    keith@actual-systems.com wrote:

    > We were tripped into scodb as we had setup a watch point on the inetd
    > process.
    >
    > We work out the pid of inetd using crash (p | grep inetd) - then we
    > convert it to hex and go into scodb.
    >
    > -------------
    > # crash
    > dumpfile = /dev/mem, namelist = /unix, outfile = stdout
    > > p | grep inetd

    > 29 s 256 1 256 0 76 0 selwait inetd load nxec
    > > q

    >
    > (PID 29=1D in hex)
    >
    > Ctrl+alt+D
    > debug0:1> bp wl &proc[1d].p_utime
    > debug0:2> q


    Ok, that should work. You can skip the `crash` part, use "ps()" in
    scodb and pick out inetd's PID there (already printed in hex). But
    you can't pipe it to "grep inetd", so that's a bit of a negative...

    > Then once this problem occured we were in debug mode and I did the
    > following to get the dumpfile I have.
    >
    > -------------------
    > debug0:4> bp dis *
    > debug0:5> sysdump()
    > debug0:6> bc *
    > debug0:7> q
    > # sysdump -i /dev/swap -fbhm -o /usr/tmp/dump
    > ---------------------
    >
    > Now the following is the output when I used scodb to look at the
    > dumpfile I have - I would have followed your advice about getting the
    > PID of inetd, but I think the process must have already stopped by the
    > time I had the dump so I just did 'stack' to give you everything and
    > then 'p' from within crash to get al the running processes.


    Again, "ps()" in scodb will do that.

    The process was dying when the dump was taken.

    Your three pieces of evidence here are inconsistent with each other.
    You've set up to break when PID 0x1D is dying. The `scodb -d dump`
    output from after the breakpoint says it's showing PID 0x147. And the
    `crash -d dump` output has an inetd PID 0x34. So at best I have to
    assume these are from three different events.

    > back:/usr/tmp # scodb -d dump
    > dumpfile = dump
    > namelist = dump
    > stunfile = dump
    > varifile = dump
    > PID 0147: /etc/inetd
    > scodb:1>
    > scodb:1> stack
    > E0000DC0 exit2(0, 9) <- psig+18F
    > E0000DE8 psig(? 8057644, 0, 0, F01D4D98) <- systrap+39F
    > E0000E10 systrap(E0000E1C) <-
    > scall_nokentry+14
    > scodb:2> q


    `exit2(0, 9)' is, I believe, delivering a signal 9 (SIGKILL) to the
    process. There are few things inside the kernel that generate SIGKILL.
    This suggests a user process is deliberately killing the dying
    processes.

    > # crash -d dump
    > dumpfile = dump, namelist = dump, outfile = stdout
    > > p

    > PROC TABLE SIZE = 83
    > SLOT ST PID PPID PRI EVENT NAME FLAGS
    > 0 s 0 0 95 runout sched load sys lock nwak
    > 1 r 1 0 80 init load nwak
    > 2 s 2 0 95 vhand vhand load sys lock nwak nxec
    > 3 r 3 0 95 bdflush load sys lock nwak nxec
    > 4 r 4 0 36 CPU1 idle proc load sys lock nxec 2000000
    > 5 p 5 0 36 CPU2 idle proc load sys lock nxec 2000000
    > 6 s 6 0 95 kmd_id kmdaemon load sys lock nwak nxec
    > 7 r 7 1 95 htepi_daemon load sys lock nwak
    > 8 s 8 0 95 pbintrpool strd load sys lock nwak nxec
    > 10 r 55 1 73 ifor_pmd <-- load nxec exit
    > 11 z 57 55 76 zombie <-- nou nxec exit
    > 12 r 52 1 80 syslogd <-- load nwak nxec exit
    > 13 s 43 1 95 0xc1031150 htepi_daemon load sys lock nwak
    > 14 r 524 1 75 getty load
    > 15 r 85 1 75 strerr <-- load exit
    > 18 r 525 1 75 getty load
    > 20 r 526 1 75 getty load
    > 21 r 276 1 81 cron <-- load nwak nxec exit
    > 25 s 206 1 95 0xc1035150 htepi_daemon load sys lock nwak
    > 26 r 210 1 95 htepi_daemon load sys lock nwak
    > 29 z 568 1 76 zombie <-- nou exit
    > 31 z 384 1 76 zombie <-- nou nxec exit
    > 33 z 382 1 76 zombie <-- nou nxec exit
    > 34 p 327 1 76 inetd <-- load nxec exit
    > 35 r 17987 1 80 getty load nwak
    > 41 r 392 1 75 lockd <-- load nxec exit
    > 42 r 393 392 75 lockd load nxec
    > 43 r 394 392 75 lockd load nxec
    > 44 r 395 392 75 lockd load nxec
    > 45 r 396 392 75 lockd load nxec
    > 47 r 434 1 80 calserver <-- load nwak nxec exit
    > 48 r 17869 1 76 rsync <-- load exit
    > 49 r 442 1 81 caldaemon <-- load nwak exit
    > 50 r 449 1 66 prngd <-- load nxec exit
    > 53 r 527 1 75 getty load
    > 54 r 507 1 76 sshd <-- load nxec exit
    > 55 r 528 1 75 getty load
    > 56 r 529 1 75 getty load
    > 57 r 530 1 75 getty load
    > 58 r 531 1 75 getty load
    > 59 r 532 1 75 getty load
    > 60 r 533 1 75 getty load
    > 61 r 534 1 75 getty load
    > 62 r 535 1 80 sdd <-- load nwak exit
    > 63 z 17871 17869 88 zombie <-- nou exit
    > 64 z 17240 276 73 zombie <-- nou exit
    > 65 r 17872 1 75 rcmd <-- load nxec exit
    > 68 s 17900 17869 95 0xfc2e3b90 rsync load nwak nxec


    I deleted some less interesting columns from the table.

    Notice the large number of processes with the "exit" flag. I marked
    them with "<--" to make them visually obvious. These are all dying.
    Does this list more or less match your experience of what dies in one of
    these events? (Your mental list should be a little longer since there
    are a few here that have gotten far enough along in dying that we can't
    see what they were running, just "zombie").

    To capture a better dump next time, I suggest rigging the system up so
    that it breaks into scodb when _any_ of these processes die. That means
    you'll catch it as early as possible -- possibly while the "attacker"
    process is still issuing the SIGKILLs.

    > And also looking at the memory as suggested shows it wasn't critically
    > close to it's limits (unless this is showing the freemem etc after the
    > inetd process has stopped.)
    >
    > scodb:2> freemem
    > 6EEA0
    > scodb:3> freeswap
    > 100000
    > scodb:4> availrmem
    > 6FB0C
    > scodb:5> availsmem
    > 16E941


    Right, that makes it look like it has nothing to do with memory.

    > As always your help here is very much appreciated.


    I'm wondering if we should take it offline -- I feel like I should be
    looking directly at one of these dumps (hopefully one collected as early
    as possible in an event).

    Instead of setting up individual breakpoints for all the processes you
    expect to die, you can use the fact that it's SIGKILL to your advantage.
    Take a look at the code for kill():

    scodb> u kill+175
    kill+175 movl %eax,4(%edi)
    kill+178 cmpl %eax,9
    kill+17D jne kill+18C
    kill+17F movb %al,-1(%ebp)

    If your kernel matches the one I'm looking at here, kill+17F is an
    address only reached during a `kill(pid, 9)' call. (If your kernel is
    different: do "u kill", then type "/,9". It will scroll forward to the
    right point in the disassembly. Hit a few times after that,
    then "q" to exit the disassembler.)

    Set a breakpoint on that ("bp kill+17F").

    Test it by:

    $ sleep 100 &
    [1] 27388
    $ kill -9 27388

    .... and you should drop into the debugger. Quit.

    There shouldn't be a lot of SIGKILLing going on in a normal working
    system. Hopefully this won't trigger until the problem event. If it
    does, you might have to find another way.

    But if it does false trigger, you should also try to understand who is
    killing -9 whom -- it might be an important clue. Suppose a particular
    daemon has a habit of creating a child, letting it run for a while, then
    killing it. Suppose its memory of the child PID is corrupted somehow.
    e.g. what if it does:

    do_something &
    childpid=$!
    ...
    ps -ef | grep $childpid | awk '{ print "kill -9 " $2 }' | sh

    This is horribly bad code, do not use it as an example of anything
    besides trouble...

    So this "works fine" for a while, when $childpid is 12345. But one day
    it runs its child right when the system PID counter has rolled over,
    gets PID 23. `ps -ef | grep 23` is likely to match several processes
    other than the intended victim.

    Anyway, if you trap on delivery of signal 9, you're likely to catch the
    problem happening.

    And if it happens without hitting that trigger, you learn that it
    _isn't_ a process-to-process kill, it's actually coming directly out of
    the kernel -- also very interesting news.

    >Bela<


  13. Re: Dying processes (inetd, cron, syslogd, sshd)

    I'm very confused by the difference in PID's showing for inetd from the
    output from crash and scodb - they were both using the same dump file
    (in the example of how it was setup was a different example.)


    You are correct about the 'exiting' processes - they all seem to be the
    processes that are the ones that are killed when this happens - so it
    does look like the dump was triggered at the correct time.

    I did a bit of testing on a machine here doing 'kill -9' to various
    processes and the result was the 'exit2(0, 9)' as you predicted -
    however when I setup the breakpoint to be 'bp kill+175' I can't seem to
    get it to trigger no matter what processes I kill. I also tried to
    trigger with lines above and below this but they either triggered at
    every kill or not at all. Is there another line that we can setup as a
    trigger?

    On these servers the problem seems to happen when they are running
    rsync's - so I'm going to grab the source for the version that we are
    running and see if there is anything in there doing kill's similar to
    your example.

    Again thanks for your help.

    Keith


  14. Re: Dying processes (inetd, cron, syslogd, sshd)

    keith@actual-systems.com wrote:

    > I'm very confused by the difference in PID's showing for inetd from the
    > output from crash and scodb - they were both using the same dump file
    > (in the example of how it was setup was a different example.)


    Hmmm. It's possible that inetd had forked at that time; but they really
    look like they should be the same.

    > You are correct about the 'exiting' processes - they all seem to be the
    > processes that are the ones that are killed when this happens - so it
    > does look like the dump was triggered at the correct time.


    I'm sure it was triggered at the right time at a gross level, now we're
    looking for finer precision.

    > I did a bit of testing on a machine here doing 'kill -9' to various
    > processes and the result was the 'exit2(0, 9)' as you predicted -
    > however when I setup the breakpoint to be 'bp kill+175' I can't seem to
    > get it to trigger no matter what processes I kill. I also tried to
    > trigger with lines above and below this but they either triggered at
    > every kill or not at all. Is there another line that we can setup as a
    > trigger?


    `kill+175' was the start of the disassembly I looked at:

    scodb> u kill+175
    kill+175 movl %eax,4(%edi)
    kill+178 cmpl %eax,9
    kill+17D jne kill+18C
    kill+17F movb %al,-1(%ebp)

    The correct breakpoint here is "bp kill+17F" -- immediately after it has
    compared the signal number to 9 (and branched elsewhere if it wasn't 9).

    > On these servers the problem seems to happen when they are running
    > rsync's - so I'm going to grab the source for the version that we are
    > running and see if there is anything in there doing kill's similar to
    > your example.


    Hmmm... what version is it? This appears in the OLDNEWS file in rsync
    source:

    " NEWS for rsync 2.5.5, aka Snowy River (2 Apr 2002)
    " Protocol: 26 (unchanged)
    " Changes since 2.5.4:
    "
    " BUG FIXES:
    "
    " * Fix situation where failure to fork (e.g. because out of process
    " slots) would cause rsync to kill all processes owned by the
    " current user. Yes, really! (Paul Haas, Martin Pool)

    That seems like it could possibly have something to do with the
    problem... ;-}

    On the other hand, it looks like it would have killed even more
    processes, but would have done so with SIGUSR1. So unless this is a
    hacked version of rsync, it probably isn't precisely that problem.

    >Bela<


  15. Re: Dying processes (inetd, cron, syslogd, sshd)

    Hi Bela,

    I'm still having problems with the kill+17F line. If I set 'bp
    kill+17F' I can kill -9 any process I want and it doesn't trigger.

    I've put the output from 'u kill' below so you can make sure that it is
    the same version you are running on - but from looking at the list of
    lines 175-17F from your example before It looks to be the same.

    scodb:1> u kill
    kill pushl %ebp
    kill+1 movl %eax,pid_stamp
    kill+6 movl %ebp,%esp
    kill+8 subb %esp,1C
    kill+B pushl %edi
    kill+C movl -14(%ebp),%eax
    kill+F pushl %esi
    kill+10 movl %edi,u.u_ap
    kill+16 pushl %ebx
    kill+17 movl %eax,4(%edi)
    kill+1A testl %eax,%eax
    kill+1C jl kill+25
    kill+1E cmpl %eax,20
    kill+23 jl kill+44
    kill+25 popl %ebx
    kill+26 popl %esi
    kill+27 movb u.u_error,16
    kill+2E popl %edi
    kill+2F movl %esp,%ebp
    kill+31 popl %ebp
    kill+32 ret
    kill+33 nop
    kill+34 movb u.u_error,3
    kill+3B popl %ebx
    kill+3C popl %esi
    kill+3D popl %edi
    kill+3E movl %esp,%ebp
    kill+40 popl %ebp
    kill+41 ret
    kill+42 nop
    kill+43 nop
    kill+44 movl %eax,u.u_renv
    kill+49 testl %eax,3000000
    kill+4E jne kill+61
    kill+50 movl %eax,4(%edi)
    kill+53 cmpl %eax,14
    kill+58 jne kill+61
    kill+5A movl 4(%edi),16
    kill+61 movw %si,(%edi)
    kill+64 testw %si,%si
    kill+67 jle kill+70
    kill+69 movl %ebx,&proc[1]
    kill+6E jmp kill+75
    kill+70 movl %ebx,&proc[2]
    kill+75 movl %eax,u.u_procp
    kill+7A testw %si,%si
    kill+7D movl -C(%ebp),%eax
    kill+80 jne kill+8B
    kill+82 movw %ax,10(%eax)
    kill+86 testw %ax,%ax
    kill+89 je kill+34
    kill+8B movb -1(%ebp),3
    kill+8F movl %edx,v.ve_proc
    kill+95 cmpl %ebx,%edx
    kill+97 jnb kill+1EE
    kill+9D movb %al,(%ebx)
    kill+9F testb %al,%al
    kill+A1 je kill+1DB
    kill+A7 movw %ax,12(%ebx)
    kill+AB testw %si,%si
    kill+AE movw -6(%ebp),%ax
    kill+B2 jle kill+BD
    kill+B4 cmpw %ax,%si
    kill+B7 jne kill+1DB
    kill+BD testw %si,%si
    kill+C0 jne kill+D3
    kill+C2 movl %eax,-C(%ebp)
    kill+C5 movw %ax,10(%eax)
    kill+C9 cmpw 10(%ebx),%ax
    kill+CD jne kill+1DB
    kill+D3 cmpb %si,FFFFFFFF
    kill+D7 jnl kill+EA
    kill+D9 movsx %edx,%si
    kill+DC negl %edx
    kill+DE movsx %eax,10(%ebx)
    kill+E2 cmpl %eax,%edx
    kill+E4 jne kill+1DB
    kill+EA movl %eax,50(%ebx)
    kill+ED movl %ecx,-14(%ebp)
    kill+F0 movl %eax,4C(%eax)
    kill+F3 subl %eax,%ecx
    kill+F5 jnle kill+1DB
    kill+FB movl %eax,4(%edi)
    kill+FE cmpl %eax,19
    kill+103 jne kill+112
    kill+105 movl %eax,-C(%ebp)
    kill+108 movl %edx,C(%ebx)
    kill+10B movl %ecx,C(%eax)
    kill+10E cmpl %ecx,%edx
    kill+110 je kill+18C
    kill+112 movw %ax,A(%ebx)
    kill+116 xorl %edx,%edx
    kill+118 movw %dx,u.u_uid
    kill+11F movw -4(%ebp),%ax
    kill+123 testl %edx,%edx
    kill+125 je kill+148
    kill+127 xorl %eax,%eax
    kill+129 movw %ax,8(%ebx)
    kill+12D cmpl %edx,%eax
    kill+12F je kill+148
    kill+131 movl %eax,-4(%ebp)
    kill+134 andl %eax,FFFF
    kill+139 cmpl %edx,%eax
    kill+13B movl %eax,0
    kill+140 setnz %al
    kill+143 jmp kill+14A
    kill+145 nop
    kill+146 nop
    kill+147 nop
    kill+148 xorl %eax,%eax
    kill+14A testl %eax,%eax
    kill+14C je kill+16D
    kill+14E xorl %eax,%eax
    kill+150 xorl %edx,%edx
    kill+152 movw %ax,u.u_ruid
    kill+158 movw %dx,8(%ebx)
    kill+15C cmpl %eax,%edx
    kill+15E je kill+16D
    kill+160 movl %edx,-4(%ebp)
    kill+163 andl %edx,FFFF
    kill+169 cmpl %eax,%edx
    kill+16B jne kill+17F
    kill+16D cmpl %ebx,&proc[1]
    kill+173 jne kill+18C
    kill+175 movl %eax,4(%edi)
    kill+178 cmpl %eax,9
    kill+17D jne kill+18C
    kill+17F int3
    kill+180 incl %ebp
    kill+181 decl %esp
    kill+183 addl %esi,50(%ebp)
    kill+186 movb -1(%ebp),1
    kill+18A jmp kill+1D6
    kill+18C movb -1(%ebp),0
    kill+190 movl %eax,4(%edi)
    kill+193 testl %eax,%eax
    kill+195 je kill+1D6
    kill+197 pushl %eax
    kill+198 pushl %ebx
    kill+199 call sigqueue_zalloc
    kill+19E addb %esp,8
    kill+1A1 movl %ecx,%eax
    kill+1A3 testl %ecx,%ecx
    kill+1A5 je kill+1C3
    kill+1A7 movl %edx,4(%edi)
    kill+1AA movl 4(%ecx),%edx
    kill+1AD movl %edx,-C(%ebp)
    kill+1B0 movw %dx,12(%edx)
    kill+1B4 movw 10(%ecx),%dx
    kill+1B8 movl %edx,-C(%ebp)
    kill+1BB movw %dx,A(%edx)
    kill+1BF movw 14(%ecx),%dx
    kill+1C3 movsx %eax,-6(%ebp)
    kill+1C7 pushl %eax
    kill+1C8 pushl %ecx
    kill+1C9 movl %ecx,4(%edi)
    kill+1CC pushl %ecx
    kill+1CD pushl %ebx
    kill+1CE call psignalinfo
    kill+1D3 addb %esp,10
    kill+1D6 testw %si,%si
    kill+1D9 jnle kill+1EE
    kill+1DB movl %eax,v.ve_proc
    kill+1E0 addl %ebx,158
    kill+1E6 cmpl %ebx,%eax
    kill+1E8 jb kill+9D
    kill+1EE movb %al,-1(%ebp)
    kill+1F1 testb %al,%al
    kill+1F3 je kill+3B
    kill+1F9 popl %ebx
    kill+1FA popl %esi
    kill+1FB popl %edi
    kill+1FC movb u.u_error,%al
    kill+201 movl %esp,%ebp
    kill+203 popl %ebp
    kill+204 ret
    kill+205 nop
    kill+206 nop
    kill+207 nop
    kill+208 nop
    kill+209 nop
    kill+20A nop
    kill+20B nop
    kill+20C nop
    kill+20D nop
    kill+20E nop
    kill+20F nop


    Unfortunately (or maybe fortunately,) the version of rsync we are using
    is newer than 2.5.5, and it was built from the directly from the source
    on the rsync site with no hacks.

    back:/ # rsync --version
    rsync version 2.6.3pre1 protocol version 28
    Copyright (C) 1996-2004 by Andrew Tridgell and others

    Capabilities: 32-bit files, socketpairs, hard links, symlinks,
    batchfiles,
    inplace, no IPv6, 32-bit system inums, 64-bit internal
    inums


    Thanks again.

    Keith

    P.S. I've also replied to your mail, but it was bounced back to me - so
    if you still havn't received it in a few hours let me know.


  16. Re: Dying processes (inetd, cron, syslogd, sshd)

    keith@actual-systems.com wrote:

    > I'm still having problems with the kill+17F line. If I set 'bp
    > kill+17F' I can kill -9 any process I want and it doesn't trigger.
    >
    > I've put the output from 'u kill' below so you can make sure that it is
    > the same version you are running on - but from looking at the list of
    > lines 175-17F from your example before It looks to be the same.
    >
    > scodb:1> u kill
    > kill pushl %ebp
    > kill+1 movl %eax,pid_stamp
    > kill+6 movl %ebp,%esp

    ....
    > kill+175 movl %eax,4(%edi)
    > kill+178 cmpl %eax,9
    > kill+17D jne kill+18C
    > kill+17F int3


    Yeah, that looks like the right code. (Though it seems a little odd
    that scodb disassembled the breakpoint as "int3". That _is_ the
    instruction it uses to mark a breakpoint, but it normally hides
    breakpoints in a disassembly.)

    I need to try it live on my own system, which I am not near at the
    moment. I'll try later today.

    [later] Yeah, that doesn't work. So do this:

    scodb:1> u kill+10
    kill+10 movl %edi,u.u_ap
    kill+16 pushl %ebx
    kill+17 movl %eax,4(%edi)
    kill+1A testl %eax,%eax [hit q]
    scodb> bp mod kill+1a
    [1] quitif %eax != 9 [hit ESC]
    :q

    This makes a breakpoint that is hit _every_ time the kill() system call
    is invoked. "quitif %eax != 9" causes scodb to quit (i.e. return the
    system to normal operation) if it wasn't a kill -9. This kind of
    breakpoint sometimes has side effects; you should only do this on one of
    your test systems.

    When this breakpoint hits, you will be in the context of the process
    issuing the kill -9, not the process being killed. This is OK. We know
    that whatever process is doing it is killing a whole bunch of other
    processes. You can see the PID to which the kill is being sent by
    doing:

    scodb> d %edi

    If it's 0 or negative, it's a single call that kills multiple processes
    (see kill(S) or kill(C)). Investigate that immediately.

    If it's positive, it's a single-process kill. Quit and see if the
    debugger pops up again immediately. Look at several of the PIDs,
    quitting after each one. Then start investigating. If you're looking
    at the problem event, you know it's going to issue 10-20 kills. None of
    the processes it's killing will disappear while you quit the debugger
    and cycle to the next kill -- the calling process is going to be able to
    fire off a bunch of kill() system calls before any of the receiving
    processes actually receive the signals and start dying. Remember the
    kernel isn't running when you're sitting at the scodb prompt...

    You might get some false hits (processes doing kill -9's that aren't
    part of the problem). You should be able to recognize the real problem
    case pretty easily.

    Once you know what process is doing it, the rest will probably fall into
    place pretty quickly...

    >Bela<


  17. Re: Dying processes (inetd, cron, syslogd, sshd)

    Excellent, thanks Bela, I'll let you know the outcome.

    Keith


  18. Re: Dying processes (inetd, cron, syslogd, sshd)

    Hi,

    I've had this machine now tripinto debug mode, but unfortunately I am
    remote from there so I had to have someone else run the commands you
    wanted and give me the response.

    >From what I can see it was triggered by an rsync - but I don't know how

    to read the output from 'd %edi' to tell you if it was 0 or negative or
    positive.

    ----------------------
    BreakPoint: PID 000005BF /usr/bin/rcmd asa /usr/local/bin/rsync
    --server--

    F01D1676 kill+1A : kill+1a
    ---------------------
    Then the output from 'd %edi' was as follows

    u+1148 000005BF 00000009 00001000 000000000

    Then when scodb was exited it popped straight back in and the output
    for the first part was the same (what triggered it,) but the output
    from 'd %edi' was as follows.

    u+1148 0000052A 00000009 00001000 000000000
    u+1148 00000536 00000009 00001000 000000000
    u+1148 00000542 00000009 00001000 000000000
    u+1148 00000536 00000009 00001000 000000000
    u+1148 00000552 00000009 00001000 000000000
    u+1148 000005BF 00000009 00001000 000000000


    Does this mean anything to you?

    Thanks,

    Keith


  19. Re: Dying processes (inetd, cron, syslogd, sshd)

    keith@actual-systems.com wrote:

    > I've had this machine now tripinto debug mode, but unfortunately I am
    > remote from there so I had to have someone else run the commands you
    > wanted and give me the response.
    >
    > >From what I can see it was triggered by an rsync - but I don't know how

    > to read the output from 'd %edi' to tell you if it was 0 or negative or
    > positive.
    >
    > ----------------------
    > BreakPoint: PID 000005BF /usr/bin/rcmd asa /usr/local/bin/rsync
    > --server--
    >
    > F01D1676 kill+1A : kill+1a
    > ---------------------
    > Then the output from 'd %edi' was as follows
    >
    > u+1148 000005BF 00000009 00001000 000000000
    >
    > Then when scodb was exited it popped straight back in and the output
    > for the first part was the same (what triggered it,) but the output
    > from 'd %edi' was as follows.
    >
    > u+1148 0000052A 00000009 00001000 000000000
    > u+1148 00000536 00000009 00001000 000000000
    > u+1148 00000542 00000009 00001000 000000000
    > u+1148 00000536 00000009 00001000 000000000
    > u+1148 00000552 00000009 00001000 000000000
    > u+1148 000005BF 00000009 00001000 000000000
    >
    > Does this mean anything to you?


    Yes, plenty. Each of those "u+1148" lines represents another trip into
    kill(). The first time, we can see that the caller is PID 0x5BF (1471),
    running `rcmd asa rsync`. %edi points to the arguments, here "5BF" and
    "9": it's sending signal 9 to PID 0x5BF -- which is itself. That seems
    odd... is there any chance that your guy at the keyboard mistyped that
    one?

    The next 6 calls show signal 9 being sent to PIDs 1322, 1334, 1346,
    1334, 1362, 1471 (the last one again being the process doing the kills).
    If we assume that the first one was a typo, we can say that PID 1471
    (rcmd) is killing off a series of other PIDs, ending with itself.

    I experimented with `rcmd`, learning that it does finish by killing
    itself with signal 9. I didn't figure out how to trigger it into a
    killing spree, but there obviously is a way. Since it's killing
    unrelated processes, it's definitely a bug of some sort.

    I'm curious about what's causing `rcmd` to malfunction that way, but
    the main thing is to fix the problem on your system. To do that, let's
    just eliminate the use of `rcmd`.

    Find the script that's doing `rcmd asa rsync`; change it to use a
    different remote command executor. For instance, change it to use `ssh
    asa rsync`. If `rcmd` is adequate then you obviously don't feel the
    need for the level of security offered by `ssh`, but the point here is
    just to use a different tool that doesn't have this bug. You can
    configure `ssh` in ways that allow passwordless remote execution. (It's
    possible that `rsync` itself is directly invoking `rcmd`; in which case
    you can use `rsync --rsh=/usr/bin/ssh` or something like that.)

    >Bela<


  20. Re: Dying processes (inetd, cron, syslogd, sshd)

    Not a problem - I will do that today.

    I started using rcmd as using ssh seemed to cause this problem to
    happen more often! (and as both machines are on the same LAN I wasn't
    worried about security.) As you said, it just a case of changing the
    rsync command that is used.

    Where you think there is a typo that is probably the case as the
    details were transferred from screen to paper to email.

    Thanks again,

    Keith


+ Reply to Thread
Page 1 of 2 1 2 LastLast