Dying processes (inetd, cron, syslogd, sshd) - SCO

This is a discussion on Dying processes (inetd, cron, syslogd, sshd) - SCO ; ----- Original Message ----- From: Newsgroups: comp.unix.sco.misc To: Sent: Wednesday, August 24, 2005 8:11 AM Subject: Re: Dying processes (inetd, cron, syslogd, sshd) > Not a problem - I will do that today. > > I started using rcmd as ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 25 of 25

Thread: Dying processes (inetd, cron, syslogd, sshd)

  1. Re: Dying processes (inetd, cron, syslogd, sshd)


    ----- Original Message -----
    From:
    Newsgroups: comp.unix.sco.misc
    To:
    Sent: Wednesday, August 24, 2005 8:11 AM
    Subject: Re: Dying processes (inetd, cron, syslogd, sshd)


    > Not a problem - I will do that today.
    >
    > I started using rcmd as using ssh seemed to cause this problem to
    > happen more often! (and as both machines are on the same LAN I wasn't
    > worried about security.) As you said, it just a case of changing the
    > rsync command that is used.
    >
    > Where you think there is a typo that is probably the case as the
    > details were transferred from screen to paper to email.


    You can also skip shh (a cpu eater) and rcmd both and install rsync as a
    permanent service.

    if you used to have a command like this running on boxaa:

    rsync -avz --delete /u/asa boxbb:/u

    which replicates /u/asa from boxaa to boxbb

    You can do the following on either boxaa or boxbb, depending on which box
    you want to initiate the transfer.
    Doesn't matter which one is receiving or sending, it matters which one you
    want to initiate the job. Who's cron job calls the shots?
    On the box that is NOT initiating the job, lets say this is boxaa, do the
    following:


    # create 2 config files in /etc
    vi /etc/rsyncd.conf:
    -----top----
    uid = root
    gid = sys
    secrets file = /etc/rsyncd.secrets
    read only = false

    [root]
    path = /
    auth users = root
    ----end----

    vi /etc/rsyncd.secrets:
    ----top----
    root:somepassword
    ----end----


    # create an rc start script
    vi /etc/init.d/rsync:
    ----top----
    #!/bin/ksh

    case $1 in
    start) /usr/local/bin/rsync --daemon ;;
    esac
    ----end----

    # link it into rc2.d so it runs at boot
    ln -s /etc/init.d/rsync /etc/rc2.d/S99rsync

    # use it manually to start it up now without rebooting
    /etc/init.d/rsync start


    now, on boxbb, do the following instead of your previous rsync command:
    RSYNC_PASSWD=somepasswd rsync -avz /u/asa otherbox::root/u
    "somepasswd" has nothing to do with the real root password on either box,
    and probably should intentionally not be the same as the root password,
    because it's written in plain text in a file on both machines. somepasswd
    just needs to be the same between the rsync command on boxbb and the
    rsyncd.secrets file on boxaa

    This is a really insecure set up because it allows the client root privs to
    write anywhere it wants on boxaa, and the password allowing this to happen
    is written in plain text in the script that has the rsync command. Obviously
    just for starters, rsyncd.secrets on boxaa, and the script on boxbb should
    be chmod a-r so that only root can read them. te cron job running the script
    on boxbb will be root and read it fine. The rc script on boxaa is also root.
    But this setup is a direct analog to the original rcmd-as-root or
    ssh-as-root you were using.


    A safer set up is this, create a module in rsync that only sees a certain
    directory (and all subdirectories), and write to it instead of /:

    BOXAA:
    /etc/rsyncd.conf:
    -----top----
    uid = root
    gid = sys
    secrets file = /etc/rsyncd.secrets
    read only = false
    list = false

    [asa]
    path = /u/asa
    auth users = asa
    ----end----

    /etc/rsyncd.secrets:
    ----top----
    asa:asapassword
    ----end----

    BOXBB:
    RSYNC_PASSWD=asapasswd rsync -avz --delete /u/asa/* asa@otherbox::asa


    That says to connect to otherbox as user "asa" , to module asa, and put
    things into the current directory (which we know is already /u/asa)



    If /u/asa/* expands to too large of a list, then you can make a less safe
    arrangement that allows writing to all of /u , so that you can specify
    /u/asa as the source.
    like this:

    BOXAA:
    /etc/rsyncd.conf:
    -----top----
    uid = root
    gid = sys
    secrets file = /etc/rsyncd.secrets
    read only = false
    list = false

    [u]
    path = /u
    auth users = u
    ----end----

    /etc/rsyncd.secrets:
    ----top----
    u:upassword
    ----end----

    BOXBB:
    RSYNC_PASSWD=upasswd rsync -avz --delete /u/asa u@otherbox::u


    Note: you do not have to create a unix user "asa" or "u" anywhere.
    You did not have to use the "user@" part of the syntax in my first example
    because I'm assuming the script is running as root in a cron job.
    Although the rsync user is not related to the unix user anywhere, if you
    don't specify a user to use with the "::" syntax, then it defaults to using
    your current unix login name.

    In all cases shown here, the rsync daemon on boxaa is still being started by
    root and the config file tells it to continue operating as root, which means
    that it has permission to create, destroy, overwrite any files anywhere, and
    it has permission to set any ownerships and permissions. So any files that
    get sent from boxbb get their permissions and ownership duplicated as well
    as their contents.

    But the module "asa" only "sees" /u/asa, so a client using that module
    cannot write anywhere outside of /u/asa. And in my examples I also made it
    so that only the rsync user "asa" has permission to use module asa. Not even
    root can get around that since rsync is only concerned with the contents of
    rsyncd.secrets and rsyncd.conf, and no "root" exists in my later examples,
    and even if it did, root is not listed on the "auth users" for the asa
    module. You could make a unix-like arrangement where root can do everything
    and asa can do a subset by having both root and asa modules in rsyncd.conf .

    BOXAA:
    /etc/rsyncd.conf:
    -----top----
    uid = root
    gid = sys
    secrets file = /etc/rsyncd.secrets
    read only = false
    list = false

    [root]
    path = /
    auth users = root

    [asa]
    path = /u/asa
    auth users = asa
    ----end----

    /etc/rsyncd.secrets:
    ----top----
    root:somepassword
    asa:asapassword
    ----end----


    A typical use of this type of arrangement might be:
    BOXBB:
    RSYNC_PASSWD=asapasswd rsync -avz --delete /u/asa/* asa@otherbox::asa
    RSYNC_PASSWD=somepasswd rsync -avz /etc/default/asa
    otherbox::root/etc/default

    your cron job uses the asa module, and yet you can still use the root module
    manually to do anything you want

    Actually, for manual, transient, rootly tasks, you can still use your
    existing familiar rcmd-based commands. None of the above affects the rcmd or
    ssh based functionality in the slightest.
    So you can try setting it up and testing it while any cron jobs continue to
    do what they're already doing.

    For the record, I use rsync a lot and for quite a while, both in twice daily
    mirror scripts between numerous hosts, and for random manual admin tasks as
    a more convenient alternative to ftp/sftp.
    And I have not had any problem with killing sprees like that, so you might
    want to try using the version of rsync I'm using. The actual binaries are
    here:
    http://www.aljex.com/bkw/sco/#rsync

    rsync.tar.bz2 is the latest version, there is also the next previous version
    I used. That previous version I used a _lot_ all over the place. It's
    received a lot more testing than the new one but I have been using the new
    one heavily too, but not on many hosts yet, since the creation date shown on
    the page.
    The previous version is the full source after compile so you have to
    manually copy out the binaries out, or run "make install" if you have
    "make".
    The unversioned tar is always the latest verion (that I've built anyway)
    with just the binries and man pages already in their installed absolute
    paths, just untar it and it already works as a client. It's compiled-in
    defaults match it's own install path and default to using rcmd, so if you
    put the same version on two boxes, you don't have to specify --rsync-path=
    or --rsh=
    just "rsync -avz --delete source target"
    If "rsync --version" crashes, install oss646c.
    Then try the recipe above.

    Replacing the rsync binary is the only thing that might affect your existing
    cron jobs depending on how the cron job is written.
    The cron job might or might not start using the new binary, and only if that
    happens, the options used might or might not need simple adjusting due to
    differences in the compiled in defaults for --rsync-path and --rsh.

    Doing it this way:
    * doesn't require user equivalency. (/.rhosts, /etc/hosts, /etc/hosts.equiv)
    all you need to set up is rsyncd.conf & .secrets
    * the tcp traffic is handled directly by rsync itself, meaning:
    - no ssh encryption/decryption eating cpu on both boxes
    - you can use or not-use compression to better suit the job at hand.
    (use -av on a dir full of jpg/tiff/png, use -avz on a database)
    - if rcmd or ssh is doing anything bad like dropping connections,
    hanging/stalling, your killing spree issue, well they are not used any more.
    - can play with rsync options like --blocking-io to see if it helps
    connection reliability problems, that might not have as much effect if the
    traffic is really going over some other client & server like rcmd or ssh if
    they don't have equivalent options themselves.

    Brian K. White -- brian@aljex.com -- http://www.aljex.com/bkw/
    +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++.
    filePro BBx Linux SCO FreeBSD #callahans Satriani Filk!


  2. Re: Dying processes (inetd, cron, syslogd, sshd)

    Brian K. White wrote:

    > ----- Original Message -----
    > From:


    > > I started using rcmd as using ssh seemed to cause this problem to
    > > happen more often! (and as both machines are on the same LAN I wasn't
    > > worried about security.) As you said, it just a case of changing the
    > > rsync command that is used.

    >
    > You can also skip shh (a cpu eater) and rcmd both and install rsync as a
    > permanent service.
    >
    > if you used to have a command like this running on boxaa:
    >
    > rsync -avz --delete /u/asa boxbb:/u
    >
    > which replicates /u/asa from boxaa to boxbb
    >

    [...]
    > now, on boxbb, do the following instead of your previous rsync command:
    > RSYNC_PASSWD=somepasswd rsync -avz /u/asa otherbox::root/u


    [heavily chopped]

    I just wanted to comment that this does sound like a good idea. My
    previous quick glance at the `rsync` man page left me with the
    impression that it always used `rsh` unless overridden to e.g. `ssh`.
    With Brian's examples in mind, I went back and read about the "::" and
    "rsync://" naming schemes and the fact that this doesn't involve a
    separate remote execution protocol. Switching from remote execution to
    remote copying is a strength reduction -- it slightly improves the
    security of the process. Plus, of course, `rsync` is 100% designed to
    talk to itself; eliminating the oblique complexities of a remote command
    executor should be helpful. And finally it _will_ avoid the killing
    spree problem since `rcmd` (rsh) will be out of the picture.

    >Bela<


  3. Re: Dying processes (inetd, cron, syslogd, sshd)


    Bela Lubkin wrote:
    > Brian K. White wrote:
    >
    > > ----- Original Message -----
    > > From:

    >
    > > > I started using rcmd as using ssh seemed to cause this problem to
    > > > happen more often! (and as both machines are on the same LAN I wasn't
    > > > worried about security.) As you said, it just a case of changing the
    > > > rsync command that is used.

    > >
    > > You can also skip shh (a cpu eater) and rcmd both and install rsync as a
    > > permanent service.
    > >
    > > if you used to have a command like this running on boxaa:
    > >
    > > rsync -avz --delete /u/asa boxbb:/u
    > >
    > > which replicates /u/asa from boxaa to boxbb
    > >

    > [...]
    > > now, on boxbb, do the following instead of your previous rsync command:
    > > RSYNC_PASSWD=somepasswd rsync -avz /u/asa otherbox::root/u

    >
    > [heavily chopped]
    >
    > I just wanted to comment that this does sound like a good idea. My
    > previous quick glance at the `rsync` man page left me with the
    > impression that it always used `rsh` unless overridden to e.g. `ssh`.
    > With Brian's examples in mind, I went back and read about the "::" and
    > "rsync://" naming schemes and the fact that this doesn't involve a
    > separate remote execution protocol. Switching from remote execution to
    > remote copying is a strength reduction -- it slightly improves the
    > security of the process. Plus, of course, `rsync` is 100% designed to
    > talk to itself; eliminating the oblique complexities of a remote command
    > executor should be helpful. And finally it _will_ avoid the killing
    > spree problem since `rcmd` (rsh) will be out of the picture.
    >
    > >Bela<


    It also sounds like a good idea to me - however it may take a little
    time to setup etc, so for the moment I have just changed to use ssh
    rather than rcmd. This method is something that may be very helpful as
    I am relying more and more on rsync and a method to do so without as
    much overhead is definately a good thing.

    Keith


  4. Re: Dying processes (inetd, cron, syslogd, sshd)

    Hi Bela,

    I sent you an email a couple of days ago - I was wondering if you
    received it or if you want me to resend it.

    It was to discuss what you had proposed in a previous email to me on
    this subject.

    Thanks,
    Keith


  5. SOLUTION, Re: Dying processes (inetd, cron, syslogd, sshd)

    A few months ago, Keith Crymble of Actual Systems posted a puzzle.
    Multiple systems running SCO OpenServer Release 5.0.6 + rs506a were
    having an intermittent problem. The symptom was that most of the
    processes on the system would suddenly die without warning.

    After some public discussion, Keith and I agreed that the most likely
    way to solve the problem was for me to access the systems directly. We
    arranged to do this under the auspices of the company I am now working
    for. We also agreed that when the problem was solved, I would post the
    story so that others might avoid the problem in the future.

    So...

    In brief, the problem was being caused by an old, buggy version of the
    pseudo-random number generator, `prngd`. The eventual solution was
    simply to upgrade the machines to a more recent version of `prngd`.

    Now for some details. I will describe the discovery process, the actual
    cause of the problem, the important details of the symptoms, and the
    solution.

    We arranged for me to have console access to two live backup machines
    which were experiencing the problem. Keith configured his firewall to
    allow me to `ssh` in to a master machine. From that system I could `cu`
    to COM1 of each of the test machines. Keith configured scodb into their
    kernels, and booted them with COM1 as their consoles. Now I had remote
    live kernel debugger access.

    I installed GNU `screen` on the master machine so that I could connect
    and disconnect at will without losing console output from the test
    machines.

    I established kernel breakpoints to enter the debugger whenever certain
    system processes (like `cron`, `inetd` etc.) died. Then it was a matter
    of waiting for an "event".

    In the first event, the process was being killed by signal 9 (SIGKILL).
    Many other processes had pending SIGKILL signals. The timing made me
    think that the problem was being caused by a single multi-process kill
    rather than a series of individual kills. The kill(S) system call
    accepts special arguments of "-PGID" (negative process group ID) to kill
    all processes in a particular group; or -1 to kill all processes in the
    system. Unfortunately, the process responsible for the call had
    finished and gone on to other business; I didn't catch it in the act.
    This might have been different on a single-CPU system, but Keith's test
    systems were MP. While I was examining the dying process on one CPU
    (probably in the few milliseconds after the debugger prompt came up),
    the culprit was finishing his dirty work on the other CPU.

    Now that I thought it was a multi-process SIGKILL, I put a breakpoint
    into kill(S) itself, essentially:

    if (signal is SIGKILL and process to kill is < 0 [multiple procs])
    breakpoint

    This eventually triggered as well. Previous evidence had lead me to
    suspect the remote-command program, `rcmd` (`rsh` on other *ix systems).
    `rcmd` was already implicated because, according to Keith, the events
    always happened while large files were being copied across the network.
    Also, I had run `truss` on it and observed that it used SIGKILL to kill
    off its own child process. I suspected a race condition where `rcmd`
    was getting confused about its child process's ID and mistakenly doing
    `kill(-1, 9)'.

    So I was fairly surprised when the actual caller was `prngd`! Sure
    enough, it was calling `kill(-1, 9)'. But why??

    Keith's systems were running `prngd` version 0.9.6. I found the
    development site for `prngd`,

    http://www.aet.tu-cottbus.de/persone...tls/prngd.html

    and its FTP repository,

    ftp://ftp.aet.tu-cottbus.de/pub/post...related/prngd/

    Fortunately there was an "old" subdirectory with 40 previous versions of
    the `prngd` source. This allowed me to see how this bug appeared and
    later disappeared.

    `prngd` is the pseudo-random number generator; on OpenServer, this is
    primarily used by `ssh`. (Interestingly, Keith had earlier mentioned
    that the problem seemed to get worse when they used `ssh` instead of
    `rcmd` for their large file copies. Now that we've identified `prngd`
    as the culprit, this makes sense...) Cryptographical protocols tend to
    use random numbers to make it harder to crack the encryption. `prngd`
    creates "pseudo" random numbers by running what it calls "entropy
    gathering commands". On OSR5, these are commands like `ps -efl`,
    `netstat -in`, `df`, and `tail -200 /var/adm/syslog`. The command list
    is stored in /etc/prngd.conf.

    Since `prngd` is portable to many operating systems, it has to deal with
    all sorts of unusual conditions. One of these is this: some time in the
    past, it had run into entropy gathering commands which would sometimes
    hang. You can imagine that a command like `netstat -in`, which dives
    into kernel networking structures, might have some obscure bugs. In
    order to protect itself from possible hangs, `prngd` monitors the
    entropy gathering commands and kills them off if they run for too long.
    (Since this appears in the first public release of `prngd`, it looks
    like the author anticipated the hanging problem without necessarily
    having seen it.)

    The code for this in `prngd` 0.9.6 looked sort of like this:

    pid = (start the entropy gathering command, report its PID)
    ...
    if (too much time has gone by: entropy gathering command seems hung)
    if (pid != -1)
    kill(pid, SIGKILL)

    It also set up a signal handler to receive SIGCHILD, which notifies a
    parent process when its subprocess dies. Some of the code in that
    handler looked like this:

    pid = -1 /* note no entropy gathering command currently running */

    This lead to a race condition. The fatal sequence went like this:

    pid = (start the entropy gathering command, report its PID)
    ...
    if (too much time has gone by: entropy gathering command seems hung)
    if (pid != -1)

    /* at this moment, pid isn't -1 */

    /* also at this moment, the entropy gathering command
    finishes, so we enter the SIGCHILD handler */

    ...
    pid = -1
    ...

    /* back in the main code */

    kill(pid, SIGKILL)

    The main code intended to kill off the one entropy gathering process,
    but it got tricked into sending SIGKILL to PID -1. Which means "kill
    every process on the system". Whoops.

    The timing window between `if (pid != -1)' and `kill(pid, SIGKILL)' is
    small. You might think "that will never happen!". People who program
    in multithreaded / multitasking environments soon learn that this sort
    of "race condition" _always_ eventually shows up. The CPU is running
    millions or billions of instructions per second and this window is only
    a few instructions wide, but you can be sure it will eventually get hit.
    If the consequences were less catastrophic, it might never be noticed.
    For instance if the only consequence was that, once every few million
    runs, an "entropy gathering process" would hang and never finish (but
    `prngd` kept running), it might never have been fixed.

    I started checking other versions of `prngd`. The first public release
    of `prngd` was 0.1.0, on 2000-07-03, and it already had this bug. 0.9.6
    was published on 2001-02-19. One short week later, 2001-02-26, version
    0.9.8 was released -- including the fix for this bug. (The current
    version is 0.9.29, released 2004-07-12...) The bug lasted about 7 1/2
    months.

    So this was an already known bug in a system daemon. The tricky part
    was tracking back from the symptoms to their cause.

    Now, a word about the symptoms. Keith had reported that "most of the
    running processes just seem to stop". Here's what was actually
    happening. When `prngd` called `kill(-1, 9)', this sent SIGKILL to
    every _eligible_ process. The kill(C) man page says:

    " If the effective user ID of the sender is root, send the
    " signal to all processes (except processes 0 and 1).

    This is an imperfect description, because certain other processes are
    also exempt. The signal is not sent to any kernel processes. On OSR5,
    these include "sched", "vhand", "bdflush", "CPUn idle process",
    "kmdaemon", "vddaemon", "strd", "htepi_daemon", "dtdaemon", and possibly
    others.

    Perhaps even more importantly, process 1 -- `init` -- is exempt. One of
    init's jobs is to restart certain processes if they die. After an
    "event" on one of Keith's systems, `init` restarted all the processes it
    was responsible for: about a dozen `getty` processes (for the console
    multiscreens), and the daemon starting daemon, `sdd`.

    By the time a human got a look at the system, after an event which
    theoretically killed "every" process on the system, there were about 20
    processes running.

    Now you too can recognize the symptoms of a "kill(-1, 9)" call...

    The solution, of course, was very simple. Keith installed a newer
    version of `prngd`, and the problem hasn't happened once in the last
    month. It used to happen several times a week.

    I recommend checking the `prngd` (sometimes called `in.prngd`) binaries
    on all of your systems running any sort of *ix OS -- OpenServer,
    UnixWare, Linux, Mac OS/X, BSD, whatever. If you find a version earlier
    than 0.9.8, upgrade to a more recent version. (`prngd` version 0.9.10
    came after 0.9.9.) All *ix systems are vulnerable to some amount of
    trouble from this bug. Even if `prngd` is run under its own separate
    user ID, it would still succeed in killing _itself_, thus shutting down
    the pseudo random number service.

    Given the age of the problematic versions, you won't find many of them
    running around out there. But it's still worth checking.

    Likewise, if you ever have "nearly all" processes on a system die,
    consider whether it might have been a global kill. As we saw here, this
    doesn't necessarily leave the process table completely blank.

    ================================================== ===========================

    I am available for Unix problem solving and security consultation
    through IS-Data, LLC, a Santa Cruz consultancy specializing in network
    security. www.is-data.net.

    >Bela<


+ Reply to Thread
Page 2 of 2 FirstFirst 1 2