forking from a large active process in solaris - Unix

This is a discussion on forking from a large active process in solaris - Unix ; In article , David Schwartz wrote: > On Sep 12, 7:49*pm, Barry Margolin wrote: > > > The point is that if you CAN report a problem synchronously, this is > > preferable. > > You can report any problem ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 34 of 34

Thread: forking from a large active process in solaris

  1. Re: forking from a large active process in solaris

    In article
    <2c392a6c-f1d2-45f0-b66c-ed43240d10dd@c22g2000prc.googlegroups.com>,
    David Schwartz wrote:

    > On Sep 12, 7:49*pm, Barry Margolin wrote:
    >
    > > The point is that if you CAN report a problem synchronously, this is
    > > preferable.

    >
    > You can report any problem synchronously if you assume it will happen.
    > You cannot report any problem synchronously at a time when you cannot
    > be sure it will happen.


    With synchronous error reporting, the lack of an error means that the
    system has done everything possible to ensure that your request can be
    satisfied. There may be conditions outside its capability to handle
    ("acts of god"), but those within its capabilities should be dealt with.

    With asynchronous error reporting, a successful system call doesn't tell
    you much.

    Another important difference is in how the application can handle the
    errors. If the error can be detected at a predictable place in the
    code, it's easier to put a good error handler there. It can ensure that
    fork() isn't done within a critical section. Reasonable error messages
    can be printed or logged. If the program fails asynchronously with a
    signal, it's much harder to do any of these things.

    >
    > > Some things you can't do anything about, like power failures and memory
    > > chip errors, and those will necessarily prodyce asynchronous errors. *
    > > But you should do as much as you can to avoid that type of error
    > > detection.

    >
    > I agree. So the question is which category this kind of problem falls
    > into. Rational people can disagree, so it's made a configurable option
    > on many operating systems.


    It's basically a difference between optimism and pessimism.

    Reserving swaps space at fork time, is pessimistic. It assumes the
    worst, that the child process may eventually write to all the
    copy-on-write pages, and fails immediately if it can tell that this
    won't work. It's actually even more pessimistic than that, since it
    assumes that the child will touch all the pages before enough other
    processes free up swap space to allow it.

    Lazy allocation is at the opposite end of the spectrum, it's optimistic.
    It assumes that the process won't modify enough pages to run out of
    swap, or that other processes will return swap space in time to satisfy
    the child's needs.

    Swap space reservation is like a credit check. Lazy allocation is like
    giving a loan to a friend.

    --
    Barry Margolin, barmar@alum.mit.edu
    Arlington, MA
    *** PLEASE post questions in newsgroups, not directly to me ***
    *** PLEASE don't copy me on replies, I'll read them in the group ***

  2. Re: forking from a large active process in solaris

    John Tsiombikas writes:
    > On 2008-09-12, Rainer Weikusat wrote:
    >>>
    >>> Which maybe could have been accomplished. Obviously, you don't
    >>> know much about writing correctly functioning software.

    >>
    >> 'Obviously', you are really confused regarding memory allocation
    >> requirements and COW-fork implementations: A page needs to be copied
    >>
    >> ... more CoW-related-rambling we all know how CoW works, thank you
    >> ...


    This was a technical description of the procedure, including what is
    and what isn't known to operating system versus what is and isn't
    known to someone aware of what a set of applications actually does.

    So, we both agree that the description is basically correct.

    >>> And how do you accomplish "detecting the error" under Linux
    >>> (supposing you haven't disabled this behavior).

    >>
    >> Which error?

    >
    > The error that the system doesn't have enough memory to give you...
    > Look, it's very simple... you can either detect that error by fork
    > returning -1, or you can "detect" it by your process (and/or possibly
    > other processes) being killed without explanation. How can you possibly
    > support that the second case is preferable?


    Because, as of the technical description I gave, the system cannot
    'detect' this situation at fork. I can make an uneducated guess based
    on its current state and an unfounded assumption about the future
    behaviour of the application which called fork. Based on this guess,
    it can refuse to attempt to do something which it might be capable of
    doing, requested by someone having the means to determine if it will
    and to assess the possible consequences in cases it won't.

    In the end, this is a ressource usage policy question: Should the
    system try as hard as possible to service a request, possibly not
    being able to do so, or should it rather refuse to do anything at all
    when it presently cannot determine the chances of a success in
    advance. Not trying causes an immediate malfunction. Trying may cause
    a more inconvenient future malfunction.


  3. Re: forking from a large active process in solaris

    Casper H.S. Dik writes:
    > John Tsiombikas writes:
    >
    >>P.S. The semantics of fork MUST be maintained, however there is need for
    >>a system call with different semantics (i.e. when you don't care about
    >>duplication of the whole process memory), and there is such a system
    >>call, it's called vfork (although it's not always properly implemented).

    >
    > Even without that semantics, fork() will always take O(N) where N is
    > the size of the process address map.


    Not necessarily. The page tables could be shared, too.

    > vfork() is better but should be used in other primitives:


    vfork cannot portably be used for anything, because the behaviour is
    undefined in case of a failed exec. Even if this wasn't this way, it
    would still suspend the parent process for an indefinite amount of
    time.

  4. Re: forking from a large active process in solaris

    Rainer Weikusat writes:

    >Casper H.S. Dik writes:
    >> John Tsiombikas writes:
    >>
    >>>P.S. The semantics of fork MUST be maintained, however there is need for
    >>>a system call with different semantics (i.e. when you don't care about
    >>>duplication of the whole process memory), and there is such a system
    >>>call, it's called vfork (although it's not always properly implemented).

    >>
    >> Even without that semantics, fork() will always take O(N) where N is
    >> the size of the process address map.


    >Not necessarily. The page tables could be shared, too.


    You still need to mark the pages as "clone when changed".

    >> vfork() is better but should be used in other primitives:


    >vfork cannot portably be used for anything, because the behaviour is
    >undefined in case of a failed exec. Even if this wasn't this way, it
    >would still suspend the parent process for an indefinite amount of
    >time.


    Which is why I wrote that you should use the primitives which might use
    vfork such as system() and posix_spawn(), not use vfork() itself.

    Casper
    --
    Expressed in this posting are my opinions. They are in no way related
    to opinions held by my employer, Sun Microsystems.
    Statements on Sun products included here are not gospel and may
    be fiction rather than truth.

  5. Re: forking from a large active process in solaris

    Casper H.S. Dik writes:
    > Rainer Weikusat writes:


    [...]

    >>vfork cannot portably be used for anything, because the behaviour is
    >>undefined in case of a failed exec. Even if this wasn't this way, it
    >>would still suspend the parent process for an indefinite amount of
    >>time.

    >
    > Which is why I wrote that you should use the primitives which might use
    > vfork such as system() and posix_spawn(), not use vfork() itself.


    'system' is a library routine which starts the 'system command
    interpreter' and causes it to execute a single command. This is not 'a
    primitive' by any sensible definition of the term. Starting a shell is
    completely useless, except insofar that it adds a few more failure
    opportunities, unless the intent is to actually make use of facilities
    provided by the shell not easily available without invoking
    it. 'Executing programs' is not among this facilities. 'posix_spawn'
    is (still) an interface intended for small, embedded systems, mainly,
    such lacking a MMU.

    That both happen to be useful to work around deficiencies (or
    'pecularities') of the Solaris fork-implementation is certainly of
    some interest in applications exclusively targetting newer
    version of Solaris. But just ignoring the issue and assuming that
    market pressure will sooner or later result in either the Solaris-fork
    being taught a reasonable behaviour or in Solaris just withering away
    is an attractive other option. For instance, 'posix_spawn' does not
    exist in my usual target environment, starting a shell without purpose
    needs a lot more time and memory than I am willing to spend and
    lastly, this system does not have any possibility of swapping, so the
    chance of ever encountering Solaris there (or in any other fairly
    small, embedded device) are zero.

  6. Re: forking from a large active process in solaris

    On Sep 12, 3:45 pm, David Schwartz wrote:
    > On Sep 12, 6:36 am, James Kanze wrote:


    > > Nobody is saying anything like it. Correct is what is
    > > necessary for reliable applications to work correctly. I've
    > > worked on a number of critical applications, and we test
    > > that we do reliably detect all possible error conditions.
    > > If we can't disable such lazy allocation, we can't use the
    > > system. It's that simple. The code has to be correct.


    > So shouldn't 'fork' always fail, since the system cannot be
    > sure that power will remain on for as long as the 'fork'ed
    > process might want to run? Isn't it better to warn now that we
    > might have to fail horribly later, when the application can
    > sensibly handle the inability to ensure the child can run to
    > completion?


    > Heck, memory might get hit by a cosmic ray. Shouldn't 'malloc'
    > always fail, so the application doesn't take the risk of
    > seeing the wrong data later? Surely the application has some
    > sensible way to handle 'malloc' failure, but it can't be
    > expected to properly handle an uncorrectable ECC error.


    Sure. No system is 100% reliable. But by the same token, you
    don't want to take arbitrary steps to reduce reliability.

    > Your definition of "reliable" is simply not useful in many
    > contexts. It may be useful in yours, which is why it's often
    > handy to be able to disable overcomittment.


    Of vice versa. There are certainly trade-offs, and not all
    systems need that high a degree of reliability. Ideally, it
    should be an option, on a per process basis. IIRC, AIX has it
    disabled by default, but will overcommit if you define the right
    environment variable. Which sounds like the ideal solution.

    The problem with having it on by default is that far too many
    people don't even realize that it exists, and so don't know to
    turn it off, even when their application requires it.

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

  7. Re: forking from a large active process in solaris

    On Sep 12, 6:35 pm, Rainer Weikusat wrote:
    > James Kanze writes:
    > > On Sep 12, 12:09 pm, Rainer Weikusat wrote:
    > >> James Kanze writes:


    > > [...]
    > >> >> AFAIK, AIX does it differently (as does Linux).


    > >> > I'm not sure.


    > >> So, why didn't you simply check it, instead of posting your
    > >> speculations?


    > > Because I don't have access to them.


    > You have access to all of the related documentation, which IBM
    > happens to publish online.


    I'd still have to find the relevant parts of the documentation,
    which would take significant time.

    > > I have been told that this broken behavior of AIX has been
    > > fixed,


    > You can attach 'loaded' adjectives to anything you want to
    > attach them to and you are completely free to never quote any
    > actual reference supporting your opinion, but both serves only
    > to devalue your standpoint: Apparently, you cannot support it
    > with arguments and are not aware of references supporting it.


    Basically, because it's not really that important. It's been
    some time since I've had to worry about AIX. (My current
    worries are Linux, which suffers from the same problem.) At the
    time I first looked into the question, there was considerable
    talk of AIX, because it had only recently dropped the policy of
    supporting only deferred commit, because of customer preasure.
    (At least at that time, IBM was trying to get AIX accepted for
    critical applications.)

    The real question concerns the appropriateness of lazy commit
    (AKA the system telling lies---and I know, that's a loaded
    assessment). I don't dispute that there are times when it can
    be appropriate, and the reduced reliability is more than made up
    for by other benefits which it provides. But it does provide a
    serious security problem, and shouldn't occur without the user
    being aware of it; IOW, it should only occur if the user
    explicitly requests it.

    From a pratical point of view, you can't allow it if you're
    running a server, and you probably don't want it active for
    anything connected to the network (but maybe I'm just being
    paranoid about the latter).

    > Sun claims that 'AIX, Linux and HP-UX' support overcommitment:


    They "support" it (although I don't know about HP-UX). All of
    them can, as far as I know, turn it off: it was turned off on
    the only development I've done under HP/UX; I don't know whether
    this was the default, or whether it was a result of some special
    configuration by the sysadmins, but the application I was
    working on (network management) certainly wouldn't have accepted
    it, and I tested it. The one time I developed under AIX, we
    didn't get far enough to start testing anything, but since I'd
    heard about the issue, I raised the question, and was assured
    (by IBM) that it was no longer a problem---that deferred
    allocation only applied to processes which explicitly asked for
    it. Under Linux, of course, you use sysctl in your rc files to
    turn it off globally; regretfully, there's no way to turn it off
    except for certain processes. (And the last time I checked,
    Linux' behavior when a page allocation failed was a disaster,
    resulting, in certain cases, in critical system processes being
    killed. I believe some distributions have patches for this.)

    > Some operating systems (such as Linux, IBM AIX, and HP-UX)
    > have a feature called memory overcommit (also known as lazy
    > swap allocation). In a memory overcommit mode, malloc() does
    > not reserve swap space and always returns a non-NULL pointer,
    > regardless of whether there is enough VM on the system to
    > support it or not.


    > http://developers.sun.com/solaris/ar...ubprocess.html


    > This is already a text with an agenda, because it mixes two
    > different things (how memory allocation requests made by an
    > application are handled vs what the kernel does or doesn't
    > allocate autonomously as part of a fork), trying to paint them
    > as identical.


    Regretfully, some systems (e.g. Linux) treat them as identical.

    > Obviously the conclusion


    > malloc() does not reserve swap space and always returns a
    > non-NULL pointer,


    > =>


    > The fork() call never fails because of the lack of VM.


    > is a non-sequitur.


    Or just an extreme over simplification.

    > 'malloc' is an interface to some memory allocator implemented
    > as part of the C-library wich uses unspecified system
    > interfaces (brk and/or mmap) to actually allocate virtual
    > memory. It isn't used as part of the fork system call. The
    > assumption that an application intends to actually use memory
    > it allocated is sensible. The assumption that a significant
    > portion of the address space of a process which called fork
    > will need to be copied is something rather different.


    I understand this point of view; I see what you're getting at.
    The problem remains that some part of the memory will be used by
    the forked process, and we don't have any real way of saying how
    much. Should the system pre-allocate, say, five pages, on the
    grounds that that will be enough?

    In the end, of course, the real problem is that fork/exec was
    broken from the start. The whole idea that you first "copy" the
    entire process, when all you really want to inherit from it is
    its standard input and output, is rather ludicrous, when you
    think about it. For many years, however, rather than fix it,
    various hacks like vfork have been tried to work around it. But
    now that we have posix_spawn, there's no need (except that some
    implementations, like Linux, implement posix_spawn as a fork
    followed by an exec).

    > [...]


    > >> [...]
    > >> > Otherwise, they're broken, and can't be used where
    > >> > reliability is an issue.


    > >> 'Reliably failure because of possibly incorrect assumptions
    > >> about the future' still doesn't increase the 'reliably' of the
    > >> system to actually perform useful work.


    > > Reliably detecting failure conditions is an important part of
    > > reliability.


    > What I wrote was that a system which fails to accomplish
    > something is not capable of reliably doing 'something'.


    The reason it fails is that it cannot determine whether it can
    do it reliably or not, and that it hasn't any effective means of
    controlling and reporting the error later.

    > After all, it doesn't do it. A radio which never plays a sound
    > is not a reliable radio. It is a broken radio (which
    > 'reliably' fails to fulfil its purpose). A radio which
    > sometimes fails is a more reliable radio than the former: It
    > fulfils its purpose at least intermittently, so it may
    > actually be useful, while the other is just decoration.


    A radio that fails when you most need it is not a reliable as
    one that doesn't fail, once you've verified before hand that it
    should work.

    > This has no relation to your general statement above (which
    > could be disputed on its own, but I am not going to do that).


    > > [...]
    > >> There is really no need to continously repeat that 'correct
    > >> is whatever Solaris happens to do'.


    > > Nobody is saying anything like it. Correct is what is
    > > necessary for reliable applications to work correctly. I've
    > > worked on a number of critical applications, and we test
    > > that we do reliably detect all possible error conditions.
    > > If we can't disable such lazy allocation, we can't use the
    > > system. It's that simple. The code has to be correct.


    > If 'code' fails to work in situations where it could have
    > worked, it is broken. That simple.


    If code fails without notification, when it could have verified
    the condition before hand and notified, then it is broken.

    In the end, it's a question of false positives vs. false
    negatives. When reliability is an issue, you simply cannot
    accept false positives (yes, it's working). In other contexts,
    of course, it all depends.

    > [...]


    > >> Personally, quoting an ex-colleague of mine "I don't give a
    > >> flying ****" what adjectives some set of people would like to
    > >> attach to a failure to accomplish something which could have
    > >> been accomplished.


    > > Which maybe could have been accomplished. Obviously, you
    > > don't know much about writing correctly functioning
    > > software.


    > 'Obviously', you are really confused regarding memory
    > allocation requirements and COW-fork implementations: A page
    > needs to be copied whenever more than one process is currently
    > referencing it and one of those processes writes to this page.
    > At this time, the system needs to 'somehow' find a unused
    > page, copy the contents of the other into it and modify the
    > page tables of the writing process to refer to the new page,
    > while all others can continue to share the old one. This means
    > that the amount of pages which will have to be copied in
    > future is a number less than or equal to the amount of
    > COW-shared pages times the number of processes sharing them
    > minus one. That's the only 'reliable' information available to
    > the system, because anything else depends on the future
    > beahviour of the application which forked. I can know
    > something about the future behaviour of this application,
    > because I know the code. Hence, I can conclude that it will be
    > possible to perform an intended operation, while the system
    > cannot conclude that it won't. It can only assume that,
    > leading to spurious failures whenever this assumption was
    > wrong.


    Which would all be very fine IF you could somehow inform the
    system about this, so that the fork would fail if the resources
    you expect weren't available.

    > This is actually even more complicated because both the amount
    > of available swap space and the amount of free RAM vary over
    > time, depending on the [future] behaviour of other
    > applications running on the same system[*]. The only
    > 'reliable' way for the system to determine if it can copy a
    > page which needs to be copied is when the corresponding page
    > fault needs to be handled: By this time, the required
    > ressource are available or are not available.


    The system can ensure that the resources are available up front.
    It will, of course. reserve more resources than are probably
    necessary, but that's the price you pay for reliability.

    > Anything else is guesswork. That the system may now be
    > incapable of performing an operation (which may and likely
    > will not even be necessary) does not mean it will be incapable
    > at the time when the operation becomes necessary.


    >[*] And 'future administrative actions': More swap space could
    > be added, eg if the presently available one is running low.


    > > And how do you accomplish "detecting the error" under Linux
    > > (supposing you haven't disabled this behavior).


    > Which error?


    That the fork actually does fail? Note that the first process
    to actually write to the page might be the parent, so it will be
    the parent which is killed, and not the child. (Actually, at
    different times, different strategies have been tried. Older
    versions of AIX, for example, would start by killing the
    processes which used the most memory; back then, that generally
    meant emacs. In at least one case known to me, Linux killed the
    init process, so no further log-in's were possible.)

    I can understand your wanting the function to work. It's a
    perfectly reasonable desire. But the fork() interface doesn't
    permit supporting it in a safe manner. On an isolated machine,
    which you control completely, if you know what you are doing,
    it's perfectly fine to activate deferred allocation. As a
    default policy, it's an invitation to DoS attacts, and it easily
    catches users unawares, with other users paying for your
    carelessness.

    > >> It didn't do what I wanted it to do, despite it could have,
    > >> which I knew beforhand.


    > > How did you know it? You can't know, that's the whole point.


    > I can, because I can know the actual memory requirements,
    > because I know the application code. This should actually be
    > self-evident to anyone.


    So your application code never calls a system library after the
    fork? Neither on the child side nor in the parent? And you can
    communicate somehow exactly how many pages you need to the
    system, so it can guarantee this small number?

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


  8. Re: forking from a large active process in solaris

    On Sep 13, 4:53 am, David Schwartz wrote:
    > On Sep 12, 7:49 pm, Barry Margolin wrote:


    > > The point is that if you CAN report a problem synchronously,
    > > this is preferable.


    > You can report any problem synchronously if you assume it will
    > happen. You cannot report any problem synchronously at a time
    > when you cannot be sure it will happen.


    It's largely a question of contract, and user expectations.
    What does (or should) fork guarantee.

    > > Some things you can't do anything about, like power failures
    > > and memory chip errors, and those will necessarily prodyce
    > > asynchronous errors. But you should do as much as you can
    > > to avoid that type of error detection.


    > I agree. So the question is which category this kind of
    > problem falls into. Rational people can disagree, so it's made
    > a configurable option on many operating systems.


    The issues are more subtle than that, and there are really more
    than just two choices. There's a difference, for example,
    between making it a poorly documented system-wide configuration
    parameter, and being able to define the behavior on a process by
    process basis. There's a difference depending on whether the
    default behavior is what most people expect (whether their
    expectations are reasonable or not). There's a very big
    difference depending on what happens when you run out of memory
    later. Neither of the two systems I currently have available is
    perfect: Solaris defines a safe and robust default, but provides
    no way of modifying it for special cases. Linux has an almost
    unacceptable default (if nothing else because of its behavior
    when a page allocation fails), and the only means of modifying
    it is poorly documented, and global.

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

  9. Re: forking from a large active process in solaris

    Barry Margolin writes:

    [...]

    > It's basically a difference between optimism and pessimism.
    >
    > Reserving swaps space at fork time, is pessimistic. It assumes the
    > worst, that the child process may eventually write to all the
    > copy-on-write pages, and fails immediately if it can tell that this
    > won't work. It's actually even more pessimistic than that, since it
    > assumes that the child will touch all the pages before enough other
    > processes free up swap space to allow it.
    >
    > Lazy allocation is at the opposite end of the spectrum, it's optimistic.
    > It assumes that the process won't modify enough pages to run out of
    > swap, or that other processes will return swap space in time to satisfy
    > the child's needs.
    >
    > Swap space reservation is like a credit check. Lazy allocation is like
    > giving a loan to a friend.


    It is not that simple. The fork system call dates back to UNIX
    V6. There was no support for virtual memory by that time, meaning,
    there was no other option except to physically copy all of the core
    image associated with a process on fork. The first variant with
    virtual memory support didn't change this. The invention of COW-fork
    effectively introduced a new operation in place of the tradional fork:
    Copy as much of the process as is needed for it to actually start
    running and do anything else on demand. Forcing a swap space
    reservation as part of a COW-fork basically emulates the traditional
    fork-call transparently and the error returned in case of a failure to
    reserve swap space roughly means 'If I [the system] had been forced to
    copy all of the invoking process, as was the case for the
    pre-COW-fork, this operation would have failed'. But this is a
    fictional error, because the system did not have to copy all of the
    process. This means that that necessary ressources where not available
    now should not have caused the fork to fail, because they weren't
    needed and may be available when they are actually needed, which may
    and likely will be 'never' for at least some part of the process
    address space, especially when the forked process is going to start
    another program after having done a few setup calls.

    Why should the system provide an emulation of the the pre-'virtual
    memory' UNIX(*) fork implementation instead of allowing to exploit
    the features of the actual, current implementation, provided doing so
    would be useful? Kirk McKusick's reason for wanting to prohibit that is
    'this is not acceptable' (that the kernel may need to kill a process
    if it actually runs out of memory), and that is not a reason. It's a
    decree provided without a reason.

    I seriously doubt that many of the people arguing in favor of keeping
    the traditional restriction have ever been forced to deal with a
    system where memory is actually tight. I happen to have. The 'device
    platform' I presently mostly code for is an ARM9 (200Mhz) based network
    appliances with 64M of RAM and no way of swapping anything. This means
    there is an immediate problem with the traditional dogma and COW:
    There is no way to ever reserve swapspace, consequently, the only way
    to preserve 'acceptable semantics' would be to fall back to the
    original fork algorithm and physically copy the whole process on fork.
    This would be very slow, compared to COW, and would waste a lot of
    scarce RAM in case a significant part of the parent address space
    would not have to be copied if COW without swap space reservation had
    been used. OTOH, this isn't exactly highly reliable hardware and it
    doesn't really need to be either. Random, hardware-induced failures
    occur on these systems, but as long as they are operational the
    majority of the time, this doesn't really matter. I have yet to meet
    a network appliance of any type, even up to high-end 'professional'
    hardware, which would not need a reboot or power-cycle every once in a
    while. Ocasionally killing a process in case of a temporary RAM
    shortage isn't going to make matters worse, especially, since killing
    a critical process can easily be avoided[*] and the resulting damage can
    usually be mitigated by software, eg because init[**] will restart a
    system daemon in case of an untimely death, this being a basic
    robustness requirment, taking into acocunt that not only the hardware
    can fail and the system could run out of RAM but additionally - *gasp*
    - software bugs exist which cause unmotivated process deaths, too.
    [*] This is/was using the Linux 2.4 OOM killer, custom-tuned
    to select the 'most useless' process first and to never select
    anything critical.

    [**] A custom init which is more oriented towards being an
    (easily machine-controllable) process monitor.

    That would be the general situation. Just judging from that, I would
    already state that 'wasting a lot of RAM and CPU time in order to
    copy data which will most likely not need to be copied is not
    acceptable here', because it would constantly degrade the usefulness
    of the device in order to cause an otherwise rare failure to occur
    much more often, but 'in a more nicely coloured way', according to the
    opinons of some people.

    The specific situation I remember was that (until end of December last
    year), this devices used the Avira (H+BEDV) 'antivir' daemon for
    transparent AV-scanning on POP3 mail download. The way this program
    worked at that time was that it would build an in core signature
    database on startup (in daemon mode) and then listen on a specific
    PF_UNIX stream sockets for clients to connect and request scans of
    certain files. The daemon would then fork and the new process would
    deal with the client. By the time 'we' got rid of this [expletive],
    the master processed had an RSS of 40M. This means it would have been
    impossible to copy the process in order to serve the single client
    (being connected at any given point in time). But since the actual
    scan server never wrote to its signature database (after I had
    complained about the fact that it did for long enough), this wasn't a
    problem at all: All of the signature memory could be shared among
    parent and child because it was completely read-only. The worst thing
    which would (rarely) happen was that the kernel would kill the antivir
    process in order to intermittently use the memory for something
    different and the worst possible user-visible effect was that a
    POP3-connection could break down and the mail download would need to
    be restarted (which, in turn, caused the scanning daemon to be
    restarted). And this could easily happen every now and then just
    because of transient connectivity problems, too.

    That someone, no matter who it may be, would not consider this to be
    an 'acceptable use' of a UNIX(*)-like system does not figure at all
    here, because the alternative would have been to never provide this
    service instead of providing it during the overwhelming majority of
    the time this program was being used.




  10. Re: forking from a large active process in solaris

    On Sep 15, 2:17*am, James Kanze wrote:

    > The problem with having it on by default is that far too many
    > people don't even realize that it exists, and so don't know to
    > turn it off, even when their application requires it.


    The problem with having it off by default would be exactly the same.

    DS

  11. Re: forking from a large active process in solaris

    On Sep 15, 6:57 pm, Rainer Weikusat wrote:
    > Barry Margolin writes:


    > [...]
    > I seriously doubt that many of the people arguing in favor of
    > keeping the traditional restriction have ever been forced to
    > deal with a system where memory is actually tight.


    Does 256 KB count as tight? (Admittedly, that was with a very
    old Unix, before virtual memory, and fork() really did copy.)
    Or 48MB under Solaris, with less than 100MB swap (up until about
    four years ago).

    [...]
    > I have yet to meet
    > a network appliance of any type, even up to high-end 'professional'
    > hardware, which would not need a reboot or power-cycle every once in a
    > while.


    Like every five or ten years? The network management systems
    I've worked on ran for years at a time without rebooting. For
    that matter, I probably only rebooted my 48MB Solaris system
    five or six times in the ten years I used it (after the first
    year---I started with Solaris 2.2, and that did have to be
    rebooted fairly often).

    And since you started by mentionning historical context: the
    historical context is that Unix was originally designed around a
    lot of small (and very small) processes, collaborating. The
    problem was a non-problem, despite restricted memory and no
    virtual memory, because processes were so small: I know that
    most of ours back then had a memory footprint of less than 32KB.
    Times have changed, and the orginal concept really doesn't
    scale.

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

  12. Re: forking from a large active process in solaris

    James Kanze writes:
    > On Sep 15, 6:57 pm, Rainer Weikusat wrote:
    >> Barry Margolin writes:

    >
    >> [...]
    >> I seriously doubt that many of the people arguing in favor of
    >> keeping the traditional restriction have ever been forced to
    >> deal with a system where memory is actually tight.

    >
    > Does 256 KB count as tight?


    Not unless the system would actually run out of memory during normal
    operation.

    [...]

    >> I have yet to meet
    >> a network appliance of any type, even up to high-end 'professional'
    >> hardware, which would not need a reboot or power-cycle every once in a
    >> while.

    >
    > Like every five or ten years? The network management systems
    > I've worked on ran for years at a time without rebooting.


    The reason I wrote 'appliance' was because I was refering to
    one. Nowadays, this means 'PC hardware' or at least PC components
    maybe combined with a non-x86-processor, eg 'desktop' IDE disks which
    suddenly cease to function a few times per year or RAM without any
    provision for correcting memory errors.

    [...]

    > And since you started by mentionning historical context: the
    > historical context is that Unix was originally designed around a
    > lot of small (and very small) processes, collaborating. The
    > problem was a non-problem, despite restricted memory and no
    > virtual memory, because processes were so small: I know that
    > most of ours back then had a memory footprint of less than 32KB.
    > Times have changed, and the orginal concept really doesn't
    > scale.


    I fail to understand how this (as usually unbacked) assertion about
    'the times' would relate to my text. It presumably doesn't.

  13. Re: forking from a large active process in solaris

    James Kanze writes:
    > On Sep 12, 3:45 pm, David Schwartz wrote:
    >> On Sep 12, 6:36 am, James Kanze wrote:
    >> > Nobody is saying anything like it. Correct is what is
    >> > necessary for reliable applications to work correctly. I've
    >> > worked on a number of critical applications, and we test
    >> > that we do reliably detect all possible error conditions.
    >> > If we can't disable such lazy allocation, we can't use the
    >> > system. It's that simple. The code has to be correct.

    >
    >> So shouldn't 'fork' always fail, since the system cannot be
    >> sure that power will remain on for as long as the 'fork'ed
    >> process might want to run? Isn't it better to warn now that we
    >> might have to fail horribly later, when the application can
    >> sensibly handle the inability to ensure the child can run to
    >> completion?

    >
    >> Heck, memory might get hit by a cosmic ray. Shouldn't 'malloc'
    >> always fail, so the application doesn't take the risk of
    >> seeing the wrong data later? Surely the application has some
    >> sensible way to handle 'malloc' failure, but it can't be
    >> expected to properly handle an uncorrectable ECC error.

    >
    > Sure. No system is 100% reliable. But by the same token, you
    > don't want to take arbitrary steps to reduce reliability.


    Constantly refering to the inability to perform a certain function for
    no particular reason as 'reliability' doesn't make it 'reliability'.

  14. Re: forking from a large active process in solaris

    James Kanze writes:
    > On Sep 12, 6:35 pm, Rainer Weikusat wrote:


    [...]

    > The real question concerns the appropriateness of lazy commit
    > (AKA the system telling lies---and I know, that's a loaded
    > assessment).


    It is a wrong assessment: No part of the system which performed a fork
    without 'reserving' backing store has 'told' anything. OTOH, the
    system which refuses the fork without knowing if the reserved memory
    would have been needed as wrongfully claimed that a succes was not
    possible.

    > I don't dispute that there are times when it can
    > be appropriate, and the reduced reliability is more than made up
    > for by other benefits which it provides.


    100% chance for failure can be called 'a reliably occuring
    phenomenon'. But a system which never works is completely useless,
    no matter how 'reliable' it may be at not working as intended.

    > But it does provide a serious security problem,


    Buzzword bingo?

    [...]

    > From a pratical point of view, you can't allow it if you're
    > running a server, and you probably don't want it active for
    > anything connected to the network (but maybe I'm just being
    > paranoid about the latter).


    From a practical point of view, the network itself is already
    unreliable and the mere fact that 'an [inter-]network' is being used
    at all already means 'best effort service' (which may fail for
    indeterminate periods of time anytime). And the problem is not 'lazy
    allocation of swap space for new processes' but 'insufficient
    ressources to actually perform a certain (set of) function(s)'. The
    fix for this is to provide sufficient ressource so that the normal
    operations the system is intended to perform can be performed
    successfully. A 'reserve backing store during fork' implementation just
    needs more ressources to perform the same function reliably than a
    'lazy swap allocation' implementation would need. Additionally, as I
    wrote in the other posting, 'best effort' service may really be
    sufficent (especially for anything 'connected to a network'), provided
    the real-world failure rate is low enough to not significantly
    decrease the overall reliabilty of the internetwork-system, including
    the services provided using it.

    [...]

    >> Some operating systems (such as Linux, IBM AIX, and HP-UX)
    >> have a feature called memory overcommit (also known as lazy
    >> swap allocation). In a memory overcommit mode, malloc() does
    >> not reserve swap space and always returns a non-NULL pointer,
    >> regardless of whether there is enough VM on the system to
    >> support it or not.

    >
    >> http://developers.sun.com/solaris/ar...ubprocess.html

    >
    >> This is already a text with an agenda, because it mixes two
    >> different things (how memory allocation requests made by an
    >> application are handled vs what the kernel does or doesn't
    >> allocate autonomously as part of a fork), trying to paint them
    >> as identical.

    >
    > Regretfully, some systems (e.g. Linux) treat them as identical.


    Reportedly, some applications (SAP) allocate vastly more memory that they
    actually ever use and 'always returning non-null from malloc' is a
    workaround for that. Personally, I would rather have memory allocation
    system calls which would only claim to have provided memory which is
    actually available, because this would be better for the applications
    I care about. People running one of these 'other application' prefer
    the other option, for obvious reason.

    >> Obviously the conclusion

    >
    >> malloc() does not reserve swap space and always returns a
    >> non-NULL pointer,

    >
    >> =>

    >
    >> The fork() call never fails because of the lack of VM.

    >
    >> is a non-sequitur.

    >
    > Or just an extreme over simplification.
    >
    >> 'malloc' is an interface to some memory allocator implemented
    >> as part of the C-library wich uses unspecified system
    >> interfaces (brk and/or mmap) to actually allocate virtual
    >> memory. It isn't used as part of the fork system call. The
    >> assumption that an application intends to actually use memory
    >> it allocated is sensible. The assumption that a significant
    >> portion of the address space of a process which called fork
    >> will need to be copied is something rather different.

    >
    > I understand this point of view; I see what you're getting at.
    > The problem remains that some part of the memory will be used by
    > the forked process, and we don't have any real way of saying how
    > much. Should the system pre-allocate, say, five pages, on the
    > grounds that that will be enough?


    Insofar providing the functionality itself is concerned, it shouldn't
    pre-allocate anything. Any number is going to be wrong.

    > In the end, of course, the real problem is that fork/exec was
    > broken from the start. The whole idea that you first "copy" the
    > entire process, when all you really want to inherit from it is
    > its standard input and output, is rather ludicrous,


    Luckily, that's an idea no-one ever had: 'fork' creates a new process
    executing the same program. Insofar this can be gathered from

    http://cm.bell-labs.com/cm/cs/who/dmr/hist.html

    the original implementation just wrote the invoking process to the
    swap area, turning it into an inactive process, and switched the
    existing 'core image' over to the new process (presumably, when
    switching back to the parent process, the unchanged original copy of
    the 'core image' was then read back into RAM). This provides a
    convenient way to create coprocesses or run any other program with a
    'mostly unmodified' environment, because the new process can just
    perform the necessary operations before executing the other program.
    But 'multiple processes' executing (different parts of) the same
    program are useful, too. A simple example would be 'executing init
    scripts during boot': It is a lot faster to just fork the shell
    running the main rc-script and let the forked shell source the script
    to be started now then to exec a new shell for each script (on the
    device platform I mentioned elsewhere). Another possibility would be
    to include sh-code in shell pipelines: If there was no fork, each part
    of a pipeline would need to be a separate, executable file. Since
    there is, the shell just forks the required number of processes,
    connects them via pipes, and each forked copy of the shell then
    executes whatever its pipeline command happens to be, ie invoke some
    external utility or run an input processing loop ...

    Coming to think of it, obviously, use of pipelines must be
    deprecated in Solaris, because it is not possible to provide
    this functionality without using fork. But I guess there is a
    special exception for 'the system command interpreter', being
    the one program being 'more equal' than all the others, thereby
    getting rid of another new concept invented for UNIX(*) in
    favor of the 1950s approach which came before it ... and you
    already wrote that implementing applications as set of
    cooperating processes would be obsolete nowadays, which is
    again the traditional, pre-UNIX-approach: link all application
    code intended to be used on the system together to a large,
    happy binary, because anything else is

    a) to complicated
    b) to slow

    and

    c) structuring code really sucks

    .... and this is even useful in compiled binaries:

    static void pppd_loader(void)
    {
    int fds[2], rc;

    rc = socketpair(PF_UNIX, SOCK_STREAM, 0, fds);
    if (rc == -1) {
    syslog(LOG_ERR, "%s: pipe: %m(%d)",
    __func__, errno);
    return;
    }

    switch (fork()) {
    case -1:
    syslog(LOG_ERR, "%s: fork: %m(%d)", __func__, errno);
    exit(1);

    default:
    break;

    case 0:
    openlog("pppd-loader", LOG_PID | LOG_PERROR, LOG_DAEMON);
    close(fds[1]);

    run_script("/libexec/load-ppp-modules");

    syslog(LOG_INFO, "%s: starting normal operation", __func__);

    write(*fds, fds + 1, sizeof(*fds));
    do
    rc = read(*fds, fds + 1, sizeof(*fds));
    while (rc != 0);

    syslog(LOG_INFO, "%s: pppd terminated", __func__);

    run_script("/libexec/unload-ppp-modules");
    _exit(0);
    }

    close(*fds);
    read(fds[1], fds, sizeof(*fds));
    }

    [...]

    >> [...]

    >
    >> >> [...]
    >> >> > Otherwise, they're broken, and can't be used where
    >> >> > reliability is an issue.

    >
    >> >> 'Reliably failure because of possibly incorrect assumptions
    >> >> about the future' still doesn't increase the 'reliably' of the
    >> >> system to actually perform useful work.

    >
    >> > Reliably detecting failure conditions is an important part of
    >> > reliability.

    >
    >> What I wrote was that a system which fails to accomplish
    >> something is not capable of reliably doing 'something'.

    >
    > The reason it fails is that it cannot determine whether it can
    > do it reliably or not,


    Sloppily speaking, the reason 'it fails' is that it would have failed
    if had been a V6 or V7 UNIX(*) running on a PDP-11, too (or 32V on
    VAX).

    [...]

    >> After all, it doesn't do it. A radio which never plays a sound
    >> is not a reliable radio. It is a broken radio (which
    >> 'reliably' fails to fulfil its purpose). A radio which
    >> sometimes fails is a more reliable radio than the former: It
    >> fulfils its purpose at least intermittently, so it may
    >> actually be useful, while the other is just decoration.

    >
    > A radio that fails when you most need it is not a reliable as
    > one that doesn't fail, once you've verified before hand that it
    > should work.


    That's besides the point of my example: Intermittent usability,
    especially, if it is really intermittent non-usability, produces useful
    results more reliable (more frequent than never) than unusability.
    If 'most of the time' is not enough, the solution is to provide the
    necessary memory ressources to turn it into 'always', not to prevent
    'works most of the time' by specifically designed software barriers.

    >> > [...]
    >> >> There is really no need to continously repeat that 'correct
    >> >> is whatever Solaris happens to do'.

    >
    >> > Nobody is saying anything like it. Correct is what is
    >> > necessary for reliable applications to work correctly. I've
    >> > worked on a number of critical applications, and we test
    >> > that we do reliably detect all possible error conditions.
    >> > If we can't disable such lazy allocation, we can't use the
    >> > system. It's that simple. The code has to be correct.

    >
    >> If 'code' fails to work in situations where it could have
    >> worked, it is broken. That simple.

    >
    > If code fails without notification, when it could have verified
    > the condition before hand and notified, then it is broken.


    And if the tooth fairy had a beard, she could possibly want to shave
    it. There is no way to 'verify beforehand' if the new process will run
    to completion by making an arbitrary assumption regarding its future
    memory use. This could actually cause a failure, eg when a malloc-call
    done by the new process fails, causing it to abort, because the
    available swap space is already occupied by a page inherited from the
    parent neither parent nor child will ever write to after the fork.

    [...]

    Since the text I deleted IMHO contains just a few more repetitions of
    'failure is reliability' and the other misconcpetions I already wrote
    about, I am going to ignore it.

    >> > And how do you accomplish "detecting the error" under Linux
    >> > (supposing you haven't disabled this behavior).

    >
    >> Which error?

    >
    > That the fork actually does fail?


    It didn't, because the new process was created.

    > Note that the first process to actually write to the page might be
    > the parent, so it will be the parent which is killed, and not the
    > child.


    Who would have guessed that ...

    > (Actually, at different times, different strategies have been tried.
    > Older versions of AIX, for example, would start by killing the
    > processes which used the most memory; back then, that generally
    > meant emacs. In at least one case known to me, Linux killed the
    > init process, so no further log-in's were possible.)


    That's at least a well-known urban legend dating back to the
    pre-2.6-days. You don't repeat it accurately, though. 'Killing init'
    will cause a kernel panic, which necessitates a reboot. But 'logins'
    are only affected as a side effect by this.

    > I can understand your wanting the function to work. It's a
    > perfectly reasonable desire.


    What you have apparently managed to not understand so far is that the
    functionality *does* work and you, Sun Microsystems, Marshall Kirk
    McKusick and presumably a host of other people having been exposed to
    a BSD-overdose at some time in the past would like this to be
    different.

    > But the fork() interface doesn't permit supporting it in a safe
    > manner.


    'The fork interface' creates new processes. Independently of this, the
    system cannot run a set of processes which collectively need more
    virtual memory than what is available to it. 'Lack of ressources'
    cannot be fixed by coding arbitrary policies into the kernel. Causing
    the system to pretend it had run out of memory while it actually
    hasn't is just an arbitrary limit (of a weird kind), which leads to
    artificial failures which could otherwise have been avoided.

    [...]

    > As a default policy, it's an invitation to DoS attacts,
    > and it easily catches users unawares, with other users paying for
    > your carelessness.


    A DoS-attack affecting other users of a general timesharing system
    could be implemented really easily (in absence of ressource limits)
    by creating a large process and causing that to fork until all of the
    available swap space has been consumed. At this point, every
    (userspace) memory allocation on the system will fail, including that
    new processes cannot be created anymore which will make any kind of
    administrative intervention at least diffcult.

    The solution to this problem is to configure resource limits on such
    systems. A workaround could be that the kernel just kills a few
    userprocesses in order to free memory.

    >> >> It didn't do what I wanted it to do, despite it could have,
    >> >> which I knew beforhand.

    >
    >> > How did you know it? You can't know, that's the whole point.

    >
    >> I can, because I can know the actual memory requirements,
    >> because I know the application code. This should actually be
    >> self-evident to anyone.

    >
    > So your application code never calls a system library after the
    > fork?


    'System libraries' cannot be 'called by application code'.


+ Reply to Thread
Page 2 of 2 FirstFirst 1 2