forking from a large active process in solaris - Unix

This is a discussion on forking from a large active process in solaris - Unix ; Hi folks. I hope I've got the info right here: If I'm mistaken please correct. We have a number of very large and fairly active Tomcat http server java jvm processes (gigabytes - near real memory size). The system identifies ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 34

Thread: forking from a large active process in solaris

  1. forking from a large active process in solaris

    Hi folks. I hope I've got the info right here:
    If I'm mistaken please correct.

    We have a number of very large
    and fairly active Tomcat http server java jvm processes
    (gigabytes - near real memory size).
    The system identifies itself as SunOS 5.10.

    Every now and again it would be very nice to fork a subprocess
    from the jvm to do something that is simple outside the jvm but
    difficult
    inside.

    I'm told that you shouldn't do this in Solaris under these
    circumstances because fork() will result
    in blowing out the memory or other problems --
    either because fork() tries to copy the
    whole memory space or because fork() does too much work trying
    to keep track of changing pages or something.

    2 questions:

    1. What is the behaviour of fork() under conditions like these?
    (refs?)

    2. Is there any way to get solaris/java to just launch a shell command
    in a fresh memory space without worrying about the parent memory
    space? I don't want to fork a copy of tomcat -- I just want to run
    another program....

    And as always:

    3. If I'm completely confused and making a fool of myself, just
    explain
    why. I can take it.

    Thanks, -- Aaron Watters

    ---
    If the Earth has been sucked into a black hole before you
    read this message no reply is needed.


  2. Re: forking from a large active process in solaris

    On Sep 10, 6:52*am, Aaron Watters wrote:

    > 2. Is there any way to get solaris/java to just launch a shell command
    > * *in a fresh memory space without worrying about the parent memory
    > * *space? *I don't want to fork a copy of tomcat -- I just want to run
    > * *another program....


    Have a small, light program whose sole purpose is to fork/exec. Then
    ask that program to fork for you.

    DS

  3. Re: forking from a large active process in solaris

    David Schwartz wrote:
    > On Sep 10, 6:52 am, Aaron Watters wrote:
    >
    >> 2. Is there any way to get solaris/java to just launch a shell command
    >> in a fresh memory space without worrying about the parent memory
    >> space? I don't want to fork a copy of tomcat -- I just want to run
    >> another program....

    >
    > Have a small, light program whose sole purpose is to fork/exec. Then
    > ask that program to fork for you.


    Presumably he means that you should start this program early (before
    memory grows) and keep it around, sending requests to it via pipe. That
    would be a good solution.

    But it's worth noting that this is the (remaining) reason that vfork
    exists. You should use truss or dtrace on your JVM to see whether it
    uses fork or vfork to start new processes (write a trivial program to do
    so). If it uses vfork, problem solved. If fork, see above.

    Note that there is the traditional Runtime#exec and the newer
    ProcessBuilder, both in java.lang. You might want to test both in case
    they differ.

    AS

  4. Re: forking from a large active process in solaris

    On Sep 10, 3:52 pm, Aaron Watters wrote:
    > Hi folks. I hope I've got the info right here:
    > If I'm mistaken please correct.


    > We have a number of very large and fairly active Tomcat http
    > server java jvm processes (gigabytes - near real memory size).
    > The system identifies itself as SunOS 5.10.


    Otherwise known as Solaris 10. (Sun has a somewhat obfuscatory
    policy with regard to versioning.)

    > Every now and again it would be very nice to fork a subprocess
    > from the jvm to do something that is simple outside the jvm
    > but difficult inside.


    > I'm told that you shouldn't do this in Solaris under these
    > circumstances because fork() will result in blowing out the
    > memory or other problems -- either because fork() tries to
    > copy the whole memory space or because fork() does too much
    > work trying to keep track of changing pages or something.


    > 2 questions:


    > 1. What is the behaviour of fork() under conditions like
    > these? (refs?)


    Solaris uses copy on write in its implementation of fork, which
    means that it only copies pages (and other information) which
    are different in the two processes; it does not copy the entire
    process. On the other hand, Solaris does reserve virtual memory
    space to do the copy if it becomes necessary, and if such space
    isn't available, the fork fails. (This is the only correct
    solution when using copy on write, since there is no way of
    reporting the error later.)

    > 2. Is there any way to get solaris/java to just launch a shell command
    > in a fresh memory space without worrying about the parent memory
    > space? I don't want to fork a copy of tomcat -- I just want to run
    > another program....


    Use posix_spawn. That's what it's there for.

    For more information:
    http://developers.sun.com/solaris/ar...ubprocess.html.

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

  5. Re: forking from a large active process in solaris

    Aaron Watters wrote:

    > We have a number of very large
    > and fairly active Tomcat http server java jvm processes
    > (gigabytes - near real memory size).
    > The system identifies itself as SunOS 5.10.
    >
    > Every now and again it would be very nice to fork a subprocess
    > from the jvm to do something that is simple outside the jvm but
    > difficult
    > inside.
    >
    > I'm told that you shouldn't do this in Solaris under these
    > circumstances because fork() will result
    > in blowing out the memory or other problems --
    > either because fork() tries to copy the
    > whole memory space or because fork() does too much work trying
    > to keep track of changing pages or something.


    Actually, Solaris fork won't copy the whole process memory. However, it
    will reserve enough space to be able to do so if necessary. So all you
    need is to configure enough swap space to satisfy it. That swap space
    won't be used in practice, but the OS needs to know it is there just in
    case, to guarantee correct semantics.

    > 2. Is there any way to get solaris/java to just launch a shell command
    > in a fresh memory space without worrying about the parent memory
    > space? I don't want to fork a copy of tomcat -- I just want to run
    > another program....


    Vote for:
    http://bugs.sun.com/view_bug.do?bug_id=5049299


  6. Re: forking from a large active process in solaris

    Marc writes:
    > Aaron Watters wrote:


    [...]

    >> Every now and again it would be very nice to fork a subprocess
    >> from the jvm to do something that is simple outside the jvm but
    >> difficult
    >> inside.
    >>
    >> I'm told that you shouldn't do this in Solaris under these
    >> circumstances because fork() will result
    >> in blowing out the memory or other problems --
    >> either because fork() tries to copy the
    >> whole memory space or because fork() does too much work trying
    >> to keep track of changing pages or something.

    >
    > Actually, Solaris fork won't copy the whole process memory. However, it
    > will reserve enough space to be able to do so if necessary. So all you
    > need is to configure enough swap space to satisfy it. That swap space
    > won't be used in practice, but the OS needs to know it is there just in
    > case, to guarantee correct semantics.


    There are no 'correct semantics' for this case (except when defininig
    'correct' as 'whatever Solaris does'). The kernel can make one of two
    assumptions:

    - Both processes will need separate copies of each writeable
    page the parent had at the time of the fork. So, if there
    isn't sufficient 'paging space' available for this, cause
    the (fork-)operation to fail right away.

    - The system will likely not be required to provide all of the
    memory at any given point in time, so, there is at least a
    chance of success. Hence, let the fork succeed and delay the
    'failure' until there is actually problem, ie more virtual
    memory than available would be needed.

    The drawback associated with the second assumption is that this
    failure will likely not be a 'clean' system call error return the
    application may even know how to deal with, but would happen at an
    arbitrary time as side effect of a write-attempt of one of the two
    processes. OTOH, assumption #2 is always true for 'program executing
    another program'.

    This implies that the Solaris fork is deliberately designed to not be
    a sensible primitive of a 'start another program'-operation, in order
    to favor 'something else'. Any real-world example of 'something else'
    would greatly be appreciated (by me): What type of application creates
    a large address spaces with a large number of writable pages and forks
    into two (n) independent processe writing to all/most of this address
    space, but without allocating significant amounts of 'new memory'?

  7. Re: forking from a large active process in solaris

    Rainer Weikusat wrote:

    >> Actually, Solaris fork won't copy the whole process memory. However, it
    >> will reserve enough space to be able to do so if necessary. So all you
    >> need is to configure enough swap space to satisfy it. That swap space
    >> won't be used in practice, but the OS needs to know it is there just in
    >> case, to guarantee correct semantics.

    >
    > There are no 'correct semantics' for this case (except when defininig
    > 'correct' as 'whatever Solaris does').


    Er, I believe there is. After fork, the child is supposed to have a full
    copy of the process memory, to which you can write normally. The sharing
    copy-on-write optimization is just an implementation detail, but it
    should not change the semantics. There is a reason fork can fail with
    ENOMEM. This is the same problem as an overcommitted malloc that
    successfully allocates memory but where using this memory fails for lack
    of space. How are you supposed to handle errors in your code if the
    errors are never reported?

    > The kernel can make one of two assumptions:
    >
    > - Both processes will need separate copies of each writeable
    > page the parent had at the time of the fork. So, if there
    > isn't sufficient 'paging space' available for this, cause
    > the (fork-)operation to fail right away.


    The kernel needs to be ready for this.

    > - The system will likely not be required to provide all of the
    > memory at any given point in time, so, there is at least a
    > chance of success. Hence, let the fork succeed and delay the
    > 'failure' until there is actually problem, ie more virtual
    > memory than available would be needed.
    >
    > The drawback associated with the second assumption is that this
    > failure will likely not be a 'clean' system call error return the
    > application may even know how to deal with, but would happen at an
    > arbitrary time as side effect of a write-attempt of one of the two
    > processes.


    This drawback is not small, it changes the semantics and makes the
    program behaviour completely erratic and unpredictable.

    > OTOH, assumption #2 is always true for 'program executing
    > another program'.
    >
    > This implies that the Solaris fork is deliberately designed to not be
    > a sensible primitive of a 'start another program'-operation, in order
    > to favor 'something else'.


    The solaris fork is designed to work for forks where you want a copy of
    the process (say you want 10 apache processes, because the threaded
    version is not perfect yet). At the same time, the copy-on-write
    optimization makes it a sensible option to start another program, as long
    as you bothered to configure some swap space. "start another program"
    alternatives exist, and popen, system, posix_spawn all use vfork instead
    of fork. Note that the solaris vfork has restrictions on how it can be
    safely used, so it is safer to use one of the wrappers I mentionned.

    Anyway, what is so hard about adding some swap space? Disks are so cheap
    compared to memory...

    I can understand the principle of: let it run as long as there is a
    possibility it might succeed, and there are indeed cases where it is a
    sensible choice, but I disagree with despising the safer alternatives.

  8. Re: forking from a large active process in solaris

    Marc writes:
    > Rainer Weikusat wrote:
    >>> Actually, Solaris fork won't copy the whole process memory. However, it
    >>> will reserve enough space to be able to do so if necessary. So all you
    >>> need is to configure enough swap space to satisfy it. That swap space
    >>> won't be used in practice, but the OS needs to know it is there just in
    >>> case, to guarantee correct semantics.

    >>
    >> There are no 'correct semantics' for this case (except when defininig
    >> 'correct' as 'whatever Solaris does').

    >
    > Er, I believe there is.


    As I wrote: "whatever Solaris happens to do". I already guessed that.
    AFAIK, AIX does it differently (as does Linux).

    > After fork, the child is supposed to have a full
    > copy of the process memory, to which you can write normally. The sharing
    > copy-on-write optimization is just an implementation detail, but it
    > should not change the semantics. There is a reason fork can fail with
    > ENOMEM. This is the same problem as an overcommitted malloc that
    > successfully allocates memory but where using this memory fails for lack
    > of space.


    It is different, because the process does not 'allocate' anything. The
    kernel does (or doesn't) based on some assumptions regarding future
    behaviour of the old and the new process.

    [...]

    >> The kernel can make one of two assumptions:
    >>
    >> - Both processes will need separate copies of each writeable
    >> page the parent had at the time of the fork. So, if there
    >> isn't sufficient 'paging space' available for this, cause
    >> the (fork-)operation to fail right away.

    >
    > The kernel needs to be ready for this.


    The kernel decidedly does not 'need' to synthetize gratuitious
    failures based on guesses about the future. It may do so, and may
    sensibly do so, if there is some expected benefit.

    >> - The system will likely not be required to provide all of the
    >> memory at any given point in time, so, there is at least a
    >> chance of success. Hence, let the fork succeed and delay the
    >> 'failure' until there is actually problem, ie more virtual
    >> memory than available would be needed.
    >>
    >> The drawback associated with the second assumption is that this
    >> failure will likely not be a 'clean' system call error return the
    >> application may even know how to deal with, but would happen at an
    >> arbitrary time as side effect of a write-attempt of one of the two
    >> processes.

    >
    > This drawback is not small, it changes the semantics and makes the
    > program behaviour completely erratic and unpredictable.


    It doesn't do anything except what I wrote above. And that happens
    only if the 'worst case' assumption which had been made for no
    partiulcar reason turns out ot be true. The net effect is that some
    process may fail to fulfil its intended purpose. Otherwise, it is
    certain that this will happen, because the kernel enforces the failure
    proactively.

    Depending on what happens in future, either of both assumptions could
    have been the more sensible choice.

    >> OTOH, assumption #2 is always true for 'program executing
    >> another program'.
    >>
    >> This implies that the Solaris fork is deliberately designed to not be
    >> a sensible primitive of a 'start another program'-operation, in order
    >> to favor 'something else'.

    >
    > The solaris fork is designed to work for forks where you want a copy of
    > the process (say you want 10 apache processes, because the threaded
    > version is not perfect yet).


    It favors something where I cannot think of a single, actual usage
    example (and neither can you, apparently) to the detriment of a (in my
    experience) much more common operation, namely, execute another
    program (or even just create another process) to accomplish an
    operation different from the purpose of the 'main process'. I rarely
    'just run a command'. Usually, there is some kind of additional setup
    needed, which can be done without affecting the parent in the forked
    process and without the need to actually copy more than about one page
    of the stack.

    > At the same time, the copy-on-write optimization makes it a sensible
    > option to start another program, as long as you bothered to
    > configure some swap space. "start another program" alternatives
    > exist, and popen, system, posix_spawn all use vfork instead
    > of fork.


    Maybe they do so on Solaris meanwhile. But that 'Solaris fork'
    implements (IMHO) 'weird tradeoffs', which could hurt me, is no reason
    to start a command interpreter to start another program or to suspend
    the parent process until this has been done.

    > Note that the solaris vfork has restrictions on how it can be
    > safely used, so it is safer to use one of the wrappers I mentionned.


    You didn't mention any 'vfork wrapper', but

    - a stdio-demonstration routine which came with the original
    V7 stdio

    - another stdio demonstration routine, likely originally built
    on top of the former

    both of which happen to execute a shell and

    - a primitive intended to be useful on systems w/o MMU

    It is somewhat amusing that 'the most advanced OS on the planet' needs to
    resort to behaving like, say, an ARM7 CPU or 3BSD in order to work around its
    own implementation for apparently frequent case where this
    implementation simply makes the wrong tradeoffs.

    [...]

    > I can understand the principle of: let it run as long as there is a
    > possibility it might succeed, and there are indeed cases where it is a
    > sensible choice, but I disagree with despising the safer alternatives.


    'Immediate failure now' isn't safer than 'possible failure in the
    future'. Especially, taking into account that the kernel actually
    attempts to second-guess at the competence of the sysadmin and/or
    developer, namely, that whoever it was will not have been able of
    taking the actual capabilities of a particular system into account
    correctly. I have quite a few 'fork examples' here, where the actually
    needed additional amount of memory is exactly 4K. For this particular
    system, a spurious fork-failure could case a considerable damage,
    because it would mean that some shell-script intended to do a
    user-transparent reconfiguration did not complete, which, in bad
    cases, could necessitate shipping a replacement unit to the affected
    customer (the same is, of course, true, for any real OOM situation).

  9. Re: forking from a large active process in solaris

    On Sep 11, 6:43 pm, Rainer Weikusat wrote:
    > Marc writes:
    > > Rainer Weikusat wrote:
    > >>> Actually, Solaris fork won't copy the whole process
    > >>> memory. However, it will reserve enough space to be able
    > >>> to do so if necessary. So all you need is to configure
    > >>> enough swap space to satisfy it. That swap space won't be
    > >>> used in practice, but the OS needs to know it is there
    > >>> just in case, to guarantee correct semantics.


    > >> There are no 'correct semantics' for this case (except when
    > >> defininig 'correct' as 'whatever Solaris does').


    > > Er, I believe there is.

    > As I wrote: "whatever Solaris happens to do". I already
    > guessed that.


    What Posix requires, in fact. What is necessary for a reliable
    system.

    > AFAIK, AIX does it differently (as does Linux).


    I'm not sure. I think the behavior is configurable in both
    cases. Otherwise, they're broken, and can't be used where
    reliability is an issue. (I know that in the past, AIX didn't
    commit memory in sbrk, either. This was changed because of the
    problems it caused---you can't use such systems if reliability
    is an issue.)

    > > After fork, the child is supposed to have a full copy of the
    > > process memory, to which you can write normally. The sharing
    > > copy-on-write optimization is just an implementation detail,
    > > but it should not change the semantics. There is a reason
    > > fork can fail with ENOMEM. This is the same problem as an
    > > overcommitted malloc that successfully allocates memory but
    > > where using this memory fails for lack of space.


    > It is different, because the process does not 'allocate'
    > anything. The kernel does (or doesn't) based on some
    > assumptions regarding future behaviour of the old and the new
    > process.


    No. The copy on write is an optimization, based on the "as-if"
    principle: the formal semantics are that fork() makes a complete
    copy (or returns an error). If the system does not behave
    exactly as if fork() made a complete copy (modulo performance
    issues), then it is broken.

    > [...]


    > >> The kernel can make one of two assumptions:

    >
    > >> - Both processes will need separate copies of each writeable
    > >> page the parent had at the time of the fork. So, if there
    > >> isn't sufficient 'paging space' available for this, cause
    > >> the (fork-)operation to fail right away.


    > > The kernel needs to be ready for this.


    > The kernel decidedly does not 'need' to synthetize gratuitious
    > failures based on guesses about the future. It may do so, and
    > may sensibly do so, if there is some expected benefit.


    The kernel is required by Posix to make a complete copy of the
    process. If it doesn't do this, it must never the less ensure
    that it's behavior is as if it had made the copy.

    Posix introduced posix_spawn precisely because of this problem:
    a conformant system is required to pre-allocate the memory,
    which it often won't actually need.

    > >> - The system will likely not be required to provide all of the
    > >> memory at any given point in time, so, there is at least a
    > >> chance of success. Hence, let the fork succeed and delay the
    > >> 'failure' until there is actually problem, ie more virtual
    > >> memory than available would be needed.


    > >> The drawback associated with the second assumption is that this
    > >> failure will likely not be a 'clean' system call error return the
    > >> application may even know how to deal with, but would happen at an
    > >> arbitrary time as side effect of a write-attempt of one of the two
    > >> processes.


    > > This drawback is not small, it changes the semantics and makes the
    > > program behaviour completely erratic and unpredictable.


    > It doesn't do anything except what I wrote above. And that
    > happens only if the 'worst case' assumption which had been
    > made for no partiulcar reason turns out ot be true.


    It only happens if it happens, yes. And it's not allowed to
    happen in a conformant system. And you can't write reliable
    software if you can't depend on the system behaving according to
    specification.

    > The net effect is that some process may fail to fulfil its
    > intended purpose. Otherwise, it is certain that this will
    > happen, because the kernel enforces the failure proactively.


    > Depending on what happens in future, either of both
    > assumptions could have been the more sensible choice.


    The difference is that one is conform with the Posix standard,
    and allows writing reliable systems, and the other isn't.

    > >> OTOH, assumption #2 is always true for 'program executing
    > >> another program'.


    > >> This implies that the Solaris fork is deliberately designed
    > >> to not be a sensible primitive of a 'start another
    > >> program'-operation, in order to favor 'something else'.


    > > The solaris fork is designed to work for forks where you
    > > want a copy of the process (say you want 10 apache
    > > processes, because the threaded version is not perfect yet).


    > It favors something where I cannot think of a single, actual
    > usage example (and neither can you, apparently) to the
    > detriment of a (in my experience) much more common operation,
    > namely, execute another program (or even just create another
    > process) to accomplish an operation different from the purpose
    > of the 'main process'.


    Well, I've occasionally forked without an exec in the child
    (although only with a small process). More to the point: how
    much memory should the system pre-allocate, to ensure that you
    can at least execute until the exec? What are the rules that
    you propose instead of those of Posix?

    > I rarely 'just run a command'. Usually, there is some kind of
    > additional setup needed, which can be done without affecting
    > the parent in the forked process and without the need to
    > actually copy more than about one page of the stack.


    "About" doesn't sound very specific to me.

    > > At the same time, the copy-on-write optimization makes it a
    > > sensible option to start another program, as long as you
    > > bothered to configure some swap space. "start another
    > > program" alternatives exist, and popen, system, posix_spawn
    > > all use vfork instead of fork.


    > Maybe they do so on Solaris meanwhile. But that 'Solaris fork'
    > implements (IMHO) 'weird tradeoffs',


    That Solaris fork implements the Posix standard. And what is
    required for writing reliable code.

    > which could hurt me, is no reason to start a command
    > interpreter to start another program or to suspend the parent
    > process until this has been done.


    There's certainly some argument for supporting both models, but
    IMHO, the default should be for reliability.

    > [...]
    > > I can understand the principle of: let it run as long as
    > > there is a possibility it might succeed, and there are
    > > indeed cases where it is a sensible choice, but I disagree
    > > with despising the safer alternatives.


    > 'Immediate failure now' isn't safer than 'possible failure in
    > the future'.


    Getting an immediate error message in case of failure, with the
    reason for the error, allows handling it. Having the child
    process crash sometime later doesn't. Killing some other random
    process (as happens in Linux) is even worse. (Hopefully it's
    since been corrected, but I'm aware of one case where it killed
    init, making further log in's impossible.)

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

  10. Re: forking from a large active process in solaris

    James Kanze writes:
    > On Sep 11, 6:43 pm, Rainer Weikusat wrote:
    >> Marc writes:
    >> > Rainer Weikusat wrote:
    >> >>> Actually, Solaris fork won't copy the whole process
    >> >>> memory. However, it will reserve enough space to be able
    >> >>> to do so if necessary. So all you need is to configure
    >> >>> enough swap space to satisfy it. That swap space won't be
    >> >>> used in practice, but the OS needs to know it is there
    >> >>> just in case, to guarantee correct semantics.

    >
    >> >> There are no 'correct semantics' for this case (except when
    >> >> defininig 'correct' as 'whatever Solaris does').

    >
    >> > Er, I believe there is.

    >> As I wrote: "whatever Solaris happens to do". I already
    >> guessed that.

    >
    > What Posix requires, in fact. What is necessary for a reliable
    > system.


    A system which causes operations which could have succeeded given the
    available amount of ressource to 'reliably fail' instead, because the
    system has no information about the future behaviour of the involved
    program(s) and assumes the worst is not 'reliable', except insofar 'it
    works never' is considered a reliable quality.

    Your statement that SUS would 'require' this behaviour is wrong. The
    fork-specification states that

    ERRORS

    The fork() function shall fail if:

    [EAGAIN]

    The system lacked the necessary resources to create
    another process

    [...]

    The fork() function may fail if:

    [ENOMEM]
    Insufficient storage space is available.

    and the rationale says that

    The [ENOMEM] error value is reserved for those implementations
    that detect and distinguish such a condition. This condition
    occurs when an implementation detects that there is not enough
    memory to create the process. This is intended to be returned
    when [EAGAIN] is inappropriate because there can never be
    enough memory (either primary or secondary storage) to perform
    the operation. Since fork() duplicates an existing process,
    this must be a condition where there is sufficient memory for
    one such process, but not for two.

    [...]

    Part of the reason for including the optional error [ENOMEM]
    is because the SVID specifies it and it should be reserved for
    the error condition specified there. *The condition is not
    applicable on many implementations*.

    >> AFAIK, AIX does it differently (as does Linux).

    >
    > I'm not sure.


    So, why didn't you simply check it, instead of posting your
    speculations?

    [...]

    > Otherwise, they're broken, and can't be used where
    > reliability is an issue.


    'Reliably failure because of possibly incorrect assumptions about the
    future' still doesn't increase the 'reliably' of the system to
    actually perform useful work.

    >> > After fork, the child is supposed to have a full copy of the
    >> > process memory, to which you can write normally. The sharing
    >> > copy-on-write optimization is just an implementation detail,
    >> > but it should not change the semantics. There is a reason
    >> > fork can fail with ENOMEM. This is the same problem as an
    >> > overcommitted malloc that successfully allocates memory but
    >> > where using this memory fails for lack of space.

    >
    >> It is different, because the process does not 'allocate'
    >> anything. The kernel does (or doesn't) based on some
    >> assumptions regarding future behaviour of the old and the new
    >> process.

    >
    > No. The copy on write is an optimization, based on the "as-if"
    > principle: the formal semantics are that fork() makes a complete
    > copy (or returns an error). If the system does not behave
    > exactly as if fork() made a complete copy (modulo performance
    > issues), then it is broken.


    There is really no need to continously repeat that 'correct is
    whatever Solaris happens to do'. Personally, quoting an ex-colleague
    of mine "I don't give a flying ****" what adjectives some set of
    people would like to attach to a failure to accomplish something which
    could have been accomplished. It didn't do what I wanted it to do,
    despite it could have, which I knew beforhand.

    [...]

    > Posix introduced posix_spawn precisely because of this problem:
    > a conformant system is required to pre-allocate the memory,
    > which it often won't actually need.


    Quoting the rationale for it:

    The posix_spawn() function and its close relation
    posix_spawnp() have been introduced to overcome the following
    perceived difficulties with fork(): the fork() function is
    difficult or impossible to implement without swapping or
    dynamic address translation.

    * Swapping is generally too slow for a realtime
    environment.

    * Dynamic address translation is not available everywhere
    that POSIX might be useful.

    * Processes are too useful to simply option out of POSIX
    whenever it must run without address translation or
    other MMU services.

    I cannot see anything wrt 'working around fork-implementations
    unusable for this tasks because someone is really convinced they
    should be' in this text.

    [...]

    >> It favors something where I cannot think of a single, actual
    >> usage example (and neither can you, apparently) to the
    >> detriment of a (in my experience) much more common operation,
    >> namely, execute another program (or even just create another
    >> process) to accomplish an operation different from the purpose
    >> of the 'main process'.

    >
    > Well, I've occasionally forked without an exec in the child
    > (although only with a small process).


    That was not what I was writing about.

    [...]

    >> 'Immediate failure now' isn't safer than 'possible failure in
    >> the future'.

    >
    > Getting an immediate error message in case of failure, with the
    > reason for the error, allows handling it.


    There is no way to handle this error: The system won't create a new
    process and that's it. There are various ways to work around this, but
    - as usual - my preference would be to first code for the general case
    and add implementation-specific workarounds as needed.

    That Solaris (according to Sun) cannot do something Linux. AIX and
    HP-UX can do isn't really my problem.

  11. Re: forking from a large active process in solaris

    > No. *The copy on write is an optimization, based on the "as-if"
    > principle: the formal semantics are that fork() makes a complete
    > copy (or returns an error). *If the system does not behave
    > exactly as if fork() made a complete copy (modulo performance
    > issues), then it is broken.


    You utterly misunderstand the 'as-if' rule. The whole point of "as-if"
    is that you can make optimizations that change behavior, even if a
    compliant application can detect those changes. Otherwise, all
    effective optimizations would be illegal, since the application could
    detect that it's running faster.

    All that is required to comply with the standard is that you comply
    with the standard. The "as-if" rule can only *RELAX* a standard, it
    can never add requirements.

    DS

  12. Re: forking from a large active process in solaris

    On Sep 12, 12:09 pm, Rainer Weikusat wrote:
    > James Kanze writes:


    [...]
    > >> AFAIK, AIX does it differently (as does Linux).


    > > I'm not sure.


    > So, why didn't you simply check it, instead of posting your
    > speculations?


    Because I don't have access to them. I have been told that this
    broken behavior of AIX has been fixed, and I have been told that
    you can disable it for Linux, but I don't currently work on such
    systems. In the case of AIX, from what I understand, the
    behavior was changed because a number of large customers
    informed IBM that the machine couldn't be used otherwise.

    > [...]
    > > Otherwise, they're broken, and can't be used where
    > > reliability is an issue.


    > 'Reliably failure because of possibly incorrect assumptions
    > about the future' still doesn't increase the 'reliably' of the
    > system to actually perform useful work.


    Reliably detecting failure conditions is an important part of
    reliability.

    [...]
    > There is really no need to continously repeat that 'correct is
    > whatever Solaris happens to do'.


    Nobody is saying anything like it. Correct is what is necessary
    for reliable applications to work correctly. I've worked on a
    number of critical applications, and we test that we do reliably
    detect all possible error conditions. If we can't disable such
    lazy allocation, we can't use the system. It's that simple.
    The code has to be correct.

    > Personally, quoting an ex-colleague of mine "I don't give a
    > flying ****" what adjectives some set of people would like to
    > attach to a failure to accomplish something which could have
    > been accomplished.


    Which maybe could have been accomplished. Obviously, you don't
    know much about writing correctly functioning software.

    And how do you accomplish "detecting the error" under Linux
    (supposing you haven't disabled this behavior).

    > It didn't do what I wanted it to do, despite it could have,
    > which I knew beforhand.


    How did you know it? You can't know, that's the whole point.

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

  13. Re: forking from a large active process in solaris

    On Sep 12, 6:36*am, James Kanze wrote:

    > Nobody is saying anything like it. *Correct is what is necessary
    > for reliable applications to work correctly. *I've worked on a
    > number of critical applications, and we test that we do reliably
    > detect all possible error conditions. *If we can't disable such
    > lazy allocation, we can't use the system. *It's that simple.
    > The code has to be correct.


    So shouldn't 'fork' always fail, since the system cannot be sure that
    power will remain on for as long as the 'fork'ed process might want to
    run? Isn't it better to warn now that we might have to fail horribly
    later, when the application can sensibly handle the inability to
    ensure the child can run to completion?

    Heck, memory might get hit by a cosmic ray. Shouldn't 'malloc' always
    fail, so the application doesn't take the risk of seeing the wrong
    data later? Surely the application has some sensible way to handle
    'malloc' failure, but it can't be expected to properly handle an
    uncorrectable ECC error.

    Your definition of "reliable" is simply not useful in many contexts.
    It may be useful in yours, which is why it's often handy to be able to
    disable overcomittment.

    DS

  14. Re: forking from a large active process in solaris

    David Schwartz writes:
    >> No. *The copy on write is an optimization, based on the "as-if"
    >> principle: the formal semantics are that fork() makes a complete
    >> copy (or returns an error). *If the system does not behave
    >> exactly as if fork() made a complete copy (modulo performance
    >> issues), then it is broken.

    >
    > You utterly misunderstand the 'as-if' rule. The whole point of "as-if"
    > is that you can make optimizations that change behavior, even if a
    > compliant application can detect those changes. Otherwise, all
    > effective optimizations would be illegal, since the application could
    > detect that it's running faster.


    The real issue here is 'detection': There is no way to 'detect' that a
    memory page which never needs to be copied could not have been copied
    due to lack of space if copying it had become necessary, because there
    is generally no way to 'detect' something which doesn't exist.

  15. Re: forking from a large active process in solaris

    James Kanze writes:
    > On Sep 12, 12:09 pm, Rainer Weikusat wrote:
    >> James Kanze writes:

    >
    > [...]
    >> >> AFAIK, AIX does it differently (as does Linux).

    >
    >> > I'm not sure.

    >
    >> So, why didn't you simply check it, instead of posting your
    >> speculations?

    >
    > Because I don't have access to them.


    You have access to all of the related documentation, which IBM happens
    to publish online.

    > I have been told that this broken behavior of AIX has been fixed,


    You can attach 'loaded' adjectives to anything you want to attach them
    to and you are completely free to never quote any actual reference
    supporting your opinion, but both serves only to devalue your
    standpoint: Apparently, you cannot support it with arguments and are
    not aware of references supporting it.

    Sun claims that 'AIX, Linux and HP-UX' support overcommitment:

    Some operating systems (such as Linux, IBM AIX, and HP-UX)
    have a feature called memory overcommit (also known as lazy
    swap allocation). In a memory overcommit mode, malloc() does
    not reserve swap space and always returns a non-NULL pointer,
    regardless of whether there is enough VM on the system to
    support it or not.

    http://developers.sun.com/solaris/ar...ubprocess.html

    This is already a text with an agenda, because it mixes two different
    things (how memory allocation requests made by an application are
    handled vs what the kernel does or doesn't allocate autonomously as
    part of a fork), trying to paint them as identical. Obviously the
    conclusion

    malloc() does not reserve swap space and always returns a
    non-NULL pointer,

    =>

    The fork() call never fails because of the lack of VM.

    is a non-sequitur. 'malloc' is an interface to some memory allocator
    implemented as part of the C-library wich uses unspecified system
    interfaces (brk and/or mmap) to actually allocate virtual memory.
    It isn't used as part of the fork system call. The assumption that an
    application intends to actually use memory it allocated is sensible.
    The assumption that a significant portion of the address space of a
    process which called fork will need to be copied is something rather
    different.

    [...]

    >> [...]
    >> > Otherwise, they're broken, and can't be used where
    >> > reliability is an issue.

    >
    >> 'Reliably failure because of possibly incorrect assumptions
    >> about the future' still doesn't increase the 'reliably' of the
    >> system to actually perform useful work.

    >
    > Reliably detecting failure conditions is an important part of
    > reliability.


    What I wrote was that a system which fails to accomplish something is
    not capable of reliably doing 'something'. After all, it doesn't do
    it. A radio which never plays a sound is not a reliable radio. It is a
    broken radio (which 'reliably' fails to fulfil its purpose). A radio
    which sometimes fails is a more reliable radio than the former: It
    fulfils its purpose at least intermittently, so it may actually be
    useful, while the other is just decoration.

    This has no relation to your general statement above (which could be
    disputed on its own, but I am not going to do that).

    > [...]
    >> There is really no need to continously repeat that 'correct is
    >> whatever Solaris happens to do'.

    >
    > Nobody is saying anything like it. Correct is what is necessary
    > for reliable applications to work correctly. I've worked on a
    > number of critical applications, and we test that we do reliably
    > detect all possible error conditions. If we can't disable such
    > lazy allocation, we can't use the system. It's that simple.
    > The code has to be correct.


    If 'code' fails to work in situations where it could have worked,
    it is broken. That simple.

    [...]

    >> Personally, quoting an ex-colleague of mine "I don't give a
    >> flying ****" what adjectives some set of people would like to
    >> attach to a failure to accomplish something which could have
    >> been accomplished.

    >
    > Which maybe could have been accomplished. Obviously, you don't
    > know much about writing correctly functioning software.


    'Obviously', you are really confused regarding memory allocation
    requirements and COW-fork implementations: A page needs to be copied
    whenever more than one process is currently referencing it and one of
    those processes writes to this page. At this time, the system needs to
    'somehow' find a unused page, copy the contents of the other into it
    and modify the page tables of the writing process to refer to the new
    page, while all others can continue to share the old one. This means
    that the amount of pages which will have to be copied in future is a
    number less than or equal to the amount of COW-shared pages times the
    number of processes sharing them minus one. That's the only 'reliable'
    information available to the system, because anything else depends on
    the future beahviour of the application which forked. I can know
    something about the future behaviour of this application, because I
    know the code. Hence, I can conclude that it will be possible to
    perform an intended operation, while the system cannot conclude that
    it won't. It can only assume that, leading to spurious failures
    whenever this assumption was wrong.

    This is actually even more complicated because both the amount of
    available swap space and the amount of free RAM vary over time,
    depending on the [future] behaviour of other applications running on
    the same system[*]. The only 'reliable' way for the system to determine
    if it can copy a page which needs to be copied is when the
    corresponding page fault needs to be handled: By this time, the
    required ressource are available or are not available. Anything else
    is guesswork. That the system may now be incapable of performing an
    operation (which may and likely will not even be necessary) does not
    mean it will be incapable at the time when the operation becomes
    necessary.
    [*] And 'future administrative actions': More swap space could
    be added, eg if the presently available one is running low.

    > And how do you accomplish "detecting the error" under Linux
    > (supposing you haven't disabled this behavior).


    Which error?

    >> It didn't do what I wanted it to do, despite it could have,
    >> which I knew beforhand.

    >
    > How did you know it? You can't know, that's the whole point.


    I can, because I can know the actual memory requirements, because I
    know the application code. This should actually be self-evident to
    anyone.


  16. Re: forking from a large active process in solaris

    On 2008-09-12, Rainer Weikusat wrote:
    >>
    >> Which maybe could have been accomplished. Obviously, you don't
    >> know much about writing correctly functioning software.

    >
    > 'Obviously', you are really confused regarding memory allocation
    > requirements and COW-fork implementations: A page needs to be copied
    >
    > ... more CoW-related-rambling we all know how CoW works, thank you ...
    >
    >> And how do you accomplish "detecting the error" under Linux
    >> (supposing you haven't disabled this behavior).

    >
    > Which error?


    The error that the system doesn't have enough memory to give you...
    Look, it's very simple... you can either detect that error by fork
    returning -1, or you can "detect" it by your process (and/or possibly
    other processes) being killed without explanation. How can you possibly
    support that the second case is preferable?

    As others explained previously, CoW-based fork is just a performance
    optimization. The semantics are "you've got a duplicate of the forked
    process". If the semantics are changed to: "you've got a duplicate which
    might run, or might fail at any time, during any page fault without
    notice" then it's perfectly obvious that's a broken implementation.

    After skimming a bit through the Design & Implementation of the 4.4BSD
    operating system, which I remembered saying something to that effect,
    McKusick et al. apparently agree too:

    To avoid running out of memory resources, the kernel must ensure that
    it does not promise to provide more virtual memory than it is able to
    deliver.
    ...
    The reason for this restriction is to ensure that processes get
    synchronous notification of memory limitations. Specifically, a
    process should get an error back from a system call (such as sbrk,
    fork, or mmap)
    ...
    Trouble arises when it has no free pages to service the fault and no
    available swap space to save an active page. Here, the kernel has no
    choice but to send a segmentation-fault signal to the process
    unfortunate enough to be page faulting. Such asynchronous
    notification of insufficient memory resources is unacceptable.


    P.S. The semantics of fork MUST be maintained, however there is need for
    a system call with different semantics (i.e. when you don't care about
    duplication of the whole process memory), and there is such a system
    call, it's called vfork (although it's not always properly implemented).

    --
    John Tsiombikas (Nuclear / Mindlapse)
    http://nuclear.sdf-eu.org/

  17. Re: forking from a large active process in solaris

    John Tsiombikas writes:

    >P.S. The semantics of fork MUST be maintained, however there is need for
    >a system call with different semantics (i.e. when you don't care about
    >duplication of the whole process memory), and there is such a system
    >call, it's called vfork (although it's not always properly implemented).


    Even without that semantics, fork() will always take O(N) where N is
    the size of the process address map. vfork() is better but should be
    used in other primitives: system(), posix_spawn().

    Casper
    --
    Expressed in this posting are my opinions. They are in no way related
    to opinions held by my employer, Sun Microsystems.
    Statements on Sun products included here are not gospel and may
    be fiction rather than truth.

  18. Re: forking from a large active process in solaris

    On Sep 12, 11:47*am, John Tsiombikas wrote:

    > The error that the system doesn't have enough memory to give you...
    > Look, it's very simple... you can either detect that error by fork
    > returning -1, or you can "detect" it by your process (and/or possibly
    > other processes) being killed without explanation. How can you possibly
    > support that the second case is preferable?


    Right -- which is better? Police search people in violation of their
    rights? Or police get shot and killed by people they didn't search out
    of respect for their rights?

    Where does the most common case -- where everything just works with
    overcommitt -- fit in your world?

    > As others explained previously, CoW-based fork is just a performance
    > optimization. The semantics are "you've got a duplicate of the forked
    > process". If the semantics are changed to: "you've got a duplicate which
    > might run, or might fail at any time, during any page fault without
    > notice" then it's perfectly obvious that's a broken implementation.


    Except that's how every computer works. The power might fail at any
    time. The CPU might blow up. The disk drive may fail. Computers can
    fail, and we don't refuse all operations just because a subsequent
    failure condition might cause them to fail later.

    > * *To avoid running out of memory resources, the kernel must ensure that
    > * *it does not promise to provide more virtual memory than it is ableto
    > * *deliver.


    Except the OS cannot know how much it is able to deliver. Imagine a
    system that shuts down memory sticks on ECC errors. Should it fail
    every 'malloc' because every memory stick might fail?

    > P.S. The semantics of fork MUST be maintained, however there is need for
    > a system call with different semantics (i.e. when you don't care about
    > duplication of the whole process memory), and there is such a system
    > call, it's called vfork (although it's not always properly implemented).


    The semantics of no operation require the OS to predict future failure
    conditions and fail operations that it can successfully perform now
    because a future failure might make it unable to complete them.

    DS

  19. Re: forking from a large active process in solaris

    In article
    <5b2d7390-67b5-4e5c-ada2-3fe0555a0f60@a2g2000prm.googlegroups.com>,
    David Schwartz wrote:

    > On Sep 12, 11:47*am, John Tsiombikas wrote:
    >
    > > The error that the system doesn't have enough memory to give you...
    > > Look, it's very simple... you can either detect that error by fork
    > > returning -1, or you can "detect" it by your process (and/or possibly
    > > other processes) being killed without explanation. How can you possibly
    > > support that the second case is preferable?

    >
    > Right -- which is better? Police search people in violation of their
    > rights? Or police get shot and killed by people they didn't search out
    > of respect for their rights?
    >
    > Where does the most common case -- where everything just works with
    > overcommitt -- fit in your world?
    >
    > > As others explained previously, CoW-based fork is just a performance
    > > optimization. The semantics are "you've got a duplicate of the forked
    > > process". If the semantics are changed to: "you've got a duplicate which
    > > might run, or might fail at any time, during any page fault without
    > > notice" then it's perfectly obvious that's a broken implementation.

    >
    > Except that's how every computer works. The power might fail at any
    > time. The CPU might blow up. The disk drive may fail. Computers can
    > fail, and we don't refuse all operations just because a subsequent
    > failure condition might cause them to fail later.


    The point is that if you CAN report a problem synchronously, this is
    preferable.

    Some things you can't do anything about, like power failures and memory
    chip errors, and those will necessarily prodyce asynchronous errors.
    But you should do as much as you can to avoid that type of error
    detection.

    --
    Barry Margolin, barmar@alum.mit.edu
    Arlington, MA
    *** PLEASE post questions in newsgroups, not directly to me ***
    *** PLEASE don't copy me on replies, I'll read them in the group ***

  20. Re: forking from a large active process in solaris

    On Sep 12, 7:49*pm, Barry Margolin wrote:

    > The point is that if you CAN report a problem synchronously, this is
    > preferable.


    You can report any problem synchronously if you assume it will happen.
    You cannot report any problem synchronously at a time when you cannot
    be sure it will happen.

    > Some things you can't do anything about, like power failures and memory
    > chip errors, and those will necessarily prodyce asynchronous errors. *
    > But you should do as much as you can to avoid that type of error
    > detection.


    I agree. So the question is which category this kind of problem falls
    into. Rational people can disagree, so it's made a configurable option
    on many operating systems.

    DS

+ Reply to Thread
Page 1 of 2 1 2 LastLast