sbrk() broken? - Linux

This is a discussion on sbrk() broken? - Linux ; Hi, I run the following code void* first = sbrk(0); void* old = first; int n = 0; void* p = 0; while ((p = sbrk(4096)) != (void*)(-1)) { old = p; n++; } printf("first %p last %p diff %.3f ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: sbrk() broken?

  1. sbrk() broken?

    Hi,

    I run the following code

    void* first = sbrk(0);
    void* old = first;

    int n = 0;
    void* p = 0;
    while ((p = sbrk(4096)) != (void*)(-1)) {
    old = p;
    n++;
    }
    printf("first %p last %p diff %.3f GByte n %d\n",
    first,old,double((size_t)old - (size_t)first)/1024/1024/1024,n);

    and get on a host with uname -a output

    Linux tahiti 2.6.17.8y #3 SMP PREEMPT Mon Sep 11 16:37:48 CEST 2006 i686 unknown unknown GNU/Linux

    this result:

    first 0x806d000 last 0xb7a00000 diff 2.744 GByte n 719252

    This corresponds nicely with the sum of RAM and swap space.

    So far so good.

    However, on another host:

    Linux imhotep 2.6.5-7.139-smp #1 SMP Fri Jan 14 15:41:33 UTC 2005 x86_64
    x86_64 x86_64 GNU/Linux

    I get

    first 0x806d000 last 0x55553000 diff 1.208 GByte n 316647

    though this host has got 8 GByte of RAM and 4 GByte of swap

    (the program above was compiled Cand linked with -m32) and I was the only
    active user.

    ulimit -a shows no limitations.

    Any idea, what's going on?

    Thanks a lot,

    -ulrich

    [nosave]
    ----------------------------------------------------------------------------
    Ulrich Lauther ph: +49 89 636 48834 fx: ... 636 42284
    Siemens CT SE 6 Internet: Ulrich.Lauther@siemens.com

  2. Re: sbrk() broken?


    Ulrich Lauther wrote:

    > However, on another host:
    >
    > Linux imhotep 2.6.5-7.139-smp #1 SMP Fri Jan 14 15:41:33 UTC 2005 x86_64
    > x86_64 x86_64 GNU/Linux
    >
    > I get
    >
    > first 0x806d000 last 0x55553000 diff 1.208 GByte n 316647
    >
    > though this host has got 8 GByte of RAM and 4 GByte of swap
    >
    > (the program above was compiled Cand linked with -m32) and I was the only
    > active user.
    >
    > ulimit -a shows no limitations.
    >
    > Any idea, what's going on?


    The 'brk' system call, which is what you are ultimately calling,
    allocates contiguous virtual memory. My bet is that for some reason
    there is only 1.2GB of contiguous virtual memory available to that
    process, perhaps because of calls (direct or indirect to 'mmap').

    It may help to modify the program to output the contents of
    '/proc/self/maps' when the 'sbrk' fails.

    DS


  3. Re: sbrk() broken?

    David Schwartz wrote:

    : Ulrich Lauther wrote:

    : > However, on another host:
    : >
    : > Linux imhotep 2.6.5-7.139-smp #1 SMP Fri Jan 14 15:41:33 UTC 2005 x86_64
    : > x86_64 x86_64 GNU/Linux
    : >
    : > I get
    : >
    : > first 0x806d000 last 0x55553000 diff 1.208 GByte n 316647
    : >
    : > though this host has got 8 GByte of RAM and 4 GByte of swap
    : >
    : > (the program above was compiled Cand linked with -m32) and I was the only
    : > active user.
    : >
    : > ulimit -a shows no limitations.
    : >
    : > Any idea, what's going on?

    : The 'brk' system call, which is what you are ultimately calling,
    : allocates contiguous virtual memory. My bet is that for some reason
    : there is only 1.2GB of contiguous virtual memory available to that
    : process, perhaps because of calls (direct or indirect to 'mmap').

    : It may help to modify the program to output the contents of
    : '/proc/self/maps' when the 'sbrk' fails.

    I did this and processed the output to show the size of free memeory
    ranges, assuming that this is indicated by a "0" in column 6.
    This gives:

    08048000-08049000 r-xp 00000000 00:1a 108330 /home/lauther/turbo/test/test
    08049000-0804a000 rwxp 00000000 00:1a 108330 /home/lauther/turbo/test/test
    0804a000-55554000 rwxp 0804a000 00:00 0
    free range 1297129472 Bytes 1.208 GByte
    55555000-5556b000 r-xp 00000000 38:05 586380 /lib/ld-2.3.3.so
    5556b000-5556c000 rwxp 00016000 38:05 586380 /lib/ld-2.3.3.so
    5556c000-5556d000 rwxp 5556c000 00:00 0
    free range 4096 Bytes 0.000 GByte
    5556d000-55659000 r-xp 00000000 00:1a 253432 /home/lauther/turbo/lib_linux_x86_32_dyn_r_g++/libturbo.so
    55659000-55685000 rwxp 000ec000 00:1a 253432 /home/lauther/turbo/lib_linux_x86_32_dyn_r_g++/libturbo.so
    55685000-55687000 rwxp 55685000 00:00 0
    free range 8192 Bytes 0.000 GByte
    556a0000-55798000 r-xp 00000000 38:05 720984 /usr/X11R6/lib/libX11.so.6.2
    55798000-5579c000 rwxp 000f7000 38:05 720984 /usr/X11R6/lib/libX11.so.6.2
    5579c000-55868000 r-xp 00000000 38:05 704498 /usr/lib/libtk8.4.so
    55868000-55874000 rwxp 000cb000 38:05 704498 /usr/lib/libtk8.4.so
    55874000-55875000 rwxp 55874000 00:00 0
    free range 4096 Bytes 0.000 GByte
    55875000-55911000 r-xp 00000000 38:05 704219 /usr/lib/libtcl8.4.so
    55911000-5591b000 rwxp 0009c000 38:05 704219 /usr/lib/libtcl8.4.so
    5591b000-5591d000 rwxp 5591b000 00:00 0
    free range 8192 Bytes 0.000 GByte
    5591d000-5591f000 r-xp 00000000 38:05 586389 /lib/libdl.so.2
    5591f000-55920000 rwxp 00002000 38:05 586389 /lib/libdl.so.2
    55920000-559dd000 r-xp 00000000 00:19 4005872 /sw/gcc4.0.2/lib/libstdc++.so.6.0.6
    559dd000-559f5000 rwxp 000bc000 00:19 4005872 /sw/gcc4.0.2/lib/libstdc++.so.6.0.6
    559f5000-559fb000 rwxp 559f5000 00:00 0
    free range 24576 Bytes 0.000 GByte
    559fb000-55a1c000 r-xp 00000000 38:05 586406 /lib/tls/libm.so.6
    55a1c000-55a1d000 rwxp 00020000 38:05 586406 /lib/tls/libm.so.6
    55a1d000-55a27000 r-xp 00000000 00:19 4005868 /sw/gcc4.0.2/lib/libgcc_s.so.1
    55a27000-55a28000 rwxp 00009000 00:19 4005868 /sw/gcc4.0.2/lib/libgcc_s.so.1
    55a28000-55b32000 r-xp 00000000 38:05 586405 /lib/tls/libc.so.6
    55b32000-55b3a000 rwxp 00109000 38:05 586405 /lib/tls/libc.so.6
    55b3a000-55b3d000 rwxp 55b3a000 00:00 0
    free range 12288 Bytes 0.000 GByte
    55b3d000-55b4d000 r-xp 00000000 00:17 3216114 /sw/local/lib/libz.so.1.2.1
    55b4d000-55b4e000 rwxp 0000f000 00:17 3216114 /sw/local/lib/libz.so.1.2.1
    55b4e000-55b4f000 rwxp 55b4e000 00:00 0
    free range 4096 Bytes 0.000 GByte
    55b4f000-55b5e000 r-xp 00000000 38:05 704142 /usr/lib/libbz2.so.1.0.0
    55b5e000-55b5f000 rwxp 0000f000 38:05 704142 /usr/lib/libbz2.so.1.0.0
    55b5f000-55b60000 rwxp 55b5f000 00:00 0
    free range 4096 Bytes 0.000 GByte
    ffffb000-ffffd000 rwxp ffffb000 00:00 0
    free range 8192 Bytes 0.000 GByte
    ffffd000-ffffe000 rw-p ffffd000 00:00 0
    free range 4096 Bytes 0.000 GByte
    ffffe000-fffff000 ---p 00000000 00:00 0
    free range 4096 Bytes 0.000 GByte
    : DS

    So 1.2 GByte is the largest free range.
    However, I have got 8 GByte of RAM and 4 GByte should be addressable in
    32-bit mode.
    The processor is a AMD Opteron(tm) Processor 252.

    On my notebook, where things work as expected, I got an Intel T2300
    processor and CONFIG_HIGHMEM4G=y.

    I don't see such a parameter on the Opteron host. Is this feature there
    missing?

    And what's going on below 08048000 ?


    Thanks for listening,

    -ulrich

    [nosave]
    ----------------------------------------------------------------------------
    Ulrich Lauther ph: +49 89 636 48834 fx: ... 636 42284
    Siemens CT SE 6 Internet: Ulrich.Lauther@siemens.com

  4. Re: sbrk() broken?


    Ulrich Lauther wrote:

    > So 1.2 GByte is the largest free range.


    So there you have it. This is one of the reasons 'sbrk' is a poor
    choice for memory allocation of large blocks.

    > However, I have got 8 GByte of RAM and 4 GByte should be addressable in
    > 32-bit mode.
    > The processor is a AMD Opteron(tm) Processor 252.
    >
    > On my notebook, where things work as expected, I got an Intel T2300
    > processor and CONFIG_HIGHMEM4G=y.
    >
    > I don't see such a parameter on the Opteron host. Is this feature there
    > missing?


    That won't solve the problem, which is basically that you are insisting
    on contiguous memory. Your memory map has holes in it.

    Why are you using 'sbrk' to allocate so much memory? You should be
    using 'mmap'. It is not reasonable to insist that such a large block be
    contiguous.

    > And what's going on below 08048000 ?


    Probably the kernel's mappings are below 08000000. I'm not sure what's
    between 08000000 and 08048000, perhaps special mappings made visible to
    user processes by the kernel.

    DS


  5. Re: sbrk() broken?

    > : It may help to modify the program to output the contents of
    > : '/proc/self/maps' when the 'sbrk' fails.
    >
    > I did this and processed the output to show the size of free memeory
    > ranges, assuming that this is indicated by a "0" in column 6.


    No, that is an indication of were the backing storage (non-RAM) lives.
    "00:1a 108330" means inode 108330 on device 00:1a (which in fact is
    /home/lauther/turbo/test/test), while "00:00 0" means swapspace.

    > This gives:
    >
    > 08048000-08049000 r-xp 00000000 00:1a 108330 /home/lauther/turbo/test/test
    > 08049000-0804a000 rwxp 00000000 00:1a 108330 /home/lauther/turbo/test/test
    > 0804a000-55554000 rwxp 0804a000 00:00 0
    > free range 1297129472 Bytes 1.208 GByte


    No, the free range is 55554000 to 55555000, which is 4KB.

    > 55555000-5556b000 r-xp 00000000 38:05 586380 /lib/ld-2.3.3.so
    > 5556b000-5556c000 rwxp 00016000 38:05 586380 /lib/ld-2.3.3.so
    > 5556c000-5556d000 rwxp 5556c000 00:00 0


    [snip]

    > 55b5e000-55b5f000 rwxp 0000f000 38:05 704142 /usr/lib/libbz2.so.1.0.0
    > 55b5f000-55b60000 rwxp 55b5f000 00:00 0
    > free range 4096 Bytes 0.000 GByte
    > ffffb000-ffffd000 rwxp ffffb000 00:00 0


    No, the free range is 55b60000 to something like ffffb000,
    which is about 0xaa49b000 or 2.8 GB. I say "something like"
    because ffffb000 is in the stack segment, and the stack is
    allowed to grow [downwards] to about 8MB by default, so the
    upper bound of the free range might be 0xfff7b000 or so.

    > And what's going on below 08048000 ?


    For an historical reason having to do with a very early *NIX clone
    on x86 hardware about 20 years ago, 0x08048000 is the default
    start address for an ET_EXEC file when built by /bin/ld.
    Then, 0x08048000 was the top of the stack. Now, it is a convenient
    "hole" that is most often empty but sometimes is used by clever software.
    Usually it does not matter that it is unused: it is only 128MB, after all.

    If you really cannot afford a machine that has a 64-bit address space
    (such as inexpensive boxes powered by AMD Semprons), then you should visit
    http://BitWagon.com/tub/tub.html to see a workaround.

    --

  6. Re: sbrk() broken?



    On Jan 5, 4:04 am, "David Schwartz" wrote:
    > Ulrich Lauther wrote:
    > > So 1.2 GByte is the largest free range.So there you have it. This is one of the reasons 'sbrk' is a poor

    > choice for memory allocation of large blocks.
    >
    > > However, I have got 8 GByte of RAM and 4 GByte should be addressable in
    > > 32-bit mode.
    > > The processor is a AMD Opteron(tm) Processor 252.

    >
    > > On my notebook, where things work as expected, I got an Intel T2300
    > > processor and CONFIG_HIGHMEM4G=y.

    >
    > > I don't see such a parameter on the Opteron host. Is this feature there
    > > missing?That won't solve the problem, which is basically that you are insisting

    > on contiguous memory. Your memory map has holes in it.
    >

    If what sbrk want is virtually continuous memory, why kernel can't meet
    the requirement by adjust the page table to make the hole disappear?

    > Why are you using 'sbrk' to allocate so much memory? You should be
    > using 'mmap'. It is not reasonable to insist that such a large block be
    > contiguous.
    >
    > > And what's going on below 08048000 ?Probably the kernel's mappings are below 08000000. I'm not sure what's

    > between 08000000 and 08048000, perhaps special mappings made visible to
    > user processes by the kernel.
    >
    > DS



  7. Re: sbrk() broken?


    Bin Chen wrote:

    > If what sbrk want is virtually continuous memory, why kernel can't meet
    > the requirement by adjust the page table to make the hole disappear?


    If the kernel changes the page tables so that mappings were in a
    different virtual address, the program would crash. No program can
    tolerate having its memory mappings changed from underneath it unless
    the virtual address stays the same.

    The kernel can certainly map and unmap *physical* memory, and the
    process won't know or care. But it cannot change virtual addresses. The
    code would break as its pointers suddenly point to the wrong place or
    the wrong code is suddenly running.

    DS


  8. Re: sbrk() broken?

    "Bin Chen" writes:
    > On Jan 5, 4:04 am, "David Schwartz" wrote:
    >> Ulrich Lauther wrote:
    >> > So 1.2 GByte is the largest free range.So there you have it. This is one of the reasons 'sbrk' is a poor

    >> choice for memory allocation of large blocks.
    >>
    >> > However, I have got 8 GByte of RAM and 4 GByte should be addressable in
    >> > 32-bit mode.
    >> > The processor is a AMD Opteron(tm) Processor 252.

    >>
    >> > On my notebook, where things work as expected, I got an Intel T2300
    >> > processor and CONFIG_HIGHMEM4G=y.

    >>
    >> > I don't see such a parameter on the Opteron host. Is this feature there
    >> > missing?That won't solve the problem, which is basically that you are insisting

    >> on contiguous memory. Your memory map has holes in it.
    >>

    > If what sbrk want is virtually continuous memory, why kernel can't meet
    > the requirement by adjust the page table to make the hole disappear?


    Because it is a hole in the virtual address space.

  9. Re: sbrk() broken?

    John Reiser wrote:
    : > : It may help to modify the program to output the contents of
    : > : '/proc/self/maps' when the 'sbrk' fails.
    : >
    : > I did this and processed the output to show the size of free memeory
    : > ranges, assuming that this is indicated by a "0" in column 6.

    : No, that is an indication of were the backing storage (non-RAM) lives.
    : "00:1a 108330" means inode 108330 on device 00:1a (which in fact is
    : /home/lauther/turbo/test/test), while "00:00 0" means swapspace.

    o.k., I revised my test program to count free blocks correctly and BEFORE
    all the sbrk() calls.
    Adding up all free space and analysing the size of free ranges, I now get:

    total free blocks 3.869 GByte maximum free range 2.661 GByte, which seems
    reasonable in 32 bit addressing mode.

    BTW, I wasn't asking (explicitely) for contineous free space, but lived with
    the naive belief (which may have been true in old days) that the memory
    layout is

    | program | heap | free space | stack |
    ^
    |
    break


    I now changed my application to allocate memory using mmap, in chunks of 64
    pages.
    There remains an open question:
    I understood, there is a limit to the number of memory areas that can be
    mapped by aprocess simultaneously. So how big should I make my chunks to
    avoid problems in this respect?

    Thanks for the feedback; I learned a lot!

    -ulrich
    : > This gives:
    : >
    : > 08048000-08049000 r-xp 00000000 00:1a 108330 /home/lauther/turbo/test/test
    : > 08049000-0804a000 rwxp 00000000 00:1a 108330 /home/lauther/turbo/test/test
    : > 0804a000-55554000 rwxp 0804a000 00:00 0
    : > free range 1297129472 Bytes 1.208 GByte

    : No, the free range is 55554000 to 55555000, which is 4KB.

    : > 55555000-5556b000 r-xp 00000000 38:05 586380 /lib/ld-2.3.3.so
    : > 5556b000-5556c000 rwxp 00016000 38:05 586380 /lib/ld-2.3.3.so
    : > 5556c000-5556d000 rwxp 5556c000 00:00 0

    : [snip]

    : > 55b5e000-55b5f000 rwxp 0000f000 38:05 704142 /usr/lib/libbz2.so.1.0.0
    : > 55b5f000-55b60000 rwxp 55b5f000 00:00 0
    : > free range 4096 Bytes 0.000 GByte
    : > ffffb000-ffffd000 rwxp ffffb000 00:00 0

    : No, the free range is 55b60000 to something like ffffb000,
    : which is about 0xaa49b000 or 2.8 GB. I say "something like"
    : because ffffb000 is in the stack segment, and the stack is
    : allowed to grow [downwards] to about 8MB by default, so the
    : upper bound of the free range might be 0xfff7b000 or so.

    : > And what's going on below 08048000 ?

    : For an historical reason having to do with a very early *NIX clone
    : on x86 hardware about 20 years ago, 0x08048000 is the default
    : start address for an ET_EXEC file when built by /bin/ld.
    : Then, 0x08048000 was the top of the stack. Now, it is a convenient
    : "hole" that is most often empty but sometimes is used by clever software.
    : Usually it does not matter that it is unused: it is only 128MB, after all.

    : If you really cannot afford a machine that has a 64-bit address space
    : (such as inexpensive boxes powered by AMD Semprons), then you should visit
    : http://BitWagon.com/tub/tub.html to see a workaround.

    : --

    --
    -lauther

    [nosave]
    ----------------------------------------------------------------------------
    Ulrich Lauther ph: +49 89 636 48834 fx: ... 636 42284
    Siemens CT SE 6 Internet: Ulrich.Lauther@siemens.com

  10. Re: sbrk() broken?

    > BTW, I wasn't asking (explicitely) for contineous free space, but lived with
    > the naive belief (which may have been true in old days) that the memory
    > layout is
    >
    > | program | heap | free space | stack |
    > ^
    > |
    > break


    Such an organization is inconvenient for mmap(0, ). So on 32-bit machines
    the kernel starts "mmap space" at TASK_UNMAPPED_BASE, which is 1/3 of
    PROCESS_SIZE, which typically is 3GB or 2GB. Also, the whole idea of
    brk() and sbrk() ought to disappear because the Linux kernel does not
    allow the value of brk(0) to be virtualized; it is set *only* by execve().
    Instead, the kernel ought to provide a binary interface to /proc/self/maps
    (such as the VirtualQuery API of Win32) so that it would be much easier
    for user code to manage address space co-operatively [main program, subsystems,
    debugger, auditor, ...] in a unified manner. brk() is dreck.

    > I now changed my application to allocate memory using mmap, in chunks of 64
    > pages.
    > There remains an open question:
    > I understood, there is a limit to the number of memory areas that can be
    > mapped by aprocess simultaneously. So how big should I make my chunks to
    > avoid problems in this respect?


    A couple years ago "too many vma" was a problem. Better kernel code
    has reduced most of the bad effects, but simultaneously mapping only a few
    pages from each of hundreds of thousands of files still causes slowness.
    Chunks of 256KB to 1MB certainly are big enough. By default the malloc()
    of glibc-2.4 uses 1MB as a threshhold to switch from general heap to mmap.

    --


  11. Re: sbrk() broken?

    John Reiser writes:

    [...]

    > Also, the whole idea of brk() and sbrk() ought to disappear because
    > the Linux kernel does not allow the value of brk(0) to be
    > virtualized; it is set *only* by execve().


    What is this supposed to mean an why is it a reason that something
    'must go away'?

    > Instead, the kernel ought to provide a binary interface to /proc/self/maps
    > (such as the VirtualQuery API of Win32) so that it would be much easier
    > for user code to manage address space co-operatively [main program, subsystems,
    > debugger, auditor, ...] in a unified manner.


    Why should the kernel do so?

    > brk() is dreck.


    It is a very convenient routine to use for dynamic memory allocations
    in code that does not need the full flexibility of dynamic allocation
    and deallocation.

    Not everything is a C++(KDE-)-GUI-application that needs to map 2^25
    dynamic libraries to print 'Hello World' in a way that cause
    everything slower than a 10Ghz CPU with 12TB of RAM and a 1230PB
    IDE-disk (with an average seek time of 4 hours) to grind to a halt.

  12. Re: sbrk() broken?

    Rainer Weikusat wrote:
    > John Reiser writes:
    >>Also, the whole idea of brk() and sbrk() ought to disappear because
    >>the Linux kernel does not allow the value of brk(0) to be
    >>virtualized; it is set *only* by execve().

    >
    >
    > What is this supposed to mean an why is it a reason that something
    > 'must go away'?


    brk() has been making it cumbersome and difficult to implement user-level
    virtualization and cooperative kernel+user address-space management for
    several years. It is a significant problem for program compressors,
    in-process debuggers, memory access checkers, execution auditors of
    several kinds.

    >>Instead, the kernel ought to provide a binary interface to /proc/self/maps
    >>(such as the VirtualQuery API of Win32) so that it would be much easier
    >>for user code to manage address space co-operatively [main program, subsystems,
    >>debugger, auditor, ...] in a unified manner.

    >
    >
    > Why should the kernel do so?
    >
    >
    >>brk() is dreck.

    >
    >
    > It is a very convenient routine to use for dynamic memory allocations
    > in code that does not need the full flexibility of dynamic allocation
    > and deallocation.


    brk() is "too simple"; there is not enough indirection involved.
    Its current implementation as a bare system call, for which SYS_brk(0)
    reports a value that can be set *only* by an actual execve(), means
    that user-mode virtualization of execve() has significant problems.
    For instance, the virtualizer must find and intercept all __NR_brk
    system calls, even the ones which are inlined and do not use linkage
    through an external symbol (such as the brk() commonly provided by glibc.)

    The minimum required change is a new system call SYS_setbrk(void *)
    which sets the value returned by subsequent calls to SYS_brk(0).

    The better, further change is a binary interface to /proc/self/maps
    which provides fast and deterministic information about arbitrary
    portions of the address space. (Probe an arbitrary address,
    properly quote any filenames, etc.) This would make it much easier
    to virtualize execve(), and it would be easy to implement a user-
    mode brk() subroutine in terms of the new interface.

    > Not everything is a C++(KDE-)-GUI-application that needs to map 2^25
    > dynamic libraries to print 'Hello World' in a way that cause
    > everything slower than a 10Ghz CPU with 12TB of RAM and a 1230PB
    > IDE-disk (with an average seek time of 4 hours) to grind to a halt.


    Yes, there are simple applications. But some large and complicated apps
    are reasonably required. The Linux kernel's current support of SYS_brk()
    for simple apps is making it more difficult than necessary to implement
    large and complicated apps, and even to verify that the simple apps
    are correct, robust, can adapt to environments having low resources, etc.

    --

  13. Re: sbrk() broken?

    John Reiser writes:
    > Rainer Weikusat wrote:
    >> John Reiser writes:
    >>>Also, the whole idea of brk() and sbrk() ought to disappear because
    >>>the Linux kernel does not allow the value of brk(0) to be
    >>>virtualized; it is set *only* by execve().

    >>
    >>
    >> What is this supposed to mean an why is it a reason that something
    >> 'must go away'?

    >
    > brk() has been making it cumbersome and difficult to implement user-level
    > virtualization and cooperative kernel+user address-space management for
    > several years. It is a significant problem for program compressors,
    > in-process debuggers, memory access checkers, execution auditors of
    > several kinds.


    This is still basically an assertion without a supporting
    reason. Assuming it to be true, all of the examples you mentioned

    a) are fringe cases
    b) have been implemented

    So, where's the problem?

    >>>Instead, the kernel ought to provide a binary interface to /proc/self/maps
    >>>(such as the VirtualQuery API of Win32) so that it would be much easier
    >>>for user code to manage address space co-operatively [main program, subsystems,
    >>>debugger, auditor, ...] in a unified manner.

    >>
    >>
    >> Why should the kernel do so?
    >>
    >>
    >>>brk() is dreck.


    Repetition doesn't make an opinion without a reason for it any
    different.

    >> It is a very convenient routine to use for dynamic memory allocations
    >> in code that does not need the full flexibility of dynamic allocation
    >> and deallocation.

    >
    > brk() is "too simple"; there is not enough indirection involved.


    Less indirection usually means 'less overhead', so this could be
    regarded as a feature (eg uclibc contains a memory allocator, which,
    according to the configuration help, is 'very fast' because it only
    uses brk).

    > Its current implementation as a bare system call, for which SYS_brk(0)
    > reports a value that can be set *only* by an actual execve(), means
    > that user-mode virtualization of execve() has significant problems.
    > For instance, the virtualizer must find and intercept all __NR_brk
    > system calls, even the ones which are inlined and do not use linkage
    > through an external symbol (such as the brk() commonly provided by
    > glibc.)
    >
    > The minimum required change is a new system call SYS_setbrk(void *)
    > which sets the value returned by subsequent calls to SYS_brk(0).


    That's sounds like an extraordinary simple kernel modification.

    [...]

    >> Not everything is a C++(KDE-)-GUI-application that needs to map 2^25
    >> dynamic libraries to print 'Hello World' in a way that cause
    >> everything slower than a 10Ghz CPU with 12TB of RAM and a 1230PB
    >> IDE-disk (with an average seek time of 4 hours) to grind to a halt.

    >
    > Yes, there are simple applications. But some large and complicated apps
    > are reasonably required. The Linux kernel's current support of SYS_brk()
    > for simple apps is making it more difficult than necessary to implement
    > large and complicated apps,


    That's another assertion. If the application is already 'large and
    complicated' (and not just 'bloated and mazy', for instance), why is
    the additional complication so much more important than the other
    inherent complication?

    > and even to verify that the simple apps are correct, robust, can
    > adapt to environments having low resources, etc.


    'Correctness' is a somewhat nebulous term for real world software,
    which, except in really simple cases, will contain parts that are
    expected to work 'most of the time' because something that works 'all
    of the time' cannot possibly be implemented (eg anything that does
    network communication. TCP isn't really reliable and it no protocol
    even could). I do not understand the meaning of 'robust' in this
    context. Generally, applications that are simple enough to use brk()
    for dynamic memory allocation will work in 'low resource'
    environments if possible (application cannot work with less memory
    than it needs).

    BTW, there is an actual problem with brk(): It doesn't fit into the
    address space model of a virtual memory operating system very well,
    were parts of the address space may come and go at leisure of the
    application, by using mmap or dlopen, for instance. This means there
    need to be at least two separate 'dynamic areas', one for mappings and
    one for brk and this will limit the amount of memory available to
    both.

+ Reply to Thread