sharing memory map between processes (same parent) - Unix

This is a discussion on sharing memory map between processes (same parent) - Unix ; On Fri, 02 May 2008 11:24:24 +0200 moi wrote: | On Fri, 02 May 2008 08:41:58 +0000, phil-news-nospam wrote: | |> On Thu, 01 May 2008 12:14:42 +0200 moi wrote: |> | On Thu, 01 May 2008 03:22:48 +0000, phil-news-nospam ...

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3
Results 41 to 43 of 43

Thread: sharing memory map between processes (same parent)

  1. Re: sharing memory map between processes (same parent)

    On Fri, 02 May 2008 11:24:24 +0200 moi wrote:
    | On Fri, 02 May 2008 08:41:58 +0000, phil-news-nospam wrote:
    |
    |> On Thu, 01 May 2008 12:14:42 +0200 moi wrote:
    |> | On Thu, 01 May 2008 03:22:48 +0000, phil-news-nospam wrote: |
    |
    |> |> I want to overlap I/O _just_ _enough_ to keep the disk working
    |> without |> the gaps of turnaround through the process that requested the
    |> I/O. |> Maybe two processes will be enough. Or maybe they won't be
    |> enough to |> let the disk do enough elevator operation to minimize seeks
    |> and maximize |> data read at each seek. These are things I need to
    |> explore, and am not |> yet ready to ask about them. OTOH, once I'm
    |> within a couple percent of |> theoretically perfect throughput, it is
    |> probably impractical to improve |> upon just because the speedup
    |> wouldn't really be noticed. |
    | |
    |> | BTW: you mentioned passing filedescriptors/ mmap()ped regions. | Have
    |> you considered using the filesystem as a way to communicate ? | Does the
    |> file have a link to the filesystem ? | Why not let the slave processes
    |> open() before mmap() ?
    |>
    |> I don't know what you mean by slave processes. Of course mmap() needs
    |> an opened descriptor.
    |
    | Slave processes are child processes which do the hard work. In the pre-
    | thread era, slave processes were used to perform the disk-I/O (blocking
    | on disk reads) and putting the resulting buffers into shared memory. In
    | your case you use mmap(), which causes the slaves to "block" on
    | pagefaults instead of explicit reads, but the mechanism is basically the
    | same.

    I understand that concept. In what I have been giving as an example, BOTH
    process A and process B are slave processes. So there is no distinction.
    What I wanted the mapping transfer ability for is so that slave process A
    gets blocked on mmap() while slave process B gets blocked on accessing the
    file mapping (concurrently where B is working with the previous mapping).


    | Of course mmap() needs an open filedescriptor.

    At the time of the mmap() call. But at the time the memory is accessed,
    the descriptor can be closed.


    | In the OP you mentioned the method by which it should be passed from
    | parent to child.
    | As been pointed out, there should hardly be any difference between mmap()
    | before fork() or mmap() after fork(). The amount of work for the kernel
    | is the same, but you'll need one or two additional system calls to ask
    | the kernel to get them done. This overhead is probably neglectable
    | compared to the fork().

    There won't be any fork(). Process A will be long living, as will be process B.
    There might also be other processes, such as one to read directories to get
    file names, another to do the descriptor open, the third to do the mapping,
    the forth to access the mapping (quite likely to access it via a write() call
    using the mapped space as the data, writing it to another file).


    | If your program *needs* the result of a disk-read, in order to issue the
    | _next_ read, it *cannot* benefit from parallel reads and cache filling,
    | unless the footprint is small enough to prefetch (almost) everything into
    | the page cache. (parallel queries may benefit, though).

    Hopefully, madvise(,,MADV_SEQUENTIAL) will give the kernel a chance to read
    ahead for that file. That might include madvise(,,MADV_WILLNEED), too.
    Then madvise(,,MADV_DONTNEED) would be done for each part of the mapping that
    the process accessing it is done with.


    | Basically it just depends on the (sizeof footprint / sizeof pagecache).
    | Plus some other factors you did not specify.

    Factors I would be exploring to tune things to the optimal values.

    --
    |WARNING: Due to extreme spam, I no longer see any articles originating from |
    | Google Groups. If you want your postings to be seen by more readers |
    | you will need to find a different place to post on Usenet. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

  2. Re: sharing memory map between processes (same parent)

    phil-news-nospam@ipal.net writes:
    >On Thu, 01 May 2008 20:45:18 GMT Scott Lurndal wrote:
    >
    >| 1) For the size of the mapping, add page table entries (PTE) to the
    >| page tables for the process virtual address space. Note that the
    >| page tables are per-process. These PTE's will be marked not-present
    >| with a reference to the appropriate block in the backing file (or swap
    >| for anonymous mappings).
    >
    >How does it know where the "appropriate block" is? It can't make that
    >reference in terms of an open descriptor because the descriptor can be
    >closed after mmap(), and the file (or portion thereof) is still mapped.
    >So it would need to figure out what blocks are being mapped during the
    >mmap() call, or somehow make the reference in terms of the inode of the
    >file.


    Once you've called mmap for a given file descriptor, the incore inode
    will get an additional reference from the mapping. Thus, the close()
    which follows the mmap, simply releases the file descriptor, but because
    of the reference by the mapping, the file is not really closed; the
    inode remains in memory until unmap or exit.

    When a page fault occurs as a result of one of the not-present PTE's in
    the page table for the mapping, it will consult the in-core inode
    associated with the mapping to map the page into a sequence of blocks
    in the file (which may require multiple discontiguous disk reads to
    satisfy (e.g. if your filesystem uses blocks smaller than the page size).

    >
    >
    >| Note that the 'open' of the file preceeded the call to mmap, and this is
    >| the step that will load the on-disk inode into the inode cache in memory.
    >| Note however, that through the completion of the mmap, none of the backing
    >| file content has been loaded into memory, and it won't be until referenced.
    >
    >However, I suspect the blocks have been looked up.


    The inode contains the block mappings for the file (or more specifically,
    it is the root of the table/tree/object mapping real disk sectors to
    blocks of file data).

    >
    >
    >| So, whether the mapping is transferred to a second process, or the second
    >| process calls mmap, the exact same sequence of underlying operating system
    >| events needs to occur. Thus, there can be no performance benefit to being
    >| able to tranfer the mapping (aside from the relatively small 'open' overhead,
    >| which can be ameliorated by shipping the fd to the new process directly, then
    >| mmaping it).
    >
    >The sequence of finding the blocks to map could be overlapped with I/O being
    >done by that process that will soon get the mmapped memory passed to it.
    >Process B will read the file from the mmapped memory. For the very first
    >file, there would be no benefit. For subsequent files, there could be some
    >overlapping and maybe a speedup.


    Finding the blocks to map is a very fast, very efficient in-memory
    operation (depending on the level of indirection in the inode block
    mapping). The mapping is on-demand (e.g. at page fault time).

    There is no possibility for speedup here. You're looking for a chimera.


    >| As far as two processes mapping the same file, they'll be sharing the same
    >| pages of physical memory assuming MAP_SHARED; clearly the processor caches
    >| will only be warm if the both processes sharing the mapping are executing
    >| on the same core/socket.
    >
    >In my case, once process A transfers it to process B, process A no longer
    >needs it. If the mechanism that might have existed merely make a copy,
    >then process A would have to munmap().


    Regardless, since processes don't share page tables, the unmap and
    remap would need to occur. If you want the behaviour you've described
    in this paragraph you should eschew mmap and let the operating system
    file cache do it's business.

    >
    >| It's not clear that you (phil) realize that the call to mmap(2) itself doesn't
    >| cause any data to be read from the backing file/device, but rather the
    >| data is read when the mapped virtual address is referenced (absent any
    >| os initiated optimizations like read-ahead).
    >
    >I'm still not convinced. The mapping setup has to reference the file pages
    >in some way, and it cannot depend on the file remaining open since the file
    >descriptor mmap() used can be closed after mmap() is done, and that will not
    >affect the pages that were mapped.


    See above. You might want to poke around in some real kernels to get
    an idea of what is happening under the covers. Linux, BSD, OpenSolaris
    all do the same general things.

    scott

  3. Re: sharing memory map between processes (same parent)

    On Fri, 02 May 2008 17:47:04 GMT Scott Lurndal wrote:

    | Once you've called mmap for a given file descriptor, the incore inode
    | will get an additional reference from the mapping. Thus, the close()
    | which follows the mmap, simply releases the file descriptor, but because
    | of the reference by the mapping, the file is not really closed; the
    | inode remains in memory until unmap or exit.

    That would have to be tracked by the many pages since munmap can release
    parts of the mapping. But clearly there is a need to reference the inode
    else deletion of the file could result in disk blocks reused and what the
    process is working on gets corrupted.


    | When a page fault occurs as a result of one of the not-present PTE's in
    | the page table for the mapping, it will consult the in-core inode
    | associated with the mapping to map the page into a sequence of blocks
    | in the file (which may require multiple discontiguous disk reads to
    | satisfy (e.g. if your filesystem uses blocks smaller than the page size).

    So it maintains the full relation, not just a direct block relation?


    |>However, I suspect the blocks have been looked up.
    |
    | The inode contains the block mappings for the file (or more specifically,
    | it is the root of the table/tree/object mapping real disk sectors to
    | blocks of file data).

    You could still do it as I suggested. Of course, if it is done by means
    of reference to the inode and a deferred block lookup, then it would have
    show no advantage upon testing of the (non-existant) feature.


    |>| So, whether the mapping is transferred to a second process, or the second
    |>| process calls mmap, the exact same sequence of underlying operating system
    |>| events needs to occur. Thus, there can be no performance benefit to being
    |>| able to tranfer the mapping (aside from the relatively small 'open' overhead,
    |>| which can be ameliorated by shipping the fd to the new process directly, then
    |>| mmaping it).
    |>
    |>The sequence of finding the blocks to map could be overlapped with I/O being
    |>done by that process that will soon get the mmapped memory passed to it.
    |>Process B will read the file from the mmapped memory. For the very first
    |>file, there would be no benefit. For subsequent files, there could be some
    |>overlapping and maybe a speedup.
    |
    | Finding the blocks to map is a very fast, very efficient in-memory
    | operation (depending on the level of indirection in the inode block
    | mapping). The mapping is on-demand (e.g. at page fault time).
    |
    | There is no possibility for speedup here. You're looking for a chimera.

    There could have been had it been done another way.


    |>| As far as two processes mapping the same file, they'll be sharing the same
    |>| pages of physical memory assuming MAP_SHARED; clearly the processor caches
    |>| will only be warm if the both processes sharing the mapping are executing
    |>| on the same core/socket.
    |>
    |>In my case, once process A transfers it to process B, process A no longer
    |>needs it. If the mechanism that might have existed merely make a copy,
    |>then process A would have to munmap().
    |
    | Regardless, since processes don't share page tables, the unmap and
    | remap would need to occur. If you want the behaviour you've described
    | in this paragraph you should eschew mmap and let the operating system
    | file cache do it's business.

    Had there been a mapping transfer syscall, it could have done its thing by
    just relinking the appropriate structures from one VM to another, and thus
    not repeating all the work of the original mmap() call (which would be a
    benefit had that call pre-indexed the pages).

    --
    |WARNING: Due to extreme spam, I no longer see any articles originating from |
    | Google Groups. If you want your postings to be seen by more readers |
    | you will need to find a different place to post on Usenet. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

+ Reply to Thread
Page 3 of 3 FirstFirst 1 2 3