mmap to read a huge file - Linux

This is a discussion on mmap to read a huge file - Linux ; Hi, I recently bought a Athlon64, so I installed an 64 bits distro and tried to "mmap" a big file (more than 4 GB) and read it via a pointer. As I read the file, the memory usage for this ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: mmap to read a huge file

  1. mmap to read a huge file

    Hi,

    I recently bought a Athlon64, so I installed an 64 bits distro and tried
    to "mmap" a big file (more than 4 GB) and read it via a pointer.

    As I read the file, the memory usage for this process grows up to 800MB
    in "top" (the virtual memory is about the size of the file), and my
    computer becomes unresponsive. Much more than if I do "cat hugefile
    > /dev/null".

    I would expect that the kernel would use much less memory for the buffers
    (well, a reasonable amount).

    So my code is mainly:

    - open(argv[1], O_RDONLY | O_DIRECT)
    - mmap(0, infos.st_size, PROT_READ | PROT_EXEC, PROT_READ | PROT_EXEC, fd,
    0)

    Do you have an idea? Thank you for your attention


  2. Re: mmap to read a huge file

    Roger That writes:

    > Hi,
    >
    > I recently bought a Athlon64, so I installed an 64 bits distro and tried
    > to "mmap" a big file (more than 4 GB) and read it via a pointer.
    >
    > As I read the file, the memory usage for this process grows up to 800MB
    > in "top" (the virtual memory is about the size of the file), and my
    > computer becomes unresponsive. Much more than if I do "cat hugefile
    >> /dev/null".

    > I would expect that the kernel would use much less memory for the buffers
    > (well, a reasonable amount).
    >
    > So my code is mainly:
    >
    > - open(argv[1], O_RDONLY | O_DIRECT)
    > - mmap(0, infos.st_size, PROT_READ | PROT_EXEC, PROT_READ | PROT_EXEC, fd,
    > 0)


    Specifying PROT_EXEC is a bad idea, unless the file really contains
    machine code that you intend to execute.

    > Do you have an idea? Thank you for your attention


    You could try giving the kernel some hints with madvise().

    --
    Måns Rullgård
    mru@inprovide.com

  3. Re: mmap to read a huge file


    Roger That wrote:

    > As I read the file, the memory usage for this process grows up to 800MB
    > in "top" (the virtual memory is about the size of the file), and my
    > computer becomes unresponsive. Much more than if I do "cat hugefile
    > > /dev/null".

    > I would expect that the kernel would use much less memory for the buffers
    > (well, a reasonable amount).
    >
    > So my code is mainly:
    >
    > - open(argv[1], O_RDONLY | O_DIRECT)
    > - mmap(0, infos.st_size, PROT_READ | PROT_EXEC, PROT_READ | PROT_EXEC, fd,
    > 0)
    >
    > Do you have an idea? Thank you for your attention


    Can you test without O_DIRECT? It would be helpful to know if this is
    an issue with O_DIRECT or an issue with the mapping even if done the
    more common way.

    DS


  4. Re: mmap to read a huge file

    Roger That wrote:
    > - mmap(0, infos.st_size, PROT_READ | PROT_EXEC, PROT_READ | PROT_EXEC, fd,
    > 0)


    Here you are incorrectly specifying PROT_READ | PROT_EXEC for both the
    "prot" and the "flags" parameter. For "flags" you must use the MAP_*
    flags like MAP_SHARED or MAP_PRIVATE.


  5. Re: mmap to read a huge file

    Kaz Kylheku wrote:

    > Roger That wrote:
    >> - mmap(0, infos.st_size, PROT_READ | PROT_EXEC, PROT_READ | PROT_EXEC,
    >> fd, 0)

    >
    > Here you are incorrectly specifying PROT_READ | PROT_EXEC for both the
    > "prot" and the "flags" parameter. For "flags" you must use the MAP_*
    > flags like MAP_SHARED or MAP_PRIVATE.


    i'm sorry, I should have re-reade my message

    It was:

    mmap(0, infos.st_size, PROT_READ | PROT_EXEC, MAP_PRIVATE | MAP_DENYWRITE,
    fd, 0));

  6. Re: mmap to read a huge file

    Måns Rullgård wrote:

    > Specifying PROT_EXEC is a bad idea, unless the file really contains
    > machine code that you intend to execute.


    OK, I removed this one, and I remove the O_DIRECT in "open" as David
    Schwartz suggested

    So my flags are:
    - fopen: O_RDONLY
    - mmap: PROT_READ, MAP_PRIVATE | MAP_DENYWRITE

    > You could try giving the kernel some hints with madvise().


    As it did not change anything, I tried to use madvise() with the flag
    MADV_SEQUENTIAL, but I get the same result.

    I forgot to give the version of the kernel:

    Linux hobbes 2.6.15-26-amd64-k8 #1 SMP PREEMPT Thu Aug 3 03:11:38 UTC 2006
    x86_64 GNU/Linux.

    Thank you for your answers.

    Here is the whole program (well, I think there is not anything interesting
    to read):

    --- 8< --- 8< --- 8< --- 8< ---

    #include

    #include
    #include
    #include
    #include
    #include

    #include

    #include

    int main(int argc, char *argv[])
    {
    if (argc!=3)
    {
    std::cerr << "Syntax mmap filename text\n";
    return 1;
    }

    int fd = open(argv[1], O_RDONLY);
    if (fd==-1)
    {
    std::cerr << "Could not open '" << argv[1] << "'\n";
    return 2;
    }

    struct stat infos;
    if (fstat(fd, &infos)!=0)
    {
    std::cerr << "Could not fstat\n";
    return 3;
    }


    char *pdata = reinterpret_cast(mmap(0,
    infos.st_size,
    PROT_READ,
    MAP_PRIVATE | MAP_DENYWRITE,
    fd,
    0));
    if (pdata==0)
    {
    std::cerr << "Could not mmap\n";
    return 4;
    }

    if (madvise(pdata, infos.st_size, MADV_SEQUENTIAL)!=0)
    {
    std::cerr << "Could not madvise\n";
    return 4;
    }

    char *pdata_end = pdata+infos.st_size;

    // count the number of NUL bytes in the file
    std::cout << std::count(pdata, pdata_end, '\0') << std::endl;

    return 0;
    }





+ Reply to Thread