Faking dense files... - Linux

This is a discussion on Faking dense files... - Linux ; Hi all, I'm wondering if there is a simple and/or recommended way to create dense files on a Linux file system other than JFS. As data for a file starts coming in over a network connection, it would be desirable ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 25

Thread: Faking dense files...

  1. Faking dense files...

    Hi all,

    I'm wondering if there is a simple and/or recommended way to
    create dense files on a Linux file system other than JFS. As data for
    a file starts coming in over a network connection, it would be
    desirable to pre-allocate the entire file before writing to it.
    However, doing a seek() to, for example, the 3GByte mark of a new file
    and writing some data there does NOT reserve the blocks necessary for
    all of the bytes in between zero and 3GB (I realize this is for space
    efficiency on the drive, but it's not desirable in our case).

    One alternative would be to write a dummy byte every 4K (if blocks
    are 4K on the device in question) from zero to the expected end of the
    file. This would ensure that every block necessary for the file is
    reserved and has a single dummy byte written to it, but this still
    takes a good amount of time for large files.

    QUESTION: Without switching to JFS, is there a simpler, more
    efficient, more highly recommended way of creating dense files on a
    Linux file system? The main issue here is that the file in question is
    created and then pre-allocated as the first data packet comes in--while
    the pre-allocation is happening, the socket buffer (no matter how
    large) quickly fills & overflows. Is there a faster way of doing this?

    Thank you in advance for any thoughts, hints, tips, suggestions.

    Allan Stirrett.


  2. Re: Faking dense files...

    "astirret@tandbergtv.com" writes:

    > Hi all,
    >
    > I'm wondering if there is a simple and/or recommended way to
    > create dense files on a Linux file system other than JFS. As data for
    > a file starts coming in over a network connection, it would be
    > desirable to pre-allocate the entire file before writing to it.
    > However, doing a seek() to, for example, the 3GByte mark of a new file
    > and writing some data there does NOT reserve the blocks necessary for
    > all of the bytes in between zero and 3GB (I realize this is for space
    > efficiency on the drive, but it's not desirable in our case).
    >
    > One alternative would be to write a dummy byte every 4K (if blocks
    > are 4K on the device in question) from zero to the expected end of the
    > file. This would ensure that every block necessary for the file is
    > reserved and has a single dummy byte written to it, but this still
    > takes a good amount of time for large files.
    >
    > QUESTION: Without switching to JFS, is there a simpler, more
    > efficient, more highly recommended way of creating dense files on a
    > Linux file system? The main issue here is that the file in question is
    > created and then pre-allocated as the first data packet comes in--while
    > the pre-allocation is happening, the socket buffer (no matter how
    > large) quickly fills & overflows. Is there a faster way of doing
    > this?


    How about using a dedicated raw disk? I have to wonder, though, with
    the data rates you're talking about, whether the media write speed of
    a drive will be able to keep up....
    --
    Joseph J. Pfeiffer, Jr., Ph.D. Phone -- (505) 646-1605
    Department of Computer Science FAX -- (505) 646-1002
    New Mexico State University http://www.cs.nmsu.edu/~pfeiffer

  3. Re: Faking dense files...

    On 2007-01-08, astirret@tandbergtv.com wrote:
    > Hi all,
    >
    > I'm wondering if there is a simple and/or recommended way to
    > create dense files on a Linux file system other than JFS. As data for
    > a file starts coming in over a network connection, it would be
    > desirable to pre-allocate the entire file before writing to it.
    > However, doing a seek() to, for example, the 3GByte mark of a new file
    > and writing some data there does NOT reserve the blocks necessary for
    > all of the bytes in between zero and 3GB (I realize this is for space
    > efficiency on the drive, but it's not desirable in our case).


    If you want speed write to a raw partition,


    if you want a 3Gi file FULL of zeros use the command

    dd if=/dev/zero of=yourfilename bs=1M count=3072

    > One alternative would be to write a dummy byte every 4K (if blocks
    > are 4K on the device in question) from zero to the expected end of the
    > file. This would ensure that every block necessary for the file is
    > reserved and has a single dummy byte written to it, but this still
    > takes a good amount of time for large files.


    it's unlikely to be significantly faster than writing zeros to the whole file.

    for security reasons linux erases the blocks as they are written so
    pre-allocating and then writing is going to take twice as long as just
    writing once.

    or you could try using fat32...

    > QUESTION: Without switching to JFS, is there a simpler, more
    > efficient, more highly recommended way of creating dense files on a
    > Linux file system? The main issue here is that the file in question is
    > created and then pre-allocated as the first data packet comes in--while
    > the pre-allocation is happening, the socket buffer (no matter how
    > large) quickly fills & overflows. Is there a faster way of doing this?


    make your own fielsystem?



    --

    Bye.
    Jasen

  4. Re: Faking dense files...

    "astirret@tandbergtv.com" writes:

    [...]

    > QUESTION: Without switching to JFS, is there a simpler, more
    > efficient, more highly recommended way of creating dense files on a
    > Linux file system? The main issue here is that the file in question is
    > created and then pre-allocated as the first data packet comes in--while
    > the pre-allocation is happening, the socket buffer (no matter how
    > large) quickly fills & overflows. Is there a faster way of doing
    > this?


    Not doing preallocation? What is that supposed to be useful for?

  5. Re: Faking dense files...

    On 2007-01-08, astirret@tandbergtv.com wrote:
    > Hi all,
    >
    > I'm wondering if there is a simple and/or recommended way to
    > create dense files on a Linux file system other than JFS. As data for
    > a file starts coming in over a network connection, it would be
    > desirable to pre-allocate the entire file before writing to it.
    > However, doing a seek() to, for example, the 3GByte mark of a new file
    > and writing some data there does NOT reserve the blocks necessary for
    > all of the bytes in between zero and 3GB (I realize this is for space
    > efficiency on the drive, but it's not desirable in our case).


    How about letting the OS take care of OS stuff and having the application do
    application stuff?

    Mmap() is probably the most efficient way to write files. but with files in
    the range of 3G you're probably going to need a 64-bit linux to do it in a
    single chunk. AIUI if the mmap succeds there's room for the file you've
    created, also writes to the disk will automatic, transparent, and fully
    asynchronous.

    Bye.
    Jasen

  6. Re: Faking dense files...

    jasen wrote:
    > How about letting the OS take care of OS stuff and having the application do
    > application stuff?
    >
    > Mmap() is probably the most efficient way to write files. but with files in
    > the range of 3G you're probably going to need a 64-bit linux to do it in a
    > single chunk. AIUI if the mmap succeds there's room for the file you've
    > created, also writes to the disk will automatic, transparent, and fully
    > asynchronous.
    >
    > Bye.
    > Jasen


    Thanks to all who have replied to this query. From what I'm seeing,
    here's a bit of explanation to clear things up.

    1) Our customers or partners determine the file system upon which they
    will run our app, so making our own or choosing one specific FS won't
    work (it would be nice, but...)

    2) Our app could be receiving files as high as 8 GB or more, so 64-bit
    support is a definite requirement. Equally important is a fast
    pre-allocation scheme for files this large.

    3) The main impetus behind pre-allocation is two-fold: contiguous
    file space for efficient playback, but more importantly ensuring that
    another app doesn't eat the free disk space we saw when the file
    started coming in (some of our target platforms are tight on disk, and
    we're not the only swimmer in the pool). Example:
    - 8GB file starts coming in to our app, and there is 10 GB free on
    disk
    - while the 8 GB arrives, another app creates and fills a 3 GB file
    (legal in a sparse FS).
    - by the time we get to the 7 GB mark of OUR file, we hit disk
    full.
    - as you can see, the current sparse file pre-allocation doesn't
    help us, hence this post.

    I'm looking for a semi- to fully-portable trick to quickly pre-allocate
    a large file, basically just asking the OS to reserve N contiguous
    blocks for this new file, without actually writing anything to the
    file. I've run some tests doing 4K writes vs. writing 1 byte every 4K,
    and, as was mentioned, there's not much difference, and both take a
    LONG time considering data for the file is still incoming.

    Without playing down at the raw disk block level, I'm getting the
    feeling there might not be a "solid" solution to this. Thanks again
    for your help, but if this information makes you think of anything
    else, your input is VERY welcome.

    Allan Stirrett.


  7. Re: Faking dense files...

    If you can't decide the file system, the customer can't hope that you
    change the specs of the one he installed.

    -Michael

  8. Re: Faking dense files...

    "Allan Stirrett" writes:

    [...]

    > 2) Our app could be receiving files as high as 8 GB or more, so 64-bit
    > support is a definite requirement. Equally important is a fast
    > pre-allocation scheme for files this large.
    >
    > 3) The main impetus behind pre-allocation is two-fold: contiguous
    > file space for efficient playback,


    Doesn't work this way.

    > but more importantly ensuring that
    > another app doesn't eat the free disk space we saw when the file
    > started coming in (some of our target platforms are tight on disk, and
    > we're not the only swimmer in the pool). Example:
    > - 8GB file starts coming in to our app, and there is 10 GB free on
    > disk
    > - while the 8 GB arrives, another app creates and fills a 3 GB file
    > (legal in a sparse FS).
    > - by the time we get to the 7 GB mark of OUR file, we hit disk
    > full.


    If separate applications try to store more data on a disk than free
    space is available, some or all of them will necessarily fail to store
    what they wanted to store.

  9. Re: Faking dense files...

    In comp.os.linux.embedded Rainer Weikusat wrote:

    |> but more importantly ensuring that
    |> another app doesn't eat the free disk space we saw when the file
    |> started coming in (some of our target platforms are tight on disk, and
    |> we're not the only swimmer in the pool). Example:
    |> - 8GB file starts coming in to our app, and there is 10 GB free on
    |> disk
    |> - while the 8 GB arrives, another app creates and fills a 3 GB file
    |> (legal in a sparse FS).
    |> - by the time we get to the 7 GB mark of OUR file, we hit disk
    |> full.
    |
    | If separate applications try to store more data on a disk than free
    | space is available, some or all of them will necessarily fail to store
    | what they wanted to store.

    I think he wants to be able to force the dense allocation as a way to
    verify that the space is indeed available and cannot become unavailable.
    The 2nd process he mentioned that would come along and grab 3GB would be
    the one to fail, and it should fail before the 3GB it gets from somewhere
    is actually sent. The sender needs to know if the space is _committed_
    before sending, is my guess. The trouble is, the process doing this is
    going to have to literally write all these blocks, so the answer to the
    question "can you commit to this much space?" is going to take a while
    to answer for larger amounts of space.

    --
    |---------------------------------------/----------------------------------|
    | Phil Howard KA9WGN (ka9wgn.ham.org) / Do not send to the address below |
    | first name lower case at ipal.net / spamtrap-2007-01-10-1708@ipal.net |
    |------------------------------------/-------------------------------------|

  10. Re: Faking dense files...

    Allan Stirrett wrote:

    > - 8GB file starts coming in to our app, and there is 10 GB free on
    > disk
    > - while the 8 GB arrives, another app creates and fills a 3 GB file
    > (legal in a sparse FS).
    > - by the time we get to the 7 GB mark of OUR file, we hit disk
    > full.
    > - as you can see, the current sparse file pre-allocation doesn't
    > help us, hence this post.


    Will be not better to make the application able to maintain the information
    about the fail, and get the user an option to resume the download or cancel
    it and delete the part already written?

    --
    Salu2

  11. Re: Faking dense files...

    > ensuring that
    > another app doesn't eat the free disk space we saw when the file


    I suppose there are file systems that allow for restricting the space
    available for users/groups. IMHO this would be the correct way to ensure
    that an application (running in the appropriate user account) always can
    get a defined minimum of the disk space.

    -Michael

  12. Re: Faking dense files...

    phil-news-nospam@ipal.net writes:
    > In comp.os.linux.embedded Rainer Weikusat wrote:
    >
    > |> but more importantly ensuring that
    > |> another app doesn't eat the free disk space we saw when the file
    > |> started coming in (some of our target platforms are tight on disk, and
    > |> we're not the only swimmer in the pool). Example:
    > |> - 8GB file starts coming in to our app, and there is 10 GB free on
    > |> disk
    > |> - while the 8 GB arrives, another app creates and fills a 3 GB file
    > |> (legal in a sparse FS).
    > |> - by the time we get to the 7 GB mark of OUR file, we hit disk
    > |> full.
    > |
    > | If separate applications try to store more data on a disk than free
    > | space is available, some or all of them will necessarily fail to store
    > | what they wanted to store.
    >
    > I think he wants to be able to force the dense allocation as a way to
    > verify that the space is indeed available and cannot become unavailable.
    > The 2nd process he mentioned that would come along and grab 3GB would be
    > the one to fail, and it should fail before the 3GB it gets from somewhere
    > is actually sent. The sender needs to know if the space is _committed_
    > before sending, is my guess.


    I understand what he would like to do, but that doesn't change the
    fact that this is basically a policy-descision made by someone using
    the respective applications. That someone needs to have enough disk
    space available for all concurrently running tasks and if there isn't
    enough disk space available, some or all of them will fail and the
    solution is to either not have different applications compete for
    resources that are scarce enough that there will be a 'loser' and a
    'winner' or to accept that this will happen.



  13. Re: Faking dense files...

    Michael Schnell wrote:
    > > ensuring that
    > > another app doesn't eat the free disk space we saw when the file

    >
    > I suppose there are file systems that allow for restricting the space
    > available for users/groups. IMHO this would be the correct way to ensure
    > that an application (running in the appropriate user account) always can
    > get a defined minimum of the disk space.
    >
    > -Michael


    Another option might be to create a partition of a given size
    specifically for our app, to restrict our app and other apps in that
    way, though this would not be changeable after machine setup.

    However, given both options, we would prefer to handle the available
    space issue within our app, without outside requirements. Thank you
    for the suggestion.


  14. Re: Faking dense files...


    Julián Albo wrote:
    > Allan Stirrett wrote:
    >
    > > - 8GB file starts coming in to our app, and there is 10 GB free on
    > > disk
    > > - while the 8 GB arrives, another app creates and fills a 3 GB file
    > > (legal in a sparse FS).
    > > - by the time we get to the 7 GB mark of OUR file, we hit disk
    > > full.
    > > - as you can see, the current sparse file pre-allocation doesn't
    > > help us, hence this post.

    >
    > Will be not better to make the application able to maintain the information
    > about the fail, and get the user an option to resume the download or cancel
    > it and delete the part already written?
    >
    > --
    > Salu2


    Since our app runs as a daemon, user interaction cannot be relied upon
    in that way. Our best bet is to inform the user/operator BEFORE the
    reception that it's not possible, rather than after the failure.


  15. Re: Faking dense files...

    "Allan Stirrett" writes:
    > Michael Schnell wrote:
    >> > ensuring that
    >> > another app doesn't eat the free disk space we saw when the file

    >>
    >> I suppose there are file systems that allow for restricting the space
    >> available for users/groups. IMHO this would be the correct way to ensure
    >> that an application (running in the appropriate user account) always can
    >> get a defined minimum of the disk space.
    >>
    >> -Michael

    >
    > Another option might be to create a partition of a given size
    > specifically for our app, to restrict our app and other apps in that
    > way, though this would not be changeable after machine setup.
    >
    > However, given both options, we would prefer to handle the available
    > space issue within our app, without outside requirements.


    Just because you really want the river to flow upstream does not mean
    it is going to happen.

  16. Re: Faking dense files...

    Rainer Weikusat wrote:
    > phil-news-nospam@ipal.net writes:
    > > In comp.os.linux.embedded Rainer Weikusat wrote:
    > >
    > > |> but more importantly ensuring that
    > > |> another app doesn't eat the free disk space we saw when the file
    > > |> started coming in (some of our target platforms are tight on disk, and
    > > |> we're not the only swimmer in the pool). Example:
    > > |> - 8GB file starts coming in to our app, and there is 10 GB free on
    > > |> disk
    > > |> - while the 8 GB arrives, another app creates and fills a 3 GB file
    > > |> (legal in a sparse FS).
    > > |> - by the time we get to the 7 GB mark of OUR file, we hit disk
    > > |> full.
    > > |
    > > | If separate applications try to store more data on a disk than free
    > > | space is available, some or all of them will necessarily fail to store
    > > | what they wanted to store.
    > >
    > > I think he wants to be able to force the dense allocation as a way to
    > > verify that the space is indeed available and cannot become unavailable.
    > > The 2nd process he mentioned that would come along and grab 3GB would be
    > > the one to fail, and it should fail before the 3GB it gets from somewhere
    > > is actually sent. The sender needs to know if the space is _committed_
    > > before sending, is my guess.

    >
    > I understand what he would like to do, but that doesn't change the
    > fact that this is basically a policy-descision made by someone using
    > the respective applications. That someone needs to have enough disk
    > space available for all concurrently running tasks and if there isn't
    > enough disk space available, some or all of them will fail and the
    > solution is to either not have different applications compete for
    > resources that are scarce enough that there will be a 'loser' and a
    > 'winner' or to accept that this will happen.


    In our case, we want to know that we can complete the reception BEFORE
    starting, rather than at the failure point. Depending on bit rate
    selected and size of the file being sent, it could be half a day before
    the transmission is complete, and to find out THEN that someone else
    used up our disk space is not a good thing.

    If the disk space is available at the start, we should succeed, no
    matter how long it takes and what else is running on the machine. If
    space is not available, we should log a message to that effect and
    don't start reception. Does that make sense?

    To that end, we need a method to reliably and portably have the file
    system reserve those blocks for us when we START. However, I'm getting
    the impression that currently it can't be done in a timely-enough
    fashion for files as large as we're working with.


  17. Re: Faking dense files...

    Allan Stirrett writes:
    > Another option might be to create a partition of a given size
    > specifically for our app, to restrict our app and other apps in that way,
    > though this would not be changeable after machine setup.


    Consider quotas.
    --
    John Hasler
    john@dhh.gt.org
    Dancing Horse Hill
    Elmwood, WI USA

  18. Re: Faking dense files...

    In comp.os.linux.embedded Rainer Weikusat wrote:
    | phil-news-nospam@ipal.net writes:
    |> In comp.os.linux.embedded Rainer Weikusat wrote:
    |>
    |> |> but more importantly ensuring that
    |> |> another app doesn't eat the free disk space we saw when the file
    |> |> started coming in (some of our target platforms are tight on disk, and
    |> |> we're not the only swimmer in the pool). Example:
    |> |> - 8GB file starts coming in to our app, and there is 10 GB free on
    |> |> disk
    |> |> - while the 8 GB arrives, another app creates and fills a 3 GB file
    |> |> (legal in a sparse FS).
    |> |> - by the time we get to the 7 GB mark of OUR file, we hit disk
    |> |> full.
    |> |
    |> | If separate applications try to store more data on a disk than free
    |> | space is available, some or all of them will necessarily fail to store
    |> | what they wanted to store.
    |>
    |> I think he wants to be able to force the dense allocation as a way to
    |> verify that the space is indeed available and cannot become unavailable.
    |> The 2nd process he mentioned that would come along and grab 3GB would be
    |> the one to fail, and it should fail before the 3GB it gets from somewhere
    |> is actually sent. The sender needs to know if the space is _committed_
    |> before sending, is my guess.
    |
    | I understand what he would like to do, but that doesn't change the
    | fact that this is basically a policy-descision made by someone using
    | the respective applications. That someone needs to have enough disk
    | space available for all concurrently running tasks and if there isn't
    | enough disk space available, some or all of them will fail and the
    | solution is to either not have different applications compete for
    | resources that are scarce enough that there will be a 'loser' and a
    | 'winner' or to accept that this will happen.

    Competing for resource is not an uncommon issue. But such competition
    does operate more gracefully if the latter tasks can at least know BEFORE
    they take certain steps, that the space is committed to them. Perhaps
    those steps are irreversable once started. In such a case a way to deal
    with the resource competition is to try to get it and at least know if
    it cannot get it right now, perhaps to try again later.

    There's a deadlock possibility if this isn't done. Suppose some other
    process will take the arriving files, process them in some way, and
    remove them when done. But that process cannot operate unless and until
    the file is completely available (for example listing the contents of a
    PKZIP file, archiving the list, and removing the zip file). Suppose there
    is 10 GB of space to work with to stage the files as they arrive over the
    network. If the logic of the receiving daemon were to just pause when the
    space is full, waiting for it to empty out, this will produce a deadlock
    when 2 very large files together exceed the space (say 2 files of 6GB that
    have transmitted 5GB so far). There is no way out but to abort at least
    one of these. And if the nature of the sender is that it is committed
    once sending starts, that can mean loss of data. The way around that is
    to get a space commitment and if the commitment fails, try later or be put
    in a queue.

    --
    |---------------------------------------/----------------------------------|
    | Phil Howard KA9WGN (ka9wgn.ham.org) / Do not send to the address below |
    | first name lower case at ipal.net / spamtrap-2007-01-11-1407@ipal.net |
    |------------------------------------/-------------------------------------|

  19. Re: Faking dense files...

    Policies are changeable at any time.

    If your app has the appropriate rights, it _can_ change the policies.
    E.g. all other apps are executed from users belonging to a certain
    group, your app is executed from a user belonging to another group. Now
    (given an appropriate file system) your app can change the combined size
    policy of that group to the disk size minus the size needed additionally
    to what already is there.

    -Michael

  20. Re: Faking dense files...

    In comp.os.linux.embedded Allan Stirrett wrote:

    | To that end, we need a method to reliably and portably have the file
    | system reserve those blocks for us when we START. However, I'm getting
    | the impression that currently it can't be done in a timely-enough
    | fashion for files as large as we're working with.

    Writing out commit blocks to disk is certainly not instant, but it is
    a lot faster than many networks. Just write some zeros with dd and time
    it to see how long.

    Another alternative is to sufficient partition the disk and ensure that
    only a certain program is allowed to write in that space. That program
    then manages the space numerically to reserve space rather than using the
    filesystem to do so. It will need to also know when the space is released
    so it can increment the available number back up.

    --
    |---------------------------------------/----------------------------------|
    | Phil Howard KA9WGN (ka9wgn.ham.org) / Do not send to the address below |
    | first name lower case at ipal.net / spamtrap-2007-01-11-1419@ipal.net |
    |------------------------------------/-------------------------------------|

+ Reply to Thread
Page 1 of 2 1 2 LastLast