How to store big files contiguously on hd - Unix

This is a discussion on How to store big files contiguously on hd - Unix ; Hi, I would like to know if there are a mean to store big files contiguously on hd with linux. I don't care about the hd space loss. I don't care about the time to write the big files on ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 26

Thread: How to store big files contiguously on hd

  1. How to store big files contiguously on hd

    Hi,

    I would like to know if there are a mean to store big files
    contiguously on hd with linux.
    I don't care about the hd space loss.
    I don't care about the time to write the big files on the hd.
    But, I DO CARE about the time to READ the big files.

    For example, If I have 12 big files, 2 Go max, to place on my harddisk,
    one solution would
    be to allocate 24 Go on my hard disk, and associate 2 contiguous Go on
    the disk for each file.

    But I want to use a FILESYSTEM, I want to use fopen/fclose/fread/fwrite
    to MANAGE this
    big files. I DON'T want to use a raw hard disk with read/write sectors
    functions only.

    I hope I make myself clear.

    And I hope someone has the answer to my response.

    Thanks in advance.

    Cyrille CHEVROT


  2. Re: How to store big files contiguously on hd

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1


    chevrot79@yahoo.fr wrote:
    > Hi,
    >
    > I would like to know if there are a mean to store big files
    > contiguously on hd with linux.


    Yes, there is. But, not within a standard filesystem.

    [snip]

    > But I want to use a FILESYSTEM, I want to use fopen/fclose/fread/fwrite
    > to MANAGE this big files.


    First off, you have an invalid assumption in the above statement. You
    assume that fopen() et al can only be used against files in a formatted
    filesystem. This is incorrect.

    >I DON'T want to use a raw hard disk with read/write sectors functions only.


    Again, an invalid assumption (although less obvious). You assume that
    raw disk access must be performed using some sort of "sector
    read/write" function only.

    None of the filesystem formats that I am familiar with permit you to
    control whether or not specific files are allocated to contigious
    blocks on the media. In fact, AFAIK, they don't have any control for
    that at all.

    So, to get contigious blocks, you are going to have to read/write from
    something that doesn't have a filesystem on it, and manage the space
    yourself.

    All the standard file operations (open()/read()/write()/lseek()/close()
    and fopen()/fread()/fwrite()/fseek()/fclose()) work when you use them
    against a file in the system, whether the file resides in a formatted
    filesystem or not.

    So, allocate yourself a partition on your hard drive, and fopen() etc
    the device file for that partition. You have direct access to
    everything in that partition, and can control exactly where data goes.
    fread()/fwrite()/fseek()/read()/write()/lseek() work sequentially from
    the beginning of the partition, so you have the guarantee of contigious
    placement of data

    example to write four contigious 1k blocks of binary 0 to a "file" :
    {
    FILE *raw;
    int count;
    char buff[1024] = { 0 };

    if ((raw = fopen("/dev/fd0","r+b")) == NULL) exit(1);
    for (count = 0; count < 4; ++count)
    fwrite(buff, sizeof buff, 1, raw);
    fclose(raw);
    }

    HTH
    - --
    Lew Pitcher

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.3 (MingW32) - WinPT 0.11.12

    iD8DBQFFAXuJagVFX4UWr64RAis+AKDqAfpzbQNm7ClUouz41l SttygmpACgrgNm
    gDAKtfRQ5cqSh2AsCHh2bSU=
    =e++Y
    -----END PGP SIGNATURE-----


  3. Re: How to store big files contiguously on hd

    > > I would like to know if there are a mean to store big files
    > > contiguously on hd with linux.

    >
    > Yes, there is. But, not within a standard filesystem.


    And with non standard filesystem ?


    > example to write four contigious 1k blocks of binary 0 to a "file" :
    > {
    > FILE *raw;
    > int count;
    > char buff[1024] = { 0 };
    >
    > if ((raw = fopen("/dev/fd0","r+b")) == NULL) exit(1);
    > for (count = 0; count < 4; ++count)
    > fwrite(buff, sizeof buff, 1, raw);
    > fclose(raw);
    > }
    >


    This is an interessant solution, however I have two questions :

    - What about bad blocks ? In fact, I mainly want a filesystem to MANAGE
    BAD BLOCKS. Does this solution manage them ?

    - What about 512 bytes alignment constraints ? Are we oblige to manage
    them "manually" or not ?

    We currently use a very similar solution under Windows using fileread()
    but we have problems with this two points.

    Thanks in advance,

    --
    Cyrille CHEVROT


  4. Re: How to store big files contiguously on hd

    On 8 Sep 2006 09:14:55 -0700
    "chevrot79@yahoo.fr" wrote:

    > > > I would like to know if there are a mean to store big files
    > > > contiguously on hd with linux.


    Why do they need to be contiguous?

    > > Yes, there is. But, not within a standard filesystem.

    >
    > And with non standard filesystem ?
    >
    >
    > > example to write four contigious 1k blocks of binary 0 to a "file" :
    > > {
    > > FILE *raw;
    > > int count;
    > > char buff[1024] = { 0 };
    > >
    > > if ((raw = fopen("/dev/fd0","r+b")) == NULL) exit(1);
    > > for (count = 0; count < 4; ++count)
    > > fwrite(buff, sizeof buff, 1, raw);
    > > fclose(raw);
    > > }
    > >

    >
    > This is an interessant solution, however I have two questions :
    >
    > - What about bad blocks ? In fact, I mainly want a filesystem to
    > MANAGE BAD BLOCKS. Does this solution manage them ?


    Filesystems managing bad blocks - what century are you living in? Bad
    blocks are handled at the device level, not the file system (or should
    be). SCSI disks give access to the primary defect list (established by
    the manufacturer) and the grown defect list. The latter should be
    empty, or else the disk is on its way to disk heaven and should be
    replaced. IDE/ATA disks do their own defect management, and you'd
    typically use SMART to know when a disk is dying.

    If you're concerned about data integrity, use MD5 or SHA1 hashes to
    verify the contents of the raw slice/partition.

    > - What about 512 bytes alignment constraints ? Are we oblige to manage
    > them "manually" or not ?


    On Unix systems, raw devices are streams of bytes. The block size
    of the underlying hardware doesn't matter. Obviously, when it comes to
    performance, using a a fairly large buffer (8192 bytes, for example)
    would give the best results. Notice that this is caused by buffer
    optimisation, not the geometry of the disk. Both SCSI and ATA disks are
    addressed by block number, and the drive firmware knows where to get
    each block. For example, blocks that have been marked bad during
    manufacturing result in head movement even though the block numbers are
    sequential.

    In other words, what you try to do is not possible. Once you're on
    RAID and SAN type storage, you haven't got a clue where the data is
    stored anyway - the devices the computer sees are synthetic.

    > We currently use a very similar solution under Windows using fileread
    > () but we have problems with this two points.


    I would be amazed that you have bad blocks on modern disks, even with
    Windows

    Don't forget that with the technique above, you can only have ONE file
    on a raw partition - there's no support for multiple files,
    directories, or anything else.

    What exactly are you trying to achieve?

    --
    Stefaan A Eeckels
    --
    How's it supposed to get the respect of management if you've got just
    one guy working on the project? It's much more impressive to have a
    battery of programmers slaving away. -- Jeffrey Hobbs (comp.lang.tcl)

  5. Re: How to store big files contiguously on hd


    Stefaan A Eeckels a écrit :

    > On 8 Sep 2006 09:14:55 -0700
    > "chevrot79@yahoo.fr" wrote:
    >
    > > > > I would like to know if there are a mean to store big files
    > > > > contiguously on hd with linux.

    >
    > Why do they need to be contiguous?


    For better transfer rate when reading.
    The big files will be place once on the hard disk and then they will be
    transfer out of the hard disk many times. The transfer rate must be the
    best.

    > > - What about bad blocks ? In fact, I mainly want a filesystem to
    > > MANAGE BAD BLOCKS. Does this solution manage them ?

    >
    > Filesystems managing bad blocks - what century are you living in? Bad
    > blocks are handled at the device level, not the file system (or should
    > be). SCSI disks give access to the primary defect list (established by
    > the manufacturer) and the grown defect list. The latter should be
    > empty, or else the disk is on its way to disk heaven and should be
    > replaced. IDE/ATA disks do their own defect management, and you'd
    > typically use SMART to know when a disk is dying.
    >
    > If you're concerned about data integrity, use MD5 or SHA1 hashes to
    > verify the contents of the raw slice/partition.


    If you prefer I want a filesystem which is able to manage the defect
    list of bad sectors (I think bad sectors is more accurate than bad
    block). Indeed, even if the device CREATES the defect list, someone
    (the device driver) need to USE this defect list. And I don't want to
    bury my hard disk when the first bad sector appear !
    Scandisk is not a XIX century software AFAIK ?


    > > - What about 512 bytes alignment constraints ? Are we oblige to manage
    > > them "manually" or not ?

    >
    > On Unix systems, raw devices are streams of bytes. The block size
    > of the underlying hardware doesn't matter. Obviously, when it comes to
    > performance, using a a fairly large buffer (8192 bytes, for example)
    > would give the best results.

    This is good news

    > Notice that this is caused by buffer
    > optimisation, not the geometry of the disk. Both SCSI and ATA disks are
    > addressed by block number, and the drive firmware knows where to get
    > each block. For example, blocks that have been marked bad during
    > manufacturing result in head movement even though the block numbers are
    > sequential.
    >
    > In other words, what you try to do is not possible. Once you're on
    > RAID and SAN type storage, you haven't got a clue where the data is
    > stored anyway - the devices the computer sees are synthetic.


    I just want the best transfer rate. What is THE solution.
    Tell me if I'm wrong, but a fragmented file on hard disk will be always
    slower to read than a "contiguous" file, even if contiguous is not
    really contiguous.

    > > We currently use a very similar solution under Windows using fileread
    > > () but we have problems with this two points.

    >
    > I would be amazed that you have bad blocks on modern disks, even with
    > Windows

    This is really interesting.
    Have you some stats on that subject ?
    Indeed, our big files are data files, not executable. So, if bad
    sectors are very infrequents,the solution may be : not manage them.
    But to do this I mustn't stall on a bad sector reading. Is it the case
    ?

    > Don't forget that with the technique above, you can only have ONE file
    > on a raw partition - there's no support for multiple files,
    > directories, or anything else.

    I know that is the problem.
    That's why, I have said above I want filesystem, I want fopen(). Indeed
    if I have two files, I want two fopen(), and I don't want one fopen()
    and one fseek(). The latter solution is not cute and hard to manage.
    That's why I still don't have THE solution.

    >
    > What exactly are you trying to achieve?

    See at the beginning of this post.

    --
    Cyrille CHEVROT


  6. Re: How to store big files contiguously on hd

    On 8 Sep 2006 10:57:50 -0700
    "chevrot79@yahoo.fr" wrote:

    >
    > Stefaan A Eeckels a écrit :
    >
    > > On 8 Sep 2006 09:14:55 -0700
    > > "chevrot79@yahoo.fr" wrote:
    > >
    > > > > > I would like to know if there are a mean to store big files
    > > > > > contiguously on hd with linux.

    > >
    > > Why do they need to be contiguous?

    >
    > For better transfer rate when reading.
    > The big files will be place once on the hard disk and then they will
    > be transfer out of the hard disk many times. The transfer rate must
    > be the best.


    Reading? Get a RAID 5 setup. It'll suck at writing, but you get
    impressive read speeds and you can lose a disk without data loss.

    > > > - What about bad blocks ? In fact, I mainly want a filesystem to
    > > > MANAGE BAD BLOCKS. Does this solution manage them ?

    > >
    > > Filesystems managing bad blocks - what century are you living in?
    > > Bad blocks are handled at the device level, not the file system (or
    > > should be). SCSI disks give access to the primary defect list
    > > (established by the manufacturer) and the grown defect list. The
    > > latter should be empty, or else the disk is on its way to disk
    > > heaven and should be replaced. IDE/ATA disks do their own defect
    > > management, and you'd typically use SMART to know when a disk is
    > > dying.
    > >
    > > If you're concerned about data integrity, use MD5 or SHA1 hashes to
    > > verify the contents of the raw slice/partition.

    >
    > If you prefer I want a filesystem which is able to manage the defect
    > list of bad sectors (I think bad sectors is more accurate than bad
    > block). Indeed, even if the device CREATES the defect list, someone
    > (the device driver) need to USE this defect list. And I don't want to
    > bury my hard disk when the first bad sector appear !


    It's the disk firmware itself that handles the bad sectors. With ATA
    disks you don't even get to see the list of bad blocks, so file systems
    cannot handle blocks they have no way of knowing. And in any case, the
    firmware in the drive will _already_ have substituted a spare block, so
    you'll be replacing a perfectly OK block (the spare one).

    SCSI and ATA disks present themselves to the OS as 1..n blocks of 512
    bytes. No more, no less. No heads, no cylinders, no sectors. They're
    still mentioned in the PC BIOS for compatibility's sake, but that's all.

    > Scandisk is not a XIX century software AFAIK ?


    Very much XX century, I'm afraid.

    > > In other words, what you try to do is not possible. Once you're on
    > > RAID and SAN type storage, you haven't got a clue where the data is
    > > stored anyway - the devices the computer sees are synthetic.

    >
    > I just want the best transfer rate. What is THE solution.
    > Tell me if I'm wrong, but a fragmented file on hard disk will be
    > always slower to read than a "contiguous" file, even if contiguous is
    > not really contiguous.


    The OS typically will do nice job keeping files well organised for
    optimal access. In most cases, fragmentation actually _helps_
    performance, especially when several files are accessed simultaneously.

    About the only time a really, really contiguous file will make a
    difference is when it is the only file on disk - then and only then the
    heads can move step by step over the disk. And, oh miracle, if you've
    only a single file on a file system, it'll be as close to contiguous as
    it can get.

    Take it from me, the OS will do a better job than you can ever do.

    > > > We currently use a very similar solution under Windows using
    > > > fileread () but we have problems with this two points.

    > >
    > > I would be amazed that you have bad blocks on modern disks, even
    > > with Windows


    > This is really interesting.
    > Have you some stats on that subject ?
    > Indeed, our big files are data files, not executable. So, if bad
    > sectors are very infrequents,the solution may be : not manage them.
    > But to do this I mustn't stall on a bad sector reading. Is it the case
    > ?


    Look, whatever way you use to manage bad sectors, there will always be
    an access failure that causes a block to be marked bad. Thus, there
    will be an effect on the reading (or writing) operation that is taking
    place. This is technology, not magic. As I told you, ATA disk drives do
    not give you access to their defect lists, so stop thinking about them.
    The rule of thumb is - once a disk starts growing new defects, you
    replace it pronto, or you will have a disaster on your hands.

    > > Don't forget that with the technique above, you can only have ONE
    > > file on a raw partition - there's no support for multiple files,
    > > directories, or anything else.


    > I know that is the problem.
    > That's why, I have said above I want filesystem, I want fopen().


    fopen() is merely opening a file with user-space buffering. It's not
    dependent on having a "file system".

    > Indeed if I have two files, I want two fopen(), and I don't want one
    > fopen() and one fseek(). The latter solution is not cute and hard to
    > manage. That's why I still don't have THE solution.


    That's correct - but a file system and two fopen()s, followed by
    reading the two files simultaneously will result in head movement
    _even_ if you don't issue an fseek(), and even when the two files are
    contiguous. Actually, you'll have, on average, more and longer head
    seeks when they _are_ contiguous because the OS cannot position the
    blocks for optimal access. Think about it, and you'll see the light

    Have you actually benchmarked the file system and the raw partition
    approaches? Measure, don't speculate.

    It's dead easy as long as you have a couple of spare raw partitions,
    and the appropriate access rights (remember, on Unix everything is a
    file). The program doesn't need to change, just give it the appropriate
    names as parameters.

    Take care,

    --
    Stefaan A Eeckels
    --
    A human being should be able to change a diaper, plan an invasion,
    butcher a hog, conn a ship, design a building, write a sonnet, balance
    accounts, build a wall, set a bone, comfort the dying, take orders,
    give orders, cooperate, act alone, solve equations, analyze a new
    problem, pitch manure, program a computer, cook a tasty meal, fight
    efficiently, die gallantly. Specialization is for
    insects. -- Robert A. Heinlein

  7. Re: How to store big files contiguously on hd

    Begin <1157738270.776660.61720@p79g2000cwp.googlegroups.c om>
    On 2006-09-08, chevrot79@yahoo.fr wrote:
    > I just want the best transfer rate. What is THE solution.


    Multiple stripes on multiple fast-spinning disks, made to work together.

    See how that fits with earlier comments that insisting on a specific
    property of the files in relation to the disks has been meaningless
    for quite some time now?


    > Tell me if I'm wrong,


    You're wrong. You focus too narrowly on what you think the problem
    is, without checking if that's really the case. Unsurprisingly, your
    assumptions are completely pointless in the world of synthetic disk
    geometries as reported by modern storage devices.

    If you had described the *actual* goal (``large files stored once, read
    many times, need lots of throughput'' but perhaps asked a bit wordier),
    then that could have foregone detailed discussion of why your insistence
    on this particular set of details is pointless. But anyway.


    > but a fragmented file on hard disk will be always slower to read than
    > a "contiguous" file, even if contiguous is not really contiguous.


    ``Fragmented'' on unix file systems tends to mean something completely
    different than it does in the cosy windows world. In the latter,
    it arose from lots of small bits of disk in random order chained
    together to represent a single file. This was caused by both simplistic
    allocation and simplistic storage management in the ``FAT'' filesystem.

    Most unix filesystems work completely different. At least some also tend
    to put a maximum of disk use per file per single ``area'' of the disk to
    avoid creating worst case access patterns for other files.

    But the chunks the files do get are fairly big, so even for big files
    consecutive reads can be done in multi-megabyte chunks, and thus much
    more quickly than in the dreaded ``fragmented'' case on windows. The
    difference is of the order of (a minimum of) one seek per 512 bytes
    for fragmented FAT compared to one seek per (say) 64MB for a modern
    unix filesystem. You will have a hard time squeezing noticeably more
    performance out of the disk by inventing your own filesystem, nevermind
    using ``raw'' disks without some filesystem.

    Coming back to the unix view of ``fragments'', those are usually partial
    storage blocks to optimize size allocation notably for small files.
    They are only used at the end of a file, so a file contains at most one
    partial block containing one or a few of those fragments.


    >> I would be amazed that you have bad blocks on modern disks, even with
    >> Windows

    > This is really interesting.
    > Have you some stats on that subject ?


    As already explained: Modern disks ``hide'' bad blocks. They have a bit
    of extra capacity that they don't show you and there internally remap
    substitutes for bad bits of disk. The effect of that is that any bad
    sector statistics are zero until the disk runs out of substitute cache,
    at which point the disk is practically dead already and it's time to
    replace it.


    > Indeed, our big files are data files, not executable. So, if bad
    > sectors are very infrequents,the solution may be : not manage them.


    But serve bad bits of disk anyway? What are you trying to do,
    micromanage bad sectors? Are you some kind of manager? Hello?

    Let me repeat what has been repeatedly pointed out to you already: You
    Don't Get To See Dead Bits Of Disk Before The Disk Needs Replacing.

    That's right, the disk won't let you even near its bad bits until
    shortly before its demise. So there's no point trying to manage that.

    The only thing you can do is ask it every so often how well it is and
    if it isn't all cosy you start planning its replacement. Some storage
    vendors automate that too: If you have a fancy setup the first thing
    you'll notice is a fresh disk in the mail with instructions to replace a
    specific one from the storage array, *before* that particular disk shows
    up as needing replacement in the daily status report.


    --
    j p d (at) d s b (dot) t u d e l f t (dot) n l .
    This message was originally posted on Usenet in plain text.
    Any other representation, additions, or changes do not have my
    consent and may be a violation of international copyright law.

  8. Re: How to store big files contiguously on hd

    > Have you actually benchmarked the file system and the raw partition
    > approaches? Measure, don't speculate.

    I have benchmarked ... but not enough !
    Currently, I don't have the result at hand, but it was roughly the
    same.
    Sometimes filesystem was better.
    Sometimes raw partition was better.
    It depends on defrag, thread use or not in application and other
    parameters ...
    You have right, I must delve into this subject.

    --
    Cyrille CHEVROT


  9. Re: How to store big files contiguously on hd

    > If you had described the *actual* goal (``large files stored once, read
    > many times, need lots of throughput'' but perhaps asked a bit wordier),

    And perhaps English is not my native language, so keep cool !
    OK ?

    --
    Cyrille CHEVROT


  10. Re: How to store big files contiguously on hd

    Begin <1157751451.242773.77830@i3g2000cwc.googlegroups.co m>
    On 2006-09-08, chevrot79@yahoo.fr wrote:
    [attribution missing]
    >> If you had described the *actual* goal (``large files stored once, read
    >> many times, need lots of throughput'' but perhaps asked a bit wordier),

    > And perhaps English is not my native language, so keep cool !
    > OK ?


    Perhaps that situation is the same for many more people on the internet.
    I wasn't criticising your language, either, but the number of false
    assumptions in your approach. But if that wasn't clear, you might
    consider improving your command of the English language, indeed.


    --
    j p d (at) d s b (dot) t u d e l f t (dot) n l .
    This message was originally posted on Usenet in plain text.
    Any other representation, additions, or changes do not have my
    consent and may be a violation of international copyright law.

  11. Re: How to store big files contiguously on hd

    chevrot79@yahoo.fr wrote:
    > I would like to know if there are a mean to store big files
    > contiguously on hd with linux.
    > I don't care about the hd space loss.
    > I don't care about the time to write the big files on the hd.
    > But, I DO CARE about the time to READ the big files.


    > But I want to use a FILESYSTEM, I want to use fopen/fclose/fread/fwrite
    > to MANAGE this
    > big files. I DON'T want to use a raw hard disk with read/write sectors
    > functions only.


    You might want to look at the XFS filesystem. Originally released by
    SGI on IRIX, it was open-sourced and has been included in Linux. I
    don't know how mature or complete the capabilities are, but the
    filesystem has a notion of realtime files. This was probably added
    because SGI machines were often used for multimedia and needed the
    ability to stream data to/from hard disk with deterministic I/O speeds.

    In googling about this topic, I ran across some hints that XFS realtime
    data may not be fully supported on Linux, so you probably want to
    research that if you decide to go this direction.

    By the way, striped disks are a common solution to this problem in
    the audio recording world. Even with a lot of seeking, striping can
    still be a serious win, because even if the filesystem is completely
    fragmented, disk #1 can seek to chunk N while disk #2 is seeking to
    chunk N+1. That is, the seeking can be parallelized. So even on
    heavily fragmented files, you can still improve your performance.

    - Logan

  12. Re: How to store big files contiguously on hd

    On 8 Sep 2006 05:14:44 -0700, chevrot79@yahoo.fr wrote:
    > Hi,
    >
    > I would like to know if there are a mean to store big files
    > contiguously on hd with linux.


    Why?


  13. Re: How to store big files contiguously on hd

    On 8 Sep 2006 10:57:50 -0700, chevrot79@yahoo.fr wrote:
    >
    > Stefaan A Eeckels a écrit :
    >
    >> Why do they need to be contiguous?

    >
    > For better transfer rate when reading.


    Better than what? Is your bottleneck in system performance really
    because of this? How do you know this?

    >> If you're concerned about data integrity, use MD5 or SHA1 hashes to
    >> verify the contents of the raw slice/partition.


    > If you prefer I want a filesystem which is able to manage the defect
    > list of bad sectors (I think bad sectors is more accurate than bad
    > block). Indeed, even if the device CREATES the defect list, someone
    > (the device driver) need to USE this defect list. And I don't want to
    > bury my hard disk when the first bad sector appear !
    > Scandisk is not a XIX century software AFAIK ?


    The point is, the filesystem never needs to manage the bad blocks list
    because it never needs to know about them in the first place.

    >> On Unix systems, raw devices are streams of bytes. The block size
    >> of the underlying hardware doesn't matter. Obviously, when it comes to
    >> performance, using a a fairly large buffer (8192 bytes, for example)
    >> would give the best results.


    > This is good news


    Yes. I think it might be a good idea if you can tell us what problem
    you're actually trying to solve, rather than ask us how to implement
    something that doesn't actually make any sense to do as an effort to
    solve whatever that is.

    >> In other words, what you try to do is not possible. Once you're on
    >> RAID and SAN type storage, you haven't got a clue where the data is
    >> stored anyway - the devices the computer sees are synthetic.

    >
    > I just want the best transfer rate. What is THE solution.


    (sigh). It depends.

    >> What exactly are you trying to achieve?

    > See at the beginning of this post.


    No, his question still stands, you've explained a what, but you haven't
    defined a "why you're doing this in the first place". Helping someone
    with a lot of work to arrive at the wrong solution isn't helping them.


  14. Re: How to store big files contiguously on hd

    On 8 Sep 2006 14:37:31 -0700, chevrot79@yahoo.fr wrote:
    >> If you had described the *actual* goal (``large files stored once, read
    >> many times, need lots of throughput'' but perhaps asked a bit wordier),

    > And perhaps English is not my native language, so keep cool !
    > OK ?


    The problem is not with your English, it's with your approach. And tell
    us who you're quoting.


  15. Re: How to store big files contiguously on hd


    Dave Hinz a écrit :

    > >> On Unix systems, raw devices are streams of bytes. The block size
    > >> of the underlying hardware doesn't matter. Obviously, when it comes to
    > >> performance, using a a fairly large buffer (8192 bytes, for example)
    > >> would give the best results.

    >
    > > This is good news

    >
    > Yes. I think it might be a good idea if you can tell us what problem
    > you're actually trying to solve, rather than ask us how to implement
    > something that doesn't actually make any sense to do as an effort to
    > solve whatever that is.


    (snip)

    > >> What exactly are you trying to achieve?

    > > See at the beginning of this post.

    >
    > No, his question still stands, you've explained a what, but you haven't
    > defined a "why you're doing this in the first place". Helping someone
    > with a lot of work to arrive at the wrong solution isn't helping them.


    Because I must obey. And I need arguments to change my boss point of
    view. That's it !

    Since the beginning I am against a raw data solution and I want a
    filesystem.

    Now, the question is : which filesystem : with unix (linux indeed) and
    with windows. We currently develop a windows solution but we want to
    migrate from windows to linux in the next months.

    --
    Cyrille CHEVROT


  16. Re: How to store big files contiguously on hd

    jpd a écrit :
    > ``Fragmented'' on unix file systems tends to mean something completely
    > different than it does in the cosy windows world. In the latter,
    > it arose from lots of small bits of disk in random order chained
    > together to represent a single file. This was caused by both simplistic
    > allocation and simplistic storage management in the ``FAT'' filesystem.
    >
    > Most unix filesystems work completely different. At least some also tend
    > to put a maximum of disk use per file per single ``area'' of the disk to
    > avoid creating worst case access patterns for other files.
    >
    > But the chunks the files do get are fairly big, so even for big files
    > consecutive reads can be done in multi-megabyte chunks, and thus much
    > more quickly than in the dreaded ``fragmented'' case on windows. The
    > difference is of the order of (a minimum of) one seek per 512 bytes
    > for fragmented FAT compared to one seek per (say) 64MB for a modern
    > unix filesystem.


    Very interesting.

    Just some questions :

    - What about NTFS ? (We want to migrate from windows to linux in the
    next months, so currently I want information on linux _and_ windows
    filesystems)

    - What is the best filesystem with linux for our goal (``large files
    stored once, read many times, need lots of throughput'') ? ext2fs,
    ext3fs, reiserfs, xfs, jfs ?

    - Can we change the chunks (64MB for a modern unix filesystem) the
    files do get (for benchmark purpose) ?

    > You will have a hard time squeezing noticeably more
    > performance out of the disk by inventing your own filesystem, nevermind
    > using ``raw'' disks without some filesystem.


    I kown but I must obey to my boss. And the aim of this thread is to
    have arguments to convince him to use a filesystem. That's the point.

    --
    Cyrille CHEVROT


  17. Re: How to store big files contiguously on hd

    Begin <1157835083.046040.55020@m73g2000cwd.googlegroups.c om>
    On 2006-09-09, chevrot79@yahoo.fr wrote:
    > Just some questions :
    >
    > - What about NTFS ? (We want to migrate from windows to linux in the
    > next months, so currently I want information on linux _and_ windows
    > filesystems)


    You'll have to find other sources for that, as I haven't really paid
    attention to ntfs at all. Some vague memories have a notion that it
    a) isn't at all like FAT, and b) it might even have actual contiguous
    file support, if only you can find a way to get at it.

    That problem is quite endemic with ntfs, though. It also supports
    ``file versioning'' through multiple forks (like the macintosh data and
    resource forks, but generalised), only nobody uses it because there is
    no good interface to use it. Also, those parts will have not seen
    widespread use and might be buggy.


    > - What is the best filesystem with linux for our goal (``large files
    > stored once, read many times, need lots of throughput'') ? ext2fs,
    > ext3fs, reiserfs, xfs, jfs ?


    There's numerous benchmarks Out There. Point of research.


    > - Can we change the chunks (64MB for a modern unix filesystem) the
    > files do get (for benchmark purpose) ?


    Can't give a definite answer for all FSes, but for some, yes, you can
    change the maximum per-area allocation cap. There's quite a lot you can
    tune and tinker with, altough some parameters can only be changed at FS
    creation time.

    I think you'll find that --beyond a certain size-- the distance from the
    center of the disk you put something has a much greater influence on
    throughput, than whether the blocks read are N MB or N*4 MB.


    > I kown but I must obey to my boss. And the aim of this thread is to
    > have arguments to convince him to use a filesystem. That's the point.


    You've been told by various people, in their own free time, for free,
    with among them quite a lot of experience, that the entire approach is
    technically unsound. If being told is not enough, then it is time to do
    research yourself. Or hire some of these people to tell your boss in
    person that he's wrong.


    --
    j p d (at) d s b (dot) t u d e l f t (dot) n l .
    This message was originally posted on Usenet in plain text.
    Any other representation, additions, or changes do not have my
    consent and may be a violation of international copyright law.

  18. Re: How to store big files contiguously on hd

    On 9 Sep 2006 13:27:23 -0700, chevrot79@yahoo.fr wrote:
    >
    > Dave Hinz a écrit :


    >> Yes. I think it might be a good idea if you can tell us what problem
    >> you're actually trying to solve, rather than ask us how to implement
    >> something that doesn't actually make any sense to do as an effort to
    >> solve whatever that is.

    (snip)
    >> No, his question still stands, you've explained a what, but you haven't
    >> defined a "why you're doing this in the first place". Helping someone
    >> with a lot of work to arrive at the wrong solution isn't helping them.


    > Because I must obey. And I need arguments to change my boss point of
    > view. That's it !


    Why don't you start there. What is your boss telling you needs to be
    solved. Did he actually say "find a way to contiguously write large
    files using Linux and make sure you map the bad blocks"? If so, you may
    need a boss upgrade. I just applied one and it's quite refreshing to do
    so.

    > Now, the question is : which filesystem : with unix (linux indeed) and
    > with windows. We currently develop a windows solution but we want to
    > migrate from windows to linux in the next months.


    What's the actual requirement please?

  19. Re: How to store big files contiguously on hd

    Dave Hinz a écrit :

    > On 9 Sep 2006 13:27:23 -0700, chevrot79@yahoo.fr wrote:
    > >
    > > Dave Hinz a écrit :

    >
    > >> Yes. I think it might be a good idea if you can tell us what problem
    > >> you're actually trying to solve, rather than ask us how to implement
    > >> something that doesn't actually make any sense to do as an effort to
    > >> solve whatever that is.

    > (snip)
    > >> No, his question still stands, you've explained a what, but you haven't
    > >> defined a "why you're doing this in the first place". Helping someone
    > >> with a lot of work to arrive at the wrong solution isn't helping them.

    >
    > > Because I must obey. And I need arguments to change my boss point of
    > > view. That's it !

    >
    > Why don't you start there. What is your boss telling you needs to be
    > solved. Did he actually say "find a way to contiguously write large
    > files using Linux and make sure you map the bad blocks"? If so, you may
    > need a boss upgrade. I just applied one and it's quite refreshing to do
    > so.


    First of all, my boss is great. The only problem is that he is an
    electronic engineer. And in this field he does great job. However when
    it comes to computer science field, he isn't as good.
    He actually said : "raw data on hard disk store contiguously without
    filesystem".
    I think try to convince him to change his mind is a better thing than
    do a boss upgrade.

    --
    Cyrille CHEVROT


  20. Re: How to store big files contiguously on hd

    jpd a écrit :

    > You've been told by various people, in their own free time, for free,
    > with among them quite a lot of experience, that the entire approach is
    > technically unsound. If being told is not enough, then it is time to do
    > research yourself. Or hire some of these people to tell your boss in
    > person that he's wrong.



    Now, I had what I was researching : arguments. Now I will try to
    convince him.

    End of the story.

    --
    Cyrille CHEVROT


+ Reply to Thread
Page 1 of 2 1 2 LastLast