How to store big files contiguously on hd - Unix
This is a discussion on How to store big files contiguously on hd - Unix ; Hi,
I would like to know if there are a mean to store big files
contiguously on hd with linux.
I don't care about the hd space loss.
I don't care about the time to write the big files on ...
-
How to store big files contiguously on hd
Hi,
I would like to know if there are a mean to store big files
contiguously on hd with linux.
I don't care about the hd space loss.
I don't care about the time to write the big files on the hd.
But, I DO CARE about the time to READ the big files.
For example, If I have 12 big files, 2 Go max, to place on my harddisk,
one solution would
be to allocate 24 Go on my hard disk, and associate 2 contiguous Go on
the disk for each file.
But I want to use a FILESYSTEM, I want to use fopen/fclose/fread/fwrite
to MANAGE this
big files. I DON'T want to use a raw hard disk with read/write sectors
functions only.
I hope I make myself clear.
And I hope someone has the answer to my response.
Thanks in advance.
Cyrille CHEVROT
-
Re: How to store big files contiguously on hd
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
chevrot79@yahoo.fr wrote:
> Hi,
>
> I would like to know if there are a mean to store big files
> contiguously on hd with linux.
Yes, there is. But, not within a standard filesystem.
[snip]
> But I want to use a FILESYSTEM, I want to use fopen/fclose/fread/fwrite
> to MANAGE this big files.
First off, you have an invalid assumption in the above statement. You
assume that fopen() et al can only be used against files in a formatted
filesystem. This is incorrect.
>I DON'T want to use a raw hard disk with read/write sectors functions only.
Again, an invalid assumption (although less obvious). You assume that
raw disk access must be performed using some sort of "sector
read/write" function only.
None of the filesystem formats that I am familiar with permit you to
control whether or not specific files are allocated to contigious
blocks on the media. In fact, AFAIK, they don't have any control for
that at all.
So, to get contigious blocks, you are going to have to read/write from
something that doesn't have a filesystem on it, and manage the space
yourself.
All the standard file operations (open()/read()/write()/lseek()/close()
and fopen()/fread()/fwrite()/fseek()/fclose()) work when you use them
against a file in the system, whether the file resides in a formatted
filesystem or not.
So, allocate yourself a partition on your hard drive, and fopen() etc
the device file for that partition. You have direct access to
everything in that partition, and can control exactly where data goes.
fread()/fwrite()/fseek()/read()/write()/lseek() work sequentially from
the beginning of the partition, so you have the guarantee of contigious
placement of data
example to write four contigious 1k blocks of binary 0 to a "file" :
{
FILE *raw;
int count;
char buff[1024] = { 0 };
if ((raw = fopen("/dev/fd0","r+b")) == NULL) exit(1);
for (count = 0; count < 4; ++count)
fwrite(buff, sizeof buff, 1, raw);
fclose(raw);
}
HTH
- --
Lew Pitcher
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32) - WinPT 0.11.12
iD8DBQFFAXuJagVFX4UWr64RAis+AKDqAfpzbQNm7ClUouz41l SttygmpACgrgNm
gDAKtfRQ5cqSh2AsCHh2bSU=
=e++Y
-----END PGP SIGNATURE-----
-
Re: How to store big files contiguously on hd
> > I would like to know if there are a mean to store big files
> > contiguously on hd with linux.
>
> Yes, there is. But, not within a standard filesystem.
And with non standard filesystem ?
> example to write four contigious 1k blocks of binary 0 to a "file" :
> {
> FILE *raw;
> int count;
> char buff[1024] = { 0 };
>
> if ((raw = fopen("/dev/fd0","r+b")) == NULL) exit(1);
> for (count = 0; count < 4; ++count)
> fwrite(buff, sizeof buff, 1, raw);
> fclose(raw);
> }
>
This is an interessant solution, however I have two questions :
- What about bad blocks ? In fact, I mainly want a filesystem to MANAGE
BAD BLOCKS. Does this solution manage them ?
- What about 512 bytes alignment constraints ? Are we oblige to manage
them "manually" or not ?
We currently use a very similar solution under Windows using fileread()
but we have problems with this two points.
Thanks in advance,
--
Cyrille CHEVROT
-
Re: How to store big files contiguously on hd
On 8 Sep 2006 09:14:55 -0700
"chevrot79@yahoo.fr" wrote:
> > > I would like to know if there are a mean to store big files
> > > contiguously on hd with linux.
Why do they need to be contiguous?
> > Yes, there is. But, not within a standard filesystem.
>
> And with non standard filesystem ?
>
>
> > example to write four contigious 1k blocks of binary 0 to a "file" :
> > {
> > FILE *raw;
> > int count;
> > char buff[1024] = { 0 };
> >
> > if ((raw = fopen("/dev/fd0","r+b")) == NULL) exit(1);
> > for (count = 0; count < 4; ++count)
> > fwrite(buff, sizeof buff, 1, raw);
> > fclose(raw);
> > }
> >
>
> This is an interessant solution, however I have two questions :
>
> - What about bad blocks ? In fact, I mainly want a filesystem to
> MANAGE BAD BLOCKS. Does this solution manage them ?
Filesystems managing bad blocks - what century are you living in? Bad
blocks are handled at the device level, not the file system (or should
be). SCSI disks give access to the primary defect list (established by
the manufacturer) and the grown defect list. The latter should be
empty, or else the disk is on its way to disk heaven and should be
replaced. IDE/ATA disks do their own defect management, and you'd
typically use SMART to know when a disk is dying.
If you're concerned about data integrity, use MD5 or SHA1 hashes to
verify the contents of the raw slice/partition.
> - What about 512 bytes alignment constraints ? Are we oblige to manage
> them "manually" or not ?
On Unix systems, raw devices are streams of bytes. The block size
of the underlying hardware doesn't matter. Obviously, when it comes to
performance, using a a fairly large buffer (8192 bytes, for example)
would give the best results. Notice that this is caused by buffer
optimisation, not the geometry of the disk. Both SCSI and ATA disks are
addressed by block number, and the drive firmware knows where to get
each block. For example, blocks that have been marked bad during
manufacturing result in head movement even though the block numbers are
sequential.
In other words, what you try to do is not possible. Once you're on
RAID and SAN type storage, you haven't got a clue where the data is
stored anyway - the devices the computer sees are synthetic.
> We currently use a very similar solution under Windows using fileread
> () but we have problems with this two points.
I would be amazed that you have bad blocks on modern disks, even with
Windows 
Don't forget that with the technique above, you can only have ONE file
on a raw partition - there's no support for multiple files,
directories, or anything else.
What exactly are you trying to achieve?
--
Stefaan A Eeckels
--
How's it supposed to get the respect of management if you've got just
one guy working on the project? It's much more impressive to have a
battery of programmers slaving away. -- Jeffrey Hobbs (comp.lang.tcl)
-
Re: How to store big files contiguously on hd
Stefaan A Eeckels a écrit :
> On 8 Sep 2006 09:14:55 -0700
> "chevrot79@yahoo.fr" wrote:
>
> > > > I would like to know if there are a mean to store big files
> > > > contiguously on hd with linux.
>
> Why do they need to be contiguous?
For better transfer rate when reading.
The big files will be place once on the hard disk and then they will be
transfer out of the hard disk many times. The transfer rate must be the
best.
> > - What about bad blocks ? In fact, I mainly want a filesystem to
> > MANAGE BAD BLOCKS. Does this solution manage them ?
>
> Filesystems managing bad blocks - what century are you living in? Bad
> blocks are handled at the device level, not the file system (or should
> be). SCSI disks give access to the primary defect list (established by
> the manufacturer) and the grown defect list. The latter should be
> empty, or else the disk is on its way to disk heaven and should be
> replaced. IDE/ATA disks do their own defect management, and you'd
> typically use SMART to know when a disk is dying.
>
> If you're concerned about data integrity, use MD5 or SHA1 hashes to
> verify the contents of the raw slice/partition.
If you prefer I want a filesystem which is able to manage the defect
list of bad sectors (I think bad sectors is more accurate than bad
block). Indeed, even if the device CREATES the defect list, someone
(the device driver) need to USE this defect list. And I don't want to
bury my hard disk when the first bad sector appear !
Scandisk is not a XIX century software AFAIK ?
> > - What about 512 bytes alignment constraints ? Are we oblige to manage
> > them "manually" or not ?
>
> On Unix systems, raw devices are streams of bytes. The block size
> of the underlying hardware doesn't matter. Obviously, when it comes to
> performance, using a a fairly large buffer (8192 bytes, for example)
> would give the best results.
This is good news 
> Notice that this is caused by buffer
> optimisation, not the geometry of the disk. Both SCSI and ATA disks are
> addressed by block number, and the drive firmware knows where to get
> each block. For example, blocks that have been marked bad during
> manufacturing result in head movement even though the block numbers are
> sequential.
>
> In other words, what you try to do is not possible. Once you're on
> RAID and SAN type storage, you haven't got a clue where the data is
> stored anyway - the devices the computer sees are synthetic.
I just want the best transfer rate. What is THE solution.
Tell me if I'm wrong, but a fragmented file on hard disk will be always
slower to read than a "contiguous" file, even if contiguous is not
really contiguous.
> > We currently use a very similar solution under Windows using fileread
> > () but we have problems with this two points.
>
> I would be amazed that you have bad blocks on modern disks, even with
> Windows 
This is really interesting.
Have you some stats on that subject ?
Indeed, our big files are data files, not executable. So, if bad
sectors are very infrequents,the solution may be : not manage them.
But to do this I mustn't stall on a bad sector reading. Is it the case
?
> Don't forget that with the technique above, you can only have ONE file
> on a raw partition - there's no support for multiple files,
> directories, or anything else.
I know that is the problem.
That's why, I have said above I want filesystem, I want fopen(). Indeed
if I have two files, I want two fopen(), and I don't want one fopen()
and one fseek(). The latter solution is not cute and hard to manage.
That's why I still don't have THE solution.
>
> What exactly are you trying to achieve?
See at the beginning of this post.
--
Cyrille CHEVROT
-
Re: How to store big files contiguously on hd
On 8 Sep 2006 10:57:50 -0700
"chevrot79@yahoo.fr" wrote:
>
> Stefaan A Eeckels a écrit :
>
> > On 8 Sep 2006 09:14:55 -0700
> > "chevrot79@yahoo.fr" wrote:
> >
> > > > > I would like to know if there are a mean to store big files
> > > > > contiguously on hd with linux.
> >
> > Why do they need to be contiguous?
>
> For better transfer rate when reading.
> The big files will be place once on the hard disk and then they will
> be transfer out of the hard disk many times. The transfer rate must
> be the best.
Reading? Get a RAID 5 setup. It'll suck at writing, but you get
impressive read speeds and you can lose a disk without data loss.
> > > - What about bad blocks ? In fact, I mainly want a filesystem to
> > > MANAGE BAD BLOCKS. Does this solution manage them ?
> >
> > Filesystems managing bad blocks - what century are you living in?
> > Bad blocks are handled at the device level, not the file system (or
> > should be). SCSI disks give access to the primary defect list
> > (established by the manufacturer) and the grown defect list. The
> > latter should be empty, or else the disk is on its way to disk
> > heaven and should be replaced. IDE/ATA disks do their own defect
> > management, and you'd typically use SMART to know when a disk is
> > dying.
> >
> > If you're concerned about data integrity, use MD5 or SHA1 hashes to
> > verify the contents of the raw slice/partition.
>
> If you prefer I want a filesystem which is able to manage the defect
> list of bad sectors (I think bad sectors is more accurate than bad
> block). Indeed, even if the device CREATES the defect list, someone
> (the device driver) need to USE this defect list. And I don't want to
> bury my hard disk when the first bad sector appear !
It's the disk firmware itself that handles the bad sectors. With ATA
disks you don't even get to see the list of bad blocks, so file systems
cannot handle blocks they have no way of knowing. And in any case, the
firmware in the drive will _already_ have substituted a spare block, so
you'll be replacing a perfectly OK block (the spare one).
SCSI and ATA disks present themselves to the OS as 1..n blocks of 512
bytes. No more, no less. No heads, no cylinders, no sectors. They're
still mentioned in the PC BIOS for compatibility's sake, but that's all.
> Scandisk is not a XIX century software AFAIK ?
Very much XX century, I'm afraid.
> > In other words, what you try to do is not possible. Once you're on
> > RAID and SAN type storage, you haven't got a clue where the data is
> > stored anyway - the devices the computer sees are synthetic.
>
> I just want the best transfer rate. What is THE solution.
> Tell me if I'm wrong, but a fragmented file on hard disk will be
> always slower to read than a "contiguous" file, even if contiguous is
> not really contiguous.
The OS typically will do nice job keeping files well organised for
optimal access. In most cases, fragmentation actually _helps_
performance, especially when several files are accessed simultaneously.
About the only time a really, really contiguous file will make a
difference is when it is the only file on disk - then and only then the
heads can move step by step over the disk. And, oh miracle, if you've
only a single file on a file system, it'll be as close to contiguous as
it can get.
Take it from me, the OS will do a better job than you can ever do.
> > > We currently use a very similar solution under Windows using
> > > fileread () but we have problems with this two points.
> >
> > I would be amazed that you have bad blocks on modern disks, even
> > with Windows 
> This is really interesting.
> Have you some stats on that subject ?
> Indeed, our big files are data files, not executable. So, if bad
> sectors are very infrequents,the solution may be : not manage them.
> But to do this I mustn't stall on a bad sector reading. Is it the case
> ?
Look, whatever way you use to manage bad sectors, there will always be
an access failure that causes a block to be marked bad. Thus, there
will be an effect on the reading (or writing) operation that is taking
place. This is technology, not magic. As I told you, ATA disk drives do
not give you access to their defect lists, so stop thinking about them.
The rule of thumb is - once a disk starts growing new defects, you
replace it pronto, or you will have a disaster on your hands.
> > Don't forget that with the technique above, you can only have ONE
> > file on a raw partition - there's no support for multiple files,
> > directories, or anything else.
> I know that is the problem.
> That's why, I have said above I want filesystem, I want fopen().
fopen() is merely opening a file with user-space buffering. It's not
dependent on having a "file system".
> Indeed if I have two files, I want two fopen(), and I don't want one
> fopen() and one fseek(). The latter solution is not cute and hard to
> manage. That's why I still don't have THE solution.
That's correct - but a file system and two fopen()s, followed by
reading the two files simultaneously will result in head movement
_even_ if you don't issue an fseek(), and even when the two files are
contiguous. Actually, you'll have, on average, more and longer head
seeks when they _are_ contiguous because the OS cannot position the
blocks for optimal access. Think about it, and you'll see the light 
Have you actually benchmarked the file system and the raw partition
approaches? Measure, don't speculate.
It's dead easy as long as you have a couple of spare raw partitions,
and the appropriate access rights (remember, on Unix everything is a
file). The program doesn't need to change, just give it the appropriate
names as parameters.
Take care,
--
Stefaan A Eeckels
--
A human being should be able to change a diaper, plan an invasion,
butcher a hog, conn a ship, design a building, write a sonnet, balance
accounts, build a wall, set a bone, comfort the dying, take orders,
give orders, cooperate, act alone, solve equations, analyze a new
problem, pitch manure, program a computer, cook a tasty meal, fight
efficiently, die gallantly. Specialization is for
insects. -- Robert A. Heinlein
-
Re: How to store big files contiguously on hd
Begin <1157738270.776660.61720@p79g2000cwp.googlegroups.c om>
On 2006-09-08, chevrot79@yahoo.fr wrote:
> I just want the best transfer rate. What is THE solution.
Multiple stripes on multiple fast-spinning disks, made to work together.
See how that fits with earlier comments that insisting on a specific
property of the files in relation to the disks has been meaningless
for quite some time now?
> Tell me if I'm wrong,
You're wrong. You focus too narrowly on what you think the problem
is, without checking if that's really the case. Unsurprisingly, your
assumptions are completely pointless in the world of synthetic disk
geometries as reported by modern storage devices.
If you had described the *actual* goal (``large files stored once, read
many times, need lots of throughput'' but perhaps asked a bit wordier),
then that could have foregone detailed discussion of why your insistence
on this particular set of details is pointless. But anyway.
> but a fragmented file on hard disk will be always slower to read than
> a "contiguous" file, even if contiguous is not really contiguous.
``Fragmented'' on unix file systems tends to mean something completely
different than it does in the cosy windows world. In the latter,
it arose from lots of small bits of disk in random order chained
together to represent a single file. This was caused by both simplistic
allocation and simplistic storage management in the ``FAT'' filesystem.
Most unix filesystems work completely different. At least some also tend
to put a maximum of disk use per file per single ``area'' of the disk to
avoid creating worst case access patterns for other files.
But the chunks the files do get are fairly big, so even for big files
consecutive reads can be done in multi-megabyte chunks, and thus much
more quickly than in the dreaded ``fragmented'' case on windows. The
difference is of the order of (a minimum of) one seek per 512 bytes
for fragmented FAT compared to one seek per (say) 64MB for a modern
unix filesystem. You will have a hard time squeezing noticeably more
performance out of the disk by inventing your own filesystem, nevermind
using ``raw'' disks without some filesystem.
Coming back to the unix view of ``fragments'', those are usually partial
storage blocks to optimize size allocation notably for small files.
They are only used at the end of a file, so a file contains at most one
partial block containing one or a few of those fragments.
>> I would be amazed that you have bad blocks on modern disks, even with
>> Windows 
> This is really interesting.
> Have you some stats on that subject ?
As already explained: Modern disks ``hide'' bad blocks. They have a bit
of extra capacity that they don't show you and there internally remap
substitutes for bad bits of disk. The effect of that is that any bad
sector statistics are zero until the disk runs out of substitute cache,
at which point the disk is practically dead already and it's time to
replace it.
> Indeed, our big files are data files, not executable. So, if bad
> sectors are very infrequents,the solution may be : not manage them.
But serve bad bits of disk anyway? What are you trying to do,
micromanage bad sectors? Are you some kind of manager? Hello?
Let me repeat what has been repeatedly pointed out to you already: You
Don't Get To See Dead Bits Of Disk Before The Disk Needs Replacing.
That's right, the disk won't let you even near its bad bits until
shortly before its demise. So there's no point trying to manage that.
The only thing you can do is ask it every so often how well it is and
if it isn't all cosy you start planning its replacement. Some storage
vendors automate that too: If you have a fancy setup the first thing
you'll notice is a fresh disk in the mail with instructions to replace a
specific one from the storage array, *before* that particular disk shows
up as needing replacement in the daily status report.
--
j p d (at) d s b (dot) t u d e l f t (dot) n l .
This message was originally posted on Usenet in plain text.
Any other representation, additions, or changes do not have my
consent and may be a violation of international copyright law.
-
Re: How to store big files contiguously on hd
> Have you actually benchmarked the file system and the raw partition
> approaches? Measure, don't speculate.
I have benchmarked ... but not enough !
Currently, I don't have the result at hand, but it was roughly the
same.
Sometimes filesystem was better.
Sometimes raw partition was better.
It depends on defrag, thread use or not in application and other
parameters ...
You have right, I must delve into this subject.
--
Cyrille CHEVROT
-
Re: How to store big files contiguously on hd
> If you had described the *actual* goal (``large files stored once, read
> many times, need lots of throughput'' but perhaps asked a bit wordier),
And perhaps English is not my native language, so keep cool !
OK ?
--
Cyrille CHEVROT
-
Re: How to store big files contiguously on hd
Begin <1157751451.242773.77830@i3g2000cwc.googlegroups.co m>
On 2006-09-08, chevrot79@yahoo.fr wrote:
[attribution missing]
>> If you had described the *actual* goal (``large files stored once, read
>> many times, need lots of throughput'' but perhaps asked a bit wordier),
> And perhaps English is not my native language, so keep cool !
> OK ?
Perhaps that situation is the same for many more people on the internet.
I wasn't criticising your language, either, but the number of false
assumptions in your approach. But if that wasn't clear, you might
consider improving your command of the English language, indeed.
--
j p d (at) d s b (dot) t u d e l f t (dot) n l .
This message was originally posted on Usenet in plain text.
Any other representation, additions, or changes do not have my
consent and may be a violation of international copyright law.
-
Re: How to store big files contiguously on hd
chevrot79@yahoo.fr wrote:
> I would like to know if there are a mean to store big files
> contiguously on hd with linux.
> I don't care about the hd space loss.
> I don't care about the time to write the big files on the hd.
> But, I DO CARE about the time to READ the big files.
> But I want to use a FILESYSTEM, I want to use fopen/fclose/fread/fwrite
> to MANAGE this
> big files. I DON'T want to use a raw hard disk with read/write sectors
> functions only.
You might want to look at the XFS filesystem. Originally released by
SGI on IRIX, it was open-sourced and has been included in Linux. I
don't know how mature or complete the capabilities are, but the
filesystem has a notion of realtime files. This was probably added
because SGI machines were often used for multimedia and needed the
ability to stream data to/from hard disk with deterministic I/O speeds.
In googling about this topic, I ran across some hints that XFS realtime
data may not be fully supported on Linux, so you probably want to
research that if you decide to go this direction.
By the way, striped disks are a common solution to this problem in
the audio recording world. Even with a lot of seeking, striping can
still be a serious win, because even if the filesystem is completely
fragmented, disk #1 can seek to chunk N while disk #2 is seeking to
chunk N+1. That is, the seeking can be parallelized. So even on
heavily fragmented files, you can still improve your performance.
- Logan
-
Re: How to store big files contiguously on hd
On 8 Sep 2006 05:14:44 -0700, chevrot79@yahoo.fr wrote:
> Hi,
>
> I would like to know if there are a mean to store big files
> contiguously on hd with linux.
Why?
-
Re: How to store big files contiguously on hd
On 8 Sep 2006 10:57:50 -0700, chevrot79@yahoo.fr wrote:
>
> Stefaan A Eeckels a écrit :
>
>> Why do they need to be contiguous?
>
> For better transfer rate when reading.
Better than what? Is your bottleneck in system performance really
because of this? How do you know this?
>> If you're concerned about data integrity, use MD5 or SHA1 hashes to
>> verify the contents of the raw slice/partition.
> If you prefer I want a filesystem which is able to manage the defect
> list of bad sectors (I think bad sectors is more accurate than bad
> block). Indeed, even if the device CREATES the defect list, someone
> (the device driver) need to USE this defect list. And I don't want to
> bury my hard disk when the first bad sector appear !
> Scandisk is not a XIX century software AFAIK ?
The point is, the filesystem never needs to manage the bad blocks list
because it never needs to know about them in the first place.
>> On Unix systems, raw devices are streams of bytes. The block size
>> of the underlying hardware doesn't matter. Obviously, when it comes to
>> performance, using a a fairly large buffer (8192 bytes, for example)
>> would give the best results.
> This is good news 
Yes. I think it might be a good idea if you can tell us what problem
you're actually trying to solve, rather than ask us how to implement
something that doesn't actually make any sense to do as an effort to
solve whatever that is.
>> In other words, what you try to do is not possible. Once you're on
>> RAID and SAN type storage, you haven't got a clue where the data is
>> stored anyway - the devices the computer sees are synthetic.
>
> I just want the best transfer rate. What is THE solution.
(sigh). It depends.
>> What exactly are you trying to achieve?
> See at the beginning of this post.
No, his question still stands, you've explained a what, but you haven't
defined a "why you're doing this in the first place". Helping someone
with a lot of work to arrive at the wrong solution isn't helping them.
-
Re: How to store big files contiguously on hd
On 8 Sep 2006 14:37:31 -0700, chevrot79@yahoo.fr wrote:
>> If you had described the *actual* goal (``large files stored once, read
>> many times, need lots of throughput'' but perhaps asked a bit wordier),
> And perhaps English is not my native language, so keep cool !
> OK ?
The problem is not with your English, it's with your approach. And tell
us who you're quoting.
-
Re: How to store big files contiguously on hd
Dave Hinz a écrit :
> >> On Unix systems, raw devices are streams of bytes. The block size
> >> of the underlying hardware doesn't matter. Obviously, when it comes to
> >> performance, using a a fairly large buffer (8192 bytes, for example)
> >> would give the best results.
>
> > This is good news 
>
> Yes. I think it might be a good idea if you can tell us what problem
> you're actually trying to solve, rather than ask us how to implement
> something that doesn't actually make any sense to do as an effort to
> solve whatever that is.
(snip)
> >> What exactly are you trying to achieve?
> > See at the beginning of this post.
>
> No, his question still stands, you've explained a what, but you haven't
> defined a "why you're doing this in the first place". Helping someone
> with a lot of work to arrive at the wrong solution isn't helping them.
Because I must obey. And I need arguments to change my boss point of
view. That's it !
Since the beginning I am against a raw data solution and I want a
filesystem.
Now, the question is : which filesystem : with unix (linux indeed) and
with windows. We currently develop a windows solution but we want to
migrate from windows to linux in the next months.
--
Cyrille CHEVROT
-
Re: How to store big files contiguously on hd
jpd a écrit :
> ``Fragmented'' on unix file systems tends to mean something completely
> different than it does in the cosy windows world. In the latter,
> it arose from lots of small bits of disk in random order chained
> together to represent a single file. This was caused by both simplistic
> allocation and simplistic storage management in the ``FAT'' filesystem.
>
> Most unix filesystems work completely different. At least some also tend
> to put a maximum of disk use per file per single ``area'' of the disk to
> avoid creating worst case access patterns for other files.
>
> But the chunks the files do get are fairly big, so even for big files
> consecutive reads can be done in multi-megabyte chunks, and thus much
> more quickly than in the dreaded ``fragmented'' case on windows. The
> difference is of the order of (a minimum of) one seek per 512 bytes
> for fragmented FAT compared to one seek per (say) 64MB for a modern
> unix filesystem.
Very interesting.
Just some questions :
- What about NTFS ? (We want to migrate from windows to linux in the
next months, so currently I want information on linux _and_ windows
filesystems)
- What is the best filesystem with linux for our goal (``large files
stored once, read many times, need lots of throughput'') ? ext2fs,
ext3fs, reiserfs, xfs, jfs ?
- Can we change the chunks (64MB for a modern unix filesystem) the
files do get (for benchmark purpose) ?
> You will have a hard time squeezing noticeably more
> performance out of the disk by inventing your own filesystem, nevermind
> using ``raw'' disks without some filesystem.
I kown but I must obey to my boss. And the aim of this thread is to
have arguments to convince him to use a filesystem. That's the point.
--
Cyrille CHEVROT
-
Re: How to store big files contiguously on hd
Begin <1157835083.046040.55020@m73g2000cwd.googlegroups.c om>
On 2006-09-09, chevrot79@yahoo.fr wrote:
> Just some questions :
>
> - What about NTFS ? (We want to migrate from windows to linux in the
> next months, so currently I want information on linux _and_ windows
> filesystems)
You'll have to find other sources for that, as I haven't really paid
attention to ntfs at all. Some vague memories have a notion that it
a) isn't at all like FAT, and b) it might even have actual contiguous
file support, if only you can find a way to get at it.
That problem is quite endemic with ntfs, though. It also supports
``file versioning'' through multiple forks (like the macintosh data and
resource forks, but generalised), only nobody uses it because there is
no good interface to use it. Also, those parts will have not seen
widespread use and might be buggy.
> - What is the best filesystem with linux for our goal (``large files
> stored once, read many times, need lots of throughput'') ? ext2fs,
> ext3fs, reiserfs, xfs, jfs ?
There's numerous benchmarks Out There. Point of research.
> - Can we change the chunks (64MB for a modern unix filesystem) the
> files do get (for benchmark purpose) ?
Can't give a definite answer for all FSes, but for some, yes, you can
change the maximum per-area allocation cap. There's quite a lot you can
tune and tinker with, altough some parameters can only be changed at FS
creation time.
I think you'll find that --beyond a certain size-- the distance from the
center of the disk you put something has a much greater influence on
throughput, than whether the blocks read are N MB or N*4 MB.
> I kown but I must obey to my boss. And the aim of this thread is to
> have arguments to convince him to use a filesystem. That's the point.
You've been told by various people, in their own free time, for free,
with among them quite a lot of experience, that the entire approach is
technically unsound. If being told is not enough, then it is time to do
research yourself. Or hire some of these people to tell your boss in
person that he's wrong.
--
j p d (at) d s b (dot) t u d e l f t (dot) n l .
This message was originally posted on Usenet in plain text.
Any other representation, additions, or changes do not have my
consent and may be a violation of international copyright law.
-
Re: How to store big files contiguously on hd
On 9 Sep 2006 13:27:23 -0700, chevrot79@yahoo.fr wrote:
>
> Dave Hinz a écrit :
>> Yes. I think it might be a good idea if you can tell us what problem
>> you're actually trying to solve, rather than ask us how to implement
>> something that doesn't actually make any sense to do as an effort to
>> solve whatever that is.
(snip)
>> No, his question still stands, you've explained a what, but you haven't
>> defined a "why you're doing this in the first place". Helping someone
>> with a lot of work to arrive at the wrong solution isn't helping them.
> Because I must obey. And I need arguments to change my boss point of
> view. That's it !
Why don't you start there. What is your boss telling you needs to be
solved. Did he actually say "find a way to contiguously write large
files using Linux and make sure you map the bad blocks"? If so, you may
need a boss upgrade. I just applied one and it's quite refreshing to do
so.
> Now, the question is : which filesystem : with unix (linux indeed) and
> with windows. We currently develop a windows solution but we want to
> migrate from windows to linux in the next months.
What's the actual requirement please?
-
Re: How to store big files contiguously on hd
Dave Hinz a écrit :
> On 9 Sep 2006 13:27:23 -0700, chevrot79@yahoo.fr wrote:
> >
> > Dave Hinz a écrit :
>
> >> Yes. I think it might be a good idea if you can tell us what problem
> >> you're actually trying to solve, rather than ask us how to implement
> >> something that doesn't actually make any sense to do as an effort to
> >> solve whatever that is.
> (snip)
> >> No, his question still stands, you've explained a what, but you haven't
> >> defined a "why you're doing this in the first place". Helping someone
> >> with a lot of work to arrive at the wrong solution isn't helping them.
>
> > Because I must obey. And I need arguments to change my boss point of
> > view. That's it !
>
> Why don't you start there. What is your boss telling you needs to be
> solved. Did he actually say "find a way to contiguously write large
> files using Linux and make sure you map the bad blocks"? If so, you may
> need a boss upgrade. I just applied one and it's quite refreshing to do
> so.
First of all, my boss is great. The only problem is that he is an
electronic engineer. And in this field he does great job. However when
it comes to computer science field, he isn't as good.
He actually said : "raw data on hard disk store contiguously without
filesystem".
I think try to convince him to change his mind is a better thing than
do a boss upgrade.
--
Cyrille CHEVROT
-
Re: How to store big files contiguously on hd
jpd a écrit :
> You've been told by various people, in their own free time, for free,
> with among them quite a lot of experience, that the entire approach is
> technically unsound. If being told is not enough, then it is time to do
> research yourself. Or hire some of these people to tell your boss in
> person that he's wrong.
Now, I had what I was researching : arguments. Now I will try to
convince him.
End of the story.
--
Cyrille CHEVROT