very poor ext3 write performance on big filesystems? - Kernel

This is a discussion on very poor ext3 write performance on big filesystems? - Kernel ; I have a 1.2 TB (of which 750 GB is used) filesystem which holds almost 200 millions of files. 1.2 TB doesn't make this filesystem that big, but 200 millions of files is a decent number. Most of the files ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 27

Thread: very poor ext3 write performance on big filesystems?

  1. very poor ext3 write performance on big filesystems?

    I have a 1.2 TB (of which 750 GB is used) filesystem which holds
    almost 200 millions of files.
    1.2 TB doesn't make this filesystem that big, but 200 millions of files
    is a decent number.


    Most of the files are hardlinked multiple times, some of them are
    hardlinked thousands of times.


    Recently I began removing some of unneeded files (or hardlinks) and to
    my surprise, it takes longer than I initially expected.


    After cache is emptied (echo 3 > /proc/sys/vm/drop_caches) I can usually
    remove about 50000-200000 files with moderate performance. I see up to
    5000 kB read/write from/to the disk, wa reported by top is usually 20-70%.


    After that, waiting for IO grows to 99%, and disk write speed is down to
    50 kB/s - 200 kB/s (fifty - two hundred kilobytes/s).


    Is it normal to expect the write speed go down to only few dozens of
    kilobytes/s? Is it because of that many seeks? Can it be somehow
    optimized? The machine has loads of free memory, perhaps it could be
    uses better?


    Also, writing big files is very slow - it takes more than 4 minutes to
    write and sync a 655 MB file (so, a little bit more than 1 MB/s) -
    fragmentation perhaps?

    + dd if=/dev/zero of=testfile bs=64k count=10000
    10000+0 records in
    10000+0 records out
    655360000 bytes (655 MB) copied, 3,12109 seconds, 210 MB/s
    + sync
    0.00user 2.14system 4:06.76elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+883minor)pagefaults 0swaps


    # df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/sda 1,2T 697G 452G 61% /mnt/iscsi_backup

    # df -i
    Filesystem Inodes IUsed IFree IUse% Mounted on
    /dev/sda 154M 20M 134M 13% /mnt/iscsi_backup




    --
    Tomasz Chmielewski
    http://wpkg.org

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: very poor ext3 write performance on big filesystems?

    Tomasz Chmielewski writes:
    >
    > Is it normal to expect the write speed go down to only few dozens of
    > kilobytes/s? Is it because of that many seeks? Can it be somehow
    > optimized?


    I have similar problems on my linux source partition which also
    has a lot of hard linked files (although probably not quite
    as many as you do). It seems like hard linking prevents
    some of the heuristics ext* uses to generate non fragmented
    disk layouts and the resulting seeking makes things slow.

    What has helped a bit was to recreate the file system with -O^dir_index
    dir_index seems to cause more seeks.

    Also keeping enough free space is also a good idea because that
    allows the file system code better choices on where to place data.

    -Andi

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: very poor ext3 write performance on big filesystems?

    On Mon, Feb 18, 2008 at 03:03:44PM +0100, Andi Kleen wrote:
    > Tomasz Chmielewski writes:
    > >
    > > Is it normal to expect the write speed go down to only few dozens of
    > > kilobytes/s? Is it because of that many seeks? Can it be somehow
    > > optimized?

    >
    > I have similar problems on my linux source partition which also
    > has a lot of hard linked files (although probably not quite
    > as many as you do). It seems like hard linking prevents
    > some of the heuristics ext* uses to generate non fragmented
    > disk layouts and the resulting seeking makes things slow.


    ext3 tries to keep inodes in the same block group as their containing
    directory. If you have lots of hard links, obviously it can't really
    do that, especially since we don't have a good way at mkdir time to
    tell the filesystem, "Psst! This is going to be a hard link clone of
    that directory over there, put it in the same block group".

    > What has helped a bit was to recreate the file system with -O^dir_index
    > dir_index seems to cause more seeks.


    Part of it may have simply been recreating the filesystem, not
    necessarily removing the dir_index feature. Dir_index speeds up
    individual lookups, but it slows down workloads that do a readdir
    followed by a stat of all of the files in the workload. You can work
    around this by calling readdir(), sorting all of the entries by inode
    number, and then calling open or stat or whatever. So this can help
    out for workloads that are doing find or rm -r on a dir_index
    workload. Basically, it helps for some things, hurts for others.
    Once things are in the cache it doesn't matter of course.

    The following ld_preload can help in some cases. Mutt has this hack
    encoded in for maildir directories, which helps.

    > Also keeping enough free space is also a good idea because that
    > allows the file system code better choices on where to place data.


    Yep, that too.

    - Ted



  4. Re: very poor ext3 write performance on big filesystems?

    On Mon, Feb 18, 2008 at 09:16:41AM -0500, Theodore Tso wrote:
    > ext3 tries to keep inodes in the same block group as their containing
    > directory. If you have lots of hard links, obviously it can't really
    > do that, especially since we don't have a good way at mkdir time to
    > tell the filesystem, "Psst! This is going to be a hard link clone of
    > that directory over there, put it in the same block group".


    Hmm, you think such a hint interface would be worth it?

    >
    > > What has helped a bit was to recreate the file system with -O^dir_index
    > > dir_index seems to cause more seeks.

    >
    > Part of it may have simply been recreating the filesystem, not


    Undoubtedly.

    > necessarily removing the dir_index feature. Dir_index speeds up
    > individual lookups, but it slows down workloads that do a readdir


    But only for large directories right? For kernel source like
    directory sizes it seems to be a general loss.

    -Andi

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: very poor ext3 write performance on big filesystems?

    Theodore Tso schrieb:

    (...)

    >> What has helped a bit was to recreate the file system with -O^dir_index
    >> dir_index seems to cause more seeks.

    >
    > Part of it may have simply been recreating the filesystem, not
    > necessarily removing the dir_index feature.


    You mean, copy data somewhere else, mkfs a new filesystem, and copy data
    back?

    Unfortunately, doing it on a file level is not possible with a
    reasonable amount of time.

    I tried to copy that filesystem once (when it was much smaller) with
    "rsync -a -H", but after 3 days, rsync was still building an index and
    didn't copy any file.


    Also, as files/hardlinks come and go, it would degrade again.


    Are there better choices than ext3 for a filesystem with lots of
    hardlinks? ext4, once it's ready? xfs?


    --
    Tomasz Chmielewski
    http://wpkg.org
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: very poor ext3 write performance on big filesystems?

    On Mon, Feb 18, 2008 at 04:18:23PM +0100, Andi Kleen wrote:
    > On Mon, Feb 18, 2008 at 09:16:41AM -0500, Theodore Tso wrote:
    > > ext3 tries to keep inodes in the same block group as their containing
    > > directory. If you have lots of hard links, obviously it can't really
    > > do that, especially since we don't have a good way at mkdir time to
    > > tell the filesystem, "Psst! This is going to be a hard link clone of
    > > that directory over there, put it in the same block group".

    >
    > Hmm, you think such a hint interface would be worth it?


    It would definitely help ext2/3/4. An interesting question is whether
    it would help enough other filesystems that's worth adding.

    > > necessarily removing the dir_index feature. Dir_index speeds up
    > > individual lookups, but it slows down workloads that do a readdir

    >
    > But only for large directories right? For kernel source like
    > directory sizes it seems to be a general loss.


    On my todo list is a hack which does the sorting of directory inodes
    by inode number inside the kernel for smallish directories (say, less
    than 2-3 blocks) where using the kernel memory space to store the
    directory entries is acceptable, and which would speed up dir_index
    performance for kernel source-like directory sizes --- without needing
    to use the spd_readdir LD_PRELOAD hack.

    But yes, right now, if you know that your directories are almost
    always going to be kernel source like in size, then omitting dir_index
    is probably goint to be a good idea.

    - Ted
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: very poor ext3 write performance on big filesystems?

    On Mon, Feb 18, 2008 at 04:02:36PM +0100, Tomasz Chmielewski wrote:
    > I tried to copy that filesystem once (when it was much smaller) with "rsync
    > -a -H", but after 3 days, rsync was still building an index and didn't copy
    > any file.


    If you're going to copy the whole filesystem don't use rsync! Use cp
    or a tar pipeline to move the files.

    > Also, as files/hardlinks come and go, it would degrade again.


    Yes...

    > Are there better choices than ext3 for a filesystem with lots of hardlinks?
    > ext4, once it's ready? xfs?


    All filesystems are going to have problems keeping inodes close to
    directories when you have huge numbers of hard links.

    I'd really need to know exactly what kind of operations you were
    trying to do that were causing problems before I could say for sure.
    Yes, you said you were removing unneeded files, but how were you doing
    it? With rm -r of old hard-linked directories? How big are the
    average files involved? Etc.

    - Ted
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: very poor ext3 write performance on big filesystems?

    On Mon, Feb 18, 2008 at 10:16:32AM -0500, Theodore Tso wrote:
    > On Mon, Feb 18, 2008 at 04:02:36PM +0100, Tomasz Chmielewski wrote:
    > > I tried to copy that filesystem once (when it was much smaller) with "rsync
    > > -a -H", but after 3 days, rsync was still building an index and didn't copy
    > > any file.

    >
    > If you're going to copy the whole filesystem don't use rsync!


    Yes, I managed to kill systems (drive them really badly into oom and
    get very long swap storms) with rsync -H in the past too. Something is very
    wrong with the rsync implementation of this.

    > Use cp
    > or a tar pipeline to move the files.


    Are you sure cp handles hardlinks correctly? I know tar does,
    but I have my doubts about cp.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: very poor ext3 write performance on big filesystems?

    On Mon, Feb 18, 2008 at 04:57:25PM +0100, Andi Kleen wrote:
    > > Use cp
    > > or a tar pipeline to move the files.

    >
    > Are you sure cp handles hardlinks correctly? I know tar does,
    > but I have my doubts about cp.


    I *think* GNU cp does the right thing with --preserve=links. I'm not
    100% sure, though --- like you, probably, I always use tar for moving
    or copying directory hierarchies.

    - Ted
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: very poor ext3 write performance on big filesystems?

    Theodore Tso schrieb:

    >> Are there better choices than ext3 for a filesystem with lots of hardlinks?
    >> ext4, once it's ready? xfs?

    >
    > All filesystems are going to have problems keeping inodes close to
    > directories when you have huge numbers of hard links.
    >
    > I'd really need to know exactly what kind of operations you were
    > trying to do that were causing problems before I could say for sure.
    > Yes, you said you were removing unneeded files, but how were you doing
    > it? With rm -r of old hard-linked directories?


    Yes, with rm -r.


    > How big are the
    > average files involved? Etc.


    It's hard to estimate the average size of a file. I'd say there are not
    many files bigger than 50 MB.

    Basically, it's a filesystem where backups are kept. Backups are made
    with BackupPC [1].

    Imagine a full rootfs backup of 100 Linux systems.

    Instead of compressing and writing "/bin/bash" 100 times for each
    separate system, we do it once, and hardlink. Then, keep 40 copies back,
    and you have 4000 hardlinks.

    For individual or user files, the number of hardlinks will be smaller of
    course.

    The directories I want to remove have usually a structure of a "normal"
    Linux rootfs, nothing special there (other than most of the files will
    have multiple hardlinks).


    I noticed using write back helps a tiny bit, but as dm and md don't
    support write barriers, I'm not very eager to use it.


    [1] http://backuppc.sf.net
    http://backuppc.sourceforge.net/faq/..._design_issues



    --
    Tomasz Chmielewski
    http://wpkg.org

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: very poor ext3 write performance on big filesystems?

    On Mon, Feb 18, 2008 at 05:16:55PM +0100, Tomasz Chmielewski wrote:
    > Theodore Tso schrieb:
    >
    >> I'd really need to know exactly what kind of operations you were
    >> trying to do that were causing problems before I could say for sure.
    >> Yes, you said you were removing unneeded files, but how were you doing
    >> it? With rm -r of old hard-linked directories?

    >
    > Yes, with rm -r.


    You should definitely try the spd_readdir hack; that will help reduce
    the seek times. This will probably help on any block group oriented
    filesystems, including XFS, etc.

    >> How big are the
    >> average files involved? Etc.

    >
    > It's hard to estimate the average size of a file. I'd say there are not
    > many files bigger than 50 MB.


    Well, Ext4 will help for files bigger than 48k.

    The other thing that might help for you is using an external journal
    on a separate hard drive (either for ext3 or ext4). That will help
    alleviate some of the seek storms going on, since the journal is
    written to only sequentially, and putting it on a separate hard drive
    will help remove some of the contention on the hard drive.

    I assume that your 1.2 TB filesystem is located on a RAID array; did
    you use the mke2fs -E stride option to make sure all of the bitmaps
    don't get concentrated on one hard drive spindle? One of the failure
    modes which can happen is if you use a 4+1 raid 5 setup, that all of
    the block and inode bitmaps can end up getting laid out on a single
    hard drive, so it becomes a bottleneck for bitmap intensive workloads
    --- including "rm -rf". So that's another thing that might be going
    on. If you do a "dumpe2fs", and look at the block numbers for the
    block and inode allocation bitmaps, and you find that they are are all
    landing on the same physical hard drive, then that's very clearly the
    biggest problem given an "rm -rf" workload. You should be able to see
    this as well visually; if one hard drive has its hard drive light
    almost constantly on, and the other ones don't have much activity,
    that's probably what is happening.

    - Ted
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: very poor ext3 write performance on big filesystems?

    Tomasz Chmielewski wrote:
    > I have a 1.2 TB (of which 750 GB is used) filesystem which holds
    > almost 200 millions of files.
    > 1.2 TB doesn't make this filesystem that big, but 200 millions of files
    > is a decent number.
    >
    >
    > Most of the files are hardlinked multiple times, some of them are
    > hardlinked thousands of times.
    >
    >
    > Recently I began removing some of unneeded files (or hardlinks) and to
    > my surprise, it takes longer than I initially expected.
    >
    >
    > After cache is emptied (echo 3 > /proc/sys/vm/drop_caches) I can usually
    > remove about 50000-200000 files with moderate performance. I see up to
    > 5000 kB read/write from/to the disk, wa reported by top is usually 20-70%.
    >
    >
    > After that, waiting for IO grows to 99%, and disk write speed is down to
    > 50 kB/s - 200 kB/s (fifty - two hundred kilobytes/s).
    >
    >
    > Is it normal to expect the write speed go down to only few dozens of
    > kilobytes/s? Is it because of that many seeks? Can it be somehow
    > optimized? The machine has loads of free memory, perhaps it could be
    > uses better?
    >
    >
    > Also, writing big files is very slow - it takes more than 4 minutes to
    > write and sync a 655 MB file (so, a little bit more than 1 MB/s) -
    > fragmentation perhaps?


    It would be really interesting if you try your workload with XFS. In my
    experience, XFS considerably outperforms ext3 on big (> few hundreds MB)
    disks.

    Vlad
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: very poor ext3 write performance on big filesystems?

    Theodore Tso schrieb:

    (...)

    > The following ld_preload can help in some cases. Mutt has this hack
    > encoded in for maildir directories, which helps.


    It doesn't work very reliable for me.

    For some reason, it hangs for me sometimes (doesn't remove any files, rm
    -rf just stalls), or segfaults.


    As most of the ideas here in this thread assume (re)creating a new
    filesystem from scratch - would perhaps playing with
    /proc/sys/vm/dirty_ratio and /proc/sys/vm/dirty_background_ratio help a bit?


    --
    Tomasz Chmielewski
    http://wpkg.org
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: very poor ext3 write performance on big filesystems?

    On Tuesday 19 February 2008, Tomasz Chmielewski wrote:
    > Theodore Tso schrieb:
    >
    > (...)
    >
    > > The following ld_preload can help in some cases. Mutt has this hack
    > > encoded in for maildir directories, which helps.

    >
    > It doesn't work very reliable for me.
    >
    > For some reason, it hangs for me sometimes (doesn't remove any files, rm
    > -rf just stalls), or segfaults.


    You can go the low-tech route (assuming your file names don't have spaces in
    them)

    find . -printf "%i %p\n" | sort -n | awk '{print $2}' | xargs rm

    >
    >
    > As most of the ideas here in this thread assume (re)creating a new
    > filesystem from scratch - would perhaps playing with
    > /proc/sys/vm/dirty_ratio and /proc/sys/vm/dirty_background_ratio help a
    > bit?


    Probably not. You're seeking between all the inodes on the box, and probably
    not bound by the memory used.

    -chris
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: very poor ext3 write performance on big filesystems?

    Chris Mason schrieb:
    > On Tuesday 19 February 2008, Tomasz Chmielewski wrote:
    >> Theodore Tso schrieb:
    >>
    >> (...)
    >>
    >>> The following ld_preload can help in some cases. Mutt has this hack
    >>> encoded in for maildir directories, which helps.

    >> It doesn't work very reliable for me.
    >>
    >> For some reason, it hangs for me sometimes (doesn't remove any files, rm
    >> -rf just stalls), or segfaults.

    >
    > You can go the low-tech route (assuming your file names don't have spaces in
    > them)
    >
    > find . -printf "%i %p\n" | sort -n | awk '{print $2}' | xargs rm


    Why should it make a difference?

    Does "find" find filenames/paths faster than "rm -r"?

    Or is "find once/remove once" faster than "find files/rm files/find
    files/rm files/...", which I suppose "rm -r" does?


    --
    Tomasz Chmielewski
    http://wpkg.org
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: very poor ext3 write performance on big filesystems?

    On Tuesday 19 February 2008, Tomasz Chmielewski wrote:
    > Chris Mason schrieb:
    > > On Tuesday 19 February 2008, Tomasz Chmielewski wrote:
    > >> Theodore Tso schrieb:
    > >>
    > >> (...)
    > >>
    > >>> The following ld_preload can help in some cases. Mutt has this hack
    > >>> encoded in for maildir directories, which helps.
    > >>
    > >> It doesn't work very reliable for me.
    > >>
    > >> For some reason, it hangs for me sometimes (doesn't remove any files, rm
    > >> -rf just stalls), or segfaults.

    > >
    > > You can go the low-tech route (assuming your file names don't have spaces
    > > in them)
    > >
    > > find . -printf "%i %p\n" | sort -n | awk '{print $2}' | xargs rm

    >
    > Why should it make a difference?


    It does something similar to Ted's ld preload, sorting the results from
    readdir by inode number before using them. You will still seek quite a lot
    between the directory entries, but operations on the files themselves will go
    in a much more optimal order. It might help.

    >
    > Does "find" find filenames/paths faster than "rm -r"?
    >
    > Or is "find once/remove once" faster than "find files/rm files/find
    > files/rm files/...", which I suppose "rm -r" does?


    rm -r does removes things in the order that readdir returns. In your hard
    linked tree (on almost any FS), this will be very random. The sorting is
    probably the best you can do from userland to optimize the ordering.

    -chris

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: very poor ext3 write performance on big filesystems?

    Theodore Tso wrote:
    ...
    > The following ld_preload can help in some cases. Mutt has this hack
    > encoded in for maildir directories, which helps.

    ...

    Oddly enough, that same spd_readdir() preload craps out here too
    when used with "rm -r" on largish directories.

    I added a bit more debugging to it, and it always craps out like this:

    opendir dir=0x805ad10((nil))
    Readdir64 dir=0x805ad10 pos=0/289/290
    Readdir64 dir=0x805ad10 pos=1/289/290
    Readdir64 dir=0x805ad10 pos=2/289/290
    Readdir64 dir=0x805ad10 pos=3/289/290
    Readdir64 dir=0x805ad10 pos=4/289/290
    ...
    Readdir64 dir=0x805ad10 pos=287/289/290
    Readdir64 dir=0x805ad10 pos=288/289/290
    Readdir64 dir=0x805ad10 pos=289/289/290
    Readdir64 dir=0x805ad10 pos=0/289/290
    Readdir64: dirstruct->dp=(nil)
    Readdir64: ds=(nil)
    Segmentation fault (core dumped)


    Always. The "rm -r" loops over the directory, as show above,
    and then tries to re-access entry 0 somehow, at which point
    it discovers that it's been NULLed out.

    Which is weird, because the local seekdir() was never called,
    and the code never zeroed/freed that memory itself
    (I've got printfs in there..).

    Nulling out the qsort has no effect, and smaller/larger
    ALLOC_STEPSIZE values don't seem to matter.

    But.. when the entire tree is in RAM (freshly unpacked .tar),
    it seems to have no problems with it. As opposed to an uncached tree.

    Peculiar.. I wonder where the bug is ?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: very poor ext3 write performance on big filesystems?

    Mark Lord wrote:
    > Theodore Tso wrote:
    > ..
    >> The following ld_preload can help in some cases. Mutt has this hack
    >> encoded in for maildir directories, which helps.

    > ..
    >
    > Oddly enough, that same spd_readdir() preload craps out here too
    > when used with "rm -r" on largish directories.
    >
    > I added a bit more debugging to it, and it always craps out like this:
    > opendir dir=0x805ad10((nil))
    > Readdir64 dir=0x805ad10 pos=0/289/290
    > Readdir64 dir=0x805ad10 pos=1/289/290
    > Readdir64 dir=0x805ad10 pos=2/289/290
    > Readdir64 dir=0x805ad10 pos=3/289/290
    > Readdir64 dir=0x805ad10 pos=4/289/290
    > ...
    > Readdir64 dir=0x805ad10 pos=287/289/290
    > Readdir64 dir=0x805ad10 pos=288/289/290
    > Readdir64 dir=0x805ad10 pos=289/289/290
    > Readdir64 dir=0x805ad10 pos=0/289/290
    > Readdir64: dirstruct->dp=(nil)
    > Readdir64: ds=(nil)
    > Segmentation fault (core dumped)
    >
    > Always. The "rm -r" loops over the directory, as show above,
    > and then tries to re-access entry 0 somehow, at which point
    > it discovers that it's been NULLed out.
    >
    > Which is weird, because the local seekdir() was never called,
    > and the code never zeroed/freed that memory itself
    > (I've got printfs in there..).
    >
    > Nulling out the qsort has no effect, and smaller/larger
    > ALLOC_STEPSIZE values don't seem to matter.
    >
    > But.. when the entire tree is in RAM (freshly unpacked .tar),
    > it seems to have no problems with it. As opposed to an uncached tree.

    ...

    I take back that last point -- it also fails even when the tree *is* cached.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: very poor ext3 write performance on big filesystems?

    Mark Lord wrote:
    > Theodore Tso wrote:
    > ..
    >> The following ld_preload can help in some cases. Mutt has this hack
    >> encoded in for maildir directories, which helps.

    > ..
    >
    > Oddly enough, that same spd_readdir() preload craps out here too
    > when used with "rm -r" on largish directories.


    From looking at the code, I think I've found at least one bug in opendir:
    ....
    > dnew = realloc(dirstruct->dp,
    > dirstruct->max * sizeof(struct dir_s));

    ....

    Shouldn't this be: "...*sizeof(struct dirent_s));"?

    --
    Paulo Marques - www.grupopie.com

    "Nostalgia isn't what it used to be."
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: very poor ext3 write performance on big filesystems?

    Paulo Marques wrote:
    > Mark Lord wrote:
    >> Theodore Tso wrote:
    >> ..
    >>> The following ld_preload can help in some cases. Mutt has this hack
    >>> encoded in for maildir directories, which helps.

    >> ..
    >>
    >> Oddly enough, that same spd_readdir() preload craps out here too
    >> when used with "rm -r" on largish directories.

    >
    > From looking at the code, I think I've found at least one bug in opendir:
    > ...
    >> dnew = realloc(dirstruct->dp,
    >> dirstruct->max * sizeof(struct dir_s));

    > ...
    >
    > Shouldn't this be: "...*sizeof(struct dirent_s));"?

    ...

    Yeah, that's one bug.
    Another is that ->fd is frequently left uninitialized, yet later used.

    Fixing those didn't change the null pointer deaths, though.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 2 1 2 LastLast