veritas filesystem and directories with large number of files - HP UX

This is a discussion on veritas filesystem and directories with large number of files - HP UX ; Folks, We have directories that contain 1.8million+ files. The filesystems are VERY slow and they are vxfs. Is there a limit on how many files can be in this a vxfs directory before it take a performance hit? thanks, keith...

+ Reply to Thread
Results 1 to 5 of 5

Thread: veritas filesystem and directories with large number of files

  1. veritas filesystem and directories with large number of files

    Folks,

    We have directories that contain 1.8million+ files. The filesystems
    are VERY slow and they are vxfs. Is there a limit on how many files
    can be in this a vxfs directory before it take a performance hit?


    thanks,

    keith

  2. Re: veritas filesystem and directories with large number of files

    In article ,
    codybear writes:
    > Folks,
    >
    > We have directories that contain 1.8million+ files. The filesystems
    > are VERY slow and they are vxfs. Is there a limit on how many files
    > can be in this a vxfs directory before it take a performance hit?
    >
    >
    > thanks,
    >
    > keith

    Hi Keith,

    from experience I'd say around 100000. Our backup's (via Netbackup) have problems if the file count is much higher.

    Yours, Hans Schwengeler

  3. Re: veritas filesystem and directories with large number of files

    On Sun, 10 Aug 2008, keithclay@gmail.com wrote:

    > Folks,
    >
    > We have directories that contain 1.8million+ files. The filesystems
    > are VERY slow and they are vxfs. Is there a limit on how many files
    > can be in this a vxfs directory before it take a performance hit?
    >
    >
    > thanks,
    >
    > keith


    Those are some fairly large directories. :-)

    What version of HP-UX are you running? And what version of VxFS?

    --
    Carl Davidson (carl.davidson@hp.com)
    Hewlett-Packard Company, Cupertino, CA 95014
    You can't please all of the people any of the time.

  4. Re: veritas filesystem and directories with large number of files

    codybear wrote:
    > We have directories that contain 1.8 million+ files. The filesystems
    > are VERY slow and they are vxfs. Is there a limit on how many files
    > can be in this a vxfs directory before it take a performance hit?


    The experts on the ITRC say you should not use the directory structure
    as a database. 2 million is way to much.
    By adding an extra level with 1000 max, you can really reduce the number.

  5. Re: veritas filesystem and directories with large number of files

    codybear writes:

    > Folks,
    >
    > We have directories that contain 1.8million+ files. The filesystems
    > are VERY slow and they are vxfs. Is there a limit on how many files
    > can be in this a vxfs directory before it take a performance hit?


    Hi!

    Which layout version of VxFS do you have? Recent layouts are said to handle
    many files in a directory better. Current layout for VxFS 5 is 7.

    Despite of that a hash-table-based directory lookup will suffer from hash
    clashes. Every hash table has a design length regarding the number of table
    entries and fil name lengths. The more hash collisions you have, the more
    lookups (CPU) is required to look up a name.

    Also, it highly depend how you access a directory:
    A "ll" will read all entries, get attributes of each, and sort those, while a
    "find" will only lookup the names. the fastest access is to directly probe for
    a single file like "test -f a_name".

    The general method to optimzie lookups is to reduce the number of files in a
    directory (to less than 100 I'd suggest). Instead of looking up a file like
    "01234567890" distribute them (assuming the names are equally distributed) in
    a structure like "01/23/45/67/89/0" (i.e. a 5-level directory hierarchy with
    100 entries at most each). You could also use the original file name and
    compute a strong hash like MD5 or SHA-1 to determine the directories. As those
    hases distribute quite equally, you might pick any "digits" for distributing
    the files.

    For example: If your eight files produce these MD5 hashes (fingerprints):
    321d1b34ba06106ad8d15dbd0cff4252
    8643682de5a441e36627fa810c7d5db2
    8fa5e4afb2da5c0e3314c760210d3819
    9c6b1d1b2bbd59b8eaaedbd1c1768a9f
    9d02b629b97700478868c931969ae55a
    9e35af0a9281d878f76cacfaea63ee75
    a345b8cfba5c00ead9cd1c731783645e
    d0b678f97e61987a160d737092a5e3cc

    You could use the first four characters to put your files into
    3/2/1/d/
    8/6/4/3/
    8/f/a/5/
    9/c/6/b/
    9/d/0/2/
    9/e/3/5/
    a/3/4/5/
    d/0/b/6/

    That is 16 entries per directory-level, 65536 "buckets" altogether. If you use
    two characters (like 32/1d/1b/34/) you'll have 256 entries per directory
    (about four billion buckets). With just three two-character levels you'd have
    16 million "buckets".

    So if you have control over the application that creates and accesses those
    files, you could easily implement that. Alternatively you should consider
    using some light-weight database like sleepycat's. I'd advise for the latter,
    because when backing up those files (assuming they are rather short), the
    backup software still has to enumerate them all before deciding whether to
    save them or not.

    Not a HP specialist ;-)

    Regards,
    Ulrich

+ Reply to Thread