Re: Too many files in one directory (again) - VMS

This is a discussion on Re: Too many files in one directory (again) - VMS ; From: Hein RMS van den Heuvel > On Mar 20, 10:19=A0pm, s...@antinode.org (Steven M. Schweda) wrote: > : > > =A0 =A0I just pass this along as a reminder that there's still some room > > for improvement in dealing ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: Re: Too many files in one directory (again)

  1. Re: Too many files in one directory (again)

    From: Hein RMS van den Heuvel

    > On Mar 20, 10:19=A0pm, s...@antinode.org (Steven M. Schweda) wrote:
    > :
    > > =A0 =A0I just pass this along as a reminder that there's still some room
    > > for improvement in dealing with cases like this which, bad design or
    > > not, don't cause nearly so much trouble on other operating systems.

    >
    > Agree with the sentiment.
    > If it is caused by directory IO, the you may be able to improve the
    > speed
    > by setting sysgen param ACP_MAXREAD to its max of 64.
    > ( $ mcr sysgen show acp_maxread )


    Currently 32 ("Default")

    > And the earlier directory pre-allocation is a good hint, if you know
    > what's coming.


    Currently about 22000 blocks, growing by one block about every four
    seconds or so.

    > But why would an end user need to worry about that in the firs place.


    Yeah, you'd like to think that if fancy tuning were so critical, that
    it'd get done automatically. Even growing the allocation by, say, 25%
    instead of one block might pay off without a big penalty (if multiple
    small allocations were really a time consumer). MONI /SYST shows a
    Direct I/O Rate of about 1900/s, with about 47% of the CPU busy (36% for
    the working process). On the bright side, with 4GB of memory (and
    nothing else to do), the Page Fault Rate is a steady zero.

    > And it will still take hours!


    Or, perhaps, days. It's over 130000 files this morning, though, so
    there may be some hope.

    > Are we sure it is directory IO, or just the file creates themself?
    > If the files are entered IN ORDER, then the directory adds would NOT
    > be the
    > biggest cost. It would be the INDEXF.SYS IO + data itself.


    As I recall, UNIX "tar" is generally not very reliable on file order,
    but in this archive the files seem to be pretty well ordered.

    > If the files are NOT in order and largely to a single directly, and if
    > this sort
    > of thing needs to happen frequently, and I was paid very well or had
    > nothing
    > better to do, then I would:
    > [...]


    It's not the only disqualifier there, but "paid" does stand out.

    ------------------------------------------------------------------------

    Steven M. Schweda sms@antinode-org
    382 South Warwick Street (+1) 651-699-9818
    Saint Paul MN 55105-2547

  2. Re: Too many files in one directory (again)

    Steven M. Schweda wrote:

    >> And the earlier directory pre-allocation is a good hint, if you know
    >> what's coming.

    >
    > Currently about 22000 blocks, growing by one block about every four
    > seconds or so.


    I would think that a pre-allocated .DIR file would have
    helped here. It would be interesting to see how those
    22000 blocks are allocated (cont?, number of frags? and
    so on)... :-)

    And hope you don't hit SYSTEM-F-HEADERFULL with just a few
    files left... :-)

    How large/small is each individual file ?
    Would it be possible (with 4 Gb available) to
    create a DECram disk and run the un-tar against
    that ?

    Jan-Erik.

  3. Re: Too many files in one directory (again)

    Jan-Erik Söderholm wrote:
    > Steven M. Schweda wrote:
    >
    >>> And the earlier directory pre-allocation is a good hint, if you know
    >>> what's coming.

    >>
    >> Currently about 22000 blocks, growing by one block about every four
    >> seconds or so.

    >
    > I would think that a pre-allocated .DIR file would have
    > helped here. It would be interesting to see how those
    > 22000 blocks are allocated (cont?, number of frags? and
    > so on)... :-)


    Jan-Erik, it's a directory, so it's contiguous, *and* it
    only has one extent :-)

    The pre-allocated, humungously-sized directory would have
    sped things up considerably.

  4. Re: Too many files in one directory (again)

    R.A.Omond wrote:
    > Jan-Erik Söderholm wrote:
    >> Steven M. Schweda wrote:
    >>
    >>>> And the earlier directory pre-allocation is a good hint, if you know
    >>>> what's coming.
    >>>
    >>> Currently about 22000 blocks, growing by one block about every four
    >>> seconds or so.

    >>
    >> I would think that a pre-allocated .DIR file would have
    >> helped here. It would be interesting to see how those
    >> 22000 blocks are allocated (cont?, number of frags? and
    >> so on)... :-)

    >
    > Jan-Erik, it's a directory, so it's contiguous, *and* it
    > only has one extent :-)


    OK, ok... :-)
    Then I guess one could have other potential problems with no
    space to extend the DIR file (or to move it to another place
    in whole, if it deas that at all, I'm unsure there...).

    > The pre-allocated, humungously-sized directory would have
    > sped things up considerably.


    Yes, and a pre-allocated DIR file (and a correct /HEADERS=n on INIT)
    makes a lot of potential fault cases go away. Right now, at the
    end of the un-tar, it can break for a number of reasons... :-)

    Jan-Erik.

  5. Re: Too many files in one directory (again)

    On Mar 21, 9:55*am, s...@antinode.org (Steven M. Schweda) wrote:
    > From: Hein RMS van den Heuvel
    >
    > > On Mar 20, 10:19=A0pm, s...@antinode.org (Steven M. Schweda) wrote:
    > > :
    > > > =A0 =A0I just pass this along as a reminder that there's still some room
    > > > for improvement in dealing with cases like this which, bad design or
    > > > not, don't cause nearly so much trouble on other operating systems.


    > * *Yeah, you'd like to think that if fancy tuning were so critical, that
    > it'd get done automatically. *Even growing the allocation by, say, 25%
    > instead of one block might pay off without a big penalty


    It will be growing by at least a disk cluster at a time.
    I don't think it'll honor the SET RMS/EXT

    >*MONI /SYST shows a Direct I/O Rate of about 1900/s


    MONI FILE would be interesting

    > > And it will still take hours!

    > Currently about 22000 blocks, growing by one block about every four

    seconds or so.

    Well with the file names as per example, each entry will be about 50
    bytes.
    ( $ dump/dir/blo=count=1 )

    So 10 per block and 10 per 4 second.
    By that estimation alone it should take:
    $ write sys$output (190000*4/10)
    76000
    seconds or
    $ write sys$output (190000*4/10)/3600
    21
    hours

    > Or, perhaps, days. It's over 130000 files this morning, though, so
    > there may be some hope.


    Sounds like you are right on track!

    btw, do a semi random:
    $ pipe dump/dir/blo=(star=10000,count=10) bad.dir | searc sys$pipe
    "End of records"

    If those -1's are hovering around 0x0100 then random inserts are
    happening
    If they are more around 0x01C0 then the blocks are packed, suggesting
    a series of ordered inserts in the sampled zone.

    To be more precise:

    $ perl -le "foreach (qx(dump/dir [-]hein.dir)){if (/^0(\w+) End/){$c+
    +; $t+=hex($1)}} print $t/$c"
    228
    $ mcr dfu directory/comp [-]hein.dir
    %DFU ... HEIN.DIR;1 : 805 files; was : 66/81, now : 32/81 blocks
    $ perl -le "foreach (qx(dump/dir [-]hein.dir)){if (/^0(\w+) End/){$c+
    +; $t+=hex($1)}} print $t/$c"
    470.25


    I would also suggest a
    $ pipe dump/header/bloc=count=0 bad.dir | searc sys$pipe
    lbn,Allocated

    A changing LBN will show you the directory being re-allocated and thus
    copied over 32 blocks at a time to its new place. That's 700+
    additional reads and as many extra writes every time it moves.
    That would not explain 1900 IO/sec.
    An average file being inserted in the middle of a directory, causing a
    full block every 10th time or so (depending on split point), could
    just about explain that.

    Hein.

  6. Re: Too many files in one directory (again)

    Prior to issuing the TAR command to create 190,000 files, did you just
    CREATE/DIRECTORY or did you CREATE/DIRECTORY/ALLOCATION=xxxxxxx ?

    Would this have made a huge difference in the file creation speed ?

    Would it make sense to SET FILE mydir.dir;1/global=5000 (or whatever) ?

+ Reply to Thread