Too many files in one directory (again) - VMS

This is a discussion on Too many files in one directory (again) - VMS ; I've heard the lecture(s) before, so please spare us all a repeat, but I recently had occasion to (try to) unpack a "tar" archive which wants to create about 190000 files in one directory. On an HP PA-RISC workstation c3700 ...

+ Reply to Thread
Results 1 to 16 of 16

Thread: Too many files in one directory (again)

  1. Too many files in one directory (again)

    I've heard the lecture(s) before, so please spare us all a repeat,
    but I recently had occasion to (try to) unpack a "tar" archive which
    wants to create about 190000 files in one directory. On an HP PA-RISC
    workstation c3700 running HP-UX 11.11 it took about 35 minutes. On an
    HP IA64 workstation zx2000 running VMS V8.3-1H1, it's about eight hours
    into the VMSTAR job, it hasn't created half the files yet, and it does
    not seem to be getting faster as it goes.

    The file names all look like "020989f4d6c2f32768d0535c1815344d.zip",
    "11509dd158a696797eca5700f902ce03.zip", and so on.

    I just pass this along as a reminder that there's still some room
    for improvement in dealing with cases like this which, bad design or
    not, don't cause nearly so much trouble on other operating systems.

    ------------------------------------------------------------------------

    Steven M. Schweda sms@antinode-org
    382 South Warwick Street (+1) 651-699-9818
    Saint Paul MN 55105-2547

  2. Re: Too many files in one directory (again)

    Steven M. Schweda wrote:
    > I've heard the lecture(s) before, so please spare us all a repeat,
    > but I recently had occasion to (try to) unpack a "tar" archive which
    > wants to create about 190000 files in one directory. On an HP PA-RISC
    > workstation c3700 running HP-UX 11.11 it took about 35 minutes. On an
    > HP IA64 workstation zx2000 running VMS V8.3-1H1, it's about eight hours
    > into the VMSTAR job, it hasn't created half the files yet, and it does
    > not seem to be getting faster as it goes.
    >


    Cheer up, it will get slower as it goes! 190,000 files in a directory
    seems rather bizarre no matter what O/S you are talking about.


  3. Re: Too many files in one directory (again)

    Steven M. Schweda wrote:
    > I've heard the lecture(s) before, so please spare us all a repeat,
    > but I recently had occasion to (try to) unpack a "tar" archive which
    > wants to create about 190000 files in one directory. On an HP PA-RISC
    > workstation c3700 running HP-UX 11.11 it took about 35 minutes. On an
    > HP IA64 workstation zx2000 running VMS V8.3-1H1, it's about eight hours
    > into the VMSTAR job, it hasn't created half the files yet, and it does
    > not seem to be getting faster as it goes.
    >
    > The file names all look like "020989f4d6c2f32768d0535c1815344d.zip",
    > "11509dd158a696797eca5700f902ce03.zip", and so on.
    >
    > I just pass this along as a reminder that there's still some room
    > for improvement in dealing with cases like this which,...


    And some of them are under your control, such as using a large
    enough /ALLOCATION=n on the CRE/DIR command. I guess that it
    would also help to INIT the device with a large enought
    /HEADERS=n to begin with.

    Even if 190.000 files "works" on VMS, it might not be what
    VMS was primarily designed for...

    Jan-Erik.

  4. RE: Too many files in one directory (again)


    > -----Original Message-----
    > From: Steven M. Schweda [mailto:sms@antinode.org]
    > Sent: March 20, 2008 10:19 PM
    > To: Info-VAX@Mvb.Saic.Com
    > Subject: Too many files in one directory (again)
    >
    > I've heard the lecture(s) before, so please spare us all a repeat,
    > but I recently had occasion to (try to) unpack a "tar" archive which
    > wants to create about 190000 files in one directory. On an HP PA-RISC
    > workstation c3700 running HP-UX 11.11 it took about 35 minutes. On an
    > HP IA64 workstation zx2000 running VMS V8.3-1H1, it's about eight hours
    > into the VMSTAR job, it hasn't created half the files yet, and it does
    > not seem to be getting faster as it goes.
    >
    > The file names all look like "020989f4d6c2f32768d0535c1815344d.zip",
    > "11509dd158a696797eca5700f902ce03.zip", and so on.
    >
    > I just pass this along as a reminder that there's still some room
    > for improvement in dealing with cases like this which, bad design or
    > not, don't cause nearly so much trouble on other operating systems.
    >
    > -----------------------------------------------------------------------
    > -
    >
    > Steven M. Schweda sms@antinode-org
    > 382 South Warwick Street (+1) 651-699-9818
    > Saint Paul MN 55105-2547


    Just a WAG, but since you are doing mostly write activities, can we assume
    That you have removed disk highwater marking and set OpenVMS to the file
    system default That UNIX uses (write back) vs OpenVMS's file system default
    (write through)?

    Regards

    Kerry Main
    Senior Consultant
    HP Services Canada
    Voice: 613-254-8911
    Fax: 613-591-4477
    kerryDOTmainAThpDOTcom
    (remove the DOT's and AT)

    OpenVMS - the secure, multi-site OS that just works.




  5. Re: Too many files in one directory (again)

    "Steven M. Schweda" wrote:
    >
    > I've heard the lecture(s) before, so please spare us all a repeat,
    > but I recently had occasion to (try to) unpack a "tar" archive which
    > wants to create about 190000 files in one directory. On an HP PA-RISC
    > workstation c3700 running HP-UX 11.11 it took about 35 minutes. On an
    > HP IA64 workstation zx2000 running VMS V8.3-1H1, it's about eight hours
    > into the VMSTAR job, it hasn't created half the files yet, and it does
    > not seem to be getting faster as it goes.
    >
    > The file names all look like "020989f4d6c2f32768d0535c1815344d.zip",
    > "11509dd158a696797eca5700f902ce03.zip", and so on.


    Even (especially) if they are not in sequence, pre-allocating the
    directory would likely have been a big help here. Knowing that
    190,000(!!!) files were coming in, I'd have pre-allocated the directory
    to 190000 blocks to prevent the system having to find a new contiguous
    extent every time it needs to extend. I could always SET FILE/TRUNCATE
    it later, if needs be.

    > I just pass this along as a reminder that there's still some room
    > for improvement in dealing with cases like this which, bad design or
    > not, don't cause nearly so much trouble on other operating systems.


    This would cause a different set of problems on other systems, I should
    think.

    David J Dachtera
    (formerly dba) DJE Systems

  6. Re: Too many files in one directory (again)

    Out of curiosity, why would creating 190,000 files in a VMS directory be
    significantly slower than on Unix ?

    What does Unix do (or not do) that makes such a difference ?

  7. Re: Too many files in one directory (again)

    In a situation where one utility (TAR) is creating a zillion files,
    would it make a big difference to have the following:


    TAR unpacks in mydir1.dir
    You create mydir2 through mydir200

    You have a separate process which runs say every minute, and does a
    RENAME [.mydir1]*.* [.mydirX] where X increases with every run and goes
    back to 2 after 200.

    This way, all directories would be of more manageable size, and TAR
    would never need to add files to a humoungous directory.

  8. Re: Too many files in one directory (again)

    JF Mezei wrote:
    > Out of curiosity, why would creating 190,000 files in a VMS directory be
    > significantly slower than on Unix ?
    >
    > What does Unix do (or not do) that makes such a difference ?


    Unix does not store the directory entries in ascending alphanumeric
    order! The VMS developers, rightly or wrongly, assumed some degree of
    sanity in the users. I cannot imagine keeping 190,000 files in a
    directory. Performance, with a mere 70,000 files in a directory, is
    something I wish I could forget; I once had the dubious privilege of
    cleaning up such a mess! Without some help from DFU, I might still be
    deleting files, ten years later!!

    A sane VMS user, rather than create 190,000 files, might consider an
    indexed sequential file to hold the information. Unix, of course, since
    it lacks the very concept of "records" does not offer indexed sequential
    files or access records by key. The only operating systems that do, to
    my knowledge, are VMS and IBM O/S 360/370/whatever came after that.


  9. RE: Too many files in one directory (again)

    > -----Original Message-----
    > From: JF Mezei [mailto:jfmezei.spamnot@vaxination.ca]
    > Sent: March 21, 2008 2:18 PM
    > To: Info-VAX@Mvb.Saic.Com
    > Subject: Re: Too many files in one directory (again)
    >
    > Out of curiosity, why would creating 190,000 files in a VMS directory
    > be
    > significantly slower than on Unix ?
    >
    > What does Unix do (or not do) that makes such a difference ?


    May not be specific to this issue, but a few things in general come to
    mind, especially with high write activities:

    - different design philosophy using indexed files vs individual files
    - default system and process parameters are often much different
    - default write philosophy - UNIX write through (speed) vs OpenVMS
    write back (data safety). You can set OpenVMS default for some scenarios
    to be write back with RMS_SEQFILE_WBH (sysgen> help sys_p RMS_SEQFILE_WBH)
    - disk high-water marking is security default on OpenVMS i.e. zero blocks
    before allocating to the process requesting additional space

    Regards

    Kerry Main
    Senior Consultant
    HP Services Canada
    Voice: 613-254-8911
    Fax: 613-591-4477
    kerryDOTmainAThpDOTcom
    (remove the DOT's and AT)

    OpenVMS - the secure, multi-site OS that just works.




  10. Re: Too many files in one directory (again)

    Richard B. Gilbert wrote:

    > Unix does not store the directory entries in ascending alphanumeric
    > order! The VMS developers, rightly or wrongly, assumed some degree of
    > sanity in the users.


    So when I do an "ls" in Unix, it does an in memory sort of the directory
    before listing it ?

    I guess it was a case of providing better performance for DIR vs CREATE
    while Unix provides better performance for CREATE vs DIR

  11. Re: Too many files in one directory (again)

    In article <08032021190335_2020CE0B@antinode.org>, sms@antinode.org (Steven M. Schweda) writes:
    > I've heard the lecture(s) before, so please spare us all a repeat,
    > but I recently had occasion to (try to) unpack a "tar" archive which
    > wants to create about 190000 files in one directory. On an HP PA-RISC
    > workstation c3700 running HP-UX 11.11 it took about 35 minutes. On an
    > HP IA64 workstation zx2000 running VMS V8.3-1H1, it's about eight hours
    > into the VMSTAR job, it hasn't created half the files yet, and it does
    > not seem to be getting faster as it goes.
    >
    > The file names all look like "020989f4d6c2f32768d0535c1815344d.zip",
    > "11509dd158a696797eca5700f902ce03.zip", and so on.


    Can you put a listing in a file using tar -t and then break down
    the output into multiple tar passes into multiple directories?

    I sure think I'd try that if I were in your situation. A little
    time editing to create a script might save a great many hours.


  12. Re: Too many files in one directory (again)

    On Fri, 21 Mar 2008, Richard B. Gilbert wrote:

    > A sane VMS user, rather than create 190,000 files, might consider an
    > indexed sequential file to hold the information. Unix, of course, since
    > it lacks the very concept of "records" does not offer indexed sequential
    > files or access records by key. The only operating systems that do, to
    > my knowledge, are VMS and IBM O/S 360/370/whatever came after that.


    I believe the OS HP most recently deprecated, MPE/IX, also featured
    similar record formats. I encountered this when editing a system file
    with MPE's, port of "vi" (which I was more comfortable with than the
    EDIT/EDT style editor MPE uses by default) and ended up corrupting the
    file requiring the MPE equivant of CONVERT/FDL to put it back.


  13. Re: Too many files in one directory (again)

    On Mar 21, 2:00 pm, JF Mezei wrote:
    > Richard B. Gilbert wrote:
    > > Unix does not store the directory entries in ascending alphanumeric
    > > order! The VMS developers, rightly or wrongly, assumed some degree of
    > > sanity in the users.

    >
    > So when I do an "ls" in Unix, it does an in memory sort of the directory
    > before listing it ?


    It must, I would think.

    >
    > I guess it was a case of providing better performance for DIR vs CREATE
    > while Unix provides better performance for CREATE vs DIR


    Long ago I asked about the point of keeping directory entries in
    alphabetical order and Carl answered it by saying it speeds up finding
    a given file in a directory because one can then do a binary search.
    So it's more than just DIR. Also, it's more than just CREATE: DELETE,
    RENAME, COPY, etc.

    AEF

  14. Re: Too many files in one directory (again)

    JF Mezei wrote:
    > Richard B. Gilbert wrote:
    >
    >> Unix does not store the directory entries in ascending alphanumeric
    >> order! The VMS developers, rightly or wrongly, assumed some degree of
    >> sanity in the users.

    >
    > So when I do an "ls" in Unix, it does an in memory sort of the directory
    > before listing it ?


    Yes. you can ask 'ls' to not sort, using '-f', if you prefer.

    >
    > I guess it was a case of providing better performance for DIR vs CREATE
    > while Unix provides better performance for CREATE vs DIR


    Unix usually tries for the simplest/fastest solution to a problem.
    It leaves the complex stuff for user level programming instead
    of embedding everything into the kernel.

    It's just a different philosophy.


    ----== Posted via Pronews.Com - Unlimited-Unrestricted-Secure Usenet News==----
    http://www.pronews.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
    ---= - Total Privacy via Encryption =---

  15. Re: Too many files in one directory (again)

    JF Mezei wrote:
    > I guess it was a case of providing better performance for DIR vs CREATE
    > while Unix provides better performance for CREATE vs DIR


    I don't think it was for optimizing DIR so much as file lookup (for open calls)
    in general. That's why directory files have the nospan record attribute as
    well - to facilitate binary lookup of the directroy entries. Of course, this
    optimization usually demonstrates little benefit due to the amount of
    directory-related caching.



    David L. Jones | Phone: (614) 271-6718
    Ohio State University | Internet:
    140 W. 19th St. | jonesd@ecr6.ohio-state.edu
    Columbus, OH 43210 | vman+@osu.edu

    Disclaimer: I'm looking for marbles all day long.

  16. Re: Too many files in one directory (again)

    JF Mezei wrote:
    > Richard B. Gilbert wrote:


    >>Unix does not store the directory entries in ascending alphanumeric
    >>order! The VMS developers, rightly or wrongly, assumed some degree of
    >>sanity in the users.


    > So when I do an "ls" in Unix, it does an in memory sort of the directory
    > before listing it ?


    > I guess it was a case of providing better performance for DIR vs CREATE
    > while Unix provides better performance for CREATE vs DIR


    Yes. I believe there is an option not to sort it.

    Also, tar will write the files in the order they are in the directory
    without sorting. Often for directories with many files that is faster
    than accessing them in sorted order.

    (Maybe not all unix, but most of them.)

    -- glen


+ Reply to Thread