two approaches to file tree recursion with *at() syscalls - Linux

This is a discussion on two approaches to file tree recursion with *at() syscalls - Linux ; I've noticed that GNU programs (at least in Slackware) have two different approaches to file tree recursion using the new *at() syscalls. 1. Keep each directory level open as a file descriptor. The deeper you go, the more descriptors will ...

+ Reply to Thread
Results 1 to 11 of 11

Thread: two approaches to file tree recursion with *at() syscalls

  1. two approaches to file tree recursion with *at() syscalls

    I've noticed that GNU programs (at least in Slackware) have two different
    approaches to file tree recursion using the new *at() syscalls.

    1. Keep each directory level open as a file descriptor. The deeper you go,
    the more descriptors will be open. Advantage: it ascends even if ".." is
    corrupt. Disadvantage: descent is limited by how many descriptors can be
    open by a process.

    The "du -s" command uses method #1.

    2. Close each directory as subdirectories are descended. Ascent is done by
    opening ".." and closing the previous directory. Only 2 descriptors are
    needed for any depth. Advantage: descent depth is unlimited by how many
    descriptors can be open. Disadvantage: ascent can fail if ".." is corrupt.

    The "rm -fr" command uses method #2.

    I'm curious which method _you_ would choose. Are there any other insights
    useful for making this programming decision?

    --
    |WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
    | by the abuse department, bellsouth.net is blocked. If you post to |
    | Usenet from these places, find another Usenet provider ASAP. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

  2. Re: two approaches to file tree recursion with *at() syscalls

    phil-news-nospam@ipal.net wrote:
    > I've noticed that GNU programs (at least in Slackware) have two different
    > approaches to file tree recursion using the new *at() syscalls.

    [snip]
    > I'm curious which method _you_ would choose. Are there any other insights
    > useful for making this programming decision?


    There are several other considerations.

    The only practical instance of "file tree" is a single non-directory file.
    Any directory with a parent link ".." has a loop! Both hard links and soft
    links can create other, non-trivial loops. So "file graph" is better
    terminology.

    Is the file graph contained in a single block device (no mounts, no
    off-device symlinks)?

    Are you allowed to create temporary files or directories in the file graph?

    Is the file graph constant and unchanging? At least no mount+unmount,
    no unionfs, no bind mounts, etc. ?

    Is the file graph on a CD-ROM or DVD? Mastering programs generally are
    bug-ridden, particularly with respect to extensions of ISO-9660 (RockRidge,
    Joliet, MtRanier, etc.) It is *likely* that the file graph is malformed:
    bad parent, wrong link count for hardlinks, etc.

    Is the file graph on a removable flash memory device?
    Then breakage is likely due to users removing the device
    while still mounted.

    Is the file graph NTFS? With files that have multiple streams?
    Very few programs understand the intricacies.

    Is the file graph reiserfs, or reiser4? The meaning of "file"
    and "directory" morphs quite a lot.

    Consider keeping a record of the pathname of the current location,
    without regard to PATH_MAX. Consider creating temporary "bread crumb"
    files every few levels, with a pointer (including name, device,
    inode number) to the previous crumb.

    --


  3. Re: two approaches to file tree recursion with *at() syscalls

    On Sat, 19 Jul 2008 17:22:17 -0700 John Reiser wrote:
    | phil-news-nospam@ipal.net wrote:
    |> I've noticed that GNU programs (at least in Slackware) have two different
    |> approaches to file tree recursion using the new *at() syscalls.
    | [snip]
    |> I'm curious which method _you_ would choose. Are there any other insights
    |> useful for making this programming decision?
    |
    | There are several other considerations.
    |
    | The only practical instance of "file tree" is a single non-directory file.
    | Any directory with a parent link ".." has a loop! Both hard links and soft
    | links can create other, non-trivial loops. So "file graph" is better
    | terminology.
    |
    | Is the file graph contained in a single block device (no mounts, no
    | off-device symlinks)?

    No guarantees.


    | Are you allowed to create temporary files or directories in the file graph?

    Sure.


    | Is the file graph constant and unchanging? At least no mount+unmount,
    | no unionfs, no bind mounts, etc. ?

    No guarantees.


    | Is the file graph on a CD-ROM or DVD? Mastering programs generally are
    | bug-ridden, particularly with respect to extensions of ISO-9660 (RockRidge,
    | Joliet, MtRanier, etc.) It is *likely* that the file graph is malformed:
    | bad parent, wrong link count for hardlinks, etc.

    No guarantees.


    | Is the file graph on a removable flash memory device?
    | Then breakage is likely due to users removing the device
    | while still mounted.

    No guarantees.


    | Is the file graph NTFS? With files that have multiple streams?
    | Very few programs understand the intricacies.

    No guarantees.


    | Is the file graph reiserfs, or reiser4? The meaning of "file"
    | and "directory" morphs quite a lot.

    No guarantees.


    | Consider keeping a record of the pathname of the current location,
    | without regard to PATH_MAX. Consider creating temporary "bread crumb"
    | files every few levels, with a pointer (including name, device,
    | inode number) to the previous crumb.

    I am planning to keep the full path as a linked list of node name elements.
    But I do not plan to use that to open anything. It is just a way to output
    the current path for messages.

    --
    |WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
    | by the abuse department, bellsouth.net is blocked. If you post to |
    | Usenet from these places, find another Usenet provider ASAP. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

  4. Re: two approaches to file tree recursion with *at() syscalls

    phil-news-nospam@ipal.net writes:

    > I've noticed that GNU programs (at least in Slackware) have two different
    > approaches to file tree recursion using the new *at() syscalls.


    I hadn't heard of these new syscalls -- where are they documented?

  5. Re: two approaches to file tree recursion with *at() syscalls

    > | Is the file graph contained in a single block device (no mounts, no
    > | off-device symlinks)?
    >
    > No guarantees. [For this or any other "nasty" condition.]


    In such cases then neither of the two proposed recursion solutions (keep open
    directories on path from root; use just one open directory, plus "..")
    is safe to walk the graph. Because the graph may span devices,
    may contain arbitrary loops, may be mal-formed, may be on a filesystem
    with a different notion of "directory", and may be changing as you walk,
    then you must keep a history of the device number, inode number, name,
    and creation+modification times of each entry, to help you navigate and
    detect and avoid getting lost in loops. Even then, a clever concurrent
    adversary can force an infinite traversal within a finite graph,
    with probability greater than zero.

    --

  6. Re: two approaches to file tree recursion with *at() syscalls

    On Sat, 19 Jul 2008 20:04:37 -0600 Joe Pfeiffer wrote:
    | phil-news-nospam@ipal.net writes:
    |
    |> I've noticed that GNU programs (at least in Slackware) have two different
    |> approaches to file tree recursion using the new *at() syscalls.
    |
    | I hadn't heard of these new syscalls -- where are they documented?

    The usual answer is in the source code (I always hate that answer).

    But most of them are in my man pages, too (Slackware 12.0).

    I read somewhere a while back that they were proposed as a means to get around
    a big mistake in the design of threading (being that all threads have to share
    the same current directory context). Without that issue, one can simply do a
    change of current directory to chase a directory tree. You can't do that in
    threads, although it would be possible to simulate the *at() functions with a
    locking of all threads for the duration of fchdir/open/fchdir to temporarily
    change the directory to what is in a given descriptor (then change it back
    before unlocking). But that slows down threads.

    --
    |WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
    | by the abuse department, bellsouth.net is blocked. If you post to |
    | Usenet from these places, find another Usenet provider ASAP. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

  7. Re: two approaches to file tree recursion with *at() syscalls

    On Sun, 20 Jul 2008 06:06:26 -0700 John Reiser wrote:
    |> | Is the file graph contained in a single block device (no mounts, no
    |> | off-device symlinks)?
    |>
    |> No guarantees. [For this or any other "nasty" condition.]
    |
    | In such cases then neither of the two proposed recursion solutions (keep open
    | directories on path from root; use just one open directory, plus "..")
    | is safe to walk the graph. Because the graph may span devices,
    | may contain arbitrary loops, may be mal-formed, may be on a filesystem
    | with a different notion of "directory", and may be changing as you walk,
    | then you must keep a history of the device number, inode number, name,
    | and creation+modification times of each entry, to help you navigate and
    | detect and avoid getting lost in loops. Even then, a clever concurrent
    | adversary can force an infinite traversal within a finite graph,
    | with probability greater than zero.

    I know that neither method is perfect. If one of them was, I'd use that.
    What I want to figure out is which method would be more appropriate to use
    in general for most cases, or in a specific case.

    I'm writing a program to do file copying of a tree to another. Mostly it is
    intended as a fast and ordered copy program. But I am also making it handle
    target replacement, which can involve deleting a directory subtree at the
    target, as well. So the code could have tree walk both for copying source
    to target, as well as deletion of (parts of) the target.

    Maybe the specific case(s) will be better served by one of these methods.

    The uses I would apply this program to would not have the tree changing
    during the copying (or of the deletion of a target). Someone else might
    try to do that. It is possible that the source tree could span filesystem
    mounts. I will be tracking device and inode already as part of replicating
    the source's hardlink relationships into the target. It seems that mounts
    simulate fake ".." entries that always go to the parent of the mount point,
    or to "/" itself for the "/" mount point.

    I'm contemplating some kind of hybrid method that uses an open descriptor of
    each directory level, then switch to the other method of closing directories
    and reopen them coming back up when descriptors run out. The catch with that
    is I need a few reserve descriptors, so I need to transition between methods
    before actually running out. I'll need at least 3 reserve descriptors. I'd
    have to trust sysconf(_SC_OPEN_MAX) or something to count them on my own.

    --
    |WARNING: Due to extreme spam, googlegroups.com is blocked. Due to ignorance |
    | by the abuse department, bellsouth.net is blocked. If you post to |
    | Usenet from these places, find another Usenet provider ASAP. |
    | Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |

  8. Re: two approaches to file tree recursion with *at() syscalls

    phil-news-nospam@ipal.net writes:

    > On Sat, 19 Jul 2008 20:04:37 -0600 Joe Pfeiffer wrote:
    > |
    > | I hadn't heard of these new syscalls -- where are they documented?
    >
    > The usual answer is in the source code (I always hate that answer).
    >
    > But most of them are in my man pages, too (Slackware 12.0).


    I'm sorry, I'm more ignorant of them than you realize -- could you
    give me the complete name of one of the functions so I can use that to
    try to hunt?

  9. Re: two approaches to file tree recursion with *at() syscalls

    > the complete name of one of the functions

    faccessatd, fchmodat, fchownat, fstatat, futimesat, linkat,
    mkdirat, mknodatd, openat, readlinkat, renameat, symlinkat,
    unlinkat, mkfifoat

    Somewhat related ["system call" with extra argument for
    more 'powerful' interface], there are pread and pwrite
    which avoid a separate lseek, and do not move the pointer.

    --

  10. Re: two approaches to file tree recursion with *at() syscalls

    John Reiser writes:

    >> the complete name of one of the functions

    >
    > faccessatd, fchmodat, fchownat, fstatat, futimesat, linkat,
    > mkdirat, mknodatd, openat, readlinkat, renameat, symlinkat,
    > unlinkat, mkfifoat
    >
    > Somewhat related ["system call" with extra argument for
    > more 'powerful' interface], there are pread and pwrite
    > which avoid a separate lseek, and do not move the pointer.


    thanks

  11. Re: two approaches to file tree recursion with *at() syscalls



    Joe Pfeiffer wrote:
    > John Reiser writes:
    >
    > >> the complete name of one of the functions

    > >
    > > faccessatd, fchmodat, fchownat, fstatat, futimesat, linkat,
    > > mkdirat, mknodatd, openat, readlinkat, renameat, symlinkat,
    > > unlinkat, mkfifoat
    > >
    > > Somewhat related ["system call" with extra argument for
    > > more 'powerful' interface], there are pread and pwrite
    > > which avoid a separate lseek, and do not move the pointer.

    >

    Keep Linux free
    Spam and smoke free as well




+ Reply to Thread