[patch 00/10] vfs: add helpers to check r/o bind mounts - Kernel

This is a discussion on [patch 00/10] vfs: add helpers to check r/o bind mounts - Kernel ; On Thu, Apr 03, 2008 at 12:40:43AM +0100, Al Viro wrote: > Anyway, what the hell for? It's more complex and buys you nothing > useful. BTW, what it gives you is an extra headache for scenarios like task A: ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 39 of 39

Thread: [patch 00/10] vfs: add helpers to check r/o bind mounts

  1. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    On Thu, Apr 03, 2008 at 12:40:43AM +0100, Al Viro wrote:
    > Anyway, what the hell for? It's more complex and buys you nothing
    > useful.


    BTW, what it gives you is an extra headache for scenarios like

    task A:
    write(fd, ...)
    task B: /* shares descriptor table with A */
    close(fd)
    task C: umount()
    ....
    task A: still writing

    At the very least, you want "that thing is still busy" on normal umount -
    we are still in the middle of write(2) and hell knows how long it's going
    to last. So you need to play with refcount of vfsmount in a very nasty
    way, for all your pains. Again, what for?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [patch 01/10] vfs: add path_create() and path_mknod()


    On Thu, 2008-04-03 at 00:47 +0100, Al Viro wrote:
    > On Thu, Apr 03, 2008 at 12:40:43AM +0100, Al Viro wrote:
    > > Anyway, what the hell for? It's more complex and buys you nothing
    > > useful.

    >
    > BTW, what it gives you is an extra headache for scenarios like
    >
    > task A:
    > write(fd, ...)
    > task B: /* shares descriptor table with A */
    > close(fd)
    > task C: umount()
    > ...
    > task A: still writing
    >
    > At the very least, you want "that thing is still busy" on normal umount -
    > we are still in the middle of write(2) and hell knows how long it's going
    > to last. So you need to play with refcount of vfsmount in a very nasty
    > way, for all your pains.


    We already call fget_light()/fput_light() around the whole call to
    vfs_write(). Substituting a call to something which takes a reference to
    the new structure is trivial.

    Once the write() syscall is done, and the file is closed, why should we
    refuse the umount request? There should be nothing more going on that
    depends on the namespace.

    > Again, what for?


    It allows you to get rid of the vfsmount 'argument' when opening a file,
    which again lowers the barrier for stacking filesystems.

    As far as the filesystems themselves are concerned, the effect is to
    enforce your assertion that file operations should not depend on the
    namespace.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    On Wed, Apr 02, 2008 at 08:42:09PM -0400, Trond Myklebust wrote:

    > > At the very least, you want "that thing is still busy" on normal umount -
    > > we are still in the middle of write(2) and hell knows how long it's going
    > > to last. So you need to play with refcount of vfsmount in a very nasty
    > > way, for all your pains.

    >
    > We already call fget_light()/fput_light() around the whole call to
    > vfs_write(). Substituting a call to something which takes a reference to
    > the new structure is trivial.


    Huh? So you want an extra layer of indirection? descriptor table -> that
    one -> struct file? And refcounting these puppies?

    > It allows you to get rid of the vfsmount 'argument' when opening a file,
    > which again lowers the barrier for stacking filesystems.


    I don't see how that would fix the fundamental breakage in those, but
    anyway... (and yes, ecryptfs has interesting issues, but the look of it).

    > As far as the filesystems themselves are concerned, the effect is to
    > enforce your assertion that file operations should not depend on the
    > namespace.


    I really doubt that it's worth doing in this area... "Don't use ->f_vfsmnt
    in fs code" is easily enforced and struct file is really used outside of
    filesystem code in fs-independent ways.

    IOW, I don't believe that it's worth introducing a new layer between descriptor
    table and files. BTW, that'll complicate union-mount handling (real ones, not
    unionfs under different name) and quite a few other things...
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    In message <1207183329.20254.49.camel@heimdal.trondhjem.org>, Trond Myklebust writes:
    >
    > On Thu, 2008-04-03 at 00:47 +0100, Al Viro wrote:

    [...]
    > > Again, what for?

    >
    > It allows you to get rid of the vfsmount 'argument' when opening a file,
    > which again lowers the barrier for stacking filesystems.
    >
    > As far as the filesystems themselves are concerned, the effect is to
    > enforce your assertion that file operations should not depend on the
    > namespace.


    I'd be delighted the day I won't have to deal with vfsmounts AT ALL in
    any stacked f/s.

    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    On Wed, Apr 02, 2008 at 08:47:06PM -0400, Erez Zadok wrote:
    > In message <1207183329.20254.49.camel@heimdal.trondhjem.org>, Trond Myklebust writes:
    > >
    > > On Thu, 2008-04-03 at 00:47 +0100, Al Viro wrote:

    > [...]
    > > > Again, what for?

    > >
    > > It allows you to get rid of the vfsmount 'argument' when opening a file,
    > > which again lowers the barrier for stacking filesystems.
    > >
    > > As far as the filesystems themselves are concerned, the effect is to
    > > enforce your assertion that file operations should not depend on the
    > > namespace.

    >
    > I'd be delighted the day I won't have to deal with vfsmounts AT ALL in
    > any stacked f/s.


    vfsmounts or the fact that any fs may be mounted in many places?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    In message <20080403010016.GU9785@ZenIV.linux.org.uk>, Al Viro writes:
    > On Wed, Apr 02, 2008 at 08:47:06PM -0400, Erez Zadok wrote:
    > > In message <1207183329.20254.49.camel@heimdal.trondhjem.org>, Trond Myklebust writes:
    > > >
    > > > On Thu, 2008-04-03 at 00:47 +0100, Al Viro wrote:

    > > [...]
    > > > > Again, what for?
    > > >
    > > > It allows you to get rid of the vfsmount 'argument' when opening a file,
    > > > which again lowers the barrier for stacking filesystems.
    > > >
    > > > As far as the filesystems themselves are concerned, the effect is to
    > > > enforce your assertion that file operations should not depend on the
    > > > namespace.

    > >
    > > I'd be delighted the day I won't have to deal with vfsmounts AT ALL in
    > > any stacked f/s.

    >
    > vfsmounts or the fact that any fs may be mounted in many places?


    I think the former. The fact that any f/s can be mounted in many places,
    should be fine for a stacked f/s (topology changes as we discussed before
    are a different problem).

    Since you've hinted of major vfs changes post-25, here's my take.

    Right now I (and to a similar extent ecryptfs too) needs to keep track of
    vfsmounts for various reasons:

    - to grab a reference so the lower filesystems/mounts won't disappear on me
    - to pass it to dentry_open (opening the lower file)
    - some fs ops pass a vfsmount (umount_begin, show_options)
    - some fs ops or vfs helpers require a nameidata or struct path which embed
    a vfsmount inside
    - sometimes it's ok to pass NULL for those things, sometimes it's not ok

    Often it also appears that a vfsmount and a dentry tuple are needed
    together, hence struct path. So we recently started using struct path in a
    few places, and using path_get/put. The need to deal with
    pairs by hand can lead to interesting bugs, such as the recent ecryptfs
    bugfix which had to swap the order of a dput() and mntput() sequence.

    The fact that several vfs ops and helpers take different namespace-related
    structures (dentry, nameidata, vfsmount, path), is confusing and hard to
    track. Is there a way to reduce the number of those structures to 1-2 at
    most?

    For a stackable f/s, it's important the the VFS ops *and* the VFS helpers
    (vfs_*, path_*, dentry_open, etc.) use the same namespace structures: then
    there's more symmetry b/t the objects passed to a stacked f/s, and the
    objects it has to pass to VFS helpers (and FWIW, every such object should
    ideally leave a "void *private" field for f/s specific extensibility).

    In the longer run, is there a way that a stackable f/s could traverse the
    namespace below it w/o knowing or caring how they are composed (assorted r-w
    and r-o bind mounts and such)?

    Cheers,
    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    On Wed, Apr 02, 2008 at 09:37:01PM -0400, Erez Zadok wrote:
    > Since you've hinted of major vfs changes post-25, here's my take.
    >
    > Right now I (and to a similar extent ecryptfs too) needs to keep track of
    > vfsmounts for various reasons:
    >
    > - to grab a reference so the lower filesystems/mounts won't disappear on me


    Umm... Strictly speaking that's not true; you can grab an active reference
    to superblock and then superblock will stay. vfsmount is usually more
    convenient, but that's it.

    > - to pass it to dentry_open (opening the lower file)
    > - some fs ops pass a vfsmount (umount_begin, show_options)


    Thank Trond for the former, BTW ;-) Both methods will be back to sanity;
    for umount_begin() the only obstacle is cifs mess, where we do unconditional
    (and bogus) work at each umount(2). With that fixed, we'll be back to
    calling it only for forced umount and passing it only superblock.
    ->show_options() can and should revert immediately after 2.6.25; thanks
    for pointing that one out.

    > - some fs ops or vfs helpers require a nameidata or struct path which embed
    > a vfsmount inside


    *ONLY* ->follow_link(). Which has to, by definition... The rest will be
    gone by .26.

    > - sometimes it's ok to pass NULL for those things, sometimes it's not ok


    See above. This crap will be gone. For ->follow_link() nobody is allowed
    to pass NULL as nameidata, period.

    > In the longer run, is there a way that a stackable f/s could traverse the
    > namespace below it w/o knowing or caring how they are composed (assorted r-w
    > and r-o bind mounts and such)?


    Eh? Explain, please...
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    In message <20080403014622.GW9785@ZenIV.linux.org.uk>, Al Viro writes:
    > On Wed, Apr 02, 2008 at 09:37:01PM -0400, Erez Zadok wrote:
    > > Since you've hinted of major vfs changes post-25, here's my take.
    > >
    > > Right now I (and to a similar extent ecryptfs too) needs to keep track of
    > > vfsmounts for various reasons:
    > >
    > > - to grab a reference so the lower filesystems/mounts won't disappear on me

    >
    > Umm... Strictly speaking that's not true; you can grab an active reference
    > to superblock and then superblock will stay. vfsmount is usually more
    > convenient, but that's it.


    Yes, I do grab both vfsmount and superblock refs. I found out that grabbing
    vfsmount refs wasn't enough to prevent "umount -l" from detaching the f/s on
    which I'm stacked on. So now at mount time (or branch management time), I
    grab those super-refs, as I have them after a successful path_lookup. And,
    since I keep a list of the branches I'm stacked on, I know precisely which
    superblocks' references I need to release when unionfs is unmounted.

    But what do I do if I descend into another lower superblock while looking up
    a lower directory? How do keep track of the superblock refs now? I'd
    basically have to memorize the hierarchy of mounted superblocks somehow?
    How would I know when to release those refs? (hmm, maybe I can rely on
    d_mounted or the like?)

    > > - sometimes it's ok to pass NULL for those things, sometimes it's not ok

    >
    > See above. This crap will be gone. For ->follow_link() nobody is allowed
    > to pass NULL as nameidata, period.


    There's been talk in the past of splitting nameidata into intent structure
    and all the rest. Is that also part of your plan for 26? Intents are
    indeed very useful in ->lookup; the rest I can do without.

    > > In the longer run, is there a way that a stackable f/s could traverse the
    > > namespace below it w/o knowing or caring how they are composed (assorted r-w
    > > and r-o bind mounts and such)?

    >
    > Eh? Explain, please...


    So, in ->lookup, I essentially have to do a lookup on the corresponding
    lower objects. I get a nameidata, I have to create my own nameidata, and
    pass it to lower ->lookup calls. Can the API be changed so I wouldn't need
    to get or pass a nameidata? (maybe just intent struct).

    Also in ->lookup I call lookup_one_len, which is nice b/c it doesn't involve
    vfsmounts or nameidata. But lookup_one_len doesn't use the same prototype
    as ->lookup, so there's some asymmetry here (I often like to see that a
    stacked op passed to the lower f/s uses the same API as what was passed to
    the op in the first place).

    Ironically, since lookup_one_len doesn't involve vfsmounts, but I need them
    for other reasons, I'm forced to live with NULL vfsmounts in some cases, or
    refer to the lower vfsmounts I already had for my root dentry (that makes
    transparently descending into a different vfsmount challenging, if not
    inconsistent).

    For many fs ops, the prototype of the ->op has a perfectly symmetric vfs_op()
    helper. For example, ->mkdir(inode,dentry,mode) and
    vfs_mkdir(inode,dentry,mode). But nothing as simple for lookup, one of the
    most complex ops in unionfs.

    BTW, to be honest, some of extra complications in unionfs (and
    unionfs_lookup) have to do with .wh.* files (whiteouts). But I hope that'll
    get simpler once we have native WH support in the all the major filesystems.

    Cheers,
    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    On Wed, Apr 02, 2008 at 10:21:24PM -0400, Erez Zadok wrote:
    > Yes, I do grab both vfsmount and superblock refs. I found out that grabbing
    > vfsmount refs wasn't enough to prevent "umount -l" from detaching the f/s on
    > which I'm stacked on. So now at mount time (or branch management time), I
    > grab those super-refs, as I have them after a successful path_lookup. And,
    > since I keep a list of the branches I'm stacked on, I know precisely which
    > superblocks' references I need to release when unionfs is unmounted.


    How the devil would holding a superblock prevent umount -l?

    > But what do I do if I descend into another lower superblock while looking up
    > a lower directory? How do keep track of the superblock refs now? I'd
    > basically have to memorize the hierarchy of mounted superblocks somehow?
    > How would I know when to release those refs? (hmm, maybe I can rely on
    > d_mounted or the like?)
    >
    > > > - sometimes it's ok to pass NULL for those things, sometimes it's not ok

    > >
    > > See above. This crap will be gone. For ->follow_link() nobody is allowed
    > > to pass NULL as nameidata, period.

    >
    > There's been talk in the past of splitting nameidata into intent structure
    > and all the rest. Is that also part of your plan for 26? Intents are
    > indeed very useful in ->lookup; the rest I can do without.


    intents will die. There'll be a method for final step of lookup + open,
    but that's it (and it'll take preallocated struct file as one of the
    arguments). Please, explain what you want to do with intents, because
    as far as I'm concerned these had been a mistake for a lot of reasons.

    > Ironically, since lookup_one_len doesn't involve vfsmounts, but I need them
    > for other reasons, I'm forced to live with NULL vfsmounts in some cases, or
    > refer to the lower vfsmounts I already had for my root dentry (that makes
    > transparently descending into a different vfsmount challenging, if not
    > inconsistent).


    Details, please. If you just want a snapshot of vfsmount tree, then by
    all means take a bloody snapshot. collect_mounts() is there for purpose.
    If you want mount/umount/etc. changes affect what you have, then I really
    would like to see the semantics you want. Some variation on shared-subtree
    might be close to that...
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    > > It is an interface with at least 3 callers at the moment: syscalls,
    > > nfsd and ecryptfs and unionfs in the future. It is an interface
    > > because all external accesses to the filesystem *are* done through the
    > > namespace and not directly on filesystem internals.
    > >
    > > Such direct access would be conceivable, but I don't think we should
    > > base the design on theoretically possible uses. If those uses appear,
    > > we can change the interface to fit that.
    > >
    > > This "move everything to callers" thing is wrong because it just
    > > results in bloat and bugs, none of which is desirable in the kernel.

    >
    > I disagree. First of all, clear separation between operations on
    > _filesystem_, which should all be namespace-agnostic and things
    > that depend on vfsmount is a Good Thing(tm). Think of that as
    > of separation between server (superblock and everything related
    > to it, starting with dentry tree) and clients; mixing those is a
    > bloody bad idea.


    Separation: yes. Exporting it as an interface: I'm not convinced.

    If we ever want to rely on r/o mounts doing what they're supposed to
    do, ie. not allowing modifications to the filesystem, we'd better
    expose an interface that does just that. Relying on callers won't
    work (as demonstrated in practice).

    Wasn't one of the points behind the r/o bind patchset is to make
    remounting read-only race free? If all mounts of a super block are
    read-only, then the super block itself can be safely marked read-only.

    If we allow external entities such as filesystem stacks to bypass this
    rule and access the filesystem directly, then the whole thing is
    basically worthless.

    > What's more, I'm not at all convinced that for nfsd it's the right
    > set of checks, to start with.


    Well, what are the right checks then? From the r/o consistency pov,
    these are the right checks. From NFS' pov there may be places where
    we might want to take a write-ref spanning several operations. But
    that's OK, the checks are cheap (if not, we have other problems).

    > The same goes for future users.


    There's already one place in that interface where users need to know
    the vfsmount which they are operating on: opening a file (and hence
    subsequent I/O on that file). Besides my other points, what is the
    sense in an interface, half of which operates on the namespace, and
    the other half on the filesystem?

    > As for ecryptfs, looking at their "lower_mnt" thing... I'd say that
    > it's a nonsense. For one thing, duplicating a reference into ever
    > dentry out there (and it's simply duplicated) makes no sense whatsoever.
    > For another... I'm not at all sure that remount of the underlying
    > vfsmount r/o *should* take that sucker read-only. And if it should,
    > it's clearly an action that should have a visible effect on superblock
    > flags of ecryptfs.


    If it wants to handle that case nicely, it can monitor /proc/mounts
    and reflect it in it's superblock flags. And it can take a write-ref
    on the underlying fs if it has outstanding dirtyness. But we should
    not _rely_ on ecryptfs to ensure that it's never writing to a
    read-only mount.

    Miklos
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [patch 01/10] vfs: add path_create() and path_mknod()


    On Wed, 2008-04-02 at 21:54 +0100, Al Viro wrote:
    > On Wed, Apr 02, 2008 at 10:12:48PM +0200, Miklos Szeredi wrote:
    > > From: Miklos Szeredi
    > >
    > > R/o bind mounts require operations which modify the filesystem to be
    > > wrapped in mnt_want_write()/mnt_drop_write(). Create helpers which do
    > > this, so callers won't need to bother, and more importantly, cannot
    > > forget! Call these path_*, analogous to vfs_*. Where there are no
    > > callers of vfs_* left, make them static.
    > >
    > > This series is a cleanup, as well as fixing several places (mostly in
    > > nfsd) where mnt_{want,drop}_write() were missing.
    > >
    > > It will also help with merging You Know What(*) security module, which
    > > needs to know the path within the namespace, and not just within the
    > > filesystem. These helpers will allow the security hooks to be in a
    > > common place, and need not be repeated in all callers.

    >
    > Rot.
    >
    > Places in nfsd must be fixed, LSM hooks *moved* *to* *callers*.


    Please don't move the existing security_inode hooks - they are at
    exactly the right location for SELinux and likely other security modules
    too - after the DAC checks (so no noise in the audit logs from accesses
    that would be denied by DAC anyway), and right before the inode method
    is invoked (so easy to see that the permission check is always invoked
    before access). Moving the existing hooks will break the first property
    unless the DAC checks are also moved and will make it harder to check
    and maintain the second property.

    > And
    > really, by that point I absolutely do not give a damn for these clowns.
    > "Help with merging" implies that they can't be arsed to do _anything_
    > with their code. Just as they could not be arsed to react to any
    > comments (including civil ones) in any manner except "wait for a month
    > and repost without changes". Sod them.
    >
    > And no, "make static where all (two of) current callers have vfsmount"
    > is non-starter. path_...() is (at most) a convenience helper, not a
    > fundamental interface - simply because new callers should not be
    > forced into inventing a fake vfsmount just to use that.
    >
    > I'll look into missing mnt_..._write() in nfsd and fix that. The rest...
    > Sorry.
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html

    --
    Stephen Smalley
    National Security Agency

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    In message , Miklos Szeredi writes:
    [...]
    > If it wants to handle that case nicely, it can monitor /proc/mounts
    > and reflect it in it's superblock flags. And it can take a write-ref
    > on the underlying fs if it has outstanding dirtyness. But we should
    > not _rely_ on ecryptfs to ensure that it's never writing to a
    > read-only mount.


    I'm all for reducing the need for stackable f/s such as ecryptfs from having
    to remember to manage the write counts. A stackable filesystem has to
    emulate some parts of the VFS when it calls into a lower f/s: in that case,
    having useful VFS helpers that do as much of the work as possible, saves a
    lot of hassle.

    > Miklos


    Cheers,
    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    In message <20080403023239.GY9785@ZenIV.linux.org.uk>, Al Viro writes:
    > On Wed, Apr 02, 2008 at 10:21:24PM -0400, Erez Zadok wrote:
    > > Yes, I do grab both vfsmount and superblock refs. I found out that grabbing
    > > vfsmount refs wasn't enough to prevent "umount -l" from detaching the f/s on
    > > which I'm stacked on. So now at mount time (or branch management time), I
    > > grab those super-refs, as I have them after a successful path_lookup. And,
    > > since I keep a list of the branches I'm stacked on, I know precisely which
    > > superblocks' references I need to release when unionfs is unmounted.

    >
    > How the devil would holding a superblock prevent umount -l?


    It doesn't prevent umount -l; but it prevents the lower superblock from
    being kfree()d if there were no references left to it, so I don't try to
    accessed freed memory (and get 6b6b6b6b oopses :-)

    > > But what do I do if I descend into another lower superblock while looking up
    > > a lower directory? How do keep track of the superblock refs now? I'd
    > > basically have to memorize the hierarchy of mounted superblocks somehow?
    > > How would I know when to release those refs? (hmm, maybe I can rely on
    > > d_mounted or the like?)
    > >
    > > > > - sometimes it's ok to pass NULL for those things, sometimes it's not ok
    > > >
    > > > See above. This crap will be gone. For ->follow_link() nobody is allowed
    > > > to pass NULL as nameidata, period.

    > >
    > > There's been talk in the past of splitting nameidata into intent structure
    > > and all the rest. Is that also part of your plan for 26? Intents are
    > > indeed very useful in ->lookup; the rest I can do without.

    >
    > intents will die. There'll be a method for final step of lookup + open,
    > but that's it (and it'll take preallocated struct file as one of the
    > arguments).


    How much of nameidata will still need to be passed to f/s ops? How many vfs
    helpers will still require a nameidata? Hopefully as few as possible.

    Is this preallocated struct file a replacement to the nd->intent.open.file
    that could have returned an allocated struct file? (I could never really
    tell what's the right thing to do with that field.)

    > Please, explain what you want to do with intents, because
    > as far as I'm concerned these had been a mistake for a lot of reasons.


    I said above "intents are indeed very useful" only because they were:

    1. required to be passed to vfs_* methods, which made it hard if the ->op I
    was in didn't have an intent (I have to create temp intents for those).

    2. apparently heavily used in nfs4, to a point where if I stacked on nfs4
    and didn't pass a correctly formatted nameidata, I'd get an oops deep
    inside nfs4 code. (I think some of those nfs4 requirements went away,
    not sure what's left.).

    > > Ironically, since lookup_one_len doesn't involve vfsmounts, but I need them
    > > for other reasons, I'm forced to live with NULL vfsmounts in some cases, or
    > > refer to the lower vfsmounts I already had for my root dentry (that makes
    > > transparently descending into a different vfsmount challenging, if not
    > > inconsistent).

    >
    > Details, please. If you just want a snapshot of vfsmount tree, then by
    > all means take a bloody snapshot. collect_mounts() is there for purpose.
    > If you want mount/umount/etc. changes affect what you have, then I really
    > would like to see the semantics you want. Some variation on shared-subtree
    > might be close to that...


    I'll have to take a closer look at vfsmount tree snapshotting and
    collect_mounts() before I can say how useful they'd be. But if you recall
    my questions at LSF, I asked whether it was possible for me to create a
    readonly directory tree (e.g., r-o bind mounts) or some form of an immutable
    namespace that no one can modify underneath me. You said that r-o bind
    mounts were not intended for that.

    Right now I'm allowing users to modify lower branches, with all the pros and
    cons that it has. But even if I wanted to prevent users from touching any
    lower files below a certain directory, while allowing only unionfs to modify
    those files, it doesn't appear that there's something available I could use.


    I have two other questions/requests:

    1. The less I have to use or know about vfsmounts and nameidata/intents, the
    better. But whatever API changes you make, please consider the symmetry
    between the f/s ->op I'm called with, and the vfs helpers I might use in
    the ->op to pass through to the lower f/s. Ideally the prototypes of the
    ->op and vfs helpers be identical, so I don't have to work too hard to
    locate the lower objects, or worse, having to make them up temporarily.

    2. You mentioned that all this work is scheduled for 26. 25 is nearing
    release. Do you have code already that I can experiment with? A preview
    of things to come? Maybe an example or two of how a filesystem
    (stackable, nfsd, or otherwise) should lookup and open arbitrary files?
    What you mentioned in this discussion thread sounds promising and
    exciting, but may take me a while to apply to my tree (longer than the
    usual merge window, which reportedly will shrink even further, now that
    we have linux-next :-). So the more lead time, the better, please.

    Thanks,
    Erez.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [patch 01/10] vfs: add path_create() and path_mknod()

    > In message <20080403023239.GY9785@ZenIV.linux.org.uk>, Al Viro writes:
    > > On Wed, Apr 02, 2008 at 10:21:24PM -0400, Erez Zadok wrote:
    > > > Yes, I do grab both vfsmount and superblock refs. I found out that grabbing
    > > > vfsmount refs wasn't enough to prevent "umount -l" from detaching the f/s on
    > > > which I'm stacked on. So now at mount time (or branch management time), I
    > > > grab those super-refs, as I have them after a successful path_lookup. And,
    > > > since I keep a list of the branches I'm stacked on, I know precisely which
    > > > superblocks' references I need to release when unionfs is unmounted.

    > >
    > > How the devil would holding a superblock prevent umount -l?

    >
    > It doesn't prevent umount -l; but it prevents the lower superblock from
    > being kfree()d if there were no references left to it, so I don't try to
    > accessed freed memory (and get 6b6b6b6b oopses :-)


    Having a ref on the vfsmount should be enough, regardless of whether
    the mount is attached or not. There's nothing special about a
    detached mount, that anything outside namespace.c should care about.

    Miklos
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. [PATCH] Unionfs: use the new path_* VFS helpers

    Miklos and Al,

    As promised, here is a patch for Unionfs to use the new path_* VFS helpers
    which now take a struct path *. I've used Al's vfs git tree plus all 12
    patches you've outlined (ten path_* patches + 2 others). I've run Unionfs
    through my basic regression suite; I haven't done more extensive testing
    yet. This patch can be used in -mm if/when your path_* patches make it
    there (this patch will be a standalone one to be used on top of my
    unionfs.git tree).

    Al, I'm not too pleased with the resulting additional code complexity to
    Unionfs, due to these new helpers. See the diffstat below. These VFS
    helpers have widened the gap between the f/s API and the VFS helpers': the
    helpers now need a struct path/vfsmount, but the most ->ops that a
    filesystem is being called with, don't have a path/vfsmount passed.

    You've said before that a filesystems has no business dealing with vfsmounts
    (with the exception of follow_link). Fine. But the irony now is that the
    new vfs helpers that a filesystems is likely to use, FORCE you to know a lot
    more about vfsmounts! If indeed a filesystem shouldn't have to know or deal
    with vfsmounts, then none of the VFS helpers should require a struct
    vfsmount (or a struct path, or a nameidata, which embed a vfsmount+dentry
    pair). Otherwise, let's admit that some filesystems have to know about
    vfsmounts and hence let's pass them to all appropriate fs ->ops. (BTW, if
    that's done, then it'd also be helpful for a struct vfsmount to have a "void
    *private" into which a stacked f/s can store "lower objects").

    I hope your planned Big VFS Overhaul can reverse this trend by making it
    possible for a filesystem to not know anything (or know very little) about
    vfsmounts and the structs that embed them.

    Thanks,
    Erez.


    commonfops.c | 5 +++-
    copyup.c | 41 ++++++++++++++++++++++++++--------
    dirhelper.c | 10 ++++++--
    inode.c | 70 +++++++++++++++++++++++++++++++++++++++++------------------
    lookup.c | 8 +++++-
    rename.c | 34 +++++++++++++++++++---------
    sioq.c | 10 ++++----
    sioq.h | 1
    subr.c | 22 +++++++++++-------
    union.h | 3 +-
    unlink.c | 10 +++++---
    xattr.c | 11 ++++++---
    12 files changed, 158 insertions(+), 67 deletions(-)



    Unionfs: use new path_* helpers instead of vfs_* helpers

    Signed-off-by: Erez Zadok

    diff --git a/fs/unionfs/commonfops.c b/fs/unionfs/commonfops.c
    index 0fc7963..e9878bb 100644
    --- a/fs/unionfs/commonfops.c
    +++ b/fs/unionfs/commonfops.c
    @@ -35,6 +35,7 @@ static int copyup_deleted_file(struct file *file, struct dentry *dentry,
    struct dentry *tmp_dentry = NULL;
    struct dentry *lower_dentry;
    struct dentry *lower_dir_dentry = NULL;
    + struct path path;

    lower_dentry = unionfs_lower_dentry_idx(dentry, bstart);

    @@ -88,7 +89,9 @@ retry:
    lower_dentry->d_inode);
    }
    lower_dir_dentry = lock_parent(lower_dentry);
    - err = vfs_unlink(lower_dir_dentry->d_inode, lower_dentry);
    + path.dentry = lower_dir_dentry;
    + path.mnt = unionfs_lower_mnt_idx(dentry->d_parent, dbstart(dentry));
    + err = path_unlink(&path, lower_dentry);
    unlock_dir(lower_dir_dentry);

    out:
    diff --git a/fs/unionfs/copyup.c b/fs/unionfs/copyup.c
    index 6d1e461..ea658e6 100644
    --- a/fs/unionfs/copyup.c
    +++ b/fs/unionfs/copyup.c
    @@ -26,13 +26,15 @@
    #ifdef CONFIG_UNION_FS_XATTR
    /* copyup all extended attrs for a given dentry */
    static int copyup_xattrs(struct dentry *old_lower_dentry,
    - struct dentry *new_lower_dentry)
    + struct dentry *new_lower_dentry,
    + struct vfsmount *new_lower_mnt)
    {
    int err = 0;
    ssize_t list_size = -1;
    char *name_list = NULL;
    char *attr_value = NULL;
    char *name_list_buf = NULL;
    + struct path path;

    /* query the actual size of the xattr list */
    list_size = vfs_listxattr(old_lower_dentry, NULL, 0);
    @@ -82,8 +84,9 @@ static int copyup_xattrs(struct dentry *old_lower_dentry,
    goto out;
    }
    /* Don't lock here since vfs_setxattr does it for us. */
    - err = vfs_setxattr(new_lower_dentry, name_list, attr_value,
    - size, 0);
    + path.dentry = new_lower_dentry;
    + path.mnt = new_lower_mnt;
    + err = path_setxattr(&path, name_list, attr_value, size, 0);
    /*
    * Selinux depends on "security.*" xattrs, so to maintain
    * the security of copied-up files, if Selinux is active,
    @@ -93,8 +96,8 @@ static int copyup_xattrs(struct dentry *old_lower_dentry,
    */
    if (err == -EPERM && !capable(CAP_FOWNER)) {
    cap_raise(current->cap_effective, CAP_FOWNER);
    - err = vfs_setxattr(new_lower_dentry, name_list,
    - attr_value, size, 0);
    + err = path_setxattr(&path, name_list, attr_value,
    + size, 0);
    cap_lower(current->cap_effective, CAP_FOWNER);
    }
    if (err < 0)
    @@ -167,12 +170,16 @@ out:
    static int __copyup_ndentry(struct dentry *old_lower_dentry,
    struct dentry *new_lower_dentry,
    struct dentry *new_lower_parent_dentry,
    + struct vfsmount *new_lower_parent_mnt,
    char *symbuf)
    {
    int err = 0;
    umode_t old_mode = old_lower_dentry->d_inode->i_mode;
    struct sioq_args args;

    + args.path.dentry = new_lower_parent_dentry;
    + args.path.mnt = new_lower_parent_mnt;
    +
    if (S_ISDIR(old_mode)) {
    args.mkdir.parent = new_lower_parent_dentry->d_inode;
    args.mkdir.dentry = new_lower_dentry;
    @@ -199,7 +206,9 @@ static int __copyup_ndentry(struct dentry *old_lower_dentry,
    err = args.err;
    } else if (S_ISREG(old_mode)) {
    struct nameidata nd;
    - err = init_lower_nd(&nd, LOOKUP_CREATE);
    +
    + err = init_lower_nd(&nd, args.path.dentry, args.path.mnt,
    + LOOKUP_CREATE);
    if (unlikely(err < 0))
    goto out;
    args.create.nd = &nd;
    @@ -387,6 +396,8 @@ int copyup_dentry(struct inode *dir, struct dentry *dentry, int bstart,
    struct dentry *new_lower_parent_dentry = NULL;
    mm_segment_t oldfs;
    char *symbuf = NULL;
    + struct path path;
    + struct vfsmount *mnt;

    verify_locked(dentry);

    @@ -444,10 +455,13 @@ int copyup_dentry(struct inode *dir, struct dentry *dentry, int bstart,

    /* Now we lock the parent, and create the object in the new branch. */
    new_lower_parent_dentry = lock_parent(new_lower_dentry);
    + mnt = unionfs_lower_mnt_idx(dentry->d_parent, new_bindex);
    + if (!mnt)
    + mnt = unionfs_lower_mnt_idx(sb->s_root, new_bindex);

    /* create the new inode */
    err = __copyup_ndentry(old_lower_dentry, new_lower_dentry,
    - new_lower_parent_dentry, symbuf);
    + new_lower_parent_dentry, mnt, symbuf);

    if (err) {
    __clear(dentry, old_lower_dentry,
    @@ -470,8 +484,9 @@ int copyup_dentry(struct inode *dir, struct dentry *dentry, int bstart,
    goto out_unlink;

    #ifdef CONFIG_UNION_FS_XATTR
    + mnt = unionfs_lower_mnt_idx(dentry, new_bindex);
    /* Selinux uses extended attributes for permissions. */
    - err = copyup_xattrs(old_lower_dentry, new_lower_dentry);
    + err = copyup_xattrs(old_lower_dentry, new_lower_dentry, mnt);
    if (err)
    goto out_unlink;
    #endif /* CONFIG_UNION_FS_XATTR */
    @@ -488,7 +503,9 @@ out_unlink:
    * quota, or something else happened so let's unlink; we don't
    * really care about the return value of vfs_unlink
    */
    - vfs_unlink(new_lower_parent_dentry->d_inode, new_lower_dentry);
    + path.dentry = new_lower_parent_dentry;
    + path.mnt = unionfs_lower_mnt_idx(dentry->d_parent, new_bindex);
    + path_unlink(&path, new_lower_dentry);

    if (copyup_file) {
    /* need to close the file */
    @@ -664,7 +681,7 @@ static void __set_dentry(struct dentry *upper, struct dentry *lower,

    /*
    * This function replicates the directory structure up-to given dentry
    - * in the bindex branch.
    + * in the bindex branch. On success, fill in @mntp;
    */
    struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
    const char *name, int bindex)
    @@ -682,6 +699,7 @@ struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
    int old_bend;
    struct dentry **path = NULL;
    struct super_block *sb;
    + struct vfsmount *lower_parent_mnt;

    verify_locked(dentry);

    @@ -753,6 +771,7 @@ struct dentry *create_parents(struct inode *dir, struct dentry *dentry,
    begin:
    /* get lower parent dir in the current branch */
    lower_parent_dentry = unionfs_lower_dentry_idx(parent_dentry, bindex);
    + lower_parent_mnt = unionfs_lower_mnt_idx(sb->s_root, bindex);
    dput(parent_dentry);

    /* init the values to lookup */
    @@ -796,6 +815,8 @@ begin:
    /* it's a negative dentry, create a new dir */
    lower_parent_dentry = lock_parent(lower_dentry);

    + args.path.dentry = lower_parent_dentry;
    + args.path.mnt = lower_parent_mnt;
    args.mkdir.parent = lower_parent_dentry->d_inode;
    args.mkdir.dentry = lower_dentry;
    args.mkdir.mode = child_dentry->d_inode->i_mode;
    diff --git a/fs/unionfs/dirhelper.c b/fs/unionfs/dirhelper.c
    index 4b73bb6..7a8a654 100644
    --- a/fs/unionfs/dirhelper.c
    +++ b/fs/unionfs/dirhelper.c
    @@ -34,6 +34,7 @@ int do_delete_whiteouts(struct dentry *dentry, int bindex,
    int i;
    struct list_head *pos;
    struct filldir_node *cursor;
    + struct path path;

    /* Find out lower parent dentry */
    lower_dir_dentry = unionfs_lower_dentry_idx(dentry, bindex);
    @@ -69,8 +70,12 @@ int do_delete_whiteouts(struct dentry *dentry, int bindex,
    err = PTR_ERR(lower_dentry);
    break;
    }
    - if (lower_dentry->d_inode)
    - err = vfs_unlink(lower_dir, lower_dentry);
    + if (lower_dentry->d_inode) {
    + path.dentry = lower_dir_dentry;
    + path.mnt = unionfs_lower_mnt_idx(dentry,
    + bindex);
    + err = path_unlink(&path, lower_dentry);
    + }
    dput(lower_dentry);
    if (err)
    break;
    @@ -113,6 +118,7 @@ int delete_whiteouts(struct dentry *dentry, int bindex,
    if (!permission(lower_dir, MAY_WRITE | MAY_EXEC, NULL)) {
    err = do_delete_whiteouts(dentry, bindex, namelist);
    } else {
    + /* __delete_whiteouts doesn't need args.path filled */
    args.deletewh.namelist = namelist;
    args.deletewh.dentry = dentry;
    args.deletewh.bindex = bindex;
    diff --git a/fs/unionfs/inode.c b/fs/unionfs/inode.c
    index a1d7aaf..2cf161e 100644
    --- a/fs/unionfs/inode.c
    +++ b/fs/unionfs/inode.c
    @@ -36,6 +36,7 @@ static int check_for_whiteout(struct dentry *dentry,
    struct dentry *wh_dentry = NULL;
    struct dentry *lower_dir_dentry;
    char *name = NULL;
    + struct path path;

    /*
    * check if whiteout exists in this branch, i.e. lookup .wh.foo
    @@ -60,9 +61,11 @@ static int check_for_whiteout(struct dentry *dentry,

    /* .wh.foo has been found, so let's unlink it */
    lower_dir_dentry = lock_parent_wh(wh_dentry);
    + path.dentry = lower_dir_dentry;
    + path.mnt = unionfs_lower_mnt(dentry);
    /* see Documentation/filesystems/unionfs/issues.txt */
    lockdep_off();
    - err = vfs_unlink(lower_dir_dentry->d_inode, wh_dentry);
    + err = path_unlink(&path, wh_dentry);
    lockdep_on();
    unlock_dir(lower_dir_dentry);

    @@ -97,13 +100,16 @@ out:
    * suitable, also tries branch 0 (which may require a copyup).
    *
    * Return a lower_dentry we can use to create object in, or ERR_PTR.
    + * On success, fills in @mntp with lower_dentry's corresponding vfsmount.
    */
    static struct dentry *find_writeable_branch(struct inode *parent,
    - struct dentry *dentry)
    + struct dentry *dentry,
    + struct vfsmount **mntp)
    {
    int err = -EINVAL;
    int bindex, istart, iend;
    struct dentry *lower_dentry = NULL;
    + struct vfsmount *m;

    istart = ibstart(parent);
    iend = ibend(parent);
    @@ -160,6 +166,11 @@ begin:
    goto out;
    }
    }
    + m = unionfs_lower_mnt_idx(dentry->d_parent, bindex);
    + if (!m)
    + m = unionfs_lower_mnt_idx(dentry->d_sb->s_root, bindex);
    + *mntp = m;
    +
    err = 0; /* all's well */
    out:
    if (err)
    @@ -175,6 +186,7 @@ static int unionfs_create(struct inode *parent, struct dentry *dentry,
    struct dentry *lower_parent_dentry = NULL;
    int valid = 0;
    struct nameidata lower_nd;
    + struct vfsmount *mnt = NULL;

    unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
    unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
    @@ -193,7 +205,7 @@ static int unionfs_create(struct inode *parent, struct dentry *dentry,
    */
    BUG_ON(!valid && dentry->d_inode);

    - lower_dentry = find_writeable_branch(parent, dentry);
    + lower_dentry = find_writeable_branch(parent, dentry, &mnt);
    if (IS_ERR(lower_dentry)) {
    err = PTR_ERR(lower_dentry);
    goto out;
    @@ -205,11 +217,11 @@ static int unionfs_create(struct inode *parent, struct dentry *dentry,
    goto out;
    }

    - err = init_lower_nd(&lower_nd, LOOKUP_CREATE);
    + err = init_lower_nd(&lower_nd, lower_parent_dentry, mnt,
    + LOOKUP_CREATE);
    if (unlikely(err < 0))
    goto out;
    - err = vfs_create(lower_parent_dentry->d_inode, lower_dentry, mode,
    - &lower_nd);
    + err = path_create(&lower_nd.path, lower_dentry, mode, &lower_nd);
    release_lower_nd(&lower_nd, err);

    if (!err) {
    @@ -304,6 +316,7 @@ static int unionfs_link(struct dentry *old_dentry, struct inode *dir,
    struct dentry *lower_dir_dentry = NULL;
    struct dentry *whiteout_dentry;
    char *name = NULL;
    + struct path path;

    unionfs_read_lock(old_dentry->d_sb, UNIONFS_SMUTEX_CHILD);
    unionfs_double_lock_dentry(new_dentry, old_dentry);
    @@ -346,10 +359,11 @@ static int unionfs_link(struct dentry *old_dentry, struct inode *dir,
    lower_dir_dentry = lock_parent_wh(whiteout_dentry);
    err = is_robranch_super(new_dentry->d_sb, dbstart(new_dentry));
    if (!err) {
    + path.dentry = lower_dir_dentry;
    + path.mnt = unionfs_lower_mnt(new_dentry);
    /* see Documentation/filesystems/unionfs/issues.txt */
    lockdep_off();
    - err = vfs_unlink(lower_dir_dentry->d_inode,
    - whiteout_dentry);
    + err = path_unlink(&path, whiteout_dentry);
    lockdep_on();
    }

    @@ -379,10 +393,13 @@ static int unionfs_link(struct dentry *old_dentry, struct inode *dir,
    lower_dir_dentry = lock_parent(lower_new_dentry);
    err = is_robranch(old_dentry);
    if (!err) {
    + path.dentry = lower_dir_dentry;
    + path.mnt = unionfs_lower_mnt(new_dentry->d_parent);
    + if (!path.mnt)
    + path.mnt = unionfs_lower_mnt(new_dentry->d_sb->s_root);
    /* see Documentation/filesystems/unionfs/issues.txt */
    lockdep_off();
    - err = vfs_link(lower_old_dentry, lower_dir_dentry->d_inode,
    - lower_new_dentry);
    + err = path_link(lower_old_dentry, &path, lower_new_dentry);
    lockdep_on();
    }
    unlock_dir(lower_dir_dentry);
    @@ -406,12 +423,16 @@ docopyup:
    bindex);
    lower_old_dentry = unionfs_lower_dentry(old_dentry);
    lower_dir_dentry = lock_parent(lower_new_dentry);
    + path.dentry = lower_dir_dentry;
    + path.mnt = unionfs_lower_mnt_idx(new_dentry, bindex);
    + if (!path.mnt)
    + path.mnt = unionfs_lower_mnt_idx(
    + new_dentry->d_sb->s_root, bindex);
    /* see Documentation/filesystems/unionfs/issues.txt */
    lockdep_off();
    /* do vfs_link */
    - err = vfs_link(lower_old_dentry,
    - lower_dir_dentry->d_inode,
    - lower_new_dentry);
    + err = path_link(lower_old_dentry, &path,
    + lower_new_dentry);
    lockdep_on();
    unlock_dir(lower_dir_dentry);
    goto check_link;
    @@ -463,6 +484,7 @@ static int unionfs_symlink(struct inode *parent, struct dentry *dentry,
    char *name = NULL;
    int valid = 0;
    umode_t mode;
    + struct path path;

    unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
    unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
    @@ -485,7 +507,7 @@ static int unionfs_symlink(struct inode *parent, struct dentry *dentry,
    */
    BUG_ON(!valid && dentry->d_inode);

    - lower_dentry = find_writeable_branch(parent, dentry);
    + lower_dentry = find_writeable_branch(parent, dentry, &path.mnt);
    if (IS_ERR(lower_dentry)) {
    err = PTR_ERR(lower_dentry);
    goto out;
    @@ -498,8 +520,8 @@ static int unionfs_symlink(struct inode *parent, struct dentry *dentry,
    }

    mode = S_IALLUGO;
    - err = vfs_symlink(lower_parent_dentry->d_inode, lower_dentry,
    - symname, mode);
    + path.dentry = lower_parent_dentry;
    + err = path_symlink(&path, lower_dentry, symname, mode);
    if (!err) {
    err = PTR_ERR(unionfs_interpose(dentry, parent->i_sb, 0));
    if (!err) {
    @@ -538,6 +560,7 @@ static int unionfs_mkdir(struct inode *parent, struct dentry *dentry, int mode)
    int whiteout_unlinked = 0;
    struct sioq_args args;
    int valid;
    + struct path path;

    unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
    unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
    @@ -581,9 +604,11 @@ static int unionfs_mkdir(struct inode *parent, struct dentry *dentry, int mode)
    } else {
    lower_parent_dentry = lock_parent_wh(whiteout_dentry);

    - /* found a.wh.foo entry, remove it then do vfs_mkdir */
    + /* found a.wh.foo entry, remove it then do path_mkdir */
    err = is_robranch_super(dentry->d_sb, bstart);
    if (!err) {
    + args.path.dentry = lower_parent_dentry;
    + args.path.mnt = unionfs_lower_mnt(dentry->d_parent);
    args.unlink.parent = lower_parent_dentry->d_inode;
    args.unlink.dentry = whiteout_dentry;
    run_sioq(__unionfs_unlink, &args);
    @@ -629,8 +654,9 @@ static int unionfs_mkdir(struct inode *parent, struct dentry *dentry, int mode)
    goto out;
    }

    - err = vfs_mkdir(lower_parent_dentry->d_inode, lower_dentry,
    - mode);
    + path.dentry = lower_parent_dentry;
    + path.mnt = unionfs_lower_mnt_idx(dentry->d_parent, bindex);
    + err = path_mkdir(&path, lower_dentry, mode);

    unlock_dir(lower_parent_dentry);

    @@ -699,6 +725,7 @@ static int unionfs_mknod(struct inode *parent, struct dentry *dentry, int mode,
    struct dentry *lower_parent_dentry = NULL;
    char *name = NULL;
    int valid = 0;
    + struct path path;

    unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
    unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
    @@ -721,7 +748,7 @@ static int unionfs_mknod(struct inode *parent, struct dentry *dentry, int mode,
    */
    BUG_ON(!valid && dentry->d_inode);

    - lower_dentry = find_writeable_branch(parent, dentry);
    + lower_dentry = find_writeable_branch(parent, dentry, &path.mnt);
    if (IS_ERR(lower_dentry)) {
    err = PTR_ERR(lower_dentry);
    goto out;
    @@ -733,7 +760,8 @@ static int unionfs_mknod(struct inode *parent, struct dentry *dentry, int mode,
    goto out;
    }

    - err = vfs_mknod(lower_parent_dentry->d_inode, lower_dentry, mode, dev);
    + path.dentry = lower_parent_dentry;
    + err = path_mknod(&path, lower_dentry, mode, dev);
    if (!err) {
    err = PTR_ERR(unionfs_interpose(dentry, parent->i_sb, 0));
    if (!err) {
    diff --git a/fs/unionfs/lookup.c b/fs/unionfs/lookup.c
    index 7f512c2..5bfbdf6 100644
    --- a/fs/unionfs/lookup.c
    +++ b/fs/unionfs/lookup.c
    @@ -52,6 +52,7 @@ static noinline_for_stack int is_opaque_dir(struct dentry *dentry, int bindex)
    lookup_one_len(UNIONFS_DIR_OPAQUE, lower_dentry,
    sizeof(UNIONFS_DIR_OPAQUE) - 1);
    } else {
    + /* __is_opaque_dir doesn't need args.path filled */
    args.is_opaque.dentry = lower_dentry;
    run_sioq(__is_opaque_dir, &args);
    wh_lower_dentry = args.ret;
    @@ -603,7 +604,8 @@ void update_bstart(struct dentry *dentry)
    * separate intent-structure, and (2) open_namei() is broken into a VFS-only
    * function and a method that other file systems can call.
    */
    -int init_lower_nd(struct nameidata *nd, unsigned int flags)
    +int init_lower_nd(struct nameidata *nd, struct dentry *dentry,
    + struct vfsmount *mnt, unsigned int flags)
    {
    int err = 0;
    #ifdef ALLOC_LOWER_ND_FILE
    @@ -648,6 +650,10 @@ int init_lower_nd(struct nameidata *nd, unsigned int flags)
    break;
    }

    + /* initialize lower path (dentry+vfsmount) */
    + nd->path.dentry = dentry;
    + nd->path.mnt = mnt;
    +
    return err;
    }

    diff --git a/fs/unionfs/rename.c b/fs/unionfs/rename.c
    index cc16eb2..8f9703a 100644
    --- a/fs/unionfs/rename.c
    +++ b/fs/unionfs/rename.c
    @@ -31,6 +31,7 @@ static int __unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
    struct dentry *lower_wh_dir_dentry;
    struct dentry *trap;
    char *wh_name = NULL;
    + struct path path;

    lower_new_dentry = unionfs_lower_dentry_idx(new_dentry, bindex);
    lower_old_dentry = unionfs_lower_dentry_idx(old_dentry, bindex);
    @@ -79,9 +80,11 @@ static int __unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,

    lower_wh_dir_dentry = lock_parent_wh(lower_wh_dentry);
    err = is_robranch_super(old_dentry->d_sb, bindex);
    - if (!err)
    - err = vfs_unlink(lower_wh_dir_dentry->d_inode,
    - lower_wh_dentry);
    + if (!err) {
    + path.dentry = lower_wh_dir_dentry;
    + path.mnt = unionfs_lower_mnt_idx(new_dentry, bindex);
    + err = path_unlink(&path, lower_wh_dentry);
    + }

    dput(lower_wh_dentry);
    unlock_dir(lower_wh_dir_dentry);
    @@ -135,8 +138,10 @@ static int __unionfs_rename(struct inode *old_dir, struct dentry *old_dentry,
    err = -ENOTEMPTY;
    goto out_err_unlock;
    }
    - err = vfs_rename(lower_old_dir_dentry->d_inode, lower_old_dentry,
    - lower_new_dir_dentry->d_inode, lower_new_dentry);
    + path.dentry = lower_old_dir_dentry;
    + path.mnt = unionfs_lower_mnt_idx(old_dentry, bindex);
    + err = path_rename(&path, lower_old_dentry,
    + lower_new_dir_dentry->d_inode, lower_new_dentry);
    out_err_unlock:
    if (!err) {
    /* update parent dir times */
    @@ -220,9 +225,12 @@ static int do_unionfs_rename(struct inode *old_dir,

    unlink_dir_dentry = lock_parent(unlink_dentry);
    err = is_robranch_super(old_dir->i_sb, bindex);
    - if (!err)
    - err = vfs_unlink(unlink_dir_dentry->d_inode,
    - unlink_dentry);
    + if (!err) {
    + struct path path;
    + path.dentry = unlink_dir_dentry;
    + path.mnt = unionfs_lower_mnt_idx(new_dentry, bindex);
    + err = path_unlink(&path, unlink_dentry);
    + }

    fsstack_copy_attr_times(new_dentry->d_parent->d_inode,
    unlink_dir_dentry->d_inode);
    @@ -283,6 +291,8 @@ static int do_unionfs_rename(struct inode *old_dir,
    if ((old_bstart != old_bend) || (do_copyup != -1)) {
    struct dentry *lower_parent;
    struct nameidata nd;
    + struct vfsmount *mnt;
    +
    if (!wh_old || wh_old->d_inode || bwh_old < 0) {
    printk(KERN_ERR "unionfs: rename error "
    "(wh_old=%p/%p bwh_old=%d)\n", wh_old,
    @@ -290,12 +300,14 @@ static int do_unionfs_rename(struct inode *old_dir,
    err = -EIO;
    goto out;
    }
    - err = init_lower_nd(&nd, LOOKUP_CREATE);
    + mnt = unionfs_lower_mnt_idx(old_dentry, bwh_old);
    + err = init_lower_nd(&nd, NULL, /* set nd.path.dentry below */
    + mnt, LOOKUP_CREATE);
    if (unlikely(err < 0))
    goto out;
    lower_parent = lock_parent_wh(wh_old);
    - local_err = vfs_create(lower_parent->d_inode, wh_old, S_IRUGO,
    - &nd);
    + nd.path.dentry = lower_parent;
    + local_err = path_create(&nd.path, wh_old, S_IRUGO, &nd);
    unlock_dir(lower_parent);
    if (!local_err) {
    set_dbopaque(old_dentry, bwh_old);
    diff --git a/fs/unionfs/sioq.c b/fs/unionfs/sioq.c
    index 2a8c88e..2d94e85 100644
    --- a/fs/unionfs/sioq.c
    +++ b/fs/unionfs/sioq.c
    @@ -60,7 +60,7 @@ void __unionfs_create(struct work_struct *work)
    struct sioq_args *args = container_of(work, struct sioq_args, work);
    struct create_args *c = &args->create;

    - args->err = vfs_create(c->parent, c->dentry, c->mode, c->nd);
    + args->err = path_create(&args->path, c->dentry, c->mode, c->nd);
    complete(&args->comp);
    }

    @@ -69,7 +69,7 @@ void __unionfs_mkdir(struct work_struct *work)
    struct sioq_args *args = container_of(work, struct sioq_args, work);
    struct mkdir_args *m = &args->mkdir;

    - args->err = vfs_mkdir(m->parent, m->dentry, m->mode);
    + args->err = path_mkdir(&args->path, m->dentry, m->mode);
    complete(&args->comp);
    }

    @@ -78,7 +78,7 @@ void __unionfs_mknod(struct work_struct *work)
    struct sioq_args *args = container_of(work, struct sioq_args, work);
    struct mknod_args *m = &args->mknod;

    - args->err = vfs_mknod(m->parent, m->dentry, m->mode, m->dev);
    + args->err = path_mknod(&args->path, m->dentry, m->mode, m->dev);
    complete(&args->comp);
    }

    @@ -87,7 +87,7 @@ void __unionfs_symlink(struct work_struct *work)
    struct sioq_args *args = container_of(work, struct sioq_args, work);
    struct symlink_args *s = &args->symlink;

    - args->err = vfs_symlink(s->parent, s->dentry, s->symbuf, s->mode);
    + args->err = path_symlink(&args->path, s->dentry, s->symbuf, s->mode);
    complete(&args->comp);
    }

    @@ -96,7 +96,7 @@ void __unionfs_unlink(struct work_struct *work)
    struct sioq_args *args = container_of(work, struct sioq_args, work);
    struct unlink_args *u = &args->unlink;

    - args->err = vfs_unlink(u->parent, u->dentry);
    + args->err = path_unlink(&args->path, u->dentry);
    complete(&args->comp);
    }

    diff --git a/fs/unionfs/sioq.h b/fs/unionfs/sioq.h
    index afb71ee..633b06a 100644
    --- a/fs/unionfs/sioq.h
    +++ b/fs/unionfs/sioq.h
    @@ -63,6 +63,7 @@ struct sioq_args {
    struct work_struct work;
    int err;
    void *ret;
    + struct path path;

    union {
    struct deletewh_args deletewh;
    diff --git a/fs/unionfs/subr.c b/fs/unionfs/subr.c
    index 1a40f63..dbe5e06 100644
    --- a/fs/unionfs/subr.c
    +++ b/fs/unionfs/subr.c
    @@ -26,12 +26,13 @@
    int create_whiteout(struct dentry *dentry, int start)
    {
    int bstart, bend, bindex;
    - struct dentry *lower_dir_dentry;
    + struct dentry *lower_dir_dentry = NULL;
    struct dentry *lower_dentry;
    struct dentry *lower_wh_dentry;
    struct nameidata nd;
    char *name = NULL;
    int err = -EINVAL;
    + struct vfsmount *mnt;

    verify_locked(dentry);

    @@ -87,16 +88,19 @@ int create_whiteout(struct dentry *dentry, int start)
    goto out;
    }

    - err = init_lower_nd(&nd, LOOKUP_CREATE);
    + mnt = unionfs_lower_mnt_idx(dentry->d_parent, bindex);
    + err = init_lower_nd(&nd, NULL, /* lower dentry set below */
    + mnt, LOOKUP_CREATE);
    if (unlikely(err < 0))
    goto out;
    lower_dir_dentry = lock_parent_wh(lower_wh_dentry);
    + nd.path.dentry = lower_dir_dentry;
    err = is_robranch_super(dentry->d_sb, bindex);
    if (!err)
    - err = vfs_create(lower_dir_dentry->d_inode,
    - lower_wh_dentry,
    - ~current->fs->umask & S_IRWXUGO,
    - &nd);
    + err = path_create(&nd.path,
    + lower_wh_dentry,
    + ~current->fs->umask & S_IRWXUGO,
    + &nd);
    unlock_dir(lower_dir_dentry);
    dput(lower_wh_dentry);
    release_lower_nd(&nd, err);
    @@ -189,11 +193,13 @@ int make_dir_opaque(struct dentry *dentry, int bindex)
    goto out;
    }

    - err = init_lower_nd(&nd, LOOKUP_CREATE);
    + err = init_lower_nd(&nd, lower_dentry,
    + unionfs_lower_mnt_idx(dentry, bindex),
    + LOOKUP_CREATE);
    if (unlikely(err < 0))
    goto out;
    if (!diropq->d_inode)
    - err = vfs_create(lower_dir, diropq, S_IRUGO, &nd);
    + err = path_create(&nd.path, diropq, S_IRUGO, &nd);
    if (!err)
    set_dbopaque(dentry, bindex);
    release_lower_nd(&nd, err);
    diff --git a/fs/unionfs/union.h b/fs/unionfs/union.h
    index 735d40c..681eab1 100644
    --- a/fs/unionfs/union.h
    +++ b/fs/unionfs/union.h
    @@ -303,7 +303,8 @@ static inline void unionfs_double_lock_dentry(struct dentry *d1,
    extern int new_dentry_private_data(struct dentry *dentry, int subclass);
    extern void free_dentry_private_data(struct dentry *dentry);
    extern void update_bstart(struct dentry *dentry);
    -extern int init_lower_nd(struct nameidata *nd, unsigned int flags);
    +extern int init_lower_nd(struct nameidata *nd, struct dentry *dentry,
    + struct vfsmount *mnt, unsigned int flags);
    extern void release_lower_nd(struct nameidata *nd, int err);

    /*
    diff --git a/fs/unionfs/unlink.c b/fs/unionfs/unlink.c
    index cad0386..6216f8a 100644
    --- a/fs/unionfs/unlink.c
    +++ b/fs/unionfs/unlink.c
    @@ -67,14 +67,16 @@ static int unionfs_unlink_whiteout(struct inode *dir, struct dentry *dentry)
    dget(lower_dentry);
    err = is_robranch_super(dentry->d_sb, bindex);
    if (!err) {
    + struct path path;
    + path.dentry = lower_dir_dentry;
    + path.mnt = unionfs_lower_mnt_idx(dentry->d_parent,
    + bindex);
    /* see Documentation/filesystems/unionfs/issues.txt */
    lockdep_off();
    if (!S_ISDIR(lower_dentry->d_inode->i_mode))
    - err = vfs_unlink(lower_dir_dentry->d_inode,
    - lower_dentry);
    + err = path_unlink(&path, lower_dentry);
    else
    - err = vfs_rmdir(lower_dir_dentry->d_inode,
    - lower_dentry);
    + err = path_rmdir(&path, lower_dentry);
    lockdep_on();
    }

    diff --git a/fs/unionfs/xattr.c b/fs/unionfs/xattr.c
    index 8001c65..f168bea 100644
    --- a/fs/unionfs/xattr.c
    +++ b/fs/unionfs/xattr.c
    @@ -73,6 +73,7 @@ int unionfs_setxattr(struct dentry *dentry, const char *name,
    {
    struct dentry *lower_dentry = NULL;
    int err = -EOPNOTSUPP;
    + struct path path;

    unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
    unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
    @@ -83,9 +84,10 @@ int unionfs_setxattr(struct dentry *dentry, const char *name,
    }

    lower_dentry = unionfs_lower_dentry(dentry);
    + path.dentry = lower_dentry;
    + path.mnt = unionfs_lower_mnt(dentry);

    - err = vfs_setxattr(lower_dentry, (char *) name, (void *) value,
    - size, flags);
    + err = path_setxattr(&path, (char *) name, (void *) value, size, flags);

    out:
    unionfs_check_dentry(dentry);
    @@ -102,6 +104,7 @@ int unionfs_removexattr(struct dentry *dentry, const char *name)
    {
    struct dentry *lower_dentry = NULL;
    int err = -EOPNOTSUPP;
    + struct path path;

    unionfs_read_lock(dentry->d_sb, UNIONFS_SMUTEX_CHILD);
    unionfs_lock_dentry(dentry, UNIONFS_DMUTEX_CHILD);
    @@ -112,8 +115,10 @@ int unionfs_removexattr(struct dentry *dentry, const char *name)
    }

    lower_dentry = unionfs_lower_dentry(dentry);
    + path.dentry = lower_dentry;
    + path.mnt = unionfs_lower_mnt(dentry);

    - err = vfs_removexattr(lower_dentry, (char *) name);
    + err = path_removexattr(&path, (char *) name);

    out:
    unionfs_check_dentry(dentry);
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [PATCH] Unionfs: use the new path_* VFS helpers

    > As promised, here is a patch for Unionfs to use the new path_* VFS helpers
    > which now take a struct path *. I've used Al's vfs git tree plus all 12
    > patches you've outlined (ten path_* patches + 2 others). I've run Unionfs
    > through my basic regression suite; I haven't done more extensive testing
    > yet. This patch can be used in -mm if/when your path_* patches make it
    > there (this patch will be a standalone one to be used on top of my
    > unionfs.git tree).


    Thanks.

    > Al, I'm not too pleased with the resulting additional code complexity to
    > Unionfs, due to these new helpers. See the diffstat below. These VFS
    > helpers have widened the gap between the f/s API and the VFS helpers': the
    > helpers now need a struct path/vfsmount, but the most ->ops that a
    > filesystem is being called with, don't have a path/vfsmount passed.


    I'm not too pleased either

    What I think would be much cleaner if for example you had functions
    like this:

    void unionfs_lower_path_index(struct dentry *dentry, int bindex,
    struct path *path_out);

    Etc. If you are dealing with paths, you should not be handling
    vfsmounts separately. That part is ugly and just invitation for bugs.

    > You've said before that a filesystems has no business dealing with vfsmounts
    > (with the exception of follow_link). Fine. But the irony now is that the
    > new vfs helpers that a filesystems is likely to use, FORCE you to know a lot
    > more about vfsmounts! If indeed a filesystem shouldn't have to know or deal
    > with vfsmounts, then none of the VFS helpers should require a struct
    > vfsmount (or a struct path, or a nameidata, which embed a vfsmount+dentry
    > pair). Otherwise, let's admit that some filesystems have to know about
    > vfsmounts and hence let's pass them to all appropriate fs ->ops.


    How would that help? AFAICS the vfsmounts on the lower layers should
    have _absolutetly nothing_ to do with the vfsmount on the top layer.

    So no, passing down vfsmounts to the filesystem is wrong (with the
    exception of follow_link, and even in that case only very specialized
    filesystems like proc will ever have a valid use for it).

    > (BTW, if
    > that's done, then it'd also be helpful for a struct vfsmount to have a "void
    > *private" into which a stacked f/s can store "lower objects").


    Oh, please! I think you are confused about the role that vfsmounts
    play in stacking. Vfsmounts place filesystems in the global and
    private mount namespaces. But each and every instance of that
    placement should behave exactly the same (modulo such flags as
    per-mount r/o, etc), and so stacking filesystems, just like any other
    filesystem, should have no need for the knowledge about where they are
    placed in the namespace.

    On the other hand, they _should_ have knowledge of the placement of
    the filesystems they are operating on. So when accessing the lower
    layers, they can conform to the same rules as anything else accessing
    the filesystems. I.e. they shouldn't meddle directly with the
    filesystem, without going through the namespace layer first.

    This is already enforced for dentry_open(), and I don't see the big
    issue with enfocing it for the rest of the vfs_* operations.

    Miklos
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. [PATCH] Call LSM functions outside VFS helper functions.

    Hello.

    Miklos Szeredi wrote:
    > > Al, I'm not too pleased with the resulting additional code complexity to
    > > Unionfs, due to these new helpers. See the diffstat below. These VFS
    > > helpers have widened the gap between the f/s API and the VFS helpers': the
    > > helpers now need a struct path/vfsmount, but the most ->ops that a
    > > filesystem is being called with, don't have a path/vfsmount passed.

    >
    > I'm not too pleased either
    >


    I'm reading this thread on tenterhooks.
    If vfsmount is passed to VFS helper functions, that will make
    r/o bind mounts and AppArmor and TOMOYO Linux happy.
    But so far, it seems that "passing vfsmount makes things complicated".

    If the conclusion became "vfsmount should not be passed to
    VFS helper functions", that's OK, but I want you to consider
    the below approach for AppArmor and TOMOYO Linux. This patch is a repost of
    http://kerneltrap.org/mailarchive/li...08/2/17/882024 .

    Regards.
    ----------
    Subject: Call LSM functions outside VFS helper functions.

    This patch allows LSM to check permission using "struct vfsmount"
    without passing "struct vfsmount" to VFS helper functions.
    There is a side effect that conventional permission checks are done twice
    because I want to do DAC checks before MAC checks.

    Signed-off-by: Tetsuo Handa
    ---
    fs/namei.c | 176 ++++++++++++++++++++++++++++++++++++++---------
    fs/open.c | 11 ++
    include/linux/fs.h | 1
    include/linux/security.h | 63 ++++++++++++++++
    net/unix/af_unix.c | 9 ++
    security/dummy.c | 17 ++++
    security/security.c | 15 ++++
    7 files changed, 254 insertions(+), 38 deletions(-)

    --- linux-2.6.25-rc8-mm1.orig/fs/namei.c
    +++ linux-2.6.25-rc8-mm1/fs/namei.c
    @@ -2021,18 +2021,24 @@ fail:
    }
    EXPORT_SYMBOL_GPL(lookup_create);

    -int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
    +int pre_vfs_mknod(struct inode *dir, struct dentry *dentry, int mode)
    {
    int error = may_create(dir, dentry, NULL);
    -
    if (error)
    return error;
    -
    if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
    return -EPERM;
    -
    if (!dir->i_op || !dir->i_op->mknod)
    return -EPERM;
    + return 0;
    +}
    +EXPORT_SYMBOL_GPL(pre_vfs_mknod);
    +
    +int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
    +{
    + int error = pre_vfs_mknod(dir, dentry, mode);
    + if (error)
    + return error;

    error = devcgroup_inode_mknod(mode, dev);
    if (error)
    @@ -2097,14 +2103,39 @@ asmlinkage long sys_mknodat(int dfd, con
    if (error)
    goto out_dput;
    switch (mode & S_IFMT) {
    + int type;
    case 0: case S_IFREG:
    error = vfs_create(nd.path.dentry->d_inode,dentry,mode,&nd);
    break;
    case S_IFCHR: case S_IFBLK:
    + error = pre_vfs_mknod(nd.path.dentry->d_inode, dentry,
    + mode);
    + if (error)
    + break;
    + if (S_ISCHR(mode))
    + type = TYPE_MKCHAR_ACL;
    + else
    + type = TYPE_MKBLOCK_ACL;
    + error = security_singlepath_permission(type, dentry,
    + nd.path.mnt);
    + if (error)
    + break;
    error = vfs_mknod(nd.path.dentry->d_inode,dentry,mode,
    new_decode_dev(dev));
    break;
    case S_IFIFO: case S_IFSOCK:
    + error = pre_vfs_mknod(nd.path.dentry->d_inode, dentry,
    + mode);
    + if (error)
    + break;
    + if (S_ISFIFO(mode))
    + type = TYPE_MKFIFO_ACL;
    + else
    + type = TYPE_MKSOCK_ACL;
    + error = security_singlepath_permission(type, dentry,
    + nd.path.mnt);
    + if (error)
    + break;
    error = vfs_mknod(nd.path.dentry->d_inode,dentry,mode,0);
    break;
    }
    @@ -2125,15 +2156,21 @@ asmlinkage long sys_mknod(const char __u
    return sys_mknodat(AT_FDCWD, filename, mode, dev);
    }

    -int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
    +static inline int pre_vfs_mkdir(struct inode *dir, struct dentry *dentry)
    {
    int error = may_create(dir, dentry, NULL);
    -
    if (error)
    return error;
    -
    if (!dir->i_op || !dir->i_op->mkdir)
    return -EPERM;
    + return 0;
    +}
    +
    +int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
    +{
    + int error = pre_vfs_mkdir(dir, dentry);
    + if (error)
    + return error;

    mode &= (S_IRWXUGO|S_ISVTX);
    error = security_inode_mkdir(dir, dentry, mode);
    @@ -2172,7 +2209,12 @@ asmlinkage long sys_mkdirat(int dfd, con
    error = mnt_want_write(nd.path.mnt);
    if (error)
    goto out_dput;
    - error = vfs_mkdir(nd.path.dentry->d_inode, dentry, mode);
    + error = pre_vfs_mkdir(nd.path.dentry->d_inode, dentry);
    + if (!error)
    + error = security_singlepath_permission(TYPE_MKDIR_ACL, dentry,
    + nd.path.mnt);
    + if (!error)
    + error = vfs_mkdir(nd.path.dentry->d_inode, dentry, mode);
    mnt_drop_write(nd.path.mnt);
    out_dput:
    dput(dentry);
    @@ -2217,15 +2259,21 @@ void dentry_unhash(struct dentry *dentry
    spin_unlock(&dcache_lock);
    }

    -int vfs_rmdir(struct inode *dir, struct dentry *dentry)
    +static inline int pre_vfs_rmdir(struct inode *dir, struct dentry *dentry)
    {
    int error = may_delete(dir, dentry, 1);
    -
    if (error)
    return error;
    -
    if (!dir->i_op || !dir->i_op->rmdir)
    return -EPERM;
    + return 0;
    +}
    +
    +int vfs_rmdir(struct inode *dir, struct dentry *dentry)
    +{
    + int error = pre_vfs_rmdir(dir, dentry);
    + if (error)
    + return error;

    DQUOT_INIT(dir);

    @@ -2284,7 +2332,12 @@ static long do_rmdir(int dfd, const char
    error = mnt_want_write(nd.path.mnt);
    if (error)
    goto exit3;
    - error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
    + error = pre_vfs_rmdir(nd.path.dentry->d_inode, dentry);
    + if (!error)
    + error = security_singlepath_permission(TYPE_RMDIR_ACL, dentry,
    + nd.path.mnt);
    + if (!error)
    + error = vfs_rmdir(nd.path.dentry->d_inode, dentry);
    mnt_drop_write(nd.path.mnt);
    exit3:
    dput(dentry);
    @@ -2302,15 +2355,21 @@ asmlinkage long sys_rmdir(const char __u
    return do_rmdir(AT_FDCWD, pathname);
    }

    -int vfs_unlink(struct inode *dir, struct dentry *dentry)
    +static inline int pre_vfs_unlink(struct inode *dir, struct dentry *dentry)
    {
    int error = may_delete(dir, dentry, 0);
    -
    if (error)
    return error;
    -
    if (!dir->i_op || !dir->i_op->unlink)
    return -EPERM;
    + return 0;
    +}
    +
    +int vfs_unlink(struct inode *dir, struct dentry *dentry)
    +{
    + int error = pre_vfs_unlink(dir, dentry);
    + if (error)
    + return error;

    DQUOT_INIT(dir);

    @@ -2370,7 +2429,13 @@ static long do_unlinkat(int dfd, const c
    error = mnt_want_write(nd.path.mnt);
    if (error)
    goto exit2;
    - error = vfs_unlink(nd.path.dentry->d_inode, dentry);
    + error = pre_vfs_unlink(nd.path.dentry->d_inode, dentry);
    + if (!error)
    + error = security_singlepath_permission(TYPE_UNLINK_ACL,
    + dentry,
    + nd.path.mnt);
    + if (!error)
    + error = vfs_unlink(nd.path.dentry->d_inode, dentry);
    mnt_drop_write(nd.path.mnt);
    exit2:
    dput(dentry);
    @@ -2406,15 +2471,22 @@ asmlinkage long sys_unlink(const char __
    return do_unlinkat(AT_FDCWD, pathname);
    }

    -int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname, int mode)
    +static inline int pre_vfs_symlink(struct inode *dir, struct dentry *dentry)
    {
    int error = may_create(dir, dentry, NULL);
    -
    if (error)
    return error;
    -
    if (!dir->i_op || !dir->i_op->symlink)
    return -EPERM;
    + return 0;
    +}
    +
    +int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname,
    + int mode)
    +{
    + int error = pre_vfs_symlink(dir, dentry);
    + if (error)
    + return error;

    error = security_inode_symlink(dir, dentry, oldname);
    if (error)
    @@ -2455,7 +2527,13 @@ asmlinkage long sys_symlinkat(const char
    error = mnt_want_write(nd.path.mnt);
    if (error)
    goto out_dput;
    - error = vfs_symlink(nd.path.dentry->d_inode, dentry, from, S_IALLUGO);
    + error = pre_vfs_symlink(nd.path.dentry->d_inode, dentry);
    + if (!error)
    + error = security_singlepath_permission(TYPE_SYMLINK_ACL,
    + dentry, nd.path.mnt);
    + if (!error)
    + error = vfs_symlink(nd.path.dentry->d_inode, dentry, from,
    + S_IALLUGO);
    mnt_drop_write(nd.path.mnt);
    out_dput:
    dput(dentry);
    @@ -2474,21 +2552,18 @@ asmlinkage long sys_symlink(const char _
    return sys_symlinkat(oldname, AT_FDCWD, newname);
    }

    -int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry)
    +static inline int pre_vfs_link(struct dentry *old_dentry, struct inode *dir,
    + struct dentry *new_dentry)
    {
    struct inode *inode = old_dentry->d_inode;
    int error;
    -
    if (!inode)
    return -ENOENT;
    -
    error = may_create(dir, new_dentry, NULL);
    if (error)
    return error;
    -
    if (dir->i_sb != inode->i_sb)
    return -EXDEV;
    -
    /*
    * A link to an append-only or immutable file cannot be created.
    */
    @@ -2498,6 +2573,15 @@ int vfs_link(struct dentry *old_dentry,
    return -EPERM;
    if (S_ISDIR(old_dentry->d_inode->i_mode))
    return -EPERM;
    + return 0;
    +}
    +
    +int vfs_link(struct dentry *old_dentry, struct inode *dir,
    + struct dentry *new_dentry)
    +{
    + int error = pre_vfs_link(old_dentry, dir, new_dentry);
    + if (error)
    + return error;

    error = security_inode_link(old_dentry, dir, new_dentry);
    if (error)
    @@ -2555,7 +2639,16 @@ asmlinkage long sys_linkat(int olddfd, c
    error = mnt_want_write(nd.path.mnt);
    if (error)
    goto out_dput;
    - error = vfs_link(old_nd.path.dentry, nd.path.dentry->d_inode, new_dentry);
    + error = pre_vfs_link(old_nd.path.dentry, nd.path.dentry->d_inode,
    + new_dentry);
    + if (!error)
    + error = security_doublepath_permission(TYPE_LINK_ACL,
    + old_nd.path.dentry,
    + nd.path.dentry,
    + old_nd.path.mnt);
    + if (!error)
    + error = vfs_link(old_nd.path.dentry, nd.path.dentry->d_inode,
    + new_dentry);
    mnt_drop_write(nd.path.mnt);
    out_dput:
    dput(new_dentry);
    @@ -2679,20 +2772,18 @@ static int vfs_rename_other(struct inode
    return error;
    }

    -int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
    - struct inode *new_dir, struct dentry *new_dentry)
    +static inline int pre_vfs_rename(struct inode *old_dir,
    + struct dentry *old_dentry,
    + struct inode *new_dir,
    + struct dentry *new_dentry)
    {
    int error;
    int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
    - const char *old_name;
    -
    if (old_dentry->d_inode == new_dentry->d_inode)
    return 0;
    -
    error = may_delete(old_dir, old_dentry, is_dir);
    if (error)
    return error;
    -
    if (!new_dentry->d_inode)
    error = may_create(new_dir, new_dentry, NULL);
    else
    @@ -2702,6 +2793,17 @@ int vfs_rename(struct inode *old_dir, st

    if (!old_dir->i_op || !old_dir->i_op->rename)
    return -EPERM;
    + return 0;
    +}
    +
    +int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
    + struct inode *new_dir, struct dentry *new_dentry)
    +{
    + int error = pre_vfs_rename(old_dir, old_dentry, new_dir, new_dentry);
    + int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
    + const char *old_name;
    + if (error)
    + return error;

    DQUOT_INIT(old_dir);
    DQUOT_INIT(new_dir);
    @@ -2786,7 +2888,15 @@ static int do_rename(int olddfd, const c
    error = mnt_want_write(oldnd.path.mnt);
    if (error)
    goto exit5;
    - error = vfs_rename(old_dir->d_inode, old_dentry,
    + error = pre_vfs_rename(old_dir->d_inode, old_dentry, new_dir->d_inode,
    + new_dentry);
    + if (!error)
    + error = security_doublepath_permission(TYPE_RENAME_ACL,
    + oldnd.path.dentry,
    + newnd.path.dentry,
    + oldnd.path.mnt);
    + if (!error)
    + error = vfs_rename(old_dir->d_inode, old_dentry,
    new_dir->d_inode, new_dentry);
    mnt_drop_write(oldnd.path.mnt);
    exit5:
    --- linux-2.6.25-rc8-mm1.orig/include/linux/security.h
    +++ linux-2.6.25-rc8-mm1/include/linux/security.h
    @@ -110,6 +110,29 @@ struct request_sock;
    #define LSM_UNSAFE_PTRACE 2
    #define LSM_UNSAFE_PTRACE_CAP 4

    +
    +/* singlepath_permission() and doublepath_permission() */
    +#define TYPE_READ_WRITE_ACL 0 /* open for reading/writing */
    +#define TYPE_EXECUTE_ACL 1 /* execute */
    +#define TYPE_READ_ACL 2 /* open for reading */
    +#define TYPE_WRITE_ACL 3 /* open for writing */
    +#define TYPE_CREATE_ACL 4 /* create */
    +#define TYPE_UNLINK_ACL 5 /* unlink */
    +#define TYPE_MKDIR_ACL 6 /* mkdir */
    +#define TYPE_RMDIR_ACL 7 /* rmdir */
    +#define TYPE_MKFIFO_ACL 8 /* mknod(S_IFIFO) */
    +#define TYPE_MKSOCK_ACL 9 /* mknod(S_IFSOCK) */
    +#define TYPE_MKBLOCK_ACL 10 /* mknod(S_IFBLK) */
    +#define TYPE_MKCHAR_ACL 11 /* mknod(S_IFCHR) */
    +#define TYPE_TRUNCATE_ACL 12 /* truncate */
    +#define TYPE_SYMLINK_ACL 13 /* symlink */
    +#define TYPE_REWRITE_ACL 14 /* open for overwriting */
    +#define MAX_SINGLE_PATH_OPERATION 15
    +
    +#define TYPE_LINK_ACL 0 /* link */
    +#define TYPE_RENAME_ACL 1 /* rename */
    +#define MAX_DOUBLE_PATH_OPERATION 2
    +
    #ifdef CONFIG_SECURITY

    struct security_mnt_opts {
    @@ -337,6 +360,19 @@ static inline void security_free_mnt_opt
    * Returns 0 if @name and @value have been successfully set,
    * -EOPNOTSUPP if no security attribute is needed, or
    * -ENOMEM on memory allocation failure.
    + * @singlepath_permission:
    + * Check permission to operations that involve one pathname.
    + * @operation contains type of operation (e.g. mknod, mkdir, symlink).
    + * @dentry contains the dentry structure for the file.
    + * @mnt contains the vfsmount structure for the file.
    + * Return 0 if permission is granted.
    + * @doublepath_permission:
    + * Check permission to operations that involve two pathnames.
    + * @operation contains type of operation (either link or rename).
    + * @old_dentry contains the dentry structure of the old file.
    + * @new_dentry contains the dentry structure of the new file.
    + * @mnt contains the vfsmount structure for the old file.
    + * Return 0 if permission is granted.
    * @inode_create:
    * Check permission to create a regular file.
    * @dir contains inode structure of the parent of the new file.
    @@ -1355,6 +1391,11 @@ struct security_operations {
    void (*inode_free_security) (struct inode *inode);
    int (*inode_init_security) (struct inode *inode, struct inode *dir,
    char **name, void **value, size_t *len);
    + int (*singlepath_permission) (int operation, struct dentry *dentry,
    + struct vfsmount *mnt);
    + int (*doublepath_permission) (int operation, struct dentry *old_dentry,
    + struct dentry *new_dentry,
    + struct vfsmount *mnt);
    int (*inode_create) (struct inode *dir,
    struct dentry *dentry, int mode);
    int (*inode_link) (struct dentry *old_dentry,
    @@ -1629,6 +1670,11 @@ int security_inode_alloc(struct inode *i
    void security_inode_free(struct inode *inode);
    int security_inode_init_security(struct inode *inode, struct inode *dir,
    char **name, void **value, size_t *len);
    +int security_singlepath_permission(int operation, struct dentry *dentry,
    + struct vfsmount *mnt);
    +int security_doublepath_permission(int operation, struct dentry *old_dentry,
    + struct dentry *new_dentry,
    + struct vfsmount *mnt);
    int security_inode_create(struct inode *dir, struct dentry *dentry, int mode);
    int security_inode_link(struct dentry *old_dentry, struct inode *dir,
    struct dentry *new_dentry);
    @@ -1966,7 +2012,22 @@ static inline int security_inode_init_se
    {
    return -EOPNOTSUPP;
    }
    -
    +
    +static inline int security_singlepath_permission(int operation,
    + struct dentry *dentry,
    + struct vfsmount *mnt)
    +{
    + return 0;
    +}
    +
    +static inline int security_doublepath_permission(int operation,
    + struct dentry *old_dentry,
    + struct dentry *new_dentry,
    + struct vfsmount *mnt)
    +{
    + return 0;
    +}
    +
    static inline int security_inode_create (struct inode *dir,
    struct dentry *dentry,
    int mode)
    --- linux-2.6.25-rc8-mm1.orig/security/dummy.c
    +++ linux-2.6.25-rc8-mm1/security/dummy.c
    @@ -286,6 +286,21 @@ static int dummy_inode_init_security (st
    return -EOPNOTSUPP;
    }

    +static inline int dummy_singlepath_permission(int operation,
    + struct dentry *dentry,
    + struct vfsmount *mnt)
    +{
    + return 0;
    +}
    +
    +static inline int dummy_doublepath_permission(int operation,
    + struct dentry *old_dentry,
    + struct dentry *new_dentry,
    + struct vfsmount *mnt)
    +{
    + return 0;
    +}
    +
    static int dummy_inode_create (struct inode *inode, struct dentry *dentry,
    int mask)
    {
    @@ -1080,6 +1095,8 @@ void security_fixup_ops (struct security
    set_to_dummy_if_null(ops, inode_alloc_security);
    set_to_dummy_if_null(ops, inode_free_security);
    set_to_dummy_if_null(ops, inode_init_security);
    + set_to_dummy_if_null(ops, singlepath_permission);
    + set_to_dummy_if_null(ops, doublepath_permission);
    set_to_dummy_if_null(ops, inode_create);
    set_to_dummy_if_null(ops, inode_link);
    set_to_dummy_if_null(ops, inode_unlink);
    --- linux-2.6.25-rc8-mm1.orig/security/security.c
    +++ linux-2.6.25-rc8-mm1/security/security.c
    @@ -388,6 +388,21 @@ int security_inode_init_security(struct
    }
    EXPORT_SYMBOL(security_inode_init_security);

    +int security_singlepath_permission(int operation, struct dentry *dentry,
    + struct vfsmount *mnt)
    +{
    + return security_ops->singlepath_permission(operation, dentry, mnt);
    +}
    +EXPORT_SYMBOL(security_singlepath_permission);
    +
    +int security_doublepath_permission(int operation, struct dentry *old_dentry,
    + struct dentry *new_dentry,
    + struct vfsmount *mnt)
    +{
    + return security_ops->doublepath_permission(operation, old_dentry,
    + new_dentry, mnt);
    +}
    +
    int security_inode_create(struct inode *dir, struct dentry *dentry, int mode)
    {
    if (unlikely(IS_PRIVATE(dir)))
    --- linux-2.6.25-rc8-mm1.orig/net/unix/af_unix.c
    +++ linux-2.6.25-rc8-mm1/net/unix/af_unix.c
    @@ -822,7 +822,14 @@ static int unix_bind(struct socket *sock
    err = mnt_want_write(nd.path.mnt);
    if (err)
    goto out_mknod_dput;
    - err = vfs_mknod(nd.path.dentry->d_inode, dentry, mode, 0);
    + err = pre_vfs_mknod(nd.path.dentry->d_inode, dentry, mode);
    + if (!err)
    + err = security_singlepath_permission(TYPE_MKSOCK_ACL,
    + dentry,
    + nd.path.mnt);
    + if (!err)
    + err = vfs_mknod(nd.path.dentry->d_inode, dentry,
    + mode, 0);
    mnt_drop_write(nd.path.mnt);
    if (err)
    goto out_mknod_dput;
    --- linux-2.6.25-rc8-mm1.orig/include/linux/fs.h
    +++ linux-2.6.25-rc8-mm1/include/linux/fs.h
    @@ -1125,6 +1125,7 @@ extern void unlock_super(struct super_bl
    extern int vfs_permission(struct nameidata *, int);
    extern int vfs_create(struct inode *, struct dentry *, int, struct nameidata *);
    extern int vfs_mkdir(struct inode *, struct dentry *, int);
    +extern int pre_vfs_mknod(struct inode *, struct dentry *, int);
    extern int vfs_mknod(struct inode *, struct dentry *, int, dev_t);
    extern int vfs_symlink(struct inode *, struct dentry *, const char *, int);
    extern int vfs_link(struct dentry *, struct inode *, struct dentry *);
    --- linux-2.6.25-rc8-mm1.orig/fs/open.c
    +++ linux-2.6.25-rc8-mm1/fs/open.c
    @@ -267,8 +267,10 @@ static long do_sys_truncate(const char _
    error = break_lease(inode, FMODE_WRITE);
    if (error)
    goto put_write_and_out;
    -
    - error = locks_verify_truncate(inode, NULL, length);
    + error = security_singlepath_permission(TYPE_TRUNCATE_ACL,
    + nd.path.dentry, nd.path.mnt);
    + if (!error)
    + error = locks_verify_truncate(inode, NULL, length);
    if (!error) {
    DQUOT_INIT(inode);
    error = do_truncate(nd.path.dentry, length, 0, NULL);
    @@ -324,7 +326,10 @@ static long do_sys_ftruncate(unsigned in
    if (IS_APPEND(inode))
    goto out_putf;

    - error = locks_verify_truncate(inode, file, length);
    + error = security_singlepath_permission(TYPE_TRUNCATE_ACL, dentry,
    + file->f_path.mnt);
    + if (!error)
    + error = locks_verify_truncate(inode, file, length);
    if (!error)
    error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, file);
    out_putf:
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [PATCH] Call LSM functions outside VFS helper functions.

    On Thu, Apr 10, 2008 at 09:02:57PM +0900, Tetsuo Handa wrote:
    > If the conclusion became "vfsmount should not be passed to
    > VFS helper functions", that's OK, but I want you to consider
    > the below approach for AppArmor and TOMOYO Linux. This patch is a repost of
    > http://kerneltrap.org/mailarchive/li...08/2/17/882024 .


    I'm a little disappointed that you simply repost this patch rather than
    responding to my post from yesterday:

    > How about an approach which doesn't require the vfsmount to be passed
    > down?
    >
    > When the rule is put in place, say "No modifications to /etc/passwd",
    > look up the inode and major:minor of /etc/passwd. If there's a rename,
    > look up the new inode number. If it's mounted elsewhere, it doesn't
    > matter, they still can't modify it because it has the same
    > major:minor:inode.
    >
    > Is this workable?


    Could you respond to this please?

    --
    Intel are signing my paycheques ... these opinions are still mine
    "Bill, look, we understand that you're interested in selling us this
    operating system, but compare it to ours. We can't possibly take such
    a retrograde step."
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [PATCH] Call LSM functions outside VFS helper functions.

    Hi,

    > I'm reading this thread on tenterhooks.
    > If vfsmount is passed to VFS helper functions, that will make
    > r/o bind mounts and AppArmor and TOMOYO Linux happy.
    > But so far, it seems that "passing vfsmount makes things complicated".


    If done right, it shouldn't. At least it didn't seem to complicate
    the parts that I've touched

    > If the conclusion became "vfsmount should not be passed to
    > VFS helper functions", that's OK, but I want you to consider
    > the below approach for AppArmor and TOMOYO Linux. This patch is a repost of
    > http://kerneltrap.org/mailarchive/li...08/2/17/882024 .


    I'm not a big fan of adding new security hooks. Nor of moving the
    existing ones to callers.

    Miklos
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2