[patch 00/13] vfs: add helpers to check r/o bind mounts - Kernel
This is a discussion on [patch 00/13] vfs: add helpers to check r/o bind mounts - Kernel ; On Thu, Apr 24, 2008 at 04:58:00PM +0200, Miklos Szeredi wrote:
> Since all the vfs_* functions will become static with path_* being the
> only caller, the compiler will be happy to get rid of that stack frame
> ...
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
On Thu, Apr 24, 2008 at 04:58:00PM +0200, Miklos Szeredi wrote:
> Since all the vfs_* functions will become static with path_* being the
> only caller, the compiler will be happy to get rid of that stack frame
> too.
>
> What is left is the guarantee, that the race-free r/o remounts will
> always work and some obscure caller didn't forget to surround it with
> the r/o checks.
>
> I think it's definitely worth it.
It would be, if we had only single vfs_...() as critical sections. We
do not - see previous mail or read the ****ing tree, already.
We don't even have many callers of each, and with a few we do it's not
obvious that we want to go through vfsmounts (and vfsmount-based checks)
in all of them. So no, I don't buy your argument. Sorry.
I'm not even convinced that they are useful as helpers, at least until
we'd decided what to do with checks in nfsd. Until then we do, as
far as I'm concerned, one place where they would definitely DTRT - fs/namei.c.
And I want more than one caller before merging those, let alone removing
the interface that doesn't require checks to be vfsmount-based for all users.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
> > > Oh, for **** sake... grep and ye shall see. Right next to setattr we
> > > have nfsd4_set_nfs4_acl(), with pair of set_nfsv4_acl_one(). I'd rather
> > > have those two covered by a single will/wont range, TYVM.
> > >
> > > nfsd_create() will happily do vfs_mkdir() and nfsd_create_setattr(). Ditto.
> >
> > Please read the patches? I've left exactly these
> > mnt_want_write/drop_write() calls in place, and removed all the
> > others.
>
> You've left _what_ in place? nfsd4_set_nfs4_acl() currently doesn't have
> that single range - fh_verify() + set_nvfs4_acl_one() + set_nfsv4_acl_one().
> And AFAICS you do nothing of that kind there.
>
Yes I did. See only caller of nfsd4_set_nfs4_acl() in nfs4proc.c.
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
On Thu, Apr 24, 2008 at 05:37:39PM +0200, Miklos Szeredi wrote:
> > > What is left is the guarantee, that the race-free r/o remounts will
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^
> > > always work and some obscure caller didn't forget to surround it with
^^^^^^^^^^^^^^^^^
> Why are those so important? Yes, if we have multiple vfs_() calls,
> surround them with an extra want/drop pair.
Which leaves you with the same need to audit all these suckers anyway.
I'm in principle fine with having such helper functions, *IF* they are
not sold as providing all protection one needs, *IF* you are not expecting
to be able to fold all areas down into them and *IF* original ones are
left intact.
Modulo the like path_rename(), BTW - that one is just plain ugly API.
> > We don't even have many callers of each, and with a few we do it's not
> > obvious that we want to go through vfsmounts (and vfsmount-based checks)
> > in all of them. So no, I don't buy your argument. Sorry.
> >
> > I'm not even convinced that they are useful as helpers, at least until
> > we'd decided what to do with checks in nfsd. Until then we do, as
> > far as I'm concerned, one place where they would definitely DTRT - fs/namei.c.
> > And I want more than one caller before merging those,
>
> unix_bind() -> vfs_mknod()
> sys_mq_unlink() -> vfs_unlink()
> open.c (several) -> notify_change()
> *setxattr() -> vfs_setxattr()
> *removexattr() -> vfs_removexattr()
OK.
> > let alone removing the interface that doesn't require checks to be
> > vfsmount-based for all users.
>
> What users? There are paractically _no_ other users. The ones that
> there are (like reiserfs) should not be using them, and there are
> already some patches cleaning that mess up.
OK, explain me, in small words, WTF should something that wants to do
operations on filesystem tree have a vfsmount. Slowly. And "r/o
bind loses value if it can be bypassed" is a hogwash - fs methods are
still there, so it *can* be bypassed just fine, thank you very much.
It's really up to caller. "But they won't be able to do open()" also
doesn't fly - again, it's up to whoever writes particular piece of code.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
> > > > What is left is the guarantee, that the race-free r/o remounts will
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > always work and some obscure caller didn't forget to surround it with
> ^^^^^^^^^^^^^^^^^
>
> > Why are those so important? Yes, if we have multiple vfs_() calls,
> > surround them with an extra want/drop pair.
>
> Which leaves you with the same need to audit all these suckers anyway.
Not really. Missing such calls would just make the *caller* buggy
(i.e. racy with remount r/o), but it would not make the *filesystem*
buggy. Big difference.
> I'm in principle fine with having such helper functions, *IF* they are
> not sold as providing all protection one needs,
I'm not selling them as that.
> *IF* you are not expecting
> to be able to fold all areas down into them and *IF* original ones are
> left intact.
Left intact for whom, specifically? Another question you've managed
to avoid answering.
> Modulo the like path_rename(), BTW - that one is just plain ugly API.
I'm all open to improvements.
> > > let alone removing the interface that doesn't require checks to be
> > > vfsmount-based for all users.
> >
> > What users? There are paractically _no_ other users. The ones that
> > there are (like reiserfs) should not be using them, and there are
> > already some patches cleaning that mess up.
>
> OK, explain me, in small words, WTF should something that wants to do
> operations on filesystem tree have a vfsmount. Slowly. And "r/o
> bind loses value if it can be bypassed" is a hogwash - fs methods are
> still there, so it *can* be bypassed just fine, thank you very much.
And we know what to do with such users.
> It's really up to caller. "But they won't be able to do open()" also
> doesn't fly - again, it's up to whoever writes particular piece of code.
I understand your theory. But it has zero practical significance.
IOW it doesn't matter that someone _may_ want to access the filesystem
without a vfsmount, if that someone doesn't exist.
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
In message <20080424113950.818688067@szeredi.hu>, Miklos Szeredi writes:
> Patches 1-3 are cleanups/bugfixes preceding the main series. The
> description is in 4/13. This is against latest git.
>
>
>
> R/O bind mounts patches have been reviewed numerous times. Still a
> relatively large number of places remained where checking for R/O
> mounts was missed.
>
> Then I did this series, which basically guarantees, that that cannot
> happen. Al rejected it, and rather fixed some of the remaining
> places. He still missed several, which sort of proves my point.
>
> I think it's totally pointless to continue trying to fix the symptoms
> instead of getting at the root of the problem.
>
> I know that VFS interfaces are a sensitive question, but it would be
> nice it we could have some sanity back in this discussion.
And I'll reiterate my desire (see http://lkml.org/lkml/2008/4/8/394) for
stackable file systems to know as little as possible about vfsmounts (or
their derivatives ala struct path). If I have to call helpers which require
vfsmounts, but similar objects don't get passed to me somehow, then I have
to maintain vfsmounts on my own.
Sure, unionfs already does some of it now (and some of it can be recoded
more cleanly), but I'd like to get rid of much of it if I could -- these new
path_* helpers make me have to maintain vfsmounts even moreso than before.
>
>
> Thanks,
> Miklos
Erez.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
> Sure, unionfs already does some of it now (and some of it can be recoded
> more cleanly), but I'd like to get rid of much of it if I could -- these new
> path_* helpers make me have to maintain vfsmounts even moreso than before.
With the words of Al: "that is hogwash". Stacks either maintain
vfsmounts or they don't. There's no middle ground.
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
In message <20080424124245.GC15214@ZenIV.linux.org.uk>, Al Viro writes:
> On Thu, Apr 24, 2008 at 01:39:50PM +0200, Miklos Szeredi wrote:
> > Then I did this series, which basically guarantees, that that cannot
> > happen. Al rejected it, and rather fixed some of the remaining
> > places. He still missed several, which sort of proves my point.
>
> Which ones have I missed?
>
> > I think it's totally pointless to continue trying to fix the symptoms
> > instead of getting at the root of the problem.
> >
> > I know that VFS interfaces are a sensitive question, but it would be
> > nice it we could have some sanity back in this discussion.
>
> Yes, it would. How about that, for starters:
>
> path_create() et.al. are *wrong* for nfsd; if nothing else, I'm not at
> all convinced that even apparmour wants export path + relative there
> _and_ r/o vs. r/w is decision that doesn't clearly map to ex_mnt flags.
>
> Moreover, it's not at all obvious that we want to drop write access as
> soon as vfs_...() is over in case of nfsd. Some of the stuff done
> immeidately afterwards might very well qualify for inclusion into
> protected area; some of the stuff done immediately _prior_ very likely
> needs that as well - look at fh_verify() and tell me why we don't want
> that "I'll hold write access to vfsmount" to span the area including
> that sucker. If we want the r/o vs r/w policy directly vfsmount-based
> for nfsd, that is.
>
> For ecryptfs it's also bogus - at the very least we need to decide what
> should happen when underlying vfsmount is remounted. Again, I'm less
> than convinced that we want the same way to express r/o vs. r/w policy.
I tend to agree that the mnt_want* bracketing may need to encompass more
than one vfs_* method. When copyup is involved, the problem is only
exacerbated. A copyup may want to perform several actions "atomically", for
example:
1. vfs_mkdir /a
2. vfs_mkdir /a/b
3. vfs_mkdir /a/b/c
4. vfs_create /a/b/c/f
5. vfs_read(source file) -> vfs_write(/a/b/c/f)
These actions are often done in close proximity in unionfs, and in such a
case, it'd be nice if I could want_write only once around this sequence.
However, I'm not sure that for copyup (or ecryptfs), that an vfsmount-level
want/drop write is the right interface. As you pointed out before, Al, some
other mechanism might be needed, perhaps at the superblock level, to declare
one's desire to write the f/s "atomically" (to avoid concurrent topology
changes).
Thanks,
Erez.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
In message , Miklos Szeredi writes:
> > Sure, unionfs already does some of it now (and some of it can be recoded
> > more cleanly), but I'd like to get rid of much of it if I could -- these new
> > path_* helpers make me have to maintain vfsmounts even moreso than before.
>
> With the words of Al: "that is hogwash". Stacks either maintain
> vfsmounts or they don't. There's no middle ground.
So, if I wanted to not maintain vfsmounts at all in unionfs, how can I use
the proposed new path_* helpers which require vfsmounts? Will there be some
other helpers available to perform lower-filesystem operations (e.g., mkdir,
create, unlink, etc.) which won't require passing vfsmounts?
The "or they don't" is not much of an option when I'm forced to use an API
that requires a vfsmount...
> Miklos
Thanks,
Erez.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
In message <20080424134826.GD15214@ZenIV.linux.org.uk>, Al Viro writes:
> On Thu, Apr 24, 2008 at 03:05:21PM +0200, Miklos Szeredi wrote:
[...]
> And yes, we need the counterpart for superblock-level stuff, to deal with
> remaining races (look at fs_may_remount_ro() and puke - it's still racy
> as hell). E.g. unlink should do sb-level "will write" when it drops
> i_nlink to 0 and final removal of inode should do "won't write".
Al, any near-term plans for sb-level "want write" locking as we discussed
briefly at LSF? Being able to do so for copyup in unionfs will hopefully
allow me to prevent concurrent topology changes.
And while we're digressing. It appears to me that when someone wants to
change a single meta-data item in a f/s, then locking rules are simple:
e.g., lock the directory inode. But when you want to modify more than one,
you need a different kind of lock -- hence the s_vfs_rename_mutex. The
latter requires calling lock_rename to find the lowest common ancestor of
the two directories that need locking. This rename-locking protocol appears
to be a special case of the sb-level "want write" idea, right? Moreover, I
don't believe that cross-directory renames are that common, so replacing the
s_vfs_rename_mutex and its use with a more generic sb-level multi-operation
write-lock should have a negligible performance impact.
Thanks,
Erez.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
> > > Sure, unionfs already does some of it now (and some of it can be recoded
> > > more cleanly), but I'd like to get rid of much of it if I could -- these new
> > > path_* helpers make me have to maintain vfsmounts even moreso than before.
> >
> > With the words of Al: "that is hogwash". Stacks either maintain
> > vfsmounts or they don't. There's no middle ground.
>
> So, if I wanted to not maintain vfsmounts at all in unionfs, how can I use
> the proposed new path_* helpers which require vfsmounts? Will there be some
> other helpers available to perform lower-filesystem operations (e.g., mkdir,
> create, unlink, etc.) which won't require passing vfsmounts?
>
> The "or they don't" is not much of an option when I'm forced to use an API
> that requires a vfsmount...
Yes. The vfsmount is already needed for open() and that pretty much
determines the fate of all stacking filesystems, except the ones which
don't want to do file I/O, but that is rather hard to imagine.
And anyway, I don't see the issue here. If the stack wants to work on
a single filesystem directly, just do a bind mount of the original to
a kernel private mount and use that. I'm not sure this can be done
with the current API, but it doesn't sound difficult to implement at
all.
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
On Thu, Apr 24, 2008 at 01:25:58PM -0400, Erez Zadok wrote:
> Al, any near-term plans for sb-level "want write" locking as we discussed
> briefly at LSF? Being able to do so for copyup in unionfs will hopefully
> allow me to prevent concurrent topology changes.
That's pretty close to the top.
> the two directories that need locking. This rename-locking protocol appears
> to be a special case of the sb-level "want write" idea, right?
Not really - that one doesn't provide any exclusion between writers and
that's what we want from rename/rename.
Take a look at Documentation/filesystems/directory-locking for details
of locking scheme...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
In message <20080424142857.GF15214@ZenIV.linux.org.uk>, Al Viro writes:
> On Thu, Apr 24, 2008 at 04:09:18PM +0200, Miklos Szeredi wrote:
[...]
> FWIW, I'm not all that happy about the way ecryptfs_interpose() is done,
> while we are at it. We get the sucker opened by whoever steps on given
> place in the tree first, with subsequent operations done using the resulting
> struct file. With fallback to r/o open. What happens to somebody who
> tries to open it with enough permissions to do r/w?
Yes, ecryptfs_interpose() calls ecryptfs_init_persistent_file() which calls
dentry_open(O_RDWR). What's the proposed solution for this in the face of
r/o vfsmounts? How could ecryptfs avoid calling this dentry_open in the
first place?
For unionfs, all I need is return -EROFS or the like to trigger a possible
copyup.
Erez.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
On Thu, 2008-04-24 at 17:37 +0200, Miklos Szeredi wrote:
> Why are those so important? Yes, if we have multiple vfs_() calls,
> surround them with an extra want/drop pair. We do already do multiple
> overlapping want/drop pairs with O_TRUNC and O_CREAT (AFAIR).
The one that I remember is the pair that we take for O_CREAT and the
nameidata_to_filp() in do_filp_open() that we do on it after the
creation and hold while the file is open.
The only reason for this one is that we want to shut down a potential
race that would allow the file to be created, then still return a -EROFS
because someone did a r/w->r/o transition between the create an the
nameidata_to_filp().
We could pass some information into nameidata_to_filp()->__dentry_open()
telling it that we already have a write and it doesn't need to take one,
but I think having the two nested writes is cleaner.
Is that the case you're thinking of?
-- Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
On Thu, Apr 24, 2008 at 01:29:49PM -0400, Erez Zadok wrote:
> In message <20080424142857.GF15214@ZenIV.linux.org.uk>, Al Viro writes:
> > On Thu, Apr 24, 2008 at 04:09:18PM +0200, Miklos Szeredi wrote:
> [...]
> > FWIW, I'm not all that happy about the way ecryptfs_interpose() is done,
> > while we are at it. We get the sucker opened by whoever steps on given
> > place in the tree first, with subsequent operations done using the resulting
> > struct file. With fallback to r/o open. What happens to somebody who
> > tries to open it with enough permissions to do r/w?
>
> Yes, ecryptfs_interpose() calls ecryptfs_init_persistent_file() which calls
> dentry_open(O_RDWR). What's the proposed solution for this in the face of
> r/o vfsmounts? How could ecryptfs avoid calling this dentry_open in the
> first place?
Doesn't have anything to do with vfsmounts (you have one to deal with and
if it's r/o, it's equivalent to just doing the entire thing on top of r/o
fs; not interesting).
No, what I'm worried about is much simpler. Look: we have a file on
underlying fs, owned by root.root with 644 for permissions. Comes a
luser and tries to open the counterpart of that file in ecryptfs; that
triggers ecryptfs_interpose() and attempts to open file. Of course,
that's going to fail - it's not world-writable. So then it (actually
ecryptfs_init_persistent_file()) falls back to opening with O_RDONLY.
Which succeeds just fine and file (opened r/o) is set as ->lower_file.
Now comes root and tries to open the damn thing r/w. It should be able
to and if it came first it'd get it; as it is, what it gets is ->lower_file
and that puppy is opened read-only and you have no guarantee that underlying
fs will not go bonkers seeing write attempts on it (e.g. open for write
doing a bit more setup of ->private_data, etc.).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
> > Why are those so important? Yes, if we have multiple vfs_() calls,
> > surround them with an extra want/drop pair. We do already do multiple
> > overlapping want/drop pairs with O_TRUNC and O_CREAT (AFAIR).
>
> The one that I remember is the pair that we take for O_CREAT and the
> nameidata_to_filp() in do_filp_open() that we do on it after the
> creation and hold while the file is open.
>
> The only reason for this one is that we want to shut down a potential
> race that would allow the file to be created, then still return a -EROFS
> because someone did a r/w->r/o transition between the create an the
> nameidata_to_filp().
>
> We could pass some information into nameidata_to_filp()->__dentry_open()
> telling it that we already have a write and it doesn't need to take one,
> but I think having the two nested writes is cleaner.
Agreed.
> Is that the case you're thinking of?
Yes. And the same thing is done a little below for similar reasons
around may_open() which does truncation.
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
In message <20080424181332.GB5882@ZenIV.linux.org.uk>, Al Viro writes:
> On Thu, Apr 24, 2008 at 01:29:49PM -0400, Erez Zadok wrote:
> > In message <20080424142857.GF15214@ZenIV.linux.org.uk>, Al Viro writes:
> > > On Thu, Apr 24, 2008 at 04:09:18PM +0200, Miklos Szeredi wrote:
> > [...]
> > > FWIW, I'm not all that happy about the way ecryptfs_interpose() is done,
> > > while we are at it. We get the sucker opened by whoever steps on given
> > > place in the tree first, with subsequent operations done using the resulting
> > > struct file. With fallback to r/o open. What happens to somebody who
> > > tries to open it with enough permissions to do r/w?
> >
> > Yes, ecryptfs_interpose() calls ecryptfs_init_persistent_file() which calls
> > dentry_open(O_RDWR). What's the proposed solution for this in the face of
> > r/o vfsmounts? How could ecryptfs avoid calling this dentry_open in the
> > first place?
>
> Doesn't have anything to do with vfsmounts (you have one to deal with and
> if it's r/o, it's equivalent to just doing the entire thing on top of r/o
> fs; not interesting).
>
> No, what I'm worried about is much simpler. Look: we have a file on
> underlying fs, owned by root.root with 644 for permissions. Comes a
> luser and tries to open the counterpart of that file in ecryptfs; that
> triggers ecryptfs_interpose() and attempts to open file. Of course,
> that's going to fail - it's not world-writable. So then it (actually
> ecryptfs_init_persistent_file()) falls back to opening with O_RDONLY.
> Which succeeds just fine and file (opened r/o) is set as ->lower_file.
>
> Now comes root and tries to open the damn thing r/w. It should be able
> to and if it came first it'd get it; as it is, what it gets is ->lower_file
> and that puppy is opened read-only and you have no guarantee that underlying
> fs will not go bonkers seeing write attempts on it (e.g. open for write
> doing a bit more setup of ->private_data, etc.).
Ah, I see. Yes: ecryptfs_init_persistent_file does have this odd "try to
open readwrite and if that failed, try readonly" code there. I can't really
say why it's done that way: Mike? Maybe it was a workaround to not having
the right permissions to open that persistent file?
This could be a similar issue to how I had to handle the .wh.* files in
unionfs. Because they are separate files, and they get created/destroyed
directly by unionfs, I cannot rely on the caller's capabilities to be able
to create/remove these .wh.* files. So in several places I have to call
cap_raise temporarily. (Of course, if/when there'd be native whiteout
support in the most most common file systems, I could happily rip out this
whiteout code. :-)
Erez.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
In message <20080424173013.GA5882@ZenIV.linux.org.uk>, Al Viro writes:
> On Thu, Apr 24, 2008 at 01:25:58PM -0400, Erez Zadok wrote:
> > Al, any near-term plans for sb-level "want write" locking as we discussed
> > briefly at LSF? Being able to do so for copyup in unionfs will hopefully
> > allow me to prevent concurrent topology changes.
>
> That's pretty close to the top.
>
> > the two directories that need locking. This rename-locking protocol appears
> > to be a special case of the sb-level "want write" idea, right?
>
> Not really - that one doesn't provide any exclusion between writers and
> that's what we want from rename/rename.
When you say "that one", do you mean the sb-level idea? If the sb-level
"want write" not provide exclusion among writers, then how can I prevent
lower topology changes while copyup takes place?
If your sb-level "want write" idea won't provide that exclusion at the sb
level, then how about we elevate the s_vfs_rename_mutex into a generic
s_vfs_multi_dir_change_mutex of sorts, which can be used by rename as well
as copyup?
> Take a look at Documentation/filesystems/directory-locking for details
> of locking scheme...
OK. lock_rename() at most locks one "parent" and one "child" inodes. So
what happens to all the ones in b/t? Suppose I have a path /a/b/c/d/e/f and
someone wants to mv /a/b/c/d/e to /a/b/; in that case lock_rename at least
grabs the mutex on /a/b, right? Can someone go in the middle and try to
muck with the "c/d/" parts in between?
From what I gathered, lock_rename won't be enough for me to prevent topology
changes during copyup; I'd really need to lock an entire arbitrarily deep
path of inodes. That just seems so complex and prone to bugs, that it might
just be easier to have a single sb-level mutex for these complex
multi-directory operations. I'm not sure how much performance hit this'd
be, thou, and if there's a more efficient way to do so.
If you think lock_rename will be sufficient for me to deal with copyups
vs. topology changes, then I'll start using it.
Thanks,
Erez.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
On Thu, Apr 24, 2008 at 03:40:03PM -0400, Erez Zadok wrote:
> Ah, I see. Yes: ecryptfs_init_persistent_file does have this odd
> "try to open readwrite and if that failed, try readonly" code there.
> I can't really say why it's done that way: Mike? Maybe it was a
> workaround to not having the right permissions to open that
> persistent file?
The notion was that of "best effort," and it is sloppy.
I think the right way to do it will be to allow up to two persistent
files. If the user with read-only perms opens, then open the
persistent file RO. Then a user with write-only (but not read) perms
opens; open another persistent file WO. Allow subsequent reads or
writes by the various users to go through whichever persistent file
has the right access. Then a user with RW access opens the file; close
both the RO file and the WO file and open a new file RW. All the
while, make sure that ecryptfs_open() performs all the requisite perm
checks.
If this sounds reasonable, I will get working on a patch to do this.
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
In message <20080424201630.GC18686@localhost.austin.ibm.com>, Michael Halcrow writes:
> On Thu, Apr 24, 2008 at 03:40:03PM -0400, Erez Zadok wrote:
> > Ah, I see. Yes: ecryptfs_init_persistent_file does have this odd
> > "try to open readwrite and if that failed, try readonly" code there.
> > I can't really say why it's done that way: Mike? Maybe it was a
> > workaround to not having the right permissions to open that
> > persistent file?
>
> The notion was that of "best effort," and it is sloppy.
>
> I think the right way to do it will be to allow up to two persistent
> files. If the user with read-only perms opens, then open the
> persistent file RO. Then a user with write-only (but not read) perms
> opens; open another persistent file WO. Allow subsequent reads or
> writes by the various users to go through whichever persistent file
> has the right access. Then a user with RW access opens the file; close
> both the RO file and the WO file and open a new file RW. All the
> while, make sure that ecryptfs_open() performs all the requisite perm
> checks.
Correct me if I'm wrong, but what you call "persistent" file is simply (in
my stacking-speek :-) the lower file, right? If so, then calling it
persistent may be confusing, b/c you could be stacked on top of a volatile
f/s (e.g., tmpfs or even *gasp* another stackable f/s) -- yes, even if it
may not make much sense from a security perspective.
The comment above ecryptfs_init_persistent_file() states that you only ever
keep a single open file for every lower inode. Was this a security decision
(say, to prevent different access modes to the same ciphertext?); or was
that the attempt to workaround the limitation of the
address_space_operations API (that ->writepage doesn't get a struct file,
which you need to pass to vfs_write)?
I'm unclear what you mean above wrt two open files: do you mean
1. two struct file's, each pointing to a SEPARATE copy of the physical lower
file?
or
2. two struct file's, BOTH pointing to the same lower physical file?
Option #1 may be racy and I'm not clear what happens if a data file gets
'forked' that way: do you then have to merge it later on?
Or, are you propsing to keep alternating b/t an open struct file for RO, and
another open struct file for RW (or WO)? If you alternate b/t the two,
keeping only one open at a time, what happens if two threads want to access
the same file: one reading and another writing the file. Does that mean
that for every read(2) or write(2) you'll have to essentially release one
open file and open another (could slow things down a lot)?
> If this sounds reasonable, I will get working on a patch to do this.
>
> Mike
Cheers,
Erez.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
-
Re: [patch 00/13] vfs: add helpers to check r/o bind mounts
On Thu, Apr 24, 2008 at 06:39:07PM -0400, Erez Zadok wrote:
> In message <20080424201630.GC18686@localhost.austin.ibm.com>,
> Michael Halcrow writes:
> > I think the right way to do it will be to allow up to two persistent
> > files. If the user with read-only perms opens, then open the
> > persistent file RO. Then a user with write-only (but not read) perms
> > opens; open another persistent file WO. Allow subsequent reads or
> > writes by the various users to go through whichever persistent file
> > has the right access. Then a user with RW access opens the file; close
> > both the RO file and the WO file and open a new file RW. All the
> > while, make sure that ecryptfs_open() performs all the requisite perm
> > checks.
>
> Correct me if I'm wrong, but what you call "persistent" file is
> simply (in my stacking-speek :-) the lower file, right? If so, then
> calling it persistent may be confusing, b/c you could be stacked on
> top of a volatile f/s (e.g., tmpfs or even *gasp* another stackable
> f/s) -- yes, even if it may not make much sense from a security
> perspective.
So long as the eCryptfs inode is alive, it will have at least one open
file in the corresponding inode for the lower filesystem. This file is
"persistent" for the life of the eCryptfs inode.
> The comment above ecryptfs_init_persistent_file() states that you
> only ever keep a single open file for every lower inode. Was this a
> security decision (say, to prevent different access modes to the
> same ciphertext?); or was that the attempt to workaround the
> limitation of the address_space_operations API (that ->writepage
> doesn't get a struct file, which you need to pass to vfs_write)?
I intend to use the persistent file as a means to use vfs_read() and
vfs_write() in readpage() and writepage().
> I'm unclear what you mean above wrt two open files: do you mean
>
> 1. two struct file's, each pointing to a SEPARATE copy of the physical lower
> file?
> or
> 2. two struct file's, BOTH pointing to the same lower physical file?
I intned two lower files, both opened against the same lower inode,
via any dentry that happened to be used to get to that inode.
> Option #1 may be racy and I'm not clear what happens if a data file
> gets 'forked' that way: do you then have to merge it later on?
>
> Or, are you propsing to keep alternating b/t an open struct file for
> RO, and another open struct file for RW (or WO)? If you alternate
> b/t the two, keeping only one open at a time, what happens if two
> threads want to access the same file: one reading and another
> writing the file.
I proposed two open files, one with RO access, and one with WO access,
in the circumstance that one process only had read access on open, and
the other process only had write access on open. A write-only
ecryptfs_open() would result in an eCryptfs file struct that only used
the WO persistent file, after the appropriate permission checks are
done.
> Does that mean that for every read(2) or write(2) you'll have to
> essentially release one open file and open another (could slow
> things down a lot)?
Not really; I was proposing to keep two lower files open to use for
write-only or read-only operations, and if I ever got RW permission on
a file linked to the lower inode, I would just get rid of both and
replace them with a single RW persistent file. All while maintaining
proper permission checks in ecryptfs_open(), so a RO ecryptfs_open()
would only ever be allowed to read from the RW persistent file.
Really, this business with possibly two lower persistent files is
probably too much of a game to play. Rusty suggested that perhaps I
should just create a root-owned kernel thread in eCryptfs that would
have the responsibility of acquiring a RW lower persistent file,
regardless of the euid and permissions of whatever process made the
initial syscall that brought the eCryptfs inode into existence. I am
tempted to go that route instead.
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/