On Tue, 25 Mar 2008, Alexander V. Chernikov wrote:

> I have made patches solving first 4 problems These patches are available at
> http://ipfw.ru/patches/ unionfs2.diff fixes fs mounting onto upper layer,
> unionfs_lmount.diff fixes lower unionfs_threads.diff and unionfs_unix.diff
> fixes cases 2) and 3) unionfs_rename.diff fixes case with renaming
> Can anybody comment/review ?

Dear Alexander,

Unfortunately, I don't know too much about unionfs. However, I can comment on
the UNIX domain socket patch:

> --- sys/fs/unionfs/union_subr.c.orig 2008-03-13 23:10:32.000000000 +0300
> +++ sys/fs/unionfs/union_subr.c 2008-03-13 23:17:34.000000000 +0300
> @@ -160,6 +160,8 @@
> unp->un_path[cnp->cn_namelen] = '\0';
> }
> vp->v_type = (uppervp != NULLVP ? uppervp->v_type : lowervp->v_type);
> + if (vp->v_type == VSOCK)
> + vp->v_socket = (uppervp != NULLVP) ? uppervp->v_socket : lowervp->v_socket;
> if ((lowervp != NULLVP) && (lowervp->v_type == VDIR))
> vp->v_mountedhere = lowervp->v_mountedhere;
> vp->v_data = unp;

I'm a bit worried about this assignment, as it represents an untracked alias
for the socket. Let me explain why:

UNIX domain sockets may have file system bindings, allowing them to use the
file system namespace as a rendezvous for communication. Typical use is that
a socket is created, bind() is called on it with a path in some location like
/var/run/log. Other processes turn up and connect() to the path, causing a
file system lookup to reach the vnode of the socket, and then the socket code
follows vp->v_socket to find the socket to connect to. When a bound socket is
closed, we follow a back-pointer from the UNIX domain socket to the vnode, and
then clear the pointer. Doing this in a race-free manner is somewhat tricky,
and I'm not 100% convinced it's correct currently, although it appears to be
somewhat close to right.

The upshot of all this is that if you copy the pointer value to other vnodes,
such as vnodes on upper layer, the UNIX domain socket code won't clear those
pointers before freeing the socket they point at. This means that the above
code snippet may lead to a v_socket pointer on a higher layer vnode pointing
at the right socket, the wrong socket, or possibly some other bit of freed and
maybe reused memory.

You can imagine a number of schemes to replicate pointer changes around or
track the various outstanding references, but I think a more fundamental
question is whether this is in fact the right behavior at all. The premise of
is that writes flow up, but not down, and "connections" to sockets are
read-write events, not read events, most typically. If you're using unionfs
to take a template system and "broadcast it" to many jails, you probably don't
want all the jails talking to the same syslogd, you want them each talking to
their own. When syslogd in a jail finds a disconnected socket, which is
effectively what a NULL v_socket pointer means, in /var/run/log, it should be
unlinking it and creating a new socket, not reusing the existing file on disk.

Robert N M Watson
Computer Laboratory
University of Cambridge
freebsd-fs@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"