On Thu, 21 Dec 2006, David Xu wrote:

> On Thursday 21 December 2006 02:18, Daniel Eischen wrote:
>> On Wed, 20 Dec 2006, Robert Watson wrote:
>>> On Wed, 13 Dec 2006, Daniel Eischen wrote:
>>>> Anyway, this was just a thought/idea. I don't mean to argue against any
>>>> of the other reasons why this isn't a good idea.
>>> Whatever may be implemented to solve this issue will require a fairly
>>> serious re-working of how we implement file descriptor reference counting
>>> in the kernel. Do you propose similar "cancellation" of other system
>>> calls blocked on the file descriptor, including select(), etc? Typically
>>> these system calls interact with the underlying object associated with the
>>> file descriptor, not the file descriptor itself, and often, they act
>>> directly on the object and release the file descriptor before performing
>>> their operation. I think before we can put any reasonable implementation
>>> proposal on the table, we need a clear set of requirements:

>> [ ... ]
>>> While providing Solaris-like semantics here makes some amount of sense,
>>> this is a very tricky area, and one where we're still refining performance
>>> behavior, reference counting behavior, etc. I don't think there will be
>>> any easy answers, and we need to think through the semantic and
>>> performance implications of any change very carefully before starting to
>>> implement.

>> I don't think the behavior here has to be any different that what we
>> currently (or desire to) do with regard to (unblocked) signals interrupting
>> threads waiting on IO. You can spend a lot of time thinking about how
>> close() should affect IO operations on the same file descriptor, but a very
>> simple approach is to treat them the same as if the operations were
>> interrupted by a signal. I'm not suggesting it is implemented the same
>> way, just that it seems to make a lot of sense to me that the behavior is
>> consistent between the two.

> I think the main concern is if we will record every thread using a fd, that
> means, when you call read() on a fd, you record your thread pointer into the
> fd's thread list, when one wants to close the fd, it has to notify all the
> threads in the list, set a flag for each thread, the flag indicates a thread
> is interrupted because the fd was closed, when the thread returns from deep
> code path to read() syscall, it should check the flag, and return EBADF to
> user if it was set. whatever, a reserved signal or TDF_INTERRUPT may
> interrupt a thread. but since there are many file operations, I don't know
> if we are willing to pay such overheads to every file syscall, extra locking
> is not welcomed.

Yes, as well as adding quite a bit of complexity and opening the door for some
rather odd/unfortunate races. You can inspect the bulk of the Solaris
implementation by looking at three spots:


In closeandsetf(), you can see that an additional layer of indirection
associated with the file descriptor is maintained in order to count consumers
of a particular fd, not just the open file record, and the set of active fds
for each thread is maintained. When a close() is performed and there are
still other open consumers, the process is suspended and all threads are
inspected to see if the fd is active for the thread, in which case a thread
flag indicating that a stale fd is set. I believe that the interrupt here is
an implicit part of the process suspend/restart, and in post_syscall() the
EINTR returns are remapped to EBADF.

That extra level of indirection and use tracking will be both complex and a
performance hit in a critical kernel path. I'm not opposed to investigating
implementing something along these lines, but I think we should defer this for
some time while we sort out more pressing issues in our kernel file
descriptor/socket/etc code and revist this in a few months. We will need to
carefully evaluate the performance costs, and if they are significant, figure
out how to avoid this causing a significant hit. It's worth observing that
removing one level of reference counting from the socket send/receive paths
(using the file descriptor reference instead of the socket reference) made a
5%+ difference in high speed send performance.

Robert N M Watson
Computer Laboratory
University of Cambridge
freebsd-arch@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"