On Sun, 2 Dec 2007, Poul-Henning Kamp wrote:

> In message <3bbf2fe10712012231p2945111cma2faed2299167d3a@mail. gmail.com>, "Atti
> lio Rao" writes:
>> 2007/12/1, Poul-Henning Kamp :
>>> Here is my proposed new timeout API for 8.x.
>>> The primary objective is to make it possible to have multiple timeout
>>> "providers" of possibly different kind, so that we can have per-cpu or
>>> per-net-stack timeout handing.

>> I have a question so.

> I have no idea what the answer to your question is, I'm focusing on
> providing the ability, how we subsequently decide to use it is up to others.

Well, I think there is an important question to be discussed regarding
combinatorics, context switching, and the ability to provide multiple callout
threads. People have found the facility to provide their own worker threads
and work pools surprisingly useful for taskqueue(9), so I find the concept of
providing seperate callout wheels for different sorts of work appealing -- we
could group, for example, high priority callouts in a separate thread from low
priority callouts, avoiding priority inversion scenarions where high priority
callouts in effect wait for low priority callouts due to the scheduling that
occurs in callout(9) processing. However, this leads to a few concerns:

- If we have several wheels in several threads, we risk significantly
increasing the level of context switching if callouts exist in multiple
wheels that fire at the same time intervals and same offsets. Today, those
"context switches" occur in a single thread and don't require interacting
with the system scheduler, saving a full stack, etc, and are effectively
make callout handlers into co-routines.

- There has been quite a bit of discussion about effectively slapping
[MAXCPUS] onto the current callout wheel and lock, and starting up a callout
thread per-CPU in order to allow workloads to be load-balanced. If no CPU
preference is specified, then it lands on CPU 0 (or the like), and otherwise
a consumer can request a preference to run the callout on a specific CPU.
Good reasons to do this include avoiding lock contention by introducing
affinities for workload, and load balancing for heavy callout users. I
specifically have TCP in mind, needless to say, and it is one of our largest
callout consumers. How would this strategy play out in the new
infrastructure -- are you proposing TCP establish a thread and a group for
each CPU, or is that a facility (affinity/CPU binding) that the timeout
facility will provide for it, allowing TCP simply to express a CPU
preference for a timeout when registering or rescheduling it?

- For more naive users of the timeout facility, do you have any thinking on
how we might load balance the timeouts as part of the facility you are
designing? On busy systems, the callout thread can become quite a CPU hog,
and it could be that transparent load balancing offers a benefit for
consumers that are not aware of how to do their own load balancing. FWIW, I
believe that in cases where we have a non-naive consumer, there are
significant benefits to allowing it to manage its own balancing, as it can
take into account data affinities, the potential for lock contention, etc.

I have plans in the early 8.x development cycle to break down the pcbinfo
locks and start balancing TCP work across CPUs via a weak affinity model
(processing can happen on other CPUs, but we prefer not to for reasons of lock
contention, cache cleanliness, etc). This in practice should also mean
assigning the callouts for a TCP connection to run on the CPU it has an
affinity for, for exactly the same reasons. This means that, one way or
another, I need the ability to do this in the next three months, and I want to
make sure that these plans are compatible with, and ideally facilitated by,
any reworking of the callout facility.

Robert N M Watson
Computer Laboratory
University of Cambridge
freebsd-arch@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"