On Fri, 1 Jul 2005, Peter Edwards wrote:

> Ever since the introduction of a separate ktrace worker thread for
> writing output, there's the distinct possibility that ktrace output will
> drop requests. For some proceses, it's actually inevitable: as long as
> the traced processes can sustain a rate of generating ktrace events
> faster than the ktrace thread can write them, you'll eventually run out
> of ktrace requests.
> I'd like to propose that rather than just dropping the request on the
> floor, we at least configurably allow ktraced threads to block until
> there are resources available to satisfy their requests.

There are two benefits to the current ktrace dispatch model:

(1) Avoiding untimely sleeping in the execution paths of threads that are
being traced.

(2) Allowing the traced thread to run ahead asynchronously, hopefully
impacting performance less.

One of the things I've been thinking for a few years is that I think I
actually preferred the old model better, there processes (now threads)
would hang a "current record" off of their process (now thread) structure,
and fill it in as they went along. The upsides of this have to do with
the downsides of the current model: that you don't allow fully
asynchronous execition of the threads with respect to queueing the records
to disk, so you don't run into "drop" scenarios, instead slowing down the
process. Likewise, the downsides.

In the audit code, we pull from a common record queue, but we allocate the
record when the system call starts for each process -- if there aren't
records available (or various other reliability-related conditions fail,
such as adequate disk space), we stall the thread entering the kernel
until we can satisfy its record allocation requirements.

There are two cases where I really run into problems with the current

(1) When I'm interacting with a slow file system, such as NFS over
100mbps, I will always lose records, because it doesn't take long for
the process to get quite ahead of the write-behind.

(2) When I trace more than one process at a time, the volume of records
overwhelms the write-behind.

Write coalescing/etc is already provided "for free" by pushing the writes
down into the file system, so other than slowing down the traced process a
little, I think we don't lose much by moving back to this model. And if
we pre-commit the record storage on system call entry (with the exception
of paths, which generally require potential sleeps anyway), we probably
won't hurt performance all that much, and avoid sleeping in bad places.

Robert N M Watson
freebsd-arch@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"