An explanation for the kernel team: certain types of processes seem
to freeze the entire user desktop for long periods of time (many
seconds). I don't know all the factors that are involved, but programs
that start up and immediately make a large number of I/O calls exhibit
this behavior. One of these is the startup process of a typical apt
frontend: on startup, it scans the apt database to build some additional
information in memory.

For reasons that are unclear to me, this only occurs when the process
is running as root, and only when it's run via "sudo" from an X session.
I'm not 100% positive that this is a kernel bug, but I don't think it's
an apt bug and I think we need help from someone who knows more about
the kernel

For your convenience, I've attached the three test programs I wrote to
reproduce this behavior. They use the apt cache by default, but I
imagine you can point them at any large cache and see the same behavior.



It looks to me like the kernel isn't doing a very good job of allowing
interactive processes to run while a newly started process is doing a
lot of I/O. I'm not surprised by a little bit of jitter, but I've seen
the X server become unresponsive for over 30 seconds when running the
"test2" program I mocked up to demonstrate this problem. That seems
wrong to me.


On Fri, Mar 21, 2008 at 12:10:44PM -0400, Justin Pryzby was heard to say:
> Thanks for the analysis. Why does it only affect aptitude sometimes
> (after an upgrade is aborted)? Is that due to some cache? Why isn't
> apt/itude using mmap (?)?


They *are* using mmap; I wrote the example without mmap to check
whether the problem was specific to mmap or independent of it (e.g.,
not using mmap might -- I'm not up to date on how the kernel is organized --
eliminate some of the work the memory management subsystem has to do).

I don't know whether caching effects are involved, but it wouldn't
surprise me. OTOH, neither would finding out that caching effects
aren't involved. The only pattern I can see is that I'm more likely to
see a freeze if I've recently run a program that reads the package
cache.

That would suggest that maybe the problem is more likely to occur
when the package cache is loaded into the system cache. That makes me
wonder whether the sheer volume of requests for buffers is overloading
the system somehow (but then why does it only happen with UID 0?).

> Out of curiousity, what kernel and hardware are you using?


Kernel 2.6.26-1-686 on a Fujitsu P7120. I haven't tried this with
other kernel versions.

> Apparently, all that's necessary is to loop around lseek with a
> nonzero "offset".


Ooh, that's right. Nice catch. I've attached a test3.c that does
just this.

I'm going to reassign this to the kernel+apt -- I'm not sure what's
going on, but I don't think I have the expertise to track it down fully,
and I think it *may* be some sort of kernel misbehavior.

Daniel