On Fri, 31 Oct 2008, Dave Chinner wrote:

> This is a fairly major change in behaviour to the writeback path on
> NUMA systems and so has the potential to introduce subtle new
> issues. Hence I'm asking about the level of testing and exposure
> it has had. It doesn't really sound like it has had much coverage
> to me....

I've regression tested it to ensure that there is no significant
degradation for unconstrained (non-cpuset) configurations, which is
identical to single system-wide cpuset configurations with ratios that
match the global sysctls.

So there should be no noticeable difference for non-cpuset users or users
who attach all system tasks to the root cpuset with this new feature.

To simulate large NUMA systems, numa=fake=32 was used on my small 4G
machine for 32 system nodes. A 4G file was created from /dev/zero and a
non-iterative version of my aforementioned test program (see the end of
this message) was used to mmap(), dirty, and msync(). This was done with
Linus' latest -git both with and without this patchset.

Although only 32 nodes were online, these kernels both had a
CONFIG_NODES_SHIFT of 8 to use the patchset's slowpath (the kmalloc() for
a struct address_space's dirty_nodes) since MAX_NUMNODES > BITS_PER_LONG.

Without this patchset (five iterations):

real 2m43.824s 2m41.842s 2m44.103s 2m42.176s 2m43.692s
user 0m18.946s 0m18.875s 0m18.747s 0m18.969s 0m19.091s
sys 0m7.722s 0m7.817s 0m7.618s 0m7.635s 0m7.450s

Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 376.68 18.60 17.90 4269 4108
sda 367.37 18.13 17.45 4269 4108
sda 388.59 19.19 18.46 4269 4107
sda 399.44 19.73 18.98 4269 4106
sda 407.15 20.08 19.33 4271 4110

With this patchset (five iterations):

real 2m46.337s 2m44.321s 2m46.406s 2m43.237s 2m47.165s
user 0m19.014s 0m18.985s 0m19.146s 0m18.778s 0m18.937s
sys 0m7.008s 0m7.180s 0m7.067s 0m7.116s 0m7.188s

Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sda 399.10 19.71 18.97 4269 4109
sda 396.85 19.55 18.78 4272 4111
sda 404.72 19.96 19.23 4270 4112
sda 403.51 19.93 19.18 4269 4108
sda 396.33 19.54 18.80 4276 4114

> I'm concerned right now because Nick and others (including myself)
> have recently found lots of nasty data integrity problems in the
> writeback path that we are currently trying to test fixes for.
> It's not a good time to introduce new behaviours as that will
> definitely perturb any regression testing we are running....

Do we need a development freeze of the writeback path for features that
would be targeted to 2.6.29 at the earliest?

Nasty data integrity problems seem like something that would be addressed
during the -rc stage.


int main(int argc, char **argv)
void *addr;
unsigned long length;
unsigned long i;
int fd;

if (argc != 3) {
fprintf(stderr, "usage: %s \n",

fd = open(argv[1], O_RDWR, 0644);
if (fd < 0) {
fprintf(stderr, "Cannot open file %s\n", argv[1]);

length = strtoul(argv[2], NULL, 0);
if (!length) {
fprintf(stderr, "Invalid length %s\n", argv[2]);

addr = mmap(0, length, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
if (addr == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");

for (i = 0; i < length; i++)
(*(char *)(addr + i))++;
msync(addr, length, MS_SYNC);
return 0;
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/