Hi Andi,

I'm not sure if you're the right person for this but I hope you are!

I've notived that the memory allocation on NUMA systems (Opterons) does
memory allocation on non-local nodes for processes running node0 even if
local memory is available. (Kernel 2.6.25 and above)

Currently I'm playing around with a quadsocket quadcore Opteron but I've
observed this behavior on other Opteron systems aswell.

Hardware specs:
1x Supermicro H8QM3-2
4x Quadcore Opteron
16x 2GiB (8 GiB memory per node)

OS:
currently openSUSE 10.3 but I've observed this on other distros aswell
Kernel: 2.6.22.* (openSUSE) / 2.6.25.4 / 2.6.25.5 / 2.6.27 (vanilla
config)

Steps to reproduce:
Start an application which needs alot of memory and watch the memory
usage per node (I'm using "watch -n 1 numastat --hardware" to watch the
memory usage per node)
A quick&dirty code which allocates a big array and writes data into the
array is enough!

In my setup I'm allocating an array of ~7GiB memory size in a
singlethreaded application.
Startup: numactl --cpunodebind=X ./app
For X=1,2,3 it works as expected, all memory is allocated on the local
node.
For X=0 I can see the memory beeing allocated on node0 as long as ~3GiB
are "free" on node0. At this point the kernel starts using memory from
node1 for the app!

For parallel realworld apps I've seen a performance penalty of 30%
compared to older kernels!

numactl --cpunodebind=0 --membind=0 ./app "solves" the problem in this
case but thats not the point!

--

Regards,
Oliver Weihe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/