The recent changes in cifs have helped a lot with dbench performance.
Mounting cifs version 1.33 (current development tree of cifs) to current
Samba 3 (loopback on same host, to eliminate most network adapter
effects) showed about a tenfold improvement over older cifs -

Running dbench version with 20 processes (mainline kernel 2.6.12-pre)
a) local jfs mount (as a sanity check) gets about 30MB/sec
b) current cifs development tree version (version 1.33) to Samba server
on same box = 2.8MB/sec
(dbench starts faster, but memory presumably gets fragmented, and it
slows down to about 2.8MB steady state)
c) cifs from six months ago (version 1.26) = 0.27MB/sec
(it starts about 15% slower than current cifs then slows
way down presumably as memory gets fragmented)

This is big progress - 10x improvement and since dbench is heavily write
oriented I should have plenty of room to double the cifs performance
again (with the recent async readahead and writebehind patches still to
evaluate, and writev support still to add and a double copy in the read
path), even without having to optimize the Samba server side. Obviosly
this is going to be slower over the network than local due to
duplication of inodes/caching and network & samba server delays - but I
would like to get within a factor of 3 of local performance under this
stress write case. I also need to measure the NFSv3
performance over loopback to local nfsd server to see how
that compares with dbench on this test system.