On Wed, 2005-02-09 at 17:01, Jeremy Allison wrote:
> On Wed, Feb 09, 2005 at 04:56:25PM -0800, Terry Griffin wrote:
> > Hi all,
> >
> > I'm having a very strange Samba VFS performance problem. Hoping
> > you can provide some clues.
> >
> > I've implemented a custom Samba VFS module. Functionally everything
> > is fine and the module does what it's supposed to do.
> >
> > With a Windows 2000 client the VFS module introduces an expected
> > throughput hit on large writes in the range of 10-20% (over Gigabit
> > Ethernet). But with a Linux CIFS client the throughput hit is more
> > like 90%!
> >
> > The Linux/CIFS and W2K client throughput numbers are similar in
> > the case where the custom VFS module is not in use.
> >
> > What about a VFS module could cause such drastically different
> > results between a W2K client and a Linux/CIFS client? And is there
> > something I can do to improve the Linux numbers? I've fiddled with
> > directio and CIFSMaxBufSize options on the client side with no
> > help. I've fiddled with all the usual tuning parameters on the Samba
> > side, again with no improvement.
> >
> > Linux (on both the CIFS client and Samba server) is Fedora Core 2
> > with kernel 2.6.10. The Samba version is 3.0.10.

>
> Try using cachegrind :
>
> http://developer.kde.org/~sewardj/do..._techdocs.html
>
> Profile the fast and slow cases and look for differences. Use
> smbd -i to look at one instance.
>
> It's what I use to track down code performance problems in Samba.
>
> Jeremy.


Well I've made a little progress after getting sidetracked for
a while.

The main difference between W2K as a client and Linux/CIFS is the
sizes of the writes. W2K will send over 64K at a time while
Linux/CIFS will send over only 4K at a time when copying a large
file. There doesn't seem to be anything I can do to get Linux/CIFS
to send anything other than 4K chunks.

The other dimension to the problem is that I'm using asynchronous
I/O (AIO) in my VFS module's write/pwrite functions (aio_write,
aio_error, and aio_return). The AIO routines seem especially
inefficient for 4K-sized writes but perform nicely with 64K writes.
Oddly I don't see the same performance difference between 4K and 64K
writes with the synchronous I/O functions even though in 2.6.x the
synchronous I/O functions are supposedly just wrappers around the AIO
functions.

Anyway, as before any clues would be appreciated, especially if they
are clues that get me bigger-than-4K writes between a Linux/CIFS
client and a Samba server.

Thanks,
Terry