Summary: Surprisingly large unshared memory usage
Product: rsync
Version: 2.6.7
Platform: x86
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P3
Component: core

I'm running a command like "rsync -vrltH --delete -pgo --stats -z -D
--numeric-ids -i --link-dest=foo blah:bar baz" (part of a dirvish run) with an
input fileset of about 2.4 million files (400K of those file are actually
hardlinked to each other on the sending machine, and remain that way on the
receiving machine---and in fact all but about 30 of them haven't changed, so
virtually all 2.4M of those files also wind up hardlinked to the --link-dest
directory; this is about 280G total).

It takes about 10 minutes to scan a filesystem of this size, and both the
sending & receiving machines rsyncs slowly expand to about 200M during this
scan; that's understandable. But then, as soon as the scan is done, the second
rsync process on the receiving side inflates (over the course of about 5 seconds
or so) to -another- 200M. I don't think I'm being faked out by shared memory
being reported twice, since the free memory on the machine declines
precipitously at exactly the same time. This isn't quite screwing me yet (the
machine's got half a gig of RAM and very little else that must stay resident
during the run), but if the filesystem gets much bigger, I fear massive
thrashing due to swapping. (Really, what I'll have to do is buy more RAM.)

I was under the impression that this wasn't supposed to happen---that rsync
tried hard not to modify lots of pages after the fork, and that Linux (I'm
running Ubuntu Breezy, which has a 2.6 kernel) had copy-on-write fork semantics.
Is the essentially instantaneous inflation of the second rsync process
happening because of either the -H or the --link-dest, or is it a bug?

[This transfer also accumulates about an hour of CPU time on this Athon 1200MHz
CPU; I assume this is due to the expense of -H, and works out to about 1.5
milliseconds of processing per file, assuming I haven't goofed on the math; this
is about a million instructions (or 21000 non-cached memory fetches) per file.
I'd love it if this could be brought down, but I'm probably being unrealistic
about an essentially O(n^2) algorithm...]

Configure bugmail:
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
To unsubscribe or change options:
Before posting, read: