Yan wrote:
> I am currently writing my first kernel module to extract data from
> the kernel and bring it into user-space. As I understand, FreeBSD
> only loads pages of the text segment that it is about to use by
> registering its handler for page faults, and bringing in more pages
> from the binary as needed. I want to track this progress, and report
> the amount of pages it brought in on process exit.
> The current idea I have and the path I'm following is as follows:
> I register a callback in my module using EVENTHANDLER_REGISTER,
> using 'process_exit' name. That gives me the proc structure on exit,
> I then try to find the information I need using the p_vmspace member,
> and try using vm_tsize for the number of pages. But this is the
> number of pages in the virtual address space of the text segment,
> which does not correlate to how many pages were actually brought
> in.
> For the context of this, I am trying to write a tiny utility, that
> watches the execution of a process, and tracks how many pages were
> actually used during its execution. It passes that knowledge to a
> hopefully-to-be-written user-space utility that generates a new
> binary using only those pages for the text segment. (If an execution
> was to go outside that, a seg fault is okay.)
> Any help is appreciated.

The problem with that is the fact that text pages are not
"owned" by a process. If you run the same binary several
times, the pages will be brought into memory only once.
So if you're running a webserver with hundreds of httpd
processes, the text pages of httpd are brought into memory
only once. Unless, of course, some of them (or even all
of them) had to be re-used during a low-memory situation,
so they have to be brought in again. Note that text pages
are _not_ paged out to swap space, because that would be
just a waste since. Instead they are simply discarded,
because they can be paged in from the binary again.

So, if you really need to assign page-ins to a certain
process, you must make sure that the binary in question
was never executed before (since last reboot), and make
sure that it is not executed again until the first process
terminates. Also note that you might easily miss code
paths that are not executed during one run, but required
during another run, so you might not get all pages that
are necessary to run the binary under all circumstances.

I also think that the pager will bring multiple pages
into memory at once, as an optimization because doing for
every single page on demand is inefficient, especially on
large binaries. (I'm not 100% sure how that works, though.
Someone will certainly correct me if necessary.)

By the way, /usr/bin/time is very useful with the option
-l to display pageing activity during execution of a
process. Here's just an example with running lynx two
times after another (I hadn't used it before since last
reboot, but all the libraries had already been used):

$ /usr/bin/time -l lynx -dump {some URL} >/dev/null
259 page reclaims
37 page faults
10 block input operations
$ /usr/bin/time -l lynx -dump {some URL} >/dev/null
243 page reclaims
0 page faults
0 block input operations

Note that the size of the binary is about 1 MB, which is
about 250 pages:
-r-xr-xr-x 1 root wheel 1086480 Jul 20 2005 lynx
Not all of those pages are text pages, though, of course.

By the way: Even if you manage to find out which pages
are actually used by a binary, it is probably non-trivial
to map that information back to parts of the binary file,
let alone to build a binary that contains only those
parts. You'll have to fight with the run-time linker.
Uhm ... The more I think about the whole thing, the more
potential problems come to my mind, so I better stop now.

Best regards

Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"We, the unwilling, led by the unknowing,
are doing the impossible for the ungrateful.
We have done so much, for so long, with so little,
we are now qualified to do anything with nothing."
* * * * -- Mother Teresa
freebsd-hackers@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"