ksyms pseudo driver - FreeBSD

This is a discussion on ksyms pseudo driver - FreeBSD ; Hi, I have created a ksyms pseudo driver for FreeBSD. Included below is the man page. The diff's to kernel source, the main source files, etc. can be found at: http://people.FreeBSD.org/~sson/ksyms/ The reason I created this driver is for dtrace ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: ksyms pseudo driver

  1. ksyms pseudo driver

    Hi,

    I have created a ksyms pseudo driver for FreeBSD. Included below is the
    man page. The diff's to kernel source, the main source files, etc. can
    be found at:

    http://people.FreeBSD.org/~sson/ksyms/

    The reason I created this driver is for dtrace and the port of the
    opensolaris lockstat(1M) command to FreeBSD. The ksyms driver allows a
    process to get a quick
    snapshot of the kernel symbol table including the symbols from any
    loaded modules.

    Unlike most other implementations, this ksyms driver maps memory in the
    process space to store the snapshot at the time /dev/ksyms is opened.
    It also checks to see if the process has already a snapshot open and
    won't allow it to open /dev/ksyms it again until it closes (and unmaps)
    its already opened snapshot first. Of course, this requires the read()
    handler to bounce the buffer into the kernel first before it is written
    back out to userspace. (Maybe there is a simple way to do an userspace
    to userspace copy instead?) The reason I went to all this trouble is to
    keep /dev/ksyms from turning into an easy way to exhaust all the kernel
    memory (unintentionally or intentionally).

    Let me know if you have any questions, comments, suggestions, and/or
    reasons why something like this should never be included in FreeBSD.

    Best Regards,

    -stacey.

    -----------------------------------------------------------------------------------
    KSYMS(4) FreeBSD Kernel Interfaces Manual
    KSYMS(4)

    NAME
    ksyms -- kernel symbol table interface

    SYNOPSIS
    device ksyms

    DESCRIPTION
    The /dev/ksyms character device provides a read-only interface to
    a snap-
    shot of the kernel symbol table. The in-kernel symbol manager is
    designed to be able to handle many types of symbols tables,
    however, only
    elf(5) symbol tables are supported by this device. The ELF format
    image
    contains two sections: a symbol table and a corresponding string
    table.

    Symbol Table
    The SYMTAB section contains the symbol table entries
    present in the current running kernel, including the
    symbol
    table entries of any loaded modules. The symbols are
    ordered by the kernel module load time starting with
    kernel
    file symbols first, followed by the first loaded
    module's
    symbols and so on.

    String Table
    The STRTAB section contains the symbol name strings from
    the kernel and any loaded modules that the symbol table
    entries reference.

    Elf formatted symbol table data read from the /dev/ksyms file
    represents
    the state of the kernel at the time when the device is opened. Since
    /dev/ksyms has no text or data, most of the fields are initialized to
    NULL. The ksyms driver does not block the loading or unloading of
    mod-
    ules into the kernel while the /dev/ksyms file is open but may contain
    stale data.

    IOCTLS
    The ioctl(2) command codes below are defined in .

    The (third) argument to the ioctl(2) should be a pointer to the type
    indicated.

    KIOCGSIZE (size_t)
    Returns the total size of the current symbol table.
    This
    can be used when allocating a buffer to make a copy
    of the
    kernel symbol table.

    KIOCGADDR (void *)
    Returns the address of the kernel symbol table mapped in
    the process memory.

    FILES
    /dev/ksyms

    ERRORS
    An open(2) of /dev/ksyms will fail if:

    [EBUSY] The device is already open. A process must close
    /dev/ksyms before it can be opened again.

    [ENOMEM] There is a resource shortage in the kernel.

    [ENXIO] The driver was unsuccessful in creating a
    snapshot of
    the kernel symbol table. This may occur if the
    kernel
    was in the process of loading or unloading a
    module.

    SEE ALSO
    ioctl(2), nlist(3), elf(5), kldload(8)

    HISTORY
    A ksyms device exists in many different operating systems. This
    imple-
    mentation is similar in function to the Solaris and NetBSD ksyms
    driver.

    The ksyms driver first appeared in FreeBSD 8.0 to support lockstat(1).

    BUGS
    Because files can be dynamically linked into the kernel at any
    time the
    symbol information can vary. When you open the /dev/ksyms file,
    you have
    access to an ELF image which represents a snapshot of the state of the
    kernel symbol information at that instant in time. Keeping the device
    open does not block the loading or unloading of kernel modules.
    To get a
    new snapshot you must close and re-open the device.

    A process is only allowed to open the /dev/ksyms file once at a time.
    The process must close the /dev/ksyms before it is allowed to open it
    again.

    The ksyms driver uses the calling process' memory address space to
    store
    the snapshot. ioctl(2) can be used to get the memory address
    where the
    symbol table is stored to save kernel memory. mmap(2) may also be
    used
    but it will map it to another address.

    AUTHORS
    The ksyms driver was written by Stacey Son
    under the
    direction of John Birrell.

    FreeBSD 8.0 April 5, 2008
    FreeBSD 8.0





    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


  2. Re: ksyms pseudo driver

    Stacey Son [sson@freebsd.org] wrote:
    >
    > The reason I created this driver is for dtrace and the port of the
    > opensolaris lockstat(1M) command to FreeBSD. The ksyms driver allows a
    > process to get a quick
    > snapshot of the kernel symbol table including the symbols from any
    > loaded modules.


    Very cool! After doing some Solaris work, I've really missed lockstat!
    This would also be useful for hwpmc.

    > its already opened snapshot first. Of course, this requires the read()
    > handler to bounce the buffer into the kernel first before it is written
    > back out to userspace. (Maybe there is a simple way to do an userspace
    > to userspace copy instead?) The reason I went to all this trouble is to
    > keep /dev/ksyms from turning into an easy way to exhaust all the kernel
    > memory (unintentionally or intentionally).


    Instead of doing the copy in the kernel, can you just have a simple
    ioctl which returns the address and size of the snapshot? Then the
    userspace side can do the copy itself.


    Drew

    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


  3. Re: ksyms pseudo driver

    Andrew Gallatin wrote:
    >> its already opened snapshot first. Of course, this requires the read()
    >> handler to bounce the buffer into the kernel first before it is written
    >> back out to userspace. (Maybe there is a simple way to do an userspace
    >> to userspace copy instead?) The reason I went to all this trouble is to
    >> keep /dev/ksyms from turning into an easy way to exhaust all the kernel
    >> memory (unintentionally or intentionally).
    >>

    >
    > Instead of doing the copy in the kernel, can you just have a simple
    > ioctl which returns the address and size of the snapshot? Then the
    > userspace side can do the copy itself.
    >

    Actually that is what the ioctls do now... You can just open
    /dev/ksyms to create the snapshot and then use ioctl() to get the size
    and address where the buffer is mapped. Or you can use mmap().

    IOCTLS
    The ioctl(2) command codes below are defined in .

    The (third) argument to the ioctl(2) should be a pointer to the type
    indicated.

    KIOCGSIZE (size_t)
    Returns the total size of the current symbol table.

    KIOCGADDR (void *)
    Returns the address of the kernel symbol table
    mapped in
    the process memory.

    -stacey.


    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


  4. Re: ksyms pseudo driver

    On Fri, Jul 11, 2008 at 08:18:25PM -0500, Stacey Son wrote:
    > Andrew Gallatin wrote:
    > >>its already opened snapshot first. Of course, this requires the read()
    > >>handler to bounce the buffer into the kernel first before it is written
    > >>back out to userspace. (Maybe there is a simple way to do an userspace
    > >>to userspace copy instead?) The reason I went to all this trouble is to
    > >>keep /dev/ksyms from turning into an easy way to exhaust all the kernel
    > >>memory (unintentionally or intentionally).
    > >>

    > >
    > >Instead of doing the copy in the kernel, can you just have a simple
    > >ioctl which returns the address and size of the snapshot? Then the
    > >userspace side can do the copy itself.
    > >

    > Actually that is what the ioctls do now... You can just open
    > /dev/ksyms to create the snapshot and then use ioctl() to get the size
    > and address where the buffer is mapped. Or you can use mmap().


    Most likely, I miss some obvious reason there. But for me it looks
    like you do it in the reverse. The natural setup would be to require
    userspace to supply an allocated memory to the driver, and then the
    driver fills the memory with symbol table. This solves the problem of
    exhaustion of kernel address space.

    As usual, when user-supplied region is too small, driver shall return
    both an error and new required size. It is understandable that the size
    is volatile and may be too small for the next call too. But, in fact,
    kernel symtable does not change too often, so I think even the one
    iteration mostly succeed.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (FreeBSD)

    iEYEARECAAYFAkh4OfwACgkQC3+MBN1Mb4jxtgCgofnRqwzq8Q zlqE6jtIHXOI3Q
    cCYAmwS9jsXBz9CuvdmwtyqXRsdyRTkC
    =roYt
    -----END PGP SIGNATURE-----


  5. Re: ksyms pseudo driver

    Kostik Belousov wrote:
    > Most likely, I miss some obvious reason there. But for me it looks
    > like you do it in the reverse. The natural setup would be to require
    > userspace to supply an allocated memory to the driver, and then the
    > driver fills the memory with symbol table. This solves the problem of
    > exhaustion of kernel address space.
    >


    The snapshot of the consolidated symbol table is made when /dev/ksyms is
    opened. The storage for the snapshot is allocated in the memory map of
    the calling process. No kernel address space is used for the snapshot.

    A temporary buffer is allocated in kernel space in the read() handler
    (ksyms_read). Right now, for a read, it does two copies: one from
    user space to the temporary kernel space buffer and a second copy from
    the kernel space temp buffer and back out to user space. Ideally, it
    would be nice to do just one user space to user space copy directly in
    the kernel.

    > As usual, when user-supplied region is too small, driver shall return
    > both an error and new required size. It is understandable that the size
    > is volatile and may be too small for the next call too. But, in fact,
    > kernel symtable does not change too often, so I think even the one
    > iteration mostly succeed.
    >


    The reason the driver tries three times to create a valid snapshot is I
    couldn't figure out a way (without creating a lock reversal) to
    temporarily keep modules from being loaded or unloaded while the
    snapshot is created. I agree that it should be able to create the
    snapshot on the first iteration in most cases.

    BTW, you may have noticed the ksyms driver now uses your per-open file
    private data code which I like much better than using clone_create() for
    per-descriptor storage.

    Best Regards,

    -stacey.


    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


  6. Re: ksyms pseudo driver

    On Sun, Jul 13, 2008 at 11:22:55PM -0500, Stacey Son wrote:
    > Kostik Belousov wrote:
    > >Most likely, I miss some obvious reason there. But for me it looks
    > >like you do it in the reverse. The natural setup would be to require
    > >userspace to supply an allocated memory to the driver, and then the
    > >driver fills the memory with symbol table. This solves the problem of
    > >exhaustion of kernel address space.
    > >

    >
    > The snapshot of the consolidated symbol table is made when /dev/ksyms is
    > opened. The storage for the snapshot is allocated in the memory map of
    > the calling process. No kernel address space is used for the snapshot.

    Again, why this is done this way ? Why not creating snapshot when the
    user process issues ioctl that supplies neccessary usermode memory
    to the driver ?

    >
    > A temporary buffer is allocated in kernel space in the read() handler
    > (ksyms_read). Right now, for a read, it does two copies: one from
    > user space to the temporary kernel space buffer and a second copy from
    > the kernel space temp buffer and back out to user space. Ideally, it
    > would be nice to do just one user space to user space copy directly in
    > the kernel.
    >
    > >As usual, when user-supplied region is too small, driver shall return
    > >both an error and new required size. It is understandable that the size
    > >is volatile and may be too small for the next call too. But, in fact,
    > >kernel symtable does not change too often, so I think even the one
    > >iteration mostly succeed.
    > >

    >
    > The reason the driver tries three times to create a valid snapshot is I
    > couldn't figure out a way (without creating a lock reversal) to
    > temporarily keep modules from being loaded or unloaded while the
    > snapshot is created. I agree that it should be able to create the
    > snapshot on the first iteration in most cases.
    >
    > BTW, you may have noticed the ksyms driver now uses your per-open file
    > private data code which I like much better than using clone_create() for
    > per-descriptor storage.

    Does it work ? Do you have any suggestions for the KPI ?

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (FreeBSD)

    iEYEARECAAYFAkh8bwkACgkQC3+MBN1Mb4jKCACgk8JxGJf2CH d/JB31ouYKxw5J
    7ikAoJtSodf1j2gW1I3xUqNRwA2UMLqO
    =2azh
    -----END PGP SIGNATURE-----


  7. Re: ksyms pseudo driver

    Kostik Belousov wrote:
    >> The snapshot of the consolidated symbol table is made when /dev/ksyms is
    >> opened. The storage for the snapshot is allocated in the memory map of
    >> the calling process. No kernel address space is used for the snapshot.
    >>

    > Again, why this is done this way ? Why not creating snapshot when the
    > user process issues ioctl that supplies neccessary usermode memory
    > to the driver ?
    >


    The main reason it is written as a pseudo driver is so it can be used
    with standard command-line utilities. For example, see the ksyms
    example in the dtrace manual
    (http://wikis.sun.com/display/DTrace/Structs+and+Unions). I guess it
    could still be possible to do in the way you are suggesting but it would
    require a special 'cat', or something, to allocate the user space buffer
    and then pass that in driver before it starts reading the symbol table.
    You could then pipe the output of the "special ksyms cat" to the actual
    command-line program you wanted to use. Of course, if you had to use
    a "special ksyms cat" then there would be no reason to make this a
    pseudo driver. You could simply make it a system call and eliminate a
    lot of code and calls into the kernel.

    >> BTW, you may have noticed the ksyms driver now uses your per-open file
    >> private data code which I like much better than using clone_create() for
    >> per-descriptor storage.
    >>

    > Does it work ? Do you have any suggestions for the KPI ?
    >

    Yes, it seems to work much better than the previous method
    (clone_create) but more testing is needed.

    I was having problems with the clone_create() method when I was running
    some testing code that would rapidly open /dev/ksyms. open() would
    fail. I am guessing there may be a race condition between when the
    device is cloned and actually open'ed.

    I'll let you know if I have some suggestions for the KPI.

    -stacey.
    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


  8. Re: ksyms pseudo driver

    On Tue, Jul 15, 2008 at 08:14:28AM -0500, Stacey Son wrote:
    > Kostik Belousov wrote:
    > >>The snapshot of the consolidated symbol table is made when /dev/ksyms is
    > >>opened. The storage for the snapshot is allocated in the memory map of
    > >>the calling process. No kernel address space is used for the snapshot.
    > >>

    > >Again, why this is done this way ? Why not creating snapshot when the
    > >user process issues ioctl that supplies neccessary usermode memory
    > >to the driver ?
    > >

    >
    > The main reason it is written as a pseudo driver is so it can be used
    > with standard command-line utilities. For example, see the ksyms
    > example in the dtrace manual
    > (http://wikis.sun.com/display/DTrace/Structs+and+Unions). I guess it
    > could still be possible to do in the way you are suggesting but it would
    > require a special 'cat', or something, to allocate the user space buffer
    > and then pass that in driver before it starts reading the symbol table.
    > You could then pipe the output of the "special ksyms cat" to the actual
    > command-line program you wanted to use. Of course, if you had to use
    > a "special ksyms cat" then there would be no reason to make this a
    > pseudo driver. You could simply make it a system call and eliminate a
    > lot of code and calls into the kernel.


    Would dd bs= work as the "special cat" ? procfs'
    /proc/pid/map has the similar problem, and there was a procmap program
    in ports. I believe dd is enough.

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.9 (FreeBSD)

    iEYEARECAAYFAkh8o4gACgkQC3+MBN1Mb4gk6wCgudTjZWHagF 5Xyr8kkJhDno0l
    iKAAoLZCuZpw2ZNArz/qAyNzHZeh6Ryg
    =10Cr
    -----END PGP SIGNATURE-----


  9. Re: ksyms pseudo driver

    Stacey Son wrote:

    > The main reason it is written as a pseudo driver is so it can be used
    > with standard command-line utilities. For example, see the ksyms


    Ah, now everything is perfectly clear to me. Your method is
    very clever indeed.

    Just out of curiosity, how much memory will the entire symbol
    + strings table require? How often do typical consumers (like dtrace)
    request them?

    Drew
    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


  10. Re: ksyms pseudo driver

    Andrew Gallatin wrote:
    > Ah, now everything is perfectly clear to me. Your method is
    > very clever indeed.
    >
    > Just out of curiosity, how much memory will the entire symbol
    > + strings table require? How often do typical consumers (like dtrace)
    > request them?


    On an AMD64 "Generic" kernel with only the ksyms module loaded it is
    1523847 bytes.

    lockstat(1M) will open and read /dev/ksyms once each time it is
    invoked. For dtrace, it depends on the script but there shouldn't be
    any reason why it reads it more than once as well.

    -stacey.

    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"


+ Reply to Thread