System Call Overhead v. Function Call Overhead Question, etc. - Linux

This is a discussion on System Call Overhead v. Function Call Overhead Question, etc. - Linux ; I have a design specification to implement on Linux, in which there is a heavy stress on avoiding system calls whenever possible, while at the same time paying no mind to how many function calls in user mode are being ...

+ Reply to Thread
Results 1 to 3 of 3

Thread: System Call Overhead v. Function Call Overhead Question, etc.

  1. System Call Overhead v. Function Call Overhead Question, etc.


    I have a design specification to implement on Linux, in which there is a
    heavy stress on avoiding system calls whenever possible, while at the same
    time paying no mind to how many function calls in user mode are being done.

    If I remember correctly from the kernel classes long ago, a system call is
    just a kernel function call. The switch from user mode to kernel mode was
    done through a trap mechanism that was quite efficient, so therefore all
    other things being equal a system call should not be magnitudes more costly
    than a function call. Both need to save the current context, branch, and
    return and restore the previous context. One is done in the user stack the
    other in the kernel stack. The change to and from kernel mode is the only
    extra overhead for a system call and that can't be all that expensive.

    Is that right?


    The same with malloc/free, new/delete. It passes on using useful STL
    containers because they use heap memory allocation. As I understand malloc
    is only expensive when there is an sbreak involved to get a chunk of memory
    from the OS. Otherwise the library dispenses preallocated memory blocks and
    the cost is not that great and comparable to creating stack objects.

    Is that so ?

    Many thanks.



  2. Re: System Call Overhead v. Function Call Overhead Question, etc.

    On Wed, 28 Mar 2007 07:16:55 +0000, Tiglath wrote:

    > If I remember correctly from the kernel classes long ago, a system call
    > is just a kernel function call. The switch from user mode to kernel
    > mode was done through a trap mechanism that was quite efficient, so
    > therefore all other things being equal a system call should not be
    > magnitudes more costly than a function call. Both need to save the
    > current context, branch, and return and restore the previous context.
    > One is done in the user stack the other in the kernel stack. The
    > change to and from kernel mode is the only extra overhead for a system
    > call and that can't be all that expensive.
    > Is that right?


    Pretty much, yeah. Linux user-mode / kernel-mode context switches are
    relatively light, as these things go. The major difference is that the
    process scheduler will probably take advantage of the system call in order
    to switch to a higher-priority process, but that will happen at the next
    clock tick anyway.

    Worry more about using an appropriate algorigthm, and keeping a tight
    locality
    of reference to avoid unnecessary CPU cache misses.

    For example, on most X86 machines this code:

    struct {
    unsigned char a;
    unsigned char b[ 64 ];
    unsigned char c;
    } s;

    printf( "%u\n", a + c );

    will run from 10 to 100 times slower than

    struct {
    unsigned char a;
    unsigned char c;
    unsigned char b[ 64 ];
    } s;

    printf( "%u\n", a + c );

    because of CPU cache priming.

    > The same with malloc/free, new/delete. It passes on using useful STL
    > containers because they use heap memory allocation. As I understand
    > malloc is only expensive when there is an sbreak involved to get a chunk
    > of memory from the OS. Otherwise the library dispenses preallocated
    > memory blocks and the cost is not that great and comparable to creating
    > stack objects.
    > Is that so ?


    Nope, not that way.

    malloc() does try to recycle blocks from its current free list, but
    even when it can't it does a lazy allocation that just updates the
    process's
    virtual memory map; you don't incurr the allocation penalty until you
    actually
    touch that memory page and the resulting page fault fills in the actual
    memory.

    HTH

  3. Re: System Call Overhead v. Function Call Overhead Question, etc.



    Thank you for your reply.

    I wasn't aware that a thread could be scheduled out before it's time slice
    was over because it made a system call. I thought that the switch to
    kernel mode is done with a single instruction "sysenter" if I recall
    correctly, which causes a software interrupt the results in the kernel
    syscall handler starting execution at once without calling the scheduler.
    The system call handler does additional work checking for trace flags, etc.,
    but if the thread has time left to run shall not be preempted by a higher
    priority process. Interesting. That's seems a good reason to avoid system
    calls as much as possible.


    "Tommy Reynolds" wrote in message
    newsan.2007.03.28.18.40.35@MegaCoder.com...
    > On Wed, 28 Mar 2007 07:16:55 +0000, Tiglath wrote:
    >
    >> If I remember correctly from the kernel classes long ago, a system call
    >> is just a kernel function call. The switch from user mode to kernel
    >> mode was done through a trap mechanism that was quite efficient, so
    >> therefore all other things being equal a system call should not be
    >> magnitudes more costly than a function call. Both need to save the
    >> current context, branch, and return and restore the previous context.
    >> One is done in the user stack the other in the kernel stack. The
    >> change to and from kernel mode is the only extra overhead for a system
    >> call and that can't be all that expensive.
    >> Is that right?

    >
    > Pretty much, yeah. Linux user-mode / kernel-mode context switches are
    > relatively light, as these things go. The major difference is that the
    > process scheduler will probably take advantage of the system call in order
    > to switch to a higher-priority process, but that will happen at the next
    > clock tick anyway.
    >
    > Worry more about using an appropriate algorigthm, and keeping a tight
    > locality
    > of reference to avoid unnecessary CPU cache misses.
    >
    > For example, on most X86 machines this code:
    >
    > struct {
    > unsigned char a;
    > unsigned char b[ 64 ];
    > unsigned char c;
    > } s;
    >
    > printf( "%u\n", a + c );
    >
    > will run from 10 to 100 times slower than
    >
    > struct {
    > unsigned char a;
    > unsigned char c;
    > unsigned char b[ 64 ];
    > } s;
    >
    > printf( "%u\n", a + c );
    >
    > because of CPU cache priming.
    >
    >> The same with malloc/free, new/delete. It passes on using useful STL
    >> containers because they use heap memory allocation. As I understand
    >> malloc is only expensive when there is an sbreak involved to get a chunk
    >> of memory from the OS. Otherwise the library dispenses preallocated
    >> memory blocks and the cost is not that great and comparable to creating
    >> stack objects.
    >> Is that so ?

    >
    > Nope, not that way.
    >
    > malloc() does try to recycle blocks from its current free list, but
    > even when it can't it does a lazy allocation that just updates the
    > process's
    > virtual memory map; you don't incurr the allocation penalty until you
    > actually
    > touch that memory page and the resulting page fault fills in the actual
    > memory.


    In the multi-threaded environment I am working on, the application is IO
    bound so CPU power is not a worry, but malloc remains a concern because of
    lock contention. I understand that the Linus heap manager uses more than
    one lock (one per arena or sub-heaps) but I have no idea how one can get a
    thread to always use the same sub-heap to reduce lock contention.

    Thanks again.




+ Reply to Thread