SUNRPC problem with 2.6.26 and beyond - Kernel

This is a discussion on SUNRPC problem with 2.6.26 and beyond - Kernel ; I have a dual quad-core Xeon system running software ( http://www.unidata.ucar.edu/software/ldm ) that relays and processes weather data through RPC calls, keeping a queue of data in a memory mapped file. Up until 2.6.26 the system has run just fine ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: SUNRPC problem with 2.6.26 and beyond

  1. SUNRPC problem with 2.6.26 and beyond

    I have a dual quad-core Xeon system running software
    (http://www.unidata.ucar.edu/software/ldm) that relays and processes
    weather data through RPC calls, keeping a queue of data in a memory
    mapped file. Up until 2.6.26 the system has run just fine (for example
    2.6.25.17). But starting with 2.6.26 through 2.6.27.2 the system runs
    into a problem after approximately 24 hours. The symptom is that the
    processing slows down to a crawl. Using "top" I can see that the System
    time is up over 90%, with almost no User and Wait time. If I stop and
    restart the software, most of the time it gets better - but sometimes it
    takes a reboot to fix the problem. I have an identical system that does
    just processing and ingesting data from remote systems, and it does not
    have this problem. I have tried a number of different kernel
    configurations, but they all show the same problem.

    I suspect a problem with SUNRPC. I notice that there were a large
    number of SUNRPC patches in 2.6.26. I am looking for suggestions on how
    to pin down which patches are causing the problem. Are there ways to
    figure where in the kernel the time is being spent? I am will to work
    on isolating the problem, but I need some suggestions on the best way to
    do it given the large number of SUNRPC patches in 2.6.26 and the fact
    that each experiment takes a day.
    --

    Dr. Harry Edmon E-MAIL: harry@atmos.washington.edu
    206-543-0547 harry@washington.edu
    Dept of Atmospheric Sciences FAX: 206-543-0308
    University of Washington, Box 351640, Seattle, WA 98195-1640

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: SUNRPC problem with 2.6.26 and beyond - try again with response in correct place.

    Trond Myklebust wrote:
    > On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
    >
    >> I have a dual quad-core Xeon system running software
    >> (http://www.unidata.ucar.edu/software/ldm) that relays and processes
    >> weather data through RPC calls, keeping a queue of data in a memory
    >> mapped file. Up until 2.6.26 the system has run just fine (for example
    >> 2.6.25.17). But starting with 2.6.26 through 2.6.27.2 the system runs
    >> into a problem after approximately 24 hours. The symptom is that the
    >> processing slows down to a crawl. Using "top" I can see that the System
    >> time is up over 90%, with almost no User and Wait time. If I stop and
    >> restart the software, most of the time it gets better - but sometimes it
    >> takes a reboot to fix the problem. I have an identical system that does
    >> just processing and ingesting data from remote systems, and it does not
    >> have this problem. I have tried a number of different kernel
    >> configurations, but they all show the same problem.
    >>
    >> I suspect a problem with SUNRPC. I notice that there were a large
    >> number of SUNRPC patches in 2.6.26. I am looking for suggestions on how
    >> to pin down which patches are causing the problem. Are there ways to
    >> figure where in the kernel the time is being spent? I am will to work
    >> on isolating the problem, but I need some suggestions on the best way to
    >> do it given the large number of SUNRPC patches in 2.6.26 and the fact
    >> that each experiment takes a day.
    >>

    >
    > The kernel sunrpc interface is not exported to user land: the glibc code
    > uses its own, entirely separate implementation of sunrpc.
    >
    > I cannot therefore see, how your application's RPC calls can be affected
    > by kernel sunrpc changes.
    >
    > Cheers
    > Trond
    >
    >

    Then how do you explain the the large system time used with 2.6.26 and
    beyond? Is it some other patch I should be looking at?
    --

    Dr. Harry Edmon E-MAIL: harry@atmos.washington.edu
    206-543-0547 harry@washington.edu
    Dept of Atmospheric Sciences FAX: 206-543-0308
    University of Washington, Box 351640, Seattle, WA 98195-1640

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: SUNRPC problem with 2.6.26 and beyond

    Then how do you explain the the large system time used with 2.6.26 and
    beyond? Is it some other patch I should be looking at?

    Trond Myklebust wrote:
    > On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
    >
    >> I have a dual quad-core Xeon system running software
    >> (http://www.unidata.ucar.edu/software/ldm) that relays and processes
    >> weather data through RPC calls, keeping a queue of data in a memory
    >> mapped file. Up until 2.6.26 the system has run just fine (for example
    >> 2.6.25.17). But starting with 2.6.26 through 2.6.27.2 the system runs
    >> into a problem after approximately 24 hours. The symptom is that the
    >> processing slows down to a crawl. Using "top" I can see that the System
    >> time is up over 90%, with almost no User and Wait time. If I stop and
    >> restart the software, most of the time it gets better - but sometimes it
    >> takes a reboot to fix the problem. I have an identical system that does
    >> just processing and ingesting data from remote systems, and it does not
    >> have this problem. I have tried a number of different kernel
    >> configurations, but they all show the same problem.
    >>
    >> I suspect a problem with SUNRPC. I notice that there were a large
    >> number of SUNRPC patches in 2.6.26. I am looking for suggestions on how
    >> to pin down which patches are causing the problem. Are there ways to
    >> figure where in the kernel the time is being spent? I am will to work
    >> on isolating the problem, but I need some suggestions on the best way to
    >> do it given the large number of SUNRPC patches in 2.6.26 and the fact
    >> that each experiment takes a day.
    >>

    >
    > The kernel sunrpc interface is not exported to user land: the glibc code
    > uses its own, entirely separate implementation of sunrpc.
    >
    > I cannot therefore see, how your application's RPC calls can be affected
    > by kernel sunrpc changes.
    >
    > Cheers
    > Trond
    >
    >



    --
    Dr. Harry Edmon E-MAIL: harry@atmos.washington.edu
    206-543-0547 harry@washington.edu
    Dept of Atmospheric Sciences FAX: 206-543-0308
    University of Washington, Box 351640, Seattle, WA 98195-1640

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: SUNRPC problem with 2.6.26 and beyond

    On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
    > I have a dual quad-core Xeon system running software
    > (http://www.unidata.ucar.edu/software/ldm) that relays and processes
    > weather data through RPC calls, keeping a queue of data in a memory
    > mapped file. Up until 2.6.26 the system has run just fine (for example
    > 2.6.25.17). But starting with 2.6.26 through 2.6.27.2 the system runs
    > into a problem after approximately 24 hours. The symptom is that the
    > processing slows down to a crawl. Using "top" I can see that the System
    > time is up over 90%, with almost no User and Wait time. If I stop and
    > restart the software, most of the time it gets better - but sometimes it
    > takes a reboot to fix the problem. I have an identical system that does
    > just processing and ingesting data from remote systems, and it does not
    > have this problem. I have tried a number of different kernel
    > configurations, but they all show the same problem.
    >
    > I suspect a problem with SUNRPC. I notice that there were a large
    > number of SUNRPC patches in 2.6.26. I am looking for suggestions on how
    > to pin down which patches are causing the problem. Are there ways to
    > figure where in the kernel the time is being spent? I am will to work
    > on isolating the problem, but I need some suggestions on the best way to
    > do it given the large number of SUNRPC patches in 2.6.26 and the fact
    > that each experiment takes a day.


    The kernel sunrpc interface is not exported to user land: the glibc code
    uses its own, entirely separate implementation of sunrpc.

    I cannot therefore see, how your application's RPC calls can be affected
    by kernel sunrpc changes.

    Cheers
    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: SUNRPC problem with 2.6.26 and beyond - try again with response in correct place.

    On Wed, 2008-10-22 at 15:55 -0700, Harry Edmon wrote:
    > Trond Myklebust wrote:
    > > On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
    > >
    > >> I have a dual quad-core Xeon system running software
    > >> (http://www.unidata.ucar.edu/software/ldm) that relays and processes
    > >> weather data through RPC calls, keeping a queue of data in a memory
    > >> mapped file. Up until 2.6.26 the system has run just fine (for example
    > >> 2.6.25.17). But starting with 2.6.26 through 2.6.27.2 the system runs
    > >> into a problem after approximately 24 hours. The symptom is that the
    > >> processing slows down to a crawl. Using "top" I can see that the System
    > >> time is up over 90%, with almost no User and Wait time. If I stop and
    > >> restart the software, most of the time it gets better - but sometimes it
    > >> takes a reboot to fix the problem. I have an identical system that does
    > >> just processing and ingesting data from remote systems, and it does not
    > >> have this problem. I have tried a number of different kernel
    > >> configurations, but they all show the same problem.
    > >>
    > >> I suspect a problem with SUNRPC. I notice that there were a large
    > >> number of SUNRPC patches in 2.6.26. I am looking for suggestions on how
    > >> to pin down which patches are causing the problem. Are there ways to
    > >> figure where in the kernel the time is being spent? I am will to work
    > >> on isolating the problem, but I need some suggestions on the best way to
    > >> do it given the large number of SUNRPC patches in 2.6.26 and the fact
    > >> that each experiment takes a day.
    > >>

    > >
    > > The kernel sunrpc interface is not exported to user land: the glibc code
    > > uses its own, entirely separate implementation of sunrpc.
    > >
    > > I cannot therefore see, how your application's RPC calls can be affected
    > > by kernel sunrpc changes.
    > >
    > > Cheers
    > > Trond
    > >
    > >

    > Then how do you explain the the large system time used with 2.6.26 and
    > beyond? Is it some other patch I should be looking at?


    I'm not explaining it. I'm saying that nothing outside the kernel NFS
    and NLM code uses the kernel sunrpc implementation. Your userland RPC
    calls are using glibc's implementation of sunrpc. Those are unaffected
    by patches to the kernel sunrpc layer.

    If you are seeing a hang, then I suggest you start by using the strace
    utility to figure out which system call is actually involved.

    Cheers
    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: SUNRPC problem with 2.6.26 and beyond - try again with response in correct place.

    Trond Myklebust wrote:
    > On Wed, 2008-10-22 at 15:55 -0700, Harry Edmon wrote:
    >
    >> Trond Myklebust wrote:
    >>
    >>> On Wed, 2008-10-22 at 08:35 -0700, Harry Edmon wrote:
    >>>
    >>>
    >>>> I have a dual quad-core Xeon system running software
    >>>> (http://www.unidata.ucar.edu/software/ldm) that relays and processes
    >>>> weather data through RPC calls, keeping a queue of data in a memory
    >>>> mapped file. Up until 2.6.26 the system has run just fine (for example
    >>>> 2.6.25.17). But starting with 2.6.26 through 2.6.27.2 the system runs
    >>>> into a problem after approximately 24 hours. The symptom is that the
    >>>> processing slows down to a crawl. Using "top" I can see that the System
    >>>> time is up over 90%, with almost no User and Wait time. If I stop and
    >>>> restart the software, most of the time it gets better - but sometimes it
    >>>> takes a reboot to fix the problem. I have an identical system that does
    >>>> just processing and ingesting data from remote systems, and it does not
    >>>> have this problem. I have tried a number of different kernel
    >>>> configurations, but they all show the same problem.
    >>>>
    >>>> I suspect a problem with SUNRPC. I notice that there were a large
    >>>> number of SUNRPC patches in 2.6.26. I am looking for suggestions on how
    >>>> to pin down which patches are causing the problem. Are there ways to
    >>>> figure where in the kernel the time is being spent? I am will to work
    >>>> on isolating the problem, but I need some suggestions on the best way to
    >>>> do it given the large number of SUNRPC patches in 2.6.26 and the fact
    >>>> that each experiment takes a day.
    >>>>
    >>>>
    >>> The kernel sunrpc interface is not exported to user land: the glibc code
    >>> uses its own, entirely separate implementation of sunrpc.
    >>>
    >>> I cannot therefore see, how your application's RPC calls can be affected
    >>> by kernel sunrpc changes.
    >>>
    >>> Cheers
    >>> Trond
    >>>
    >>>
    >>>

    >> Then how do you explain the the large system time used with 2.6.26 and
    >> beyond? Is it some other patch I should be looking at?
    >>

    >
    > I'm not explaining it. I'm saying that nothing outside the kernel NFS
    > and NLM code uses the kernel sunrpc implementation. Your userland RPC
    > calls are using glibc's implementation of sunrpc. Those are unaffected
    > by patches to the kernel sunrpc layer.
    >
    > If you are seeing a hang, then I suggest you start by using the strace
    > utility to figure out which system call is actually involved.
    >
    > Cheers
    > Trond
    >
    >

    The problem is that it is not hanging. The processes are running
    through a lot of systems calls. It is just that the system time jumps
    up to over 95% on all 8 processors with 2.6.26 and beyond. I never see
    that with 2.6.25.17. I will try looking again and see if there are
    certain calls that are taking a lot of time.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread