[PATCH] lockd: convert reclaimer thread to kthread interface - Kernel

This is a discussion on [PATCH] lockd: convert reclaimer thread to kthread interface - Kernel ; My understanding is that there is a push to turn the kernel_thread interface into a non-exported symbol and move all kernel threads to use the kthread API. This patch changes lockd to use kthread_run to spawn the reclaimer thread. I've ...

+ Reply to Thread
Results 1 to 11 of 11

Thread: [PATCH] lockd: convert reclaimer thread to kthread interface

  1. [PATCH] lockd: convert reclaimer thread to kthread interface

    My understanding is that there is a push to turn the kernel_thread
    interface into a non-exported symbol and move all kernel threads to use
    the kthread API. This patch changes lockd to use kthread_run to spawn
    the reclaimer thread.

    I've made the assumption here that the extra module references taken
    when we spawn this thread are unnecessary and removed them. I've also
    added a KERN_ERR printk that pops if the thread can't be spawned to warn
    the admin that the locks won't be reclaimed.

    I consider this patch 2.6.29 material.

    Signed-off-by: Jeff Layton
    ---
    fs/lockd/clntlock.c | 14 +++++++++-----
    1 files changed, 9 insertions(+), 5 deletions(-)

    diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
    index 8307dd6..fcc2378 100644
    --- a/fs/lockd/clntlock.c
    +++ b/fs/lockd/clntlock.c
    @@ -14,6 +14,7 @@
    #include
    #include
    #include
    +#include

    #define NLMDBG_FACILITY NLMDBG_CLIENT

    @@ -191,11 +192,15 @@ __be32 nlmclnt_grant(const struct sockaddr *addr, const struct nlm_lock *lock)
    void
    nlmclnt_recovery(struct nlm_host *host)
    {
    + struct task_struct *task;
    +
    if (!host->h_reclaiming++) {
    nlm_get_host(host);
    - __module_get(THIS_MODULE);
    - if (kernel_thread(reclaimer, host, CLONE_FS | CLONE_FILES) < 0)
    - module_put(THIS_MODULE);
    + task = kthread_run(reclaimer, host, "%s-reclaim", host->h_name);
    + if (IS_ERR(task))
    + printk(KERN_ERR "lockd: unable to spawn reclaimer "
    + "thread. Locks for %s won't be reclaimed! "
    + "(%ld)\n", host->h_name, PTR_ERR(task));
    }
    }

    @@ -207,7 +212,6 @@ reclaimer(void *ptr)
    struct file_lock *fl, *next;
    u32 nsmstate;

    - daemonize("%s-reclaim", host->h_name);
    allow_signal(SIGKILL);

    down_write(&host->h_rwsem);
    @@ -261,5 +265,5 @@ restart:
    nlm_release_host(host);
    lockd_down();
    unlock_kernel();
    - module_put_and_exit(0);
    + return 0;
    }
    --
    1.5.5.1

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH] lockd: convert reclaimer thread to kthread interface

    On Wed, 29 Oct 2008 07:15:45 -0400
    Jeff Layton wrote:

    > My understanding is that there is a push to turn the kernel_thread
    > interface into a non-exported symbol and move all kernel threads to use
    > the kthread API. This patch changes lockd to use kthread_run to spawn
    > the reclaimer thread.
    >
    > I've made the assumption here that the extra module references taken
    > when we spawn this thread are unnecessary and removed them. I've also
    > added a KERN_ERR printk that pops if the thread can't be spawned to warn
    > the admin that the locks won't be reclaimed.
    >
    > I consider this patch 2.6.29 material.
    >
    > Signed-off-by: Jeff Layton
    > ---
    > fs/lockd/clntlock.c | 14 +++++++++-----
    > 1 files changed, 9 insertions(+), 5 deletions(-)
    >
    > diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
    > index 8307dd6..fcc2378 100644
    > --- a/fs/lockd/clntlock.c
    > +++ b/fs/lockd/clntlock.c
    > @@ -14,6 +14,7 @@
    > #include
    > #include
    > #include
    > +#include
    >
    > #define NLMDBG_FACILITY NLMDBG_CLIENT
    >
    > @@ -191,11 +192,15 @@ __be32 nlmclnt_grant(const struct sockaddr *addr, const struct nlm_lock *lock)
    > void
    > nlmclnt_recovery(struct nlm_host *host)
    > {
    > + struct task_struct *task;
    > +
    > if (!host->h_reclaiming++) {
    > nlm_get_host(host);
    > - __module_get(THIS_MODULE);
    > - if (kernel_thread(reclaimer, host, CLONE_FS | CLONE_FILES) < 0)
    > - module_put(THIS_MODULE);
    > + task = kthread_run(reclaimer, host, "%s-reclaim", host->h_name);
    > + if (IS_ERR(task))
    > + printk(KERN_ERR "lockd: unable to spawn reclaimer "
    > + "thread. Locks for %s won't be reclaimed! "
    > + "(%ld)\n", host->h_name, PTR_ERR(task));
    > }
    > }
    >
    > @@ -207,7 +212,6 @@ reclaimer(void *ptr)
    > struct file_lock *fl, *next;
    > u32 nsmstate;
    >
    > - daemonize("%s-reclaim", host->h_name);
    > allow_signal(SIGKILL);
    >
    > down_write(&host->h_rwsem);
    > @@ -261,5 +265,5 @@ restart:
    > nlm_release_host(host);
    > lockd_down();
    > unlock_kernel();
    > - module_put_and_exit(0);
    > + return 0;
    > }


    Looks OK to me. I assume the SIGKILL handling has been carefully tested?


    Is it correct to emit a warning and keep going if the thread didn't
    start? Or would it be safer&saner to fail the whole mount (or whatever
    syscall we're doing here..)



    I see this:

    /* Why are we leaking memory here? --okir */
    if (signalled())
    continue;

    is that still true? It seems unlikely that what appears to be a pretty
    gross leak has been around for so long.

    This code needs some BKL-removal love.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH] lockd: convert reclaimer thread to kthread interface

    On Mon, 3 Nov 2008 13:12:15 -0800
    Andrew Morton wrote:

    > On Wed, 29 Oct 2008 07:15:45 -0400
    > Jeff Layton wrote:
    >
    > > My understanding is that there is a push to turn the kernel_thread
    > > interface into a non-exported symbol and move all kernel threads to use
    > > the kthread API. This patch changes lockd to use kthread_run to spawn
    > > the reclaimer thread.
    > >
    > > I've made the assumption here that the extra module references taken
    > > when we spawn this thread are unnecessary and removed them. I've also
    > > added a KERN_ERR printk that pops if the thread can't be spawned to warn
    > > the admin that the locks won't be reclaimed.
    > >
    > > I consider this patch 2.6.29 material.
    > >
    > > Signed-off-by: Jeff Layton
    > > ---
    > > fs/lockd/clntlock.c | 14 +++++++++-----
    > > 1 files changed, 9 insertions(+), 5 deletions(-)
    > >
    > > diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
    > > index 8307dd6..fcc2378 100644
    > > --- a/fs/lockd/clntlock.c
    > > +++ b/fs/lockd/clntlock.c
    > > @@ -14,6 +14,7 @@
    > > #include
    > > #include
    > > #include
    > > +#include
    > >
    > > #define NLMDBG_FACILITY NLMDBG_CLIENT
    > >
    > > @@ -191,11 +192,15 @@ __be32 nlmclnt_grant(const struct sockaddr *addr, const struct nlm_lock *lock)
    > > void
    > > nlmclnt_recovery(struct nlm_host *host)
    > > {
    > > + struct task_struct *task;
    > > +
    > > if (!host->h_reclaiming++) {
    > > nlm_get_host(host);
    > > - __module_get(THIS_MODULE);
    > > - if (kernel_thread(reclaimer, host, CLONE_FS | CLONE_FILES) < 0)
    > > - module_put(THIS_MODULE);
    > > + task = kthread_run(reclaimer, host, "%s-reclaim", host->h_name);
    > > + if (IS_ERR(task))
    > > + printk(KERN_ERR "lockd: unable to spawn reclaimer "
    > > + "thread. Locks for %s won't be reclaimed! "
    > > + "(%ld)\n", host->h_name, PTR_ERR(task));
    > > }
    > > }
    > >
    > > @@ -207,7 +212,6 @@ reclaimer(void *ptr)
    > > struct file_lock *fl, *next;
    > > u32 nsmstate;
    > >
    > > - daemonize("%s-reclaim", host->h_name);
    > > allow_signal(SIGKILL);
    > >
    > > down_write(&host->h_rwsem);
    > > @@ -261,5 +265,5 @@ restart:
    > > nlm_release_host(host);
    > > lockd_down();
    > > unlock_kernel();
    > > - module_put_and_exit(0);
    > > + return 0;
    > > }

    >
    > Looks OK to me. I assume the SIGKILL handling has been carefully tested?
    >


    Not by me, though I don't think this patch will make that any better or
    worse. It should just change how the thread is spawned. My testing
    mostly consisted of making sure that we could reclaim locks after this
    was applied.

    >
    > Is it correct to emit a warning and keep going if the thread didn't
    > start? Or would it be safer&saner to fail the whole mount (or whatever
    > syscall we're doing here..)
    >
    >
    >
    > I see this:
    >
    > /* Why are we leaking memory here? --okir */
    > if (signalled())
    > continue;
    >
    > is that still true? It seems unlikely that what appears to be a pretty
    > gross leak has been around for so long.
    >


    Well, just before that we do this:

    list_del_init(&fl->fl_u.nfs_fl.list);

    ....and a little while after, we do this:

    list_add_tail(&fl->fl_u.nfs_fl.list, &host->h_granted);

    ....so I assume the fact that the fl doesn't end up back on a list if
    we're signalled is the problem. If so, then yes, it does look like we're
    still leaking memory there.

    I've never heard of anyone needing to signal the reclaimer thread, so
    maybe we should just make it ignore all signals?

    > This code needs some BKL-removal love.


    Yep. All of lockd does, though IIRC that's held up by dependencies on
    the BKL in generic VFS locking code. Pulling the BKL out of lockd is
    probably going to be painful since it's almost surely hiding some
    races.

    --
    Jeff Layton
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH] lockd: convert reclaimer thread to kthread interface

    On Mon, 3 Nov 2008 13:12:15 -0800
    Andrew Morton wrote:

    > On Wed, 29 Oct 2008 07:15:45 -0400
    > Jeff Layton wrote:
    >
    > > My understanding is that there is a push to turn the kernel_thread
    > > interface into a non-exported symbol and move all kernel threads to use
    > > the kthread API. This patch changes lockd to use kthread_run to spawn
    > > the reclaimer thread.
    > >
    > > I've made the assumption here that the extra module references taken
    > > when we spawn this thread are unnecessary and removed them. I've also
    > > added a KERN_ERR printk that pops if the thread can't be spawned to warn
    > > the admin that the locks won't be reclaimed.
    > >
    > > I consider this patch 2.6.29 material.
    > >
    > > Signed-off-by: Jeff Layton
    > > ---
    > > fs/lockd/clntlock.c | 14 +++++++++-----
    > > 1 files changed, 9 insertions(+), 5 deletions(-)
    > >
    > > diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
    > > index 8307dd6..fcc2378 100644
    > > --- a/fs/lockd/clntlock.c
    > > +++ b/fs/lockd/clntlock.c
    > > @@ -14,6 +14,7 @@
    > > #include
    > > #include
    > > #include
    > > +#include
    > >
    > > #define NLMDBG_FACILITY NLMDBG_CLIENT
    > >
    > > @@ -191,11 +192,15 @@ __be32 nlmclnt_grant(const struct sockaddr *addr, const struct nlm_lock *lock)
    > > void
    > > nlmclnt_recovery(struct nlm_host *host)
    > > {
    > > + struct task_struct *task;
    > > +
    > > if (!host->h_reclaiming++) {
    > > nlm_get_host(host);
    > > - __module_get(THIS_MODULE);
    > > - if (kernel_thread(reclaimer, host, CLONE_FS | CLONE_FILES) < 0)
    > > - module_put(THIS_MODULE);
    > > + task = kthread_run(reclaimer, host, "%s-reclaim", host->h_name);
    > > + if (IS_ERR(task))
    > > + printk(KERN_ERR "lockd: unable to spawn reclaimer "
    > > + "thread. Locks for %s won't be reclaimed! "
    > > + "(%ld)\n", host->h_name, PTR_ERR(task));
    > > }
    > > }
    > >
    > > @@ -207,7 +212,6 @@ reclaimer(void *ptr)
    > > struct file_lock *fl, *next;
    > > u32 nsmstate;
    > >
    > > - daemonize("%s-reclaim", host->h_name);
    > > allow_signal(SIGKILL);
    > >
    > > down_write(&host->h_rwsem);
    > > @@ -261,5 +265,5 @@ restart:
    > > nlm_release_host(host);
    > > lockd_down();
    > > unlock_kernel();
    > > - module_put_and_exit(0);
    > > + return 0;
    > > }

    >
    > Looks OK to me. I assume the SIGKILL handling has been carefully tested?
    >
    >
    > Is it correct to emit a warning and keep going if the thread didn't
    > start? Or would it be safer&saner to fail the whole mount (or whatever
    > syscall we're doing here..)
    >


    Forgot to answer this part...

    This thread gets kicked off when the server has rebooted and we need to
    reclaim our locks. There isn't a syscall on which we can return an
    error to the user.

    Aside from just warning the admin, I'm not sure what we can do here. We
    might be able to start making all syscalls on the mount fail somehow,
    but I don't think we have infrastructure for that and that may be
    overkill anyway. I suppose we could also go to sleep and try to spawn the
    thread again, but there's no guarantee of success there.

    >
    >
    > I see this:
    >
    > /* Why are we leaking memory here? --okir */
    > if (signalled())
    > continue;
    >
    > is that still true? It seems unlikely that what appears to be a pretty
    > gross leak has been around for so long.
    >
    > This code needs some BKL-removal love.
    > --
    > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
    > the body of a message to majordomo@vger.kernel.org
    > More majordomo info at http://vger.kernel.org/majordomo-info.html
    >



    --
    Jeff Layton
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH] lockd: convert reclaimer thread to kthread interface

    On Nov 3, 2008, at 19:19, Jeff Layton wrote:

    > On Mon, 3 Nov 2008 13:12:15 -0800
    > Andrew Morton wrote:
    >
    >> On Wed, 29 Oct 2008 07:15:45 -0400
    >> Jeff Layton wrote:
    >>
    >>> My understanding is that there is a push to turn the kernel_thread
    >>> interface into a non-exported symbol and move all kernel threads
    >>> to use
    >>> the kthread API. This patch changes lockd to use kthread_run to
    >>> spawn
    >>> the reclaimer thread.
    >>>
    >>> I've made the assumption here that the extra module references taken
    >>> when we spawn this thread are unnecessary and removed them. I've
    >>> also
    >>> added a KERN_ERR printk that pops if the thread can't be spawned
    >>> to warn
    >>> the admin that the locks won't be reclaimed.
    >>>
    >>> I consider this patch 2.6.29 material.
    >>>
    >>> Signed-off-by: Jeff Layton
    >>> ---
    >>> fs/lockd/clntlock.c | 14 +++++++++-----
    >>> 1 files changed, 9 insertions(+), 5 deletions(-)
    >>>
    >>> diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
    >>> index 8307dd6..fcc2378 100644
    >>> --- a/fs/lockd/clntlock.c
    >>> +++ b/fs/lockd/clntlock.c
    >>> @@ -14,6 +14,7 @@
    >>> #include
    >>> #include
    >>> #include
    >>> +#include
    >>>
    >>> #define NLMDBG_FACILITY NLMDBG_CLIENT
    >>>
    >>> @@ -191,11 +192,15 @@ __be32 nlmclnt_grant(const struct sockaddr
    >>> *addr, const struct nlm_lock *lock)
    >>> void
    >>> nlmclnt_recovery(struct nlm_host *host)
    >>> {
    >>> + struct task_struct *task;
    >>> +
    >>> if (!host->h_reclaiming++) {
    >>> nlm_get_host(host);
    >>> - __module_get(THIS_MODULE);
    >>> - if (kernel_thread(reclaimer, host, CLONE_FS |
    >>> CLONE_FILES) < 0)
    >>> - module_put(THIS_MODULE);
    >>> + task = kthread_run(reclaimer, host, "%s-reclaim", host-
    >>> >h_name);
    >>> + if (IS_ERR(task))
    >>> + printk(KERN_ERR "lockd: unable to spawn reclaimer "
    >>> + "thread. Locks for %s won't be reclaimed! "
    >>> + "(%ld)\n", host->h_name, PTR_ERR(task));
    >>> }
    >>> }
    >>>
    >>> @@ -207,7 +212,6 @@ reclaimer(void *ptr)
    >>> struct file_lock *fl, *next;
    >>> u32 nsmstate;
    >>>
    >>> - daemonize("%s-reclaim", host->h_name);
    >>> allow_signal(SIGKILL);
    >>>
    >>> down_write(&host->h_rwsem);
    >>> @@ -261,5 +265,5 @@ restart:
    >>> nlm_release_host(host);
    >>> lockd_down();
    >>> unlock_kernel();
    >>> - module_put_and_exit(0);
    >>> + return 0;
    >>> }

    >>
    >> Looks OK to me. I assume the SIGKILL handling has been carefully
    >> tested?
    >>
    >>
    >> Is it correct to emit a warning and keep going if the thread didn't
    >> start? Or would it be safer&saner to fail the whole mount (or
    >> whatever
    >> syscall we're doing here..)
    >>

    >
    > Forgot to answer this part...
    >
    > This thread gets kicked off when the server has rebooted and we need
    > to
    > reclaim our locks. There isn't a syscall on which we can return an
    > error to the user.
    >
    > Aside from just warning the admin, I'm not sure what we can do here.
    > We
    > might be able to start making all syscalls on the mount fail somehow,
    > but I don't think we have infrastructure for that and that may be
    > overkill anyway. I suppose we could also go to sleep and try to
    > spawn the
    > thread again, but there's no guarantee of success there.
    >>
    >>
    >>
    >>
    >>
    >>


    At some point RSN we should implement SIGLOST. That would be the
    closest thing we have to a *NIX standard for reporting the loss of
    filesystem state to the application.

    Trond
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [PATCH] lockd: convert reclaimer thread to kthread interface

    On Mon, 2008-11-03 at 19:19 -0500, Jeff Layton wrote:
    > On Mon, 3 Nov 2008 13:12:15 -0800
    > Andrew Morton wrote:
    >
    > > On Wed, 29 Oct 2008 07:15:45 -0400
    > > Jeff Layton wrote:
    > >
    > > > My understanding is that there is a push to turn the kernel_thread
    > > > interface into a non-exported symbol and move all kernel threads to use
    > > > the kthread API. This patch changes lockd to use kthread_run to spawn
    > > > the reclaimer thread.
    > > >
    > > > I've made the assumption here that the extra module references taken
    > > > when we spawn this thread are unnecessary and removed them. I've also
    > > > added a KERN_ERR printk that pops if the thread can't be spawned to warn
    > > > the admin that the locks won't be reclaimed.
    > > >
    > > > I consider this patch 2.6.29 material.
    > > >
    > > > Signed-off-by: Jeff Layton
    > > > ---
    > > > fs/lockd/clntlock.c | 14 +++++++++-----
    > > > 1 files changed, 9 insertions(+), 5 deletions(-)
    > > >
    > > > diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
    > > > index 8307dd6..fcc2378 100644
    > > > --- a/fs/lockd/clntlock.c
    > > > +++ b/fs/lockd/clntlock.c
    > > > @@ -14,6 +14,7 @@
    > > > #include
    > > > #include
    > > > #include
    > > > +#include
    > > >
    > > > #define NLMDBG_FACILITY NLMDBG_CLIENT
    > > >
    > > > @@ -191,11 +192,15 @@ __be32 nlmclnt_grant(const struct sockaddr *addr, const struct nlm_lock *lock)
    > > > void
    > > > nlmclnt_recovery(struct nlm_host *host)
    > > > {
    > > > + struct task_struct *task;
    > > > +
    > > > if (!host->h_reclaiming++) {
    > > > nlm_get_host(host);
    > > > - __module_get(THIS_MODULE);
    > > > - if (kernel_thread(reclaimer, host, CLONE_FS | CLONE_FILES) < 0)
    > > > - module_put(THIS_MODULE);
    > > > + task = kthread_run(reclaimer, host, "%s-reclaim", host->h_name);
    > > > + if (IS_ERR(task))
    > > > + printk(KERN_ERR "lockd: unable to spawn reclaimer "
    > > > + "thread. Locks for %s won't be reclaimed! "
    > > > + "(%ld)\n", host->h_name, PTR_ERR(task));
    > > > }
    > > > }
    > > >
    > > > @@ -207,7 +212,6 @@ reclaimer(void *ptr)
    > > > struct file_lock *fl, *next;
    > > > u32 nsmstate;
    > > >
    > > > - daemonize("%s-reclaim", host->h_name);
    > > > allow_signal(SIGKILL);
    > > >
    > > > down_write(&host->h_rwsem);
    > > > @@ -261,5 +265,5 @@ restart:
    > > > nlm_release_host(host);
    > > > lockd_down();
    > > > unlock_kernel();
    > > > - module_put_and_exit(0);
    > > > + return 0;
    > > > }

    > >
    > > Looks OK to me. I assume the SIGKILL handling has been carefully tested?
    > >
    > >
    > > Is it correct to emit a warning and keep going if the thread didn't
    > > start? Or would it be safer&saner to fail the whole mount (or whatever
    > > syscall we're doing here..)
    > >

    >
    > Forgot to answer this part...
    >
    > This thread gets kicked off when the server has rebooted and we need to
    > reclaim our locks. There isn't a syscall on which we can return an
    > error to the user.
    >
    > Aside from just warning the admin, I'm not sure what we can do here. We
    > might be able to start making all syscalls on the mount fail somehow,
    > but I don't think we have infrastructure for that and that may be
    > overkill anyway. I suppose we could also go to sleep and try to spawn the
    > thread again, but there's no guarantee of success there.


    We should consider implementing SIGLOST. That is the closest thing that
    we have to a *NIX standard for signalling that remote filesystem state
    has been lost.

    Cheers
    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [PATCH] lockd: convert reclaimer thread to kthread interface

    On Tue, 04 Nov 2008 07:41:47 -0500
    Trond Myklebust wrote:

    > On Mon, 2008-11-03 at 19:19 -0500, Jeff Layton wrote:
    > > On Mon, 3 Nov 2008 13:12:15 -0800
    > > Andrew Morton wrote:
    > >
    > > > On Wed, 29 Oct 2008 07:15:45 -0400
    > > > Jeff Layton wrote:
    > > >
    > > > > My understanding is that there is a push to turn the kernel_thread
    > > > > interface into a non-exported symbol and move all kernel threads to use
    > > > > the kthread API. This patch changes lockd to use kthread_run to spawn
    > > > > the reclaimer thread.
    > > > >
    > > > > I've made the assumption here that the extra module references taken
    > > > > when we spawn this thread are unnecessary and removed them. I've also
    > > > > added a KERN_ERR printk that pops if the thread can't be spawned to warn
    > > > > the admin that the locks won't be reclaimed.
    > > > >
    > > > > I consider this patch 2.6.29 material.
    > > > >
    > > > > Signed-off-by: Jeff Layton
    > > > > ---
    > > > > fs/lockd/clntlock.c | 14 +++++++++-----
    > > > > 1 files changed, 9 insertions(+), 5 deletions(-)
    > > > >
    > > > > diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
    > > > > index 8307dd6..fcc2378 100644
    > > > > --- a/fs/lockd/clntlock.c
    > > > > +++ b/fs/lockd/clntlock.c
    > > > > @@ -14,6 +14,7 @@
    > > > > #include
    > > > > #include
    > > > > #include
    > > > > +#include
    > > > >
    > > > > #define NLMDBG_FACILITY NLMDBG_CLIENT
    > > > >
    > > > > @@ -191,11 +192,15 @@ __be32 nlmclnt_grant(const struct sockaddr *addr, const struct nlm_lock *lock)
    > > > > void
    > > > > nlmclnt_recovery(struct nlm_host *host)
    > > > > {
    > > > > + struct task_struct *task;
    > > > > +
    > > > > if (!host->h_reclaiming++) {
    > > > > nlm_get_host(host);
    > > > > - __module_get(THIS_MODULE);
    > > > > - if (kernel_thread(reclaimer, host, CLONE_FS | CLONE_FILES) < 0)
    > > > > - module_put(THIS_MODULE);
    > > > > + task = kthread_run(reclaimer, host, "%s-reclaim", host->h_name);
    > > > > + if (IS_ERR(task))
    > > > > + printk(KERN_ERR "lockd: unable to spawn reclaimer "
    > > > > + "thread. Locks for %s won't be reclaimed! "
    > > > > + "(%ld)\n", host->h_name, PTR_ERR(task));
    > > > > }
    > > > > }
    > > > >
    > > > > @@ -207,7 +212,6 @@ reclaimer(void *ptr)
    > > > > struct file_lock *fl, *next;
    > > > > u32 nsmstate;
    > > > >
    > > > > - daemonize("%s-reclaim", host->h_name);
    > > > > allow_signal(SIGKILL);
    > > > >
    > > > > down_write(&host->h_rwsem);
    > > > > @@ -261,5 +265,5 @@ restart:
    > > > > nlm_release_host(host);
    > > > > lockd_down();
    > > > > unlock_kernel();
    > > > > - module_put_and_exit(0);
    > > > > + return 0;
    > > > > }
    > > >
    > > > Looks OK to me. I assume the SIGKILL handling has been carefully tested?
    > > >
    > > >
    > > > Is it correct to emit a warning and keep going if the thread didn't
    > > > start? Or would it be safer&saner to fail the whole mount (or whatever
    > > > syscall we're doing here..)
    > > >

    > >
    > > Forgot to answer this part...
    > >
    > > This thread gets kicked off when the server has rebooted and we need to
    > > reclaim our locks. There isn't a syscall on which we can return an
    > > error to the user.
    > >
    > > Aside from just warning the admin, I'm not sure what we can do here. We
    > > might be able to start making all syscalls on the mount fail somehow,
    > > but I don't think we have infrastructure for that and that may be
    > > overkill anyway. I suppose we could also go to sleep and try to spawn the
    > > thread again, but there's no guarantee of success there.

    >
    > We should consider implementing SIGLOST. That is the closest thing that
    > we have to a *NIX standard for signalling that remote filesystem state
    > has been lost.
    >


    Very interesting. I hadn't heard of SIGLOST before, but it does seem
    like something we should implement. CIFS might also be able to use it
    too. CIFS doesn't have a grace period, so lock reclaims are always
    iffy...

    While we're on the subject of signals...

    Do you have any thoughts/objections to just making the reclaimer thread
    ignore them altogether? That would simplify the code a bit.

    I think I may have been wrong before as well. Now that I look closer,
    I'm not sure that we're actually leaking memory if the reclaimer is
    signaled. The file_locks do end up not being on the h_granted list
    anymore, but I think that just keeps the kernel from attempting to
    reclaim them again (for instance, if a new reclaimer thread is spawned
    after this one exits).

    --
    Jeff Layton
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [PATCH] lockd: convert reclaimer thread to kthread interface

    On Tue, 2008-11-04 at 13:42 -0500, Jeff Layton wrote:
    > While we're on the subject of signals...
    >
    > Do you have any thoughts/objections to just making the reclaimer thread
    > ignore them altogether? That would simplify the code a bit.


    How does the administrator then get out of the situation where the
    server dies (permanently) in the middle of a reclaim?

    Forced unmounts won't help here, since they only signal the NFS
    requests, and are in any case per-filesystem, not per-server.

    I suppose one could use soft RPC calls, but what should the retry policy
    be for that case?

    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [PATCH] lockd: convert reclaimer thread to kthread interface

    On Tue, 04 Nov 2008 14:26:21 -0500
    Trond Myklebust wrote:

    > On Tue, 2008-11-04 at 13:42 -0500, Jeff Layton wrote:
    > > While we're on the subject of signals...
    > >
    > > Do you have any thoughts/objections to just making the reclaimer thread
    > > ignore them altogether? That would simplify the code a bit.

    >
    > How does the administrator then get out of the situation where the
    > server dies (permanently) in the middle of a reclaim?
    >


    Erm...Reboot?

    Ok, I'm convinced. I suppose that's a good enough argument for
    continuing to allow SIGKILL. I guess the only change we need to make to
    this patch for now is to remove the "memory leak" comment (unless there
    is a leak and I'm just not seeing it).

    --
    Jeff Layton
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [PATCH] lockd: convert reclaimer thread to kthread interface

    On Tue, 2008-11-04 at 14:46 -0500, Jeff Layton wrote:
    > On Tue, 04 Nov 2008 14:26:21 -0500
    > Trond Myklebust wrote:
    >
    > > On Tue, 2008-11-04 at 13:42 -0500, Jeff Layton wrote:
    > > > While we're on the subject of signals...
    > > >
    > > > Do you have any thoughts/objections to just making the reclaimer thread
    > > > ignore them altogether? That would simplify the code a bit.

    > >
    > > How does the administrator then get out of the situation where the
    > > server dies (permanently) in the middle of a reclaim?
    > >

    >
    > Erm...Reboot?
    >
    > Ok, I'm convinced. I suppose that's a good enough argument for
    > continuing to allow SIGKILL. I guess the only change we need to make to
    > this patch for now is to remove the "memory leak" comment (unless there
    > is a leak and I'm just not seeing it).


    Hold on... I'm not saying that I'm absolutely wedded to the idea of
    SIGKILL. I'm just stating the reason for allowing it in the first place.

    All booting NLM servers will have a finite grace period during which
    lock recovery is allowed, so it is obvious that retrying each RPC call
    forever is not a good solution. The questions are then "How long do you
    wait before giving up?" and "What do you do after timing out?".

    One solution may be to let the administrator set a time-out via a
    sysctl, and then set a policy for how to deal with the failure. A
    reasonable set of possible policies may be to either retry recovery at a
    later time, or to wait for a new reboot notification from the server, or
    at some point to start sending out SIGLOST to the applications...

    Cheers
    Trond

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [PATCH] lockd: convert reclaimer thread to kthread interface

    On Tue, 04 Nov 2008 15:17:14 -0500
    Trond Myklebust wrote:

    > On Tue, 2008-11-04 at 14:46 -0500, Jeff Layton wrote:
    > > On Tue, 04 Nov 2008 14:26:21 -0500
    > > Trond Myklebust wrote:
    > >
    > > > On Tue, 2008-11-04 at 13:42 -0500, Jeff Layton wrote:
    > > > > While we're on the subject of signals...
    > > > >
    > > > > Do you have any thoughts/objections to just making the reclaimer thread
    > > > > ignore them altogether? That would simplify the code a bit.
    > > >
    > > > How does the administrator then get out of the situation where the
    > > > server dies (permanently) in the middle of a reclaim?
    > > >

    > >
    > > Erm...Reboot?
    > >
    > > Ok, I'm convinced. I suppose that's a good enough argument for
    > > continuing to allow SIGKILL. I guess the only change we need to make to
    > > this patch for now is to remove the "memory leak" comment (unless there
    > > is a leak and I'm just not seeing it).

    >
    > Hold on... I'm not saying that I'm absolutely wedded to the idea of
    > SIGKILL. I'm just stating the reason for allowing it in the first place.
    >
    > All booting NLM servers will have a finite grace period during which
    > lock recovery is allowed, so it is obvious that retrying each RPC call
    > forever is not a good solution. The questions are then "How long do you
    > wait before giving up?" and "What do you do after timing out?".
    >
    > One solution may be to let the administrator set a time-out via a
    > sysctl, and then set a policy for how to deal with the failure. A
    > reasonable set of possible policies may be to either retry recovery at a
    > later time, or to wait for a new reboot notification from the server, or
    > at some point to start sending out SIGLOST to the applications...
    >


    Fair enough -- those are good ideas. For now, I think keeping the
    signaling in place is probably reasonable since we do want to allow the
    admin to take down the thread.

    Long term, adding better mechanisms to handle failed lock reclaims is
    something that ought to be on the to-do list.

    --
    Jeff Layton
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread