[PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing - Kernel

This is a discussion on [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing - Kernel ; zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong. Yes, we have already killed all tasks in this namespace, and sys_wait4() doesn't see any child. But this doesn't mean ->children list is empty, we may have EXIT_DEAD tasks which are not ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing

  1. [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing

    zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.

    Yes, we have already killed all tasks in this namespace, and sys_wait4()
    doesn't see any child. But this doesn't mean ->children list is empty,
    we may have EXIT_DEAD tasks which are not visible to do_wait(). In that
    case the subsequent forget_original_parent() will crash the kernel because
    it will try to re-parent these tasks to the NULL reaper.

    Even if there are no childs, it is not good that forget_original_parent()
    uses reaper == NULL.

    Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
    We could use pid_ns->parent->child_reaper as well, I think this does not
    really matter. These EXIT_DEAD tasks are not visible to the new ->parent
    after re-parenting, they will silently do release_task() eventually.

    Note that we must change ->child_reaper, otherwise forget_original_parent()
    will use reaper == father, and in that case we will hit the (correct)
    BUG_ON(!list_empty(&father->children)).

    Signed-off-by: Oleg Nesterov

    --- 2.6.27-rc4/kernel/pid_namespace.c~1_ZAP_DONT_CLEAR_REAPER 2008-07-30 13:12:49.000000000 +0400
    +++ 2.6.27-rc4/kernel/pid_namespace.c 2008-08-24 17:22:59.000000000 +0400
    @@ -179,9 +179,12 @@ void zap_pid_ns_processes(struct pid_nam
    rc = sys_wait4(-1, NULL, __WALL, NULL);
    } while (rc != -ECHILD);

    -
    - /* Child reaper for the pid namespace is going away */
    - pid_ns->child_reaper = NULL;
    + /*
    + * We can not clear ->child_reaper or leave it alone.
    + * There may by stealth EXIT_DEAD tasks on ->children,
    + * forget_original_parent() must move them somewhere.
    + */
    + pid_ns->child_reaper = init_pid_ns.child_reaper;
    acct_exit_ns(pid_ns);
    return;
    }

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing

    Quoting Oleg Nesterov (oleg@tv-sign.ru):
    > zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
    >
    > Yes, we have already killed all tasks in this namespace, and sys_wait4()
    > doesn't see any child. But this doesn't mean ->children list is empty,
    > we may have EXIT_DEAD tasks which are not visible to do_wait(). In that
    > case the subsequent forget_original_parent() will crash the kernel because
    > it will try to re-parent these tasks to the NULL reaper.
    >
    > Even if there are no childs, it is not good that forget_original_parent()
    > uses reaper == NULL.
    >
    > Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
    > We could use pid_ns->parent->child_reaper as well, I think this does not
    > really matter. These EXIT_DEAD tasks are not visible to the new ->parent
    > after re-parenting, they will silently do release_task() eventually.
    >
    > Note that we must change ->child_reaper, otherwise forget_original_parent()
    > will use reaper == father, and in that case we will hit the (correct)
    > BUG_ON(!list_empty(&father->children)).
    >
    > Signed-off-by: Oleg Nesterov


    Well while it looked correct to me all along, I couldn't get the
    testcase to cause an oops. But I see where do_exit() calls
    zap_pid_ns_processes() (through group_exit()) before
    forget_original_parent(), which can/should end up dereferencing
    the null pid_ns, so clearly this is needed and correct.

    Acked-by: Serge Hallyn

    Thanks, Oleg.

    -serge

    > --- 2.6.27-rc4/kernel/pid_namespace.c~1_ZAP_DONT_CLEAR_REAPER 2008-07-30 13:12:49.000000000 +0400
    > +++ 2.6.27-rc4/kernel/pid_namespace.c 2008-08-24 17:22:59.000000000 +0400
    > @@ -179,9 +179,12 @@ void zap_pid_ns_processes(struct pid_nam
    > rc = sys_wait4(-1, NULL, __WALL, NULL);
    > } while (rc != -ECHILD);
    >
    > -
    > - /* Child reaper for the pid namespace is going away */
    > - pid_ns->child_reaper = NULL;
    > + /*
    > + * We can not clear ->child_reaper or leave it alone.
    > + * There may by stealth EXIT_DEAD tasks on ->children,
    > + * forget_original_parent() must move them somewhere.


    Actually this comment is a little bit misleading - the null
    deref will happen regardless of whether there are children,
    right?

    > + */
    > + pid_ns->child_reaper = init_pid_ns.child_reaper;
    > acct_exit_ns(pid_ns);
    > return;
    > }

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing

    Oleg Nesterov [oleg@tv-sign.ru] wrote:
    | zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
    |
    | Yes, we have already killed all tasks in this namespace, and sys_wait4()
    | doesn't see any child. But this doesn't mean ->children list is empty,
    | we may have EXIT_DEAD tasks which are not visible to do_wait(). In that
    | case the subsequent forget_original_parent() will crash the kernel because
    | it will try to re-parent these tasks to the NULL reaper.
    |
    | Even if there are no childs, it is not good that forget_original_parent()
    | uses reaper == NULL.
    |
    | Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
    | We could use pid_ns->parent->child_reaper as well, I think this does not
    | really matter. These EXIT_DEAD tasks are not visible to the new ->parent
    | after re-parenting, they will silently do release_task() eventually.
    |
    | Note that we must change ->child_reaper, otherwise forget_original_parent()
    | will use reaper == father, and in that case we will hit the (correct)
    | BUG_ON(!list_empty(&father->children)).
    |
    | Signed-off-by: Oleg Nesterov

    Acked-by: Sukadev Bhattiprolu

    |
    | --- 2.6.27-rc4/kernel/pid_namespace.c~1_ZAP_DONT_CLEAR_REAPER 2008-07-30 13:12:49.000000000 +0400
    | +++ 2.6.27-rc4/kernel/pid_namespace.c 2008-08-24 17:22:59.000000000 +0400
    | @@ -179,9 +179,12 @@ void zap_pid_ns_processes(struct pid_nam
    | rc = sys_wait4(-1, NULL, __WALL, NULL);
    | } while (rc != -ECHILD);
    |
    | -
    | - /* Child reaper for the pid namespace is going away */
    | - pid_ns->child_reaper = NULL;
    | + /*
    | + * We can not clear ->child_reaper or leave it alone.
    | + * There may by stealth EXIT_DEAD tasks on ->children,
    | + * forget_original_parent() must move them somewhere.
    | + */
    | + pid_ns->child_reaper = init_pid_ns.child_reaper;
    | acct_exit_ns(pid_ns);
    | return;
    | }
    |
    | --
    | To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    | the body of a message to majordomo@vger.kernel.org
    | More majordomo info at http://vger.kernel.org/majordomo-info.html
    | Please read the FAQ at http://www.tux.org/lkml/
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing

    > Signed-off-by: Oleg Nesterov

    Acked-by: Pavel Emelyanov
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [PATCH 1/4] pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing

    On 08/26, Serge E. Hallyn wrote:
    >
    > Quoting Oleg Nesterov (oleg@tv-sign.ru):
    > > zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
    > > ...
    > > Even if there are no childs, it is not good that forget_original_parent()
    > > uses reaper == NULL.
    > > ...

    >
    > Well while it looked correct to me all along, I couldn't get the
    > testcase to cause an oops.


    Perhaps you can start several instances or increase the delay, I
    forgot that usleep() uses usecs, not msecs. In any case I agree,
    the test-case is oversimplified, but since it worked for me I didn't
    try to make it more "stubborn".

    > Acked-by: Serge Hallyn


    Thanks!

    > > + /*
    > > + * We can not clear ->child_reaper or leave it alone.
    > > + * There may by stealth EXIT_DEAD tasks on ->children,
    > > + * forget_original_parent() must move them somewhere.

    >
    > Actually this comment is a little bit misleading - the null
    > deref will happen regardless of whether there are children,
    > right?


    Yes, the comment could be better

    But no, we won't oops without children. Please note that
    forget_original_parent() uses reaper inside the
    list_for_each_entry_safe(father->children) loop. ->children is
    empty, reaper is not used at all.

    But please also note the "not good" part of the changelog above.


    Thanks to all for review!

    Oleg.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread