[PATCH 0/10] OpenVZ kernel based checkpointing/restart (v2) - Kernel

This is a discussion on [PATCH 0/10] OpenVZ kernel based checkpointing/restart (v2) - Kernel ; On Wed, 2008-10-22 at 12:44 +0200, Louis Rilling wrote: > On Wed, Oct 22, 2008 at 12:06:19PM +0200, Greg Kurz wrote: > > On Wed, 2008-10-22 at 11:25 +0200, Louis Rilling wrote: > > > Do you checkpoint uninterruptible syscalls ...

+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast
Results 21 to 40 of 41

Thread: [PATCH 0/10] OpenVZ kernel based checkpointing/restart (v2)

  1. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process

    On Wed, 2008-10-22 at 12:44 +0200, Louis Rilling wrote:
    > On Wed, Oct 22, 2008 at 12:06:19PM +0200, Greg Kurz wrote:
    > > On Wed, 2008-10-22 at 11:25 +0200, Louis Rilling wrote:
    > > > Do you checkpoint uninterruptible syscalls as well? If only interruptible
    > > > syscalls are checkpointed, I'd say that either this syscall uses ERESTARTSYS or
    > > > ERESTART_RESTARTBLOCK, and then signal handling code already does the trick, or
    > > > this syscall does not restart itself when interrupted, and well, this is life,
    > > > userspace just sees -EINTR, which is allowed by the syscall spec.
    > > > Actually this is how we checkpoint/migrate tasks in interruptible syscalls in
    > > > Kerrighed and this works.
    > > >
    > > > Louis
    > > >

    > >
    > > I don't know Kerrighed internals but I understand you perform checkpoint
    > > with a signal handler. Right ?

    >
    > Right. This is an kernel-internal-only signal, so all signals remain available
    > for userspace.
    >
    > > This approach has a huge benefit: the
    > > signal handling code do all the arch dependant stuff to save registers
    > > in user memory.

    >
    > Hm, I'm not sure to understand what you mean here. We just rely on arch code
    > that jumps to signal handling to correctly setup struct pt_regs, which is then
    > passed to the checkpoint code. So yes, userspace registers are mostly saved by
    > existing arch code. But in x86-64 for instance, segment registers still need to
    > be saved by the checkpoint code (a bit like copy_thread() does), and I don't
    > know arch-independent functions doing this.
    >


    You're right, some segment registers need to be saved on x86 also... I
    should have written 'most of' in my previous mail.

    --
    Gregory Kurz gkurz@fr.ibm.com
    Software Engineer @ IBM/Meiosys http://www.ibm.com
    Tel +33 (0)534 638 479 Fax +33 (0)561 400 420

    "Anarchy is about taking complete responsibility for yourself."
    Alan Moore.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process



    Andrey Mirkin wrote:
    > On Wednesday 22 October 2008 13:25 Louis Rilling wrote:
    >> On Wed, Oct 22, 2008 at 12:49:54PM +0400, Andrey Mirkin wrote:
    >>> On Monday 20 October 2008 13:23 Cedric Le Goater wrote:
    >>>> Hello Andrey !
    >>>>
    >>>>> diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
    >>>>> index 109792b..a4848a3 100644
    >>>>> --- a/arch/x86/kernel/entry_32.S
    >>>>> +++ b/arch/x86/kernel/entry_32.S
    >>>>> @@ -225,6 +225,7 @@ ENTRY(ret_from_fork)
    >>>>> GET_THREAD_INFO(%ebp)
    >>>>> popl %eax
    >>>>> CFI_ADJUST_CFA_OFFSET -4
    >>>>> +ret_from_fork_tail:
    >>>>> pushl $0x0202 # Reset kernel eflags
    >>>>> CFI_ADJUST_CFA_OFFSET 4
    >>>>> popfl
    >>>>> @@ -233,6 +234,26 @@ ENTRY(ret_from_fork)
    >>>>> CFI_ENDPROC
    >>>>> END(ret_from_fork)
    >>>>>
    >>>>> +ENTRY(i386_ret_from_resume)
    >>>>> + CFI_STARTPROC
    >>>>> + pushl %eax
    >>>>> + CFI_ADJUST_CFA_OFFSET 4
    >>>>> + call schedule_tail
    >>>>> + GET_THREAD_INFO(%ebp)
    >>>>> + popl %eax
    >>>>> + CFI_ADJUST_CFA_OFFSET -4
    >>>>> + movl (%esp), %eax
    >>>>> + testl %eax, %eax
    >>>>> + jz 1f
    >>>>> + pushl %esp
    >>>>> + call *%eax
    >>>>> + addl $4, %esp
    >>>>> +1:
    >>>>> + addl $256, %esp
    >>>>> + jmp ret_from_fork_tail
    >>>>> + CFI_ENDPROC
    >>>>> +END(i386_ret_from_resume)
    >>>> Could you explain why you need to do this
    >>>>
    >>>> call *%eax
    >>>>
    >>>> is it related to the freezer code ?
    >>> It is not related to the freezer code actually.
    >>> That is needed to restart syscalls. Right now I don't have a code in my
    >>> patchset which restarts a syscall, but later I plan to add it.
    >>> In OpenVZ checkpointing we restart syscalls if process was caught in
    >>> syscall during checkpointing.

    >> Do you checkpoint uninterruptible syscalls as well? If only interruptible
    >> syscalls are checkpointed, I'd say that either this syscall uses
    >> ERESTARTSYS or ERESTART_RESTARTBLOCK, and then signal handling code already
    >> does the trick, or this syscall does not restart itself when interrupted,
    >> and well, this is life, userspace just sees -EINTR, which is allowed by the
    >> syscall spec.
    >> Actually this is how we checkpoint/migrate tasks in interruptible syscalls
    >> in Kerrighed and this works.

    >
    > We checkpoint only interruptible syscalls. Some syscalls do not restart
    > themself, that is why after restarting a process we restart syscall to
    > complete it.


    Can you please elaborate on this ? I don't recall having had issues
    with that.

    Thanks,

    Oren.

    >
    > Andrey
    > _______________________________________________
    > Containers mailing list
    > Containers@lists.linux-foundation.org
    > https://lists.linux-foundation.org/m...nfo/containers

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [Devel] Re: [PATCH 06/10] Introduce functions to dump mm

    On Monday 20 October 2008 21:21 Dave Hansen wrote:
    > On Sat, 2008-10-18 at 03:11 +0400, Andrey Mirkin wrote:
    > > +static void page_get_desc(struct vm_area_struct *vma, unsigned long
    > > addr, + struct page_desc *pdesc, cpt_context_t *
    > > ctx) +{
    > > + struct mm_struct *mm = vma->vm_mm;
    > > + pgd_t *pgd;
    > > + pud_t *pud;
    > > + pmd_t *pmd;
    > > + pte_t *ptep, pte;
    > > + spinlock_t *ptl;
    > > + struct page *pg = NULL;
    > > + pgoff_t linear_index = (addr - vma->vm_start)/PAGE_SIZE +
    > > vma->vm_pgoff; +
    > > + pdesc->index = linear_index;
    > > + pdesc->shared = 0;
    > > + pdesc->mm = CPT_NULL;
    > > +
    > > + if (vma->vm_flags & VM_IO) {
    > > + pdesc->type = PD_ABSENT;
    > > + return;
    > > + }
    > > +
    > > + pgd = pgd_offset(mm, addr);
    > > + if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
    > > + goto out_absent;
    > > + pud = pud_offset(pgd, addr);
    > > + if (pud_none(*pud) || unlikely(pud_bad(*pud)))
    > > + goto out_absent;
    > > + pmd = pmd_offset(pud, addr);
    > > + if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
    > > + goto out_absent;
    > > +#ifdef CONFIG_X86
    > > + if (pmd_huge(*pmd)) {
    > > + eprintk("page_huge\n");
    > > + goto out_unsupported;
    > > + }
    > > +#endif

    >
    > I take it you know that this breaks with the 1GB (x86_64) and 16GB (ppc)
    > large pages.
    >
    > Since you have the VMA, why not use is_vm_hugetlb_page()?

    Right now I'm checking VM_HUGETLB flag on VMAs in dump_one_vma().
    This checks were added for sanity purpose just to throw out all unsupported
    right now cases.

    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process

    On Wednesday 22 October 2008 14:46 Louis Rilling wrote:
    > On Wed, Oct 22, 2008 at 02:12:12PM +0400, Andrey Mirkin wrote:
    > > On Wednesday 22 October 2008 13:25 Louis Rilling wrote:
    > > > On Wed, Oct 22, 2008 at 12:49:54PM +0400, Andrey Mirkin wrote:
    > > > > On Monday 20 October 2008 13:23 Cedric Le Goater wrote:
    > > > > > Hello Andrey !
    > > > > >
    > > > > > > diff --git a/arch/x86/kernel/entry_32.S
    > > > > > > b/arch/x86/kernel/entry_32.S index 109792b..a4848a3 100644
    > > > > > > --- a/arch/x86/kernel/entry_32.S
    > > > > > > +++ b/arch/x86/kernel/entry_32.S
    > > > > > > @@ -225,6 +225,7 @@ ENTRY(ret_from_fork)
    > > > > > > GET_THREAD_INFO(%ebp)
    > > > > > > popl %eax
    > > > > > > CFI_ADJUST_CFA_OFFSET -4
    > > > > > > +ret_from_fork_tail:
    > > > > > > pushl $0x0202 # Reset kernel eflags
    > > > > > > CFI_ADJUST_CFA_OFFSET 4
    > > > > > > popfl
    > > > > > > @@ -233,6 +234,26 @@ ENTRY(ret_from_fork)
    > > > > > > CFI_ENDPROC
    > > > > > > END(ret_from_fork)
    > > > > > >
    > > > > > > +ENTRY(i386_ret_from_resume)
    > > > > > > + CFI_STARTPROC
    > > > > > > + pushl %eax
    > > > > > > + CFI_ADJUST_CFA_OFFSET 4
    > > > > > > + call schedule_tail
    > > > > > > + GET_THREAD_INFO(%ebp)
    > > > > > > + popl %eax
    > > > > > > + CFI_ADJUST_CFA_OFFSET -4
    > > > > > > + movl (%esp), %eax
    > > > > > > + testl %eax, %eax
    > > > > > > + jz 1f
    > > > > > > + pushl %esp
    > > > > > > + call *%eax
    > > > > > > + addl $4, %esp
    > > > > > > +1:
    > > > > > > + addl $256, %esp
    > > > > > > + jmp ret_from_fork_tail
    > > > > > > + CFI_ENDPROC
    > > > > > > +END(i386_ret_from_resume)
    > > > > >
    > > > > > Could you explain why you need to do this
    > > > > >
    > > > > > call *%eax
    > > > > >
    > > > > > is it related to the freezer code ?
    > > > >
    > > > > It is not related to the freezer code actually.
    > > > > That is needed to restart syscalls. Right now I don't have a code in
    > > > > my patchset which restarts a syscall, but later I plan to add it. In
    > > > > OpenVZ checkpointing we restart syscalls if process was caught in
    > > > > syscall during checkpointing.
    > > >
    > > > Do you checkpoint uninterruptible syscalls as well? If only
    > > > interruptible syscalls are checkpointed, I'd say that either this
    > > > syscall uses ERESTARTSYS or ERESTART_RESTARTBLOCK, and then signal
    > > > handling code already does the trick, or this syscall does not restart
    > > > itself when interrupted, and well, this is life, userspace just sees
    > > > -EINTR, which is allowed by the syscall spec.
    > > > Actually this is how we checkpoint/migrate tasks in interruptible
    > > > syscalls in Kerrighed and this works.

    > >
    > > We checkpoint only interruptible syscalls. Some syscalls do not restart
    > > themself, that is why after restarting a process we restart syscall to
    > > complete it.

    >
    > I guess you do that to avoid breaking application that are badly written
    > and do not handle -EINTR correctly with interruptible syscalls. Right?


    Right, also this is needed to restart some syscalls (like pause) from kernel
    without returning to user space. Let me explain it in more details. There is
    a gap when process will be in user space just before entering syscall again.
    At this time a signal can be delivered to process and it even can be handled.
    So, we will miss a signal which must interrupt pause syscall.

    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process

    On Wednesday 22 October 2008 19:25 Oren Laadan wrote:
    > Andrey Mirkin wrote:
    > > On Wednesday 22 October 2008 13:25 Louis Rilling wrote:
    > >> On Wed, Oct 22, 2008 at 12:49:54PM +0400, Andrey Mirkin wrote:
    > >>> On Monday 20 October 2008 13:23 Cedric Le Goater wrote:
    > >>>> Hello Andrey !
    > >>>>
    > >>>>> diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
    > >>>>> index 109792b..a4848a3 100644
    > >>>>> --- a/arch/x86/kernel/entry_32.S
    > >>>>> +++ b/arch/x86/kernel/entry_32.S
    > >>>>> @@ -225,6 +225,7 @@ ENTRY(ret_from_fork)
    > >>>>> GET_THREAD_INFO(%ebp)
    > >>>>> popl %eax
    > >>>>> CFI_ADJUST_CFA_OFFSET -4
    > >>>>> +ret_from_fork_tail:
    > >>>>> pushl $0x0202 # Reset kernel eflags
    > >>>>> CFI_ADJUST_CFA_OFFSET 4
    > >>>>> popfl
    > >>>>> @@ -233,6 +234,26 @@ ENTRY(ret_from_fork)
    > >>>>> CFI_ENDPROC
    > >>>>> END(ret_from_fork)
    > >>>>>
    > >>>>> +ENTRY(i386_ret_from_resume)
    > >>>>> + CFI_STARTPROC
    > >>>>> + pushl %eax
    > >>>>> + CFI_ADJUST_CFA_OFFSET 4
    > >>>>> + call schedule_tail
    > >>>>> + GET_THREAD_INFO(%ebp)
    > >>>>> + popl %eax
    > >>>>> + CFI_ADJUST_CFA_OFFSET -4
    > >>>>> + movl (%esp), %eax
    > >>>>> + testl %eax, %eax
    > >>>>> + jz 1f
    > >>>>> + pushl %esp
    > >>>>> + call *%eax
    > >>>>> + addl $4, %esp
    > >>>>> +1:
    > >>>>> + addl $256, %esp
    > >>>>> + jmp ret_from_fork_tail
    > >>>>> + CFI_ENDPROC
    > >>>>> +END(i386_ret_from_resume)
    > >>>>
    > >>>> Could you explain why you need to do this
    > >>>>
    > >>>> call *%eax
    > >>>>
    > >>>> is it related to the freezer code ?
    > >>>
    > >>> It is not related to the freezer code actually.
    > >>> That is needed to restart syscalls. Right now I don't have a code in my
    > >>> patchset which restarts a syscall, but later I plan to add it.
    > >>> In OpenVZ checkpointing we restart syscalls if process was caught in
    > >>> syscall during checkpointing.
    > >>
    > >> Do you checkpoint uninterruptible syscalls as well? If only
    > >> interruptible syscalls are checkpointed, I'd say that either this
    > >> syscall uses ERESTARTSYS or ERESTART_RESTARTBLOCK, and then signal
    > >> handling code already does the trick, or this syscall does not restart
    > >> itself when interrupted, and well, this is life, userspace just sees
    > >> -EINTR, which is allowed by the syscall spec.
    > >> Actually this is how we checkpoint/migrate tasks in interruptible
    > >> syscalls in Kerrighed and this works.

    > >
    > > We checkpoint only interruptible syscalls. Some syscalls do not restart
    > > themself, that is why after restarting a process we restart syscall to
    > > complete it.

    >
    > Can you please elaborate on this ? I don't recall having had issues
    > with that.


    Right now in 2.6.18 kernel we restarts in such a way pause, rt_sigtimedwait
    and futex syscalls. Recently futex syscall was reworked and we will not need
    such hooks for it.

    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process

    On Wednesday 22 October 2008 16:47 Cedric Le Goater wrote:
    > >>> +ENTRY(i386_ret_from_resume)
    > >>> + CFI_STARTPROC
    > >>> + pushl %eax
    > >>> + CFI_ADJUST_CFA_OFFSET 4
    > >>> + call schedule_tail
    > >>> + GET_THREAD_INFO(%ebp)
    > >>> + popl %eax
    > >>> + CFI_ADJUST_CFA_OFFSET -4
    > >>> + movl (%esp), %eax
    > >>> + testl %eax, %eax
    > >>> + jz 1f
    > >>> + pushl %esp
    > >>> + call *%eax
    > >>> + addl $4, %esp
    > >>> +1:
    > >>> + addl $256, %esp
    > >>> + jmp ret_from_fork_tail
    > >>> + CFI_ENDPROC
    > >>> +END(i386_ret_from_resume)
    > >>
    > >> Could you explain why you need to do this
    > >>
    > >> call *%eax
    > >>
    > >> is it related to the freezer code ?

    > >
    > > It is not related to the freezer code actually.
    > > That is needed to restart syscalls. Right now I don't have a code in my
    > > patchset which restarts a syscall, but later I plan to add it.
    > > In OpenVZ checkpointing we restart syscalls if process was caught in
    > > syscall during checkpointing.

    >
    > ok. I get it now. why 256 bytes of extra stack ? I'm sure it's not random.

    We are putting special structure on stack, which is used at the very end of
    the whole restart procedure to restore complex states (ptrace is one of such
    cases). Right now I don't need to use this structure as we have a deal with
    simple cases, but reservation of 256 bytes on stack is needed for future.

    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process

    On Monday 20 October 2008 17:25 Louis Rilling wrote:
    > On Sat, Oct 18, 2008 at 03:11:36AM +0400, Andrey Mirkin wrote:
    > > Functions to restart process, restore its state, fpu and registers are
    > > added.

    >
    > [...]
    >
    > > diff --git a/checkpoint/rst_process.c b/checkpoint/rst_process.c
    > > new file mode 100644
    > > index 0000000..b9f745e
    > > --- /dev/null
    > > +++ b/checkpoint/rst_process.c
    > > @@ -0,0 +1,277 @@
    > > +/*
    > > + * Copyright (C) 2008 Parallels, Inc.
    > > + *
    > > + * Author: Andrey Mirkin
    > > + *
    > > + * This program is free software; you can redistribute it and/or
    > > + * modify it under the terms of the GNU General Public License as
    > > + * published by the Free Software Foundation, version 2 of the
    > > + * License.
    > > + *
    > > + */
    > > +
    > > +#include
    > > +#include
    > > +#include
    > > +#include
    > > +#include
    > > +
    > > +#include "checkpoint.h"
    > > +#include "cpt_image.h"
    > > +
    > > +#define HOOK_RESERVE 256
    > > +
    > > +struct thr_context {
    > > + struct completion complete;
    > > + int error;
    > > + struct cpt_context *ctx;
    > > + struct task_struct *tsk;
    > > +};
    > > +
    > > +int local_kernel_thread(int (*fn)(void *), void * arg, unsigned long
    > > flags, pid_t pid) +{
    > > + pid_t ret;
    > > +
    > > + if (current->fs == NULL) {
    > > + /* do_fork_pid() hates processes without fs, oopses. */
    > > + eprintk("local_kernel_thread: current->fs==NULL\n");
    > > + return -EINVAL;
    > > + }
    > > + if (!try_module_get(THIS_MODULE))
    > > + return -EBUSY;
    > > + ret = kernel_thread(fn, arg, flags);
    > > + if (ret < 0)
    > > + module_put(THIS_MODULE);
    > > + return ret;
    > > +}
    > > +
    > > +static unsigned int decode_task_flags(unsigned int task_flags)
    > > +{
    > > + unsigned int flags = 0;
    > > +
    > > + if (task_flags & (1 << CPT_PF_EXITING))
    > > + flags |= PF_EXITING;
    > > + if (task_flags & (1 << CPT_PF_FORKNOEXEC))
    > > + flags |= PF_FORKNOEXEC;
    > > + if (task_flags & (1 << CPT_PF_SUPERPRIV))
    > > + flags |= PF_SUPERPRIV;
    > > + if (task_flags & (1 << CPT_PF_DUMPCORE))
    > > + flags |= PF_DUMPCORE;
    > > + if (task_flags & (1 << CPT_PF_SIGNALED))
    > > + flags |= PF_SIGNALED;
    > > +
    > > + return flags;
    > > +
    > > +}
    > > +
    > > +int rst_restore_task_struct(struct task_struct *tsk, struct
    > > cpt_task_image *ti, + struct cpt_context *ctx)
    > > +{
    > > + int i;
    > > +
    > > + /* Restore only saved flags, comm and tls for now */
    > > + tsk->flags = decode_task_flags(ti->cpt_flags);
    > > + clear_tsk_thread_flag(tsk, TIF_FREEZE);
    > > + memcpy(tsk->comm, ti->cpt_comm, TASK_COMM_LEN);
    > > + for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++) {
    > > + tsk->thread.tls_array[i].a = ti->cpt_tls[i] & 0xFFFFFFFF;
    > > + tsk->thread.tls_array[i].b = ti->cpt_tls[i] >> 32;
    > > + }
    > > +
    > > + return 0;
    > > +}
    > > +
    > > +static int rst_restore_fpustate(struct task_struct *tsk, struct
    > > cpt_task_image *ti, + struct cpt_context *ctx)
    > > +{
    > > + struct cpt_obj_bits hdr;
    > > + int err;
    > > + char *buf;
    > > +
    > > + clear_stopped_child_used_math(tsk);
    > > +
    > > + err = rst_get_object(CPT_OBJ_BITS, &hdr, sizeof(hdr), ctx);
    > > + if (err < 0)
    > > + return err;
    > > +
    > > + buf = kmalloc(hdr.cpt_size, GFP_KERNEL);
    > > + if (!buf)
    > > + return -ENOMEM;
    > > +
    > > + err = ctx->read(buf, hdr.cpt_size, ctx);
    > > + if (err)
    > > + goto out;
    > > +
    > > + if (hdr.cpt_content == CPT_CONTENT_X86_FPUSTATE && cpu_has_fxsr) {
    > > + memcpy(&tsk->thread.xstate, buf,
    > > + sizeof(struct i387_fxsave_struct));
    > > + if (ti->cpt_flags & CPT_PF_USED_MATH)
    > > + set_stopped_child_used_math(tsk);
    > > + }
    > > +#ifndef CONFIG_X86_64
    > > + else if (hdr.cpt_content == CPT_CONTENT_X86_FPUSTATE_OLD &&
    > > + !cpu_has_fxsr) {
    > > + memcpy(&tsk->thread.xstate, buf,
    > > + sizeof(struct i387_fsave_struct));
    > > + if (ti->cpt_flags & CPT_PF_USED_MATH)
    > > + set_stopped_child_used_math(tsk);
    > > + }
    > > +#endif
    > > +
    > > +out:
    > > + kfree(buf);
    > > + return err;
    > > +}
    > > +
    > > +static u32 decode_segment(u32 segid)
    > > +{
    > > + if (segid == CPT_SEG_ZERO)
    > > + return 0;
    > > +
    > > + /* TLS descriptors */
    > > + if (segid <= CPT_SEG_TLS3)
    > > + return ((GDT_ENTRY_TLS_MIN + segid - CPT_SEG_TLS1) << 3) + 3;
    > > +
    > > + /* LDT descriptor, it is just an index to LDT array */
    > > + if (segid >= CPT_SEG_LDT)
    > > + return ((segid - CPT_SEG_LDT) << 3) | 7;
    > > +
    > > + /* Check for one of standard descriptors */
    > > + if (segid == CPT_SEG_USER32_DS)
    > > + return __USER_DS;
    > > + if (segid == CPT_SEG_USER32_CS)
    > > + return __USER_CS;
    > > +
    > > + eprintk("Invalid segment reg %d\n", segid);
    > > + return 0;
    > > +}
    > > +
    > > +static int rst_restore_registers(struct task_struct *tsk, struct
    > > cpt_context *ctx) +{
    > > + struct cpt_x86_regs ri;
    > > + struct pt_regs *regs = task_pt_regs(tsk);
    > > + extern char i386_ret_from_resume;
    > > + int err;
    > > +
    > > + err = rst_get_object(CPT_OBJ_X86_REGS, &ri, sizeof(ri), ctx);
    > > + if (err < 0)
    > > + return err;
    > > +
    > > + tsk->thread.sp = (unsigned long) regs;
    > > + tsk->thread.sp0 = (unsigned long) (regs+1);
    > > + tsk->thread.ip = (unsigned long) &i386_ret_from_resume;
    > > +
    > > + tsk->thread.gs = decode_segment(ri.cpt_gs);
    > > + tsk->thread.debugreg0 = ri.cpt_debugreg[0];
    > > + tsk->thread.debugreg1 = ri.cpt_debugreg[1];
    > > + tsk->thread.debugreg2 = ri.cpt_debugreg[2];
    > > + tsk->thread.debugreg3 = ri.cpt_debugreg[3];
    > > + tsk->thread.debugreg6 = ri.cpt_debugreg[6];
    > > + tsk->thread.debugreg7 = ri.cpt_debugreg[7];
    > > +
    > > + regs->bx = ri.cpt_bx;
    > > + regs->cx = ri.cpt_cx;
    > > + regs->dx = ri.cpt_dx;
    > > + regs->si = ri.cpt_si;
    > > + regs->di = ri.cpt_di;
    > > + regs->bp = ri.cpt_bp;
    > > + regs->ax = ri.cpt_ax;
    > > + regs->orig_ax = ri.cpt_orig_ax;
    > > + regs->ip = ri.cpt_ip;
    > > + regs->flags = ri.cpt_flags;
    > > + regs->sp = ri.cpt_sp;
    > > +
    > > + regs->cs = decode_segment(ri.cpt_cs);
    > > + regs->ss = decode_segment(ri.cpt_ss);
    > > + regs->ds = decode_segment(ri.cpt_ds);
    > > + regs->es = decode_segment(ri.cpt_es);
    > > + regs->fs = decode_segment(ri.cpt_fs);
    > > +
    > > + tsk->thread.sp -= HOOK_RESERVE;
    > > + memset((void*)tsk->thread.sp, 0, HOOK_RESERVE);
    > > +
    > > + return 0;
    > > +}
    > > +
    > > +static int restart_thread(void *arg)
    > > +{
    > > + struct thr_context *thr_ctx = arg;
    > > + struct cpt_context *ctx;
    > > + struct cpt_task_image *ti;
    > > + int err;
    > > +
    > > + current->state = TASK_UNINTERRUPTIBLE;
    > > +
    > > + ctx = thr_ctx->ctx;
    > > + ti = kmalloc(sizeof(*ti), GFP_KERNEL);
    > > + if (!ti)
    > > + return -ENOMEM;
    > > +
    > > + err = rst_get_object(CPT_OBJ_TASK, ti, sizeof(*ti), ctx);
    > > + if (!err)
    > > + err = rst_restore_task_struct(current, ti, ctx);
    > > + /* Restore mm here */
    > > + if (!err)
    > > + err = rst_restore_fpustate(current, ti, ctx);
    > > + if (!err)
    > > + err = rst_restore_registers(current, ctx);
    > > +
    > > + thr_ctx->error = err;
    > > + complete(&thr_ctx->complete);
    > > +
    > > + if (!err && (ti->cpt_state & (EXIT_ZOMBIE|EXIT_DEAD))) {
    > > + do_exit(ti->cpt_exit_code);
    > > + } else {
    > > + __set_current_state(TASK_UNINTERRUPTIBLE);
    > > + }
    > > +
    > > + kfree(ti);
    > > + schedule();
    > > +
    > > + eprintk("leaked %d/%d %p\n", task_pid_nr(current),
    > > task_pid_vnr(current), current->mm); +
    > > + module_put(THIS_MODULE);

    >
    > I'm sorry, I still do not understand what you are doing with this
    > self-module pinning stuff. AFAICS, we should not get here unless there is a
    > bug. So the checkpoint module ref count is never decreased, right?
    >
    > Could you detail what is this self-module pinning for? As I already told
    > you, this looks like a bogus solution to avoid unloading the checkpoint
    > module during restart.


    Actually right now module ref count increase/decrease is not needed.
    But in some cases restore work should be done only after unfreezing the
    process. So, in this case we should grab ref count during process creation
    and put it after this special work is done.
    I will rework this place and send it in next version to make it more clear how
    it will be used in future.

    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process

    On Thu, 2008-10-23 at 13:54 +0400, Andrey Mirkin wrote:
    > We are putting special structure on stack, which is used at the very end of
    > the whole restart procedure to restore complex states (ptrace is one of such
    > cases). Right now I don't need to use this structure as we have a deal with
    > simple cases, but reservation of 256 bytes on stack is needed for future.


    Wow. So you're saying that, if this patch is accepted, we simply need
    to accept that anything being checkpointed will use an extra 256 bytes
    of stack? Seems like something to perhaps put in the changelog rather
    than some completely undocumented assembly nugget.

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [Devel] Re: [PATCH 06/10] Introduce functions to dump mm

    On Thu, 2008-10-23 at 12:43 +0400, Andrey Mirkin wrote:
    > > > +#ifdef CONFIG_X86
    > > > + if (pmd_huge(*pmd)) {
    > > > + eprintk("page_huge\n");
    > > > + goto out_unsupported;
    > > > + }
    > > > +#endif

    > >
    > > I take it you know that this breaks with the 1GB (x86_64) and 16GB (ppc)
    > > large pages.
    > >
    > > Since you have the VMA, why not use is_vm_hugetlb_page()?

    > Right now I'm checking VM_HUGETLB flag on VMAs in dump_one_vma().
    > This checks were added for sanity purpose just to throw out all unsupported
    > right now cases.


    I'm telling you that it's no good. Not only should this path never be
    hit. But, if it is, you'll crash anyway in some cases.

    It's a bad check. At best it misleads the reader to think that you've
    covered your bases.

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process

    On Thu, 2008-10-23 at 13:00 +0400, Andrey Mirkin wrote:
    >
    > > >>> It is not related to the freezer code actually.
    > > >>> That is needed to restart syscalls. Right now I don't have a code in my
    > > >>> patchset which restarts a syscall, but later I plan to add it.
    > > >>> In OpenVZ checkpointing we restart syscalls if process was caught in
    > > >>> syscall during checkpointing.
    > > >>
    > > >> Do you checkpoint uninterruptible syscalls as well? If only
    > > >> interruptible syscalls are checkpointed, I'd say that either this
    > > >> syscall uses ERESTARTSYS or ERESTART_RESTARTBLOCK, and then signal
    > > >> handling code already does the trick, or this syscall does not restart
    > > >> itself when interrupted, and well, this is life, userspace just sees
    > > >> -EINTR, which is allowed by the syscall spec.
    > > >> Actually this is how we checkpoint/migrate tasks in interruptible
    > > >> syscalls in Kerrighed and this works.
    > > >
    > > > We checkpoint only interruptible syscalls. Some syscalls do not restart
    > > > themself, that is why after restarting a process we restart syscall to
    > > > complete it.

    > >
    > > Can you please elaborate on this ? I don't recall having had issues
    > > with that.

    >
    > Right now in 2.6.18 kernel we restarts in such a way pause, rt_sigtimedwait
    > and futex syscalls. Recently futex syscall was reworked and we will not need
    > such hooks for it.


    Could you elaborate on this a bit?

    If the futex syscall was reworked, perhaps we can do the same for
    rt_sigtimedwait() and get rid of this code completely.

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process

    On Thursday 23 October 2008 17:57 Dave Hansen wrote:
    > On Thu, 2008-10-23 at 13:00 +0400, Andrey Mirkin wrote:
    > > > >>> It is not related to the freezer code actually.
    > > > >>> That is needed to restart syscalls. Right now I don't have a code
    > > > >>> in my patchset which restarts a syscall, but later I plan to add
    > > > >>> it. In OpenVZ checkpointing we restart syscalls if process was
    > > > >>> caught in syscall during checkpointing.
    > > > >>
    > > > >> Do you checkpoint uninterruptible syscalls as well? If only
    > > > >> interruptible syscalls are checkpointed, I'd say that either this
    > > > >> syscall uses ERESTARTSYS or ERESTART_RESTARTBLOCK, and then signal
    > > > >> handling code already does the trick, or this syscall does not
    > > > >> restart itself when interrupted, and well, this is life, userspace
    > > > >> just sees -EINTR, which is allowed by the syscall spec.
    > > > >> Actually this is how we checkpoint/migrate tasks in interruptible
    > > > >> syscalls in Kerrighed and this works.
    > > > >
    > > > > We checkpoint only interruptible syscalls. Some syscalls do not
    > > > > restart themself, that is why after restarting a process we restart
    > > > > syscall to complete it.
    > > >
    > > > Can you please elaborate on this ? I don't recall having had issues
    > > > with that.

    > >
    > > Right now in 2.6.18 kernel we restarts in such a way pause,
    > > rt_sigtimedwait and futex syscalls. Recently futex syscall was reworked
    > > and we will not need such hooks for it.

    >
    > Could you elaborate on this a bit?
    >
    > If the futex syscall was reworked, perhaps we can do the same for
    > rt_sigtimedwait() and get rid of this code completely.


    Well, we can try to rework rt_sigtimedwait(), but we will still need this code
    in the future to restart pause syscall from kernel without returning to user
    space. Also this code will be needed to restore some complex states.
    As concerns pause syscall I have already written to Louis about the problem we
    are trying to solve with this code. There is a gap when process will be in
    user space just before entering syscall again. At this time a signal can be
    delivered to process and it even can be handled. So, we will miss a signal
    which must interrupt pause syscall.

    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process

    On Thursday 23 October 2008 17:49 Dave Hansen wrote:
    > On Thu, 2008-10-23 at 13:54 +0400, Andrey Mirkin wrote:
    > > We are putting special structure on stack, which is used at the very end
    > > of the whole restart procedure to restore complex states (ptrace is one
    > > of such cases). Right now I don't need to use this structure as we have a
    > > deal with simple cases, but reservation of 256 bytes on stack is needed
    > > for future.

    >
    > Wow. So you're saying that, if this patch is accepted, we simply need
    > to accept that anything being checkpointed will use an extra 256 bytes
    > of stack? Seems like something to perhaps put in the changelog rather
    > than some completely undocumented assembly nugget.


    This 256 bytes will be used only during restart procedure and only by our
    module. As you can see in i386_ret_from_resume we are restoring it back. So,
    when process will return to user space it will not have extra 256 bytes
    reserved on stack already. I will add information about it to documentation
    and changelog.

    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [Devel] Re: [PATCH 06/10] Introduce functions to dump mm

    On Thursday 23 October 2008 17:51 Dave Hansen wrote:
    > On Thu, 2008-10-23 at 12:43 +0400, Andrey Mirkin wrote:
    > > > > +#ifdef CONFIG_X86
    > > > > + if (pmd_huge(*pmd)) {
    > > > > + eprintk("page_huge\n");
    > > > > + goto out_unsupported;
    > > > > + }
    > > > > +#endif
    > > >
    > > > I take it you know that this breaks with the 1GB (x86_64) and 16GB
    > > > (ppc) large pages.
    > > >
    > > > Since you have the VMA, why not use is_vm_hugetlb_page()?

    > >
    > > Right now I'm checking VM_HUGETLB flag on VMAs in dump_one_vma().
    > > This checks were added for sanity purpose just to throw out all
    > > unsupported right now cases.

    >
    > I'm telling you that it's no good. Not only should this path never be
    > hit. But, if it is, you'll crash anyway in some cases.
    >
    > It's a bad check. At best it misleads the reader to think that you've
    > covered your bases.


    Agree, I will rework this.

    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [Devel] Re: [PATCH 05/10] Introduce function to dump process

    On Monday 20 October 2008 15:02 Louis Rilling wrote:
    > Hi,
    >
    > On Sat, Oct 18, 2008 at 03:11:33AM +0400, Andrey Mirkin wrote:
    > > Functions to dump task struct, fpu state and registers are added.
    > > All IDs are saved from the POV of process (container) namespace.

    >
    > Just a couple of little comments, in case this series should keep on
    > living.
    >
    > [...]
    >
    > > diff --git a/checkpoint/cpt_process.c b/checkpoint/cpt_process.c
    > > new file mode 100644
    > > index 0000000..58f608d
    > > --- /dev/null
    > > +++ b/checkpoint/cpt_process.c
    > > @@ -0,0 +1,236 @@
    > > +/*
    > > + * Copyright (C) 2008 Parallels, Inc.
    > > + *
    > > + * Author: Andrey Mirkin
    > > + *
    > > + * This program is free software; you can redistribute it and/or
    > > + * modify it under the terms of the GNU General Public License as
    > > + * published by the Free Software Foundation, version 2 of the
    > > + * License.
    > > + *
    > > + */
    > > +
    > > +#include
    > > +#include
    > > +#include
    > > +#include
    > > +#include
    > > +
    > > +#include "checkpoint.h"
    > > +#include "cpt_image.h"
    > > +
    > > +static unsigned int encode_task_flags(unsigned int task_flags)
    > > +{
    > > + unsigned int flags = 0;
    > > +
    > > + if (task_flags & PF_EXITING)
    > > + flags |= (1 << CPT_PF_EXITING);
    > > + if (task_flags & PF_FORKNOEXEC)
    > > + flags |= (1 << CPT_PF_FORKNOEXEC);
    > > + if (task_flags & PF_SUPERPRIV)
    > > + flags |= (1 << CPT_PF_SUPERPRIV);
    > > + if (task_flags & PF_DUMPCORE)
    > > + flags |= (1 << CPT_PF_DUMPCORE);
    > > + if (task_flags & PF_SIGNALED)
    > > + flags |= (1 << CPT_PF_SIGNALED);
    > > + if (task_flags & PF_USED_MATH)
    > > + flags |= (1 << CPT_PF_USED_MATH);
    > > +
    > > + return flags;
    > > +
    > > +}
    > > +
    > > +int cpt_dump_task_struct(struct task_struct *tsk, struct cpt_context
    > > *ctx) +{
    > > + struct cpt_task_image *t;
    > > + int i;
    > > + int err;
    > > +
    > > + t = kzalloc(sizeof(*t), GFP_KERNEL);
    > > + if (!t)
    > > + return -ENOMEM;
    > > +
    > > + t->cpt_len = sizeof(*t);
    > > + t->cpt_type = CPT_OBJ_TASK;
    > > + t->cpt_hdrlen = sizeof(*t);
    > > + t->cpt_content = CPT_CONTENT_ARRAY;
    > > +
    > > + t->cpt_state = tsk->state;
    > > + t->cpt_flags = encode_task_flags(tsk->flags);
    > > + t->cpt_exit_code = tsk->exit_code;
    > > + t->cpt_exit_signal = tsk->exit_signal;
    > > + t->cpt_pdeath_signal = tsk->pdeath_signal;
    > > + t->cpt_pid = task_pid_nr_ns(tsk, ctx->nsproxy->pid_ns);
    > > + t->cpt_tgid = task_tgid_nr_ns(tsk, ctx->nsproxy->pid_ns);
    > > + t->cpt_ppid = tsk->parent ?
    > > + task_pid_nr_ns(tsk->parent, ctx->nsproxy->pid_ns) : 0;
    > > + t->cpt_rppid = tsk->real_parent ?
    > > + task_pid_nr_ns(tsk->real_parent, ctx->nsproxy->pid_ns) : 0;
    > > + t->cpt_pgrp = task_pgrp_nr_ns(tsk, ctx->nsproxy->pid_ns);
    > > + t->cpt_session = task_session_nr_ns(tsk, ctx->nsproxy->pid_ns);
    > > + t->cpt_old_pgrp = 0;
    > > + if (tsk->signal->tty_old_pgrp)
    > > + t->cpt_old_pgrp = pid_vnr(tsk->signal->tty_old_pgrp);
    > > + t->cpt_leader = tsk->group_leader ? task_pid_vnr(tsk->group_leader) :
    > > 0;

    >
    > Why pid_vnr() here, and task_*_nr_ns() above? According to the introducing
    > comment, I'd expect something like pid_nr_ns(tsk->signal->tty_old_pgrp,
    > tsk->nsproxy->pid_ns), and the same for tsk->group_leader.
    >
    > IIUC, pid_vnr() is correct only if ctx->nsproxy->pid_ns ==
    > tsk->nsproxy->pid_ns == current->nsproxy->pid_ns, and I expect current to
    > live in a different pid_ns.
    >
    > Comments?


    Oh, you right here. I have already fixed it in my tree, but accidentally wrong
    patch were sent.

    >
    > > + t->cpt_utime = tsk->utime;
    > > + t->cpt_stime = tsk->stime;
    > > + t->cpt_utimescaled = tsk->utimescaled;
    > > + t->cpt_stimescaled = tsk->stimescaled;
    > > + t->cpt_gtime = tsk->gtime;
    > > + t->cpt_prev_utime = tsk->prev_utime;
    > > + t->cpt_prev_stime = tsk->prev_stime;
    > > + t->cpt_nvcsw = tsk->nvcsw;
    > > + t->cpt_nivcsw = tsk->nivcsw;
    > > + t->cpt_start_time = cpt_timespec_export(&tsk->start_time);
    > > + t->cpt_real_start_time = cpt_timespec_export(&tsk->real_start_time);
    > > + t->cpt_min_flt = tsk->min_flt;
    > > + t->cpt_maj_flt = tsk->maj_flt;
    > > + memcpy(t->cpt_comm, tsk->comm, TASK_COMM_LEN);
    > > + for (i = 0; i < GDT_ENTRY_TLS_ENTRIES; i++) {
    > > + t->cpt_tls[i] = (((u64)tsk->thread.tls_array[i].b) << 32) +
    > > + tsk->thread.tls_array[i].a;
    > > + }
    > > + /* TODO: encode thread flags and status like task flags */
    > > + t->cpt_thrflags = task_thread_info(tsk)->flags & ~(1< > > + t->cpt_thrstatus = task_thread_info(tsk)->status;
    > > + t->cpt_user = tsk->user->uid;
    > > + t->cpt_uid = tsk->uid;
    > > + t->cpt_euid = tsk->euid;
    > > + t->cpt_suid = tsk->suid;
    > > + t->cpt_fsuid = tsk->fsuid;
    > > + t->cpt_gid = tsk->gid;
    > > + t->cpt_egid = tsk->egid;
    > > + t->cpt_sgid = tsk->sgid;
    > > + t->cpt_fsgid = tsk->fsgid;
    > > +
    > > + err = ctx->write(t, sizeof(*t), ctx);
    > > +
    > > + kfree(t);
    > > + return err;
    > > +}
    > > +
    > > +static int cpt_dump_fpustate(struct task_struct *tsk, struct cpt_context
    > > *ctx) +{
    > > + struct cpt_obj_bits hdr;
    > > + int err;
    > > + int content;
    > > + unsigned long size;
    > > +
    > > + content = CPT_CONTENT_X86_FPUSTATE;
    > > + size = sizeof(struct i387_fxsave_struct);
    > > +#ifndef CONFIG_X86_64
    > > + if (!cpu_has_fxsr) {
    > > + size = sizeof(struct i387_fsave_struct);
    > > + content = CPT_CONTENT_X86_FPUSTATE_OLD;
    > > + }
    > > +#endif
    > > +
    > > + hdr.cpt_len = sizeof(hdr) + size;
    > > + hdr.cpt_type = CPT_OBJ_BITS;
    > > + hdr.cpt_hdrlen = sizeof(hdr);
    > > + hdr.cpt_content = content;
    > > + hdr.cpt_size = size;
    > > + err = ctx->write(&hdr, sizeof(hdr), ctx);
    > > + if (!err)
    > > + ctx->write(tsk->thread.xstate, size, ctx);

    >
    > Should check the error code of the line above, right?

    Right, thanks!

    [...]

    > > +int cpt_dump_task(struct task_struct *tsk, struct cpt_context *ctx)
    > > +{
    > > + int err;
    > > +
    > > + err = cpt_dump_task_struct(tsk, ctx);
    > > +
    > > + /* Dump task mm */
    > > +
    > > + if (!err)
    > > + cpt_dump_fpustate(tsk, ctx);

    >
    > error checking...
    >
    > > + if (!err)
    > > + cpt_dump_registers(tsk, ctx);

    >
    > error checking...

    Thanks again, will fix it.


    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [Devel] Re: [PATCH 05/10] Introduce function to dump process

    On Monday 20 October 2008 21:48 Serge E. Hallyn wrote:
    > Quoting Andrey Mirkin (major@openvz.org):
    > > + t->cpt_uid = tsk->uid;
    > > + t->cpt_euid = tsk->euid;
    > > + t->cpt_suid = tsk->suid;
    > > + t->cpt_fsuid = tsk->fsuid;
    > > + t->cpt_gid = tsk->gid;
    > > + t->cpt_egid = tsk->egid;
    > > + t->cpt_sgid = tsk->sgid;
    > > + t->cpt_fsgid = tsk->fsgid;

    >
    > I don't see where any of these are restored. (Obviously, I wanted
    > to think about how you're verifying the restarter's authorization
    > to do so)


    Well, right now I don't use them during restore to simplify restart procedure
    and make it more clear for reviewers. In OpenVZ we are doing all restart
    procedure with root's privileges and relying on fact that all such IDs will
    be the same during restart (as we are restarting a container and its file
    system will be the same during restart).

    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process



    Andrey Mirkin wrote:
    > On Thursday 23 October 2008 17:57 Dave Hansen wrote:
    >> On Thu, 2008-10-23 at 13:00 +0400, Andrey Mirkin wrote:
    >>>>>>> It is not related to the freezer code actually.
    >>>>>>> That is needed to restart syscalls. Right now I don't have a code
    >>>>>>> in my patchset which restarts a syscall, but later I plan to add
    >>>>>>> it. In OpenVZ checkpointing we restart syscalls if process was
    >>>>>>> caught in syscall during checkpointing.
    >>>>>> Do you checkpoint uninterruptible syscalls as well? If only
    >>>>>> interruptible syscalls are checkpointed, I'd say that either this
    >>>>>> syscall uses ERESTARTSYS or ERESTART_RESTARTBLOCK, and then signal
    >>>>>> handling code already does the trick, or this syscall does not
    >>>>>> restart itself when interrupted, and well, this is life, userspace
    >>>>>> just sees -EINTR, which is allowed by the syscall spec.
    >>>>>> Actually this is how we checkpoint/migrate tasks in interruptible
    >>>>>> syscalls in Kerrighed and this works.
    >>>>> We checkpoint only interruptible syscalls. Some syscalls do not
    >>>>> restart themself, that is why after restarting a process we restart
    >>>>> syscall to complete it.
    >>>> Can you please elaborate on this ? I don't recall having had issues
    >>>> with that.
    >>> Right now in 2.6.18 kernel we restarts in such a way pause,
    >>> rt_sigtimedwait and futex syscalls. Recently futex syscall was reworked
    >>> and we will not need such hooks for it.

    >> Could you elaborate on this a bit?
    >>
    >> If the futex syscall was reworked, perhaps we can do the same for
    >> rt_sigtimedwait() and get rid of this code completely.

    >
    > Well, we can try to rework rt_sigtimedwait(), but we will still need this code
    > in the future to restart pause syscall from kernel without returning to user
    > space. Also this code will be needed to restore some complex states.
    > As concerns pause syscall I have already written to Louis about the problem we
    > are trying to solve with this code. There is a gap when process will be in
    > user space just before entering syscall again. At this time a signal can be
    > delivered to process and it even can be handled. So, we will miss a signal
    > which must interrupt pause syscall.


    I'm not convinced that you a real race exists, and even if it does, I'm not
    convinced that hacking the assembly entry/exit code is the best way to do it.

    Let me explain:

    You are concerned about a race in which a signal is delivered to a task
    that resumes from restart to user space and is about to (re)invoke 'pause()'
    (because the restart so arranged its EIP and registers).

    This almost always means that the user code is buggy and relies on specific
    scheduling, because you can usually find a scheduling (without the C/R) where
    the intended recipient of the signal was delayed and only calls pause() after
    the signal is delivered.

    For instance, if the sequence of events is:
    A calls pause() -> checkpoint -> restart ->
    B signals A -> A calls pause() (after restart),
    then the following sequence is possible(*) without C/R:
    B signals A -> A calls pause()
    because normally B cannot assume anything about when A is actually,
    really, is suspended (which means the programmer did an imperfect job).

    I said "almost always" and "usually", because there is one case where the
    alternative schedule: task B could, prior to sending the signal, "ensure"
    that task A is already sleeping within the 'pause()' syscall. While this
    is possible, it is definitely unusual, and in fact I never code that does
    that. And what if the sysadmin send SIGSTOP followed by SIGCONT ? In
    short, such code is simply broken.

    More importantly, if you think about the operation and semantics of the
    freezer cgroup - similar behavior is to be expected when you freeze and
    then thaw a container.

    Specifically, when you freeze the container that has a task in sys_pause(),
    then that task will abort the syscall become frozen. As soon as it becomes
    unfrozen, it will return to user space (with the EIP "rewinded") only to
    re-invoke the syscall. So the same "race" remains even if you only freeze
    and then thaw, regardless of C/R.

    Moreover, I argue that basically when you return from a sys_restart(), the
    entire container should, by default, remain in frozen state - just like it
    is with sys_checkpoint(). An explicit thaw will make the container resume
    execution.

    Therefore, there are two options: the first is to decide that this behavior
    - going back to user space to re-invoke the syscall - is valid. In this case
    you don't need a special hack for returning from sys_restart(). The second
    option is to decide that it is broken, in which case you need to also fix
    the freezer code. Personally, I think that this behavior is valid and need
    not be fixed.

    Finally, even if you do want to fix the behavior for this pathologic case,
    I don't see why you'd want to do it in this manner. Instead, you can add a
    simple test prior to returning from sys_restart(), something like this:

    ...
    /* almost done: now handle special cases: */
    if (our last syscall == __NR_pause) {
    ret = sys_pause();
    } else if (our last syscall == __NR_futex) {
    do some stuff;
    ret = sys_futex();
    } else {
    ret = what-we-want-to-return
    }
    /* finally, return to user space */
    return ret;
    }

    I'm not quite know what other "complex states" you refer to; but I wonder
    whether that code "needed to restore some complex states" could not be
    implemented along the same idea.

    The upside is clear: the code is less obscure, simple to debug, and not
    architecture-dependent. (hehe .. it even runs faster because it saves a
    whole kernel->user->kernel switch, what do you know !).

    Oren.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [PATCH 10/10] Add support for multiple processes



    Andrey Mirkin wrote:
    > The whole tree of processes can be checkpointed and restarted now.
    > Shared objects are not supported yet.
    >
    > Signed-off-by: Andrey Mirkin
    > ---
    > checkpoint/cpt_image.h | 2 +
    > checkpoint/cpt_process.c | 24 +++++++++++++
    > checkpoint/rst_process.c | 85 +++++++++++++++++++++++++++-------------------
    > 3 files changed, 76 insertions(+), 35 deletions(-)
    >
    > diff --git a/checkpoint/cpt_image.h b/checkpoint/cpt_image.h
    > index e1fb483..f370df2 100644
    > --- a/checkpoint/cpt_image.h
    > +++ b/checkpoint/cpt_image.h
    > @@ -128,6 +128,8 @@ struct cpt_task_image {
    > __u64 cpt_nivcsw;
    > __u64 cpt_min_flt;
    > __u64 cpt_maj_flt;
    > + __u32 cpt_children_num;
    > + __u32 cpt_pad;
    > } __attribute__ ((aligned (8)));
    >
    > struct cpt_mm_image {
    > diff --git a/checkpoint/cpt_process.c b/checkpoint/cpt_process.c
    > index 1f7a54b..d73ec3c 100644
    > --- a/checkpoint/cpt_process.c
    > +++ b/checkpoint/cpt_process.c
    > @@ -40,6 +40,19 @@ static unsigned int encode_task_flags(unsigned int task_flags)
    >
    > }
    >
    > +static int cpt_count_children(struct task_struct *tsk, struct cpt_context *ctx)
    > +{
    > + int num = 0;
    > + struct task_struct *child;
    > +
    > + list_for_each_entry(child, &tsk->children, sibling) {
    > + if (child->parent != tsk)
    > + continue;
    > + num++;
    > + }
    > + return num;
    > +}
    > +


    I noticed that don't take the appropriate locks when browsing through
    tasks lists (siblings, children, global list). Although I realize that
    the container should be frozen at this time, I keep wondering if this
    is indeed always safe.

    For instance, are you protected against an OOM killer that might just
    occur uninvited and kill one of those tasks ?

    Can the administrator force an un-freeze of the container ? Or perhaps
    some error condition if the kernel cause that ?

    > int cpt_dump_task_struct(struct task_struct *tsk, struct cpt_context *ctx)
    > {
    > struct cpt_task_image *t;
    > @@ -102,6 +115,7 @@ int cpt_dump_task_struct(struct task_struct *tsk, struct cpt_context *ctx)
    > t->cpt_egid = tsk->egid;
    > t->cpt_sgid = tsk->sgid;
    > t->cpt_fsgid = tsk->fsgid;
    > + t->cpt_children_num = cpt_count_children(tsk, ctx);
    >
    > err = ctx->write(t, sizeof(*t), ctx);
    >
    > @@ -231,6 +245,16 @@ int cpt_dump_task(struct task_struct *tsk, struct cpt_context *ctx)
    > err = cpt_dump_fpustate(tsk, ctx);
    > if (!err)
    > err = cpt_dump_registers(tsk, ctx);
    > + if (!err) {
    > + struct task_struct *child;
    > + list_for_each_entry(child, &tsk->children, sibling) {
    > + if (child->parent != tsk)
    > + continue;
    > + err = cpt_dump_task(child, ctx);
    > + if (err)
    > + break;


    Here too.

    [...]

    Oren.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [Devel] Re: [PATCH 08/10] Introduce functions to restart a process

    On Sunday 26 October 2008 01:10 Oren Laadan wrote:
    > Andrey Mirkin wrote:
    > > On Thursday 23 October 2008 17:57 Dave Hansen wrote:
    > >> On Thu, 2008-10-23 at 13:00 +0400, Andrey Mirkin wrote:
    > >>>>>>> It is not related to the freezer code actually.
    > >>>>>>> That is needed to restart syscalls. Right now I don't have a code
    > >>>>>>> in my patchset which restarts a syscall, but later I plan to add
    > >>>>>>> it. In OpenVZ checkpointing we restart syscalls if process was
    > >>>>>>> caught in syscall during checkpointing.
    > >>>>>>
    > >>>>>> Do you checkpoint uninterruptible syscalls as well? If only
    > >>>>>> interruptible syscalls are checkpointed, I'd say that either this
    > >>>>>> syscall uses ERESTARTSYS or ERESTART_RESTARTBLOCK, and then signal
    > >>>>>> handling code already does the trick, or this syscall does not
    > >>>>>> restart itself when interrupted, and well, this is life, userspace
    > >>>>>> just sees -EINTR, which is allowed by the syscall spec.
    > >>>>>> Actually this is how we checkpoint/migrate tasks in interruptible
    > >>>>>> syscalls in Kerrighed and this works.
    > >>>>>
    > >>>>> We checkpoint only interruptible syscalls. Some syscalls do not
    > >>>>> restart themself, that is why after restarting a process we restart
    > >>>>> syscall to complete it.
    > >>>>
    > >>>> Can you please elaborate on this ? I don't recall having had issues
    > >>>> with that.
    > >>>
    > >>> Right now in 2.6.18 kernel we restarts in such a way pause,
    > >>> rt_sigtimedwait and futex syscalls. Recently futex syscall was reworked
    > >>> and we will not need such hooks for it.
    > >>
    > >> Could you elaborate on this a bit?
    > >>
    > >> If the futex syscall was reworked, perhaps we can do the same for
    > >> rt_sigtimedwait() and get rid of this code completely.

    > >
    > > Well, we can try to rework rt_sigtimedwait(), but we will still need this
    > > code in the future to restart pause syscall from kernel without returning
    > > to user space. Also this code will be needed to restore some complex
    > > states. As concerns pause syscall I have already written to Louis about
    > > the problem we are trying to solve with this code. There is a gap when
    > > process will be in user space just before entering syscall again. At this
    > > time a signal can be delivered to process and it even can be handled. So,
    > > we will miss a signal which must interrupt pause syscall.

    >
    > I'm not convinced that you a real race exists, and even if it does, I'm not
    > convinced that hacking the assembly entry/exit code is the best way to do
    > it.


    Well, as I already told pause() syscall is is not only one case why we need to
    do some additional job in that place.

    > Let me explain:
    >
    > You are concerned about a race in which a signal is delivered to a task
    > that resumes from restart to user space and is about to (re)invoke
    > 'pause()' (because the restart so arranged its EIP and registers).
    >
    > This almost always means that the user code is buggy and relies on specific
    > scheduling, because you can usually find a scheduling (without the C/R)
    > where the intended recipient of the signal was delayed and only calls
    > pause() after the signal is delivered.
    >
    > For instance, if the sequence of events is:
    > A calls pause() -> checkpoint -> restart ->
    > B signals A -> A calls pause() (after restart),
    > then the following sequence is possible(*) without C/R:
    > B signals A -> A calls pause()
    > because normally B cannot assume anything about when A is actually,
    > really, is suspended (which means the programmer did an imperfect job).


    You right here. Both sequences are possible in theory. You will be surprised
    but in practice we found out that probability to miss a signal in case of C/R
    is much higher then during ordinary execution.

    > I said "almost always" and "usually", because there is one case where the
    > alternative schedule: task B could, prior to sending the signal, "ensure"
    > that task A is already sleeping within the 'pause()' syscall. While this
    > is possible, it is definitely unusual, and in fact I never code that does
    > that. And what if the sysadmin send SIGSTOP followed by SIGCONT ? In
    > short, such code is simply broken.
    >
    > More importantly, if you think about the operation and semantics of the
    > freezer cgroup - similar behavior is to be expected when you freeze and
    > then thaw a container.
    >
    > Specifically, when you freeze the container that has a task in sys_pause(),
    > then that task will abort the syscall become frozen. As soon as it becomes
    > unfrozen, it will return to user space (with the EIP "rewinded") only to
    > re-invoke the syscall. So the same "race" remains even if you only freeze
    > and then thaw, regardless of C/R.


    Exactly. But during freeze/unfreeze probability to catch such situation is
    very low. In our tests we were tried to checkpoint/restart LTP tests. And
    this "race" were triggered during restart almost in 100% of tests.

    > Moreover, I argue that basically when you return from a sys_restart(), the
    > entire container should, by default, remain in frozen state - just like it
    > is with sys_checkpoint(). An explicit thaw will make the container resume
    > execution.


    No doubt. That is how we do in OpenVZ, after restart the container remains
    frozen. And we need to thaw it to resume its execution.

    > Therefore, there are two options: the first is to decide that this behavior
    > - going back to user space to re-invoke the syscall - is valid. In this
    > case you don't need a special hack for returning from sys_restart(). The
    > second option is to decide that it is broken, in which case you need to
    > also fix the freezer code. Personally, I think that this behavior is valid
    > and need not be fixed.


    I still believe that we need to fix such behaviour during restart as in
    practice it is very easy to trigger it.

    > Finally, even if you do want to fix the behavior for this pathologic case,
    > I don't see why you'd want to do it in this manner. Instead, you can add a
    > simple test prior to returning from sys_restart(), something like this:
    >
    > ...
    > /* almost done: now handle special cases: */
    > if (our last syscall == __NR_pause) {
    > ret = sys_pause();
    > } else if (our last syscall == __NR_futex) {
    > do some stuff;
    > ret = sys_futex();
    > } else {
    > ret = what-we-want-to-return
    > }
    > /* finally, return to user space */
    > return ret;
    > }


    This only works if we do not want to stay in frozen state after restart. Or am
    I missed something?

    > I'm not quite know what other "complex states" you refer to; but I wonder
    > whether that code "needed to restore some complex states" could not be
    > implemented along the same idea.


    In the same manner we are restoring for instance ptrace.

    > The upside is clear: the code is less obscure, simple to debug, and not
    > architecture-dependent. (hehe .. it even runs faster because it saves a
    > whole kernel->user->kernel switch, what do you know !).

    In our case we also do not need a switch and the code actually not very
    complicated.

    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [Devel] Re: [PATCH 03/10] Introduce context structure needed during checkpointing/restart

    On Monday 20 October 2008 21:02 Dave Hansen wrote:
    > On Sat, 2008-10-18 at 03:11 +0400, Andrey Mirkin wrote:
    > > +typedef struct cpt_context
    > > +{
    > > + pid_t pid; /* should be changed to ctid later */
    > > + int ctx_id; /* context id */
    > > + struct list_head ctx_list;
    > > + int refcount;
    > > + int ctx_state;
    > > + struct semaphore main_sem;

    >
    > Does this really need to be a semaphore or is a mutex OK?

    Actually mutex is enough here.

    > > + int errno;

    >
    > Could you hold off on adding these things to the struct until the patch
    > where they're actually used? It's hard to judge this without seeing
    > what you do with it.

    I will try not to introduce variables and functions which are not used in
    future.

    >
    > > + struct file *file;
    > > + loff_t current_object;
    > > +
    > > + struct list_head object_array[CPT_OBJ_MAX];
    > > +
    > > + int (*write)(const void *addr, size_t count, struct cpt_context *ctx);
    > > + int (*read)(void *addr, size_t count, struct cpt_context *ctx);
    > > +} cpt_context_t;

    >
    > Man, this is hard to review. I was going to try and make sure that your
    > refcounting was right and atomic, but there's no use of it in this patch
    > except for the initialization and accessor functions. Darn.

    For simplicity I will throw out all this stuff completely.

    >
    > > +extern int debug_level;

    >
    > I'm going to go out on a limb here and say that "debug_level" is
    > probably a wee bit too generic of a variable name.

    I will change it to something else.

    >
    > > +#define cpt_printk(lvl, fmt, args...) do { \
    > > + if (lvl <= debug_level) \
    > > + printk(fmt, ##args); \
    > > + } while (0)

    >
    > I think you can use pr_debug() here, too, just like Oren did.

    Will switch to pr_debug().

    >
    > > +struct cpt_context * context_alloc(void)
    > > +{
    > > + struct cpt_context *ctx;
    > > + int i;
    > > +
    > > + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
    > > + if (!ctx)
    > > + return NULL;
    > > +
    > > + init_MUTEX(&ctx->main_sem);
    > > + ctx->refcount = 1;
    > > +
    > > + ctx->current_object = -1;
    > > + ctx->write = file_write;
    > > + ctx->read = file_read;
    > > + for (i = 0; i < CPT_OBJ_MAX; i++) {
    > > + INIT_LIST_HEAD(&ctx->object_array[i]);
    > > + }
    > > +
    > > + return ctx;
    > > +}
    > > +
    > > +void context_release(struct cpt_context *ctx)
    > > +{
    > > + ctx->ctx_state = CPT_CTX_ERROR;
    > > +
    > > + kfree(ctx);
    > > +}
    > > +
    > > +static void context_put(struct cpt_context *ctx)
    > > +{
    > > + if (!--ctx->refcount)
    > > + context_release(ctx);
    > > +}
    > > +
    > > static int checkpoint(pid_t pid, int fd, unsigned long flags)
    > > {
    > > - return -ENOSYS;
    > > + struct file *file;
    > > + struct cpt_context *ctx;
    > > + int err;
    > > +
    > > + err = -EBADF;
    > > + file = fget(fd);
    > > + if (!file)
    > > + goto out;
    > > +
    > > + err = -ENOMEM;
    > > + ctx = context_alloc();
    > > + if (!ctx)
    > > + goto out_file;
    > > +
    > > + ctx->file = file;
    > > + ctx->ctx_state = CPT_CTX_DUMPING;
    > > +
    > > + /* checkpoint */
    > > + err = -ENOSYS;
    > > +
    > > + context_put(ctx);
    > > +
    > > +out_file:
    > > + fput(file);
    > > +out:
    > > + return err;
    > > }

    >
    > So, where is context_get()? Is there only single-threaded access to the
    > refcount? If so, why do we even need it? We should probably just use
    > context_release() driectly.

    The idea is that in future we should be able to keep a context for incremental
    checkpointing. That is why we need context get/put functions. Right now it is
    not used, so I will drop it.

    > If there is multithreaded access to context_put() or the refcount, then
    > they're unsafe without additional locking.

    Access to refcount will be protected with context mutex.

    Thanks for comments.

    Actually I'm not sure if I will continue with my own patch set, but I will
    take into account all your comments during porting my functionality to Oren's
    tree.

    Andrey
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [Devel] Re: [PATCH 10/10] Add support for multiple processes

    On Monday 27 October 2008 18:58 Oren Laadan wrote:
    > Andrey Mirkin wrote:
    > > The whole tree of processes can be checkpointed and restarted now.
    > > Shared objects are not supported yet.
    > >
    > > Signed-off-by: Andrey Mirkin
    > > ---
    > > checkpoint/cpt_image.h | 2 +
    > > checkpoint/cpt_process.c | 24 +++++++++++++
    > > checkpoint/rst_process.c | 85
    > > +++++++++++++++++++++++++++------------------- 3 files changed, 76
    > > insertions(+), 35 deletions(-)
    > >
    > > diff --git a/checkpoint/cpt_image.h b/checkpoint/cpt_image.h
    > > index e1fb483..f370df2 100644
    > > --- a/checkpoint/cpt_image.h
    > > +++ b/checkpoint/cpt_image.h
    > > @@ -128,6 +128,8 @@ struct cpt_task_image {
    > > __u64 cpt_nivcsw;
    > > __u64 cpt_min_flt;
    > > __u64 cpt_maj_flt;
    > > + __u32 cpt_children_num;
    > > + __u32 cpt_pad;
    > > } __attribute__ ((aligned (8)));
    > >
    > > struct cpt_mm_image {
    > > diff --git a/checkpoint/cpt_process.c b/checkpoint/cpt_process.c
    > > index 1f7a54b..d73ec3c 100644
    > > --- a/checkpoint/cpt_process.c
    > > +++ b/checkpoint/cpt_process.c
    > > @@ -40,6 +40,19 @@ static unsigned int encode_task_flags(unsigned int
    > > task_flags)
    > >
    > > }
    > >
    > > +static int cpt_count_children(struct task_struct *tsk, struct
    > > cpt_context *ctx) +{
    > > + int num = 0;
    > > + struct task_struct *child;
    > > +
    > > + list_for_each_entry(child, &tsk->children, sibling) {
    > > + if (child->parent != tsk)
    > > + continue;
    > > + num++;
    > > + }
    > > + return num;
    > > +}
    > > +

    >
    > I noticed that don't take the appropriate locks when browsing through
    > tasks lists (siblings, children, global list). Although I realize that
    > the container should be frozen at this time, I keep wondering if this
    > is indeed always safe.

    You right here. We need to take tasklist_lock to be sure that everything will
    be consistent.

    > For instance, are you protected against an OOM killer that might just
    > occur uninvited and kill one of those tasks ?


    OOM killer can't kill one of those tasks as all of them should be frozen at
    that time and be in uninterruptible state. So, we can just do not think about
    OOM at that time. But anyway you right and we need locking around browsing
    tasks list.

    > Can the administrator force an un-freeze of the container ? Or perhaps
    > some error condition if the kernel cause that ?


    The main idea is that context should be protected with a mutex and only one
    process can do some activity (freeze, dump, unfreeze, kill) with a container.
    Right now it is not implemented at all, but in future it will be added.

    Andrey

    > > int cpt_dump_task_struct(struct task_struct *tsk, struct cpt_context
    > > *ctx) {
    > > struct cpt_task_image *t;
    > > @@ -102,6 +115,7 @@ int cpt_dump_task_struct(struct task_struct *tsk,
    > > struct cpt_context *ctx) t->cpt_egid = tsk->egid;
    > > t->cpt_sgid = tsk->sgid;
    > > t->cpt_fsgid = tsk->fsgid;
    > > + t->cpt_children_num = cpt_count_children(tsk, ctx);
    > >
    > > err = ctx->write(t, sizeof(*t), ctx);
    > >
    > > @@ -231,6 +245,16 @@ int cpt_dump_task(struct task_struct *tsk, struct
    > > cpt_context *ctx) err = cpt_dump_fpustate(tsk, ctx);
    > > if (!err)
    > > err = cpt_dump_registers(tsk, ctx);
    > > + if (!err) {
    > > + struct task_struct *child;
    > > + list_for_each_entry(child, &tsk->children, sibling) {
    > > + if (child->parent != tsk)
    > > + continue;
    > > + err = cpt_dump_task(child, ctx);
    > > + if (err)
    > > + break;

    >
    > Here too.
    >
    > [...]
    >
    > Oren.
    >
    > _______________________________________________
    > Containers mailing list
    > Containers@lists.linux-foundation.org
    > https://lists.linux-foundation.org/m...nfo/containers
    >
    > _______________________________________________
    > Devel mailing list
    > Devel@openvz.org
    > https://openvz.org/mailman/listinfo/devel

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 2 of 3 FirstFirst 1 2 3 LastLast