[RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work - Kernel

This is a discussion on [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work - Kernel ; Suggested by Ingo. Checkpoint/restart is going to be a long effort to get things working. We're going to have a lot of things that we know just don't work for a long time. That doesn't mean that it will be ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 20 of 34

Thread: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

  1. [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work


    Suggested by Ingo.

    Checkpoint/restart is going to be a long effort to get things working.
    We're going to have a lot of things that we know just don't work for
    a long time. That doesn't mean that it will be useless, it just means
    that there's some complicated features that we are going to have to
    work incrementally to fix.

    This patch introduces a new mechanism to help the checkpoint/restart
    developers. A new function pair: task/process_deny_checkpoint() is
    created. When called, these tell the kernel that we *know* that the
    process has performed some activity that will keep it from being
    properly checkpointed.

    The 'flag' is an atomic_t for now so that we can have some level
    of atomicity and make sure to only warn once.

    For now, this is a one-way trip. Once a process is no longer
    'may_checkpoint' capable, neither it nor its children ever will be.
    This can, of course, be fixed up in the future. We might want to
    reset the flag when a new pid namespace is created, for instance.


    Signed-off-by: Dave Hansen
    ---

    linux-2.6.git-dave/include/linux/checkpoint.h | 31 +++++++++++++++++++++++++-
    linux-2.6.git-dave/include/linux/sched.h | 3 ++
    linux-2.6.git-dave/kernel/fork.c | 10 ++++++++
    3 files changed, 43 insertions(+), 1 deletion(-)

    diff -puN include/linux/checkpoint.h~flag include/linux/checkpoint.h
    --- linux-2.6.git/include/linux/checkpoint.h~flag 2008-10-09 11:56:57.000000000 -0700
    +++ linux-2.6.git-dave/include/linux/checkpoint.h 2008-10-09 11:56:57.000000000 -0700
    @@ -10,8 +10,11 @@
    * distribution for more details.
    */

    -#include
    #include
    +#include
    +#include
    +
    +#ifdef CONFIG_CHECKPOINT_RESTART

    #define CR_VERSION 2

    @@ -94,4 +97,30 @@ extern void file_pos_write(struct file *
    #define cr_debug(fmt, args...) \
    pr_debug("[CR:%s] " fmt, __func__, ## args)

    +static inline void __task_deny_checkpointing(struct task_struct *task,
    + char *file, int line)
    +{
    + if(!atomic_dec_and_test(&task->may_checkpoint))
    + return;
    + printk(KERN_INFO "process performed an action that can not be "
    + "checkpointed at: %s:%d\n", file, line);
    + WARN_ON(1);
    +}
    +#define process_deny_checkpointing(p) __task_deny_checkpointing(p, __FILE__, __LINE__)
    +
    +/*
    + * For now, we're not going to have a distinction between
    + * tasks and processes for the purpose of c/r. But, allow
    + * these two calls anyway to make new users at least think
    + * about it.
    + */
    +#define task_deny_checkpointing(p) __task_deny_checkpointing(p, __FILE__, __LINE__)
    +
    +#else
    +
    +static inline void task_deny_checkpointing(struct task_struct *task) {}
    +static inline void process_deny_checkpointing(struct task_struct *task) {}
    +
    +#endif
    +
    #endif /* _CHECKPOINT_CKPT_H_ */
    diff -puN include/linux/sched.h~flag include/linux/sched.h
    --- linux-2.6.git/include/linux/sched.h~flag 2008-10-09 11:56:57.000000000 -0700
    +++ linux-2.6.git-dave/include/linux/sched.h 2008-10-09 11:56:57.000000000 -0700
    @@ -1301,6 +1301,9 @@ struct task_struct {
    int latency_record_count;
    struct latency_record latency_record[LT_SAVECOUNT];
    #endif
    +#ifdef CONFIG_CHECKPOINT_RESTART
    + atomic_t may_checkpoint;
    +#endif
    };

    /*
    diff -puN kernel/fork.c~flag kernel/fork.c
    --- linux-2.6.git/kernel/fork.c~flag 2008-10-09 11:56:57.000000000 -0700
    +++ linux-2.6.git-dave/kernel/fork.c 2008-10-09 11:56:57.000000000 -0700
    @@ -194,6 +194,13 @@ void __init fork_init(unsigned long memp
    init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
    init_task.signal->rlim[RLIMIT_SIGPENDING] =
    init_task.signal->rlim[RLIMIT_NPROC];
    +
    +#ifdef CONFIG_CHECKPOINT_RESTART
    + /*
    + * This probably won't stay set for long...
    + */
    + atomic_set(&init_task.may_checkpoint, 1);
    +#endif
    }

    int __attribute__((weak)) arch_dup_task_struct(struct task_struct *dst,
    @@ -244,6 +251,9 @@ static struct task_struct *dup_task_stru
    tsk->btrace_seq = 0;
    #endif
    tsk->splice_pipe = NULL;
    +#ifdef CONFIG_CHECKPOINT_RESTART
    + atomic_set(&tsk->may_checkpoint, atomic_read(&orig->may_checkpoint));
    +#endif
    return tsk;

    out:
    _
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

    Quoting Dave Hansen (dave@linux.vnet.ibm.com):
    >
    > These are just a few simple examples of things we know we can't
    > checkpoint now. There are plenty more, but this should give
    > everyone an idea how this will look in practice.
    >
    >
    > Signed-off-by: Dave Hansen
    > ---
    >
    > linux-2.6.git-dave/fs/aio.c | 7 +++++++
    > linux-2.6.git-dave/ipc/mqueue.c | 3 +++
    > linux-2.6.git-dave/net/socket.c | 5 +++++
    > 3 files changed, 15 insertions(+)
    >
    > diff -puN fs/aio.c~no-checkpointing-for-sockets fs/aio.c
    > --- linux-2.6.git/fs/aio.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
    > +++ linux-2.6.git-dave/fs/aio.c 2008-10-09 11:56:58.000000000 -0700
    > @@ -19,6 +19,7 @@
    >
    > #define DEBUG 0
    >
    > +#include
    > #include
    > #include
    > #include
    > @@ -1663,6 +1664,8 @@ asmlinkage long sys_io_submit(aio_contex
    > if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
    > return -EFAULT;
    >
    > + process_deny_checkpointing(current);
    > +
    > ctx = lookup_ioctx(ctx_id);
    > if (unlikely(!ctx)) {
    > pr_debug("EINVAL: io_submit: invalid context id\n");
    > @@ -1742,6 +1745,8 @@ asmlinkage long sys_io_cancel(aio_contex
    > if (unlikely(!ctx))
    > return -EINVAL;
    >
    > + process_deny_checkpointing(current);
    > +
    > spin_lock_irq(&ctx->ctx_lock);
    > ret = -EAGAIN;
    > kiocb = lookup_kiocb(ctx, iocb, key);
    > @@ -1796,6 +1801,8 @@ asmlinkage long sys_io_getevents(aio_con
    > struct kioctx *ioctx = lookup_ioctx(ctx_id);
    > long ret = -EINVAL;
    >
    > + process_deny_checkpointing(current);


    Hmm, I don't know too much about aio, but is it possible to succeed with
    io_getevents if we didn't first do a submit? It looks like the contexts
    are looked up out of current->mm, so I don't think we need this call
    here.

    Otherwise, this is neat.

    -serge

    > +
    > if (likely(ioctx)) {
    > if (likely(min_nr <= nr && min_nr >= 0 && nr >= 0))
    > ret = read_events(ioctx, min_nr, nr, events, timeout);
    > diff -puN ipc/mqueue.c~no-checkpointing-for-sockets ipc/mqueue.c
    > --- linux-2.6.git/ipc/mqueue.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
    > +++ linux-2.6.git-dave/ipc/mqueue.c 2008-10-09 11:56:58.000000000 -0700
    > @@ -14,6 +14,7 @@
    > */
    >
    > #include
    > +#include
    > #include
    > #include
    > #include
    > @@ -655,6 +656,8 @@ asmlinkage long sys_mq_open(const char _
    > char *name;
    > int fd, error;
    >
    > + process_deny_checkpointing(current);
    > +
    > error = audit_mq_open(oflag, mode, u_attr);
    > if (error != 0)
    > return error;
    > diff -puN net/socket.c~no-checkpointing-for-sockets net/socket.c
    > --- linux-2.6.git/net/socket.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
    > +++ linux-2.6.git-dave/net/socket.c 2008-10-09 11:56:58.000000000 -0700
    > @@ -87,6 +87,7 @@
    > #include
    > #include
    > #include
    > +#include
    >
    > #include
    > #include
    > @@ -1236,6 +1237,8 @@ asmlinkage long sys_socket(int family, i
    > if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK))
    > flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
    >
    > + process_deny_checkpointing(current);
    > +
    > retval = sock_create(family, type, protocol, &sock);
    > if (retval < 0)
    > goto out;
    > @@ -2130,6 +2133,8 @@ asmlinkage long sys_socketcall(int call,
    > a0 = a[0];
    > a1 = a[1];
    >
    > + process_deny_checkpointing(current);
    > +
    > switch (call) {
    > case SYS_SOCKET:
    > err = sys_socket(a0, a1, a[2]);
    > _
    > _______________________________________________
    > Containers mailing list
    > Containers@lists.linux-foundation.org
    > https://lists.linux-foundation.org/m...nfo/containers

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

    On Thu, 2008-10-09 at 14:43 -0500, Serge E. Hallyn wrote:
    > Hmm, I don't know too much about aio, but is it possible to succeed with
    > io_getevents if we didn't first do a submit? It looks like the contexts
    > are looked up out of current->mm, so I don't think we need this call
    > here.
    >
    > Otherwise, this is neat.


    Good question. I know nothing, either.

    My thought was that any process *trying* to do aio stuff of any kind is
    going to be really confused if it gets checkpointed. Or, it might try
    to submit an aio right after it checks the list of them. I thought it
    best to be cautious and say, if you screw with aio, no checkpointing for
    you!

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

    On Thu, 2008-10-09 at 12:04 -0700, Dave Hansen wrote:
    > Suggested by Ingo.
    >
    > Checkpoint/restart is going to be a long effort to get things working.
    > We're going to have a lot of things that we know just don't work for
    > a long time. That doesn't mean that it will be useless, it just means
    > that there's some complicated features that we are going to have to
    > work incrementally to fix.
    >
    > This patch introduces a new mechanism to help the checkpoint/restart
    > developers. A new function pair: task/process_deny_checkpoint() is
    > created. When called, these tell the kernel that we *know* that the
    > process has performed some activity that will keep it from being
    > properly checkpointed.
    >
    > The 'flag' is an atomic_t for now so that we can have some level
    > of atomicity and make sure to only warn once.
    >
    > For now, this is a one-way trip. Once a process is no longer
    > 'may_checkpoint' capable, neither it nor its children ever will be.
    > This can, of course, be fixed up in the future. We might want to
    > reset the flag when a new pid namespace is created, for instance.
    >


    Then this patch should be described as:

    Track in-kernel when we expect checkpoint/restart to fail.

    By the way, why don't you introduce the reverse operation ?

    >
    > Signed-off-by: Dave Hansen
    > ---
    >
    > linux-2.6.git-dave/include/linux/checkpoint.h | 31 +++++++++++++++++++++++++-
    > linux-2.6.git-dave/include/linux/sched.h | 3 ++
    > linux-2.6.git-dave/kernel/fork.c | 10 ++++++++
    > 3 files changed, 43 insertions(+), 1 deletion(-)
    >
    > diff -puN include/linux/checkpoint.h~flag include/linux/checkpoint.h
    > --- linux-2.6.git/include/linux/checkpoint.h~flag 2008-10-09 11:56:57.000000000 -0700
    > +++ linux-2.6.git-dave/include/linux/checkpoint.h 2008-10-09 11:56:57.000000000 -0700
    > @@ -10,8 +10,11 @@
    > * distribution for more details.
    > */
    >
    > -#include
    > #include
    > +#include
    > +#include
    > +
    > +#ifdef CONFIG_CHECKPOINT_RESTART
    >
    > #define CR_VERSION 2
    >
    > @@ -94,4 +97,30 @@ extern void file_pos_write(struct file *
    > #define cr_debug(fmt, args...) \
    > pr_debug("[CR:%s] " fmt, __func__, ## args)
    >
    > +static inline void __task_deny_checkpointing(struct task_struct *task,
    > + char *file, int line)
    > +{
    > + if(!atomic_dec_and_test(&task->may_checkpoint))
    > + return;
    > + printk(KERN_INFO "process performed an action that can not be "
    > + "checkpointed at: %s:%d\n", file, line);
    > + WARN_ON(1);
    > +}
    > +#define process_deny_checkpointing(p) __task_deny_checkpointing(p, __FILE__, __LINE__)
    > +
    > +/*
    > + * For now, we're not going to have a distinction between
    > + * tasks and processes for the purpose of c/r. But, allow
    > + * these two calls anyway to make new users at least think
    > + * about it.
    > + */
    > +#define task_deny_checkpointing(p) __task_deny_checkpointing(p, __FILE__, __LINE__)
    > +
    > +#else
    > +
    > +static inline void task_deny_checkpointing(struct task_struct *task) {}
    > +static inline void process_deny_checkpointing(struct task_struct *task) {}
    > +
    > +#endif
    > +
    > #endif /* _CHECKPOINT_CKPT_H_ */
    > diff -puN include/linux/sched.h~flag include/linux/sched.h
    > --- linux-2.6.git/include/linux/sched.h~flag 2008-10-09 11:56:57.000000000 -0700
    > +++ linux-2.6.git-dave/include/linux/sched.h 2008-10-09 11:56:57.000000000 -0700
    > @@ -1301,6 +1301,9 @@ struct task_struct {
    > int latency_record_count;
    > struct latency_record latency_record[LT_SAVECOUNT];
    > #endif
    > +#ifdef CONFIG_CHECKPOINT_RESTART
    > + atomic_t may_checkpoint;
    > +#endif
    > };
    >
    > /*
    > diff -puN kernel/fork.c~flag kernel/fork.c
    > --- linux-2.6.git/kernel/fork.c~flag 2008-10-09 11:56:57.000000000 -0700
    > +++ linux-2.6.git-dave/kernel/fork.c 2008-10-09 11:56:57.000000000 -0700
    > @@ -194,6 +194,13 @@ void __init fork_init(unsigned long memp
    > init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
    > init_task.signal->rlim[RLIMIT_SIGPENDING] =
    > init_task.signal->rlim[RLIMIT_NPROC];
    > +
    > +#ifdef CONFIG_CHECKPOINT_RESTART
    > + /*
    > + * This probably won't stay set for long...
    > + */
    > + atomic_set(&init_task.may_checkpoint, 1);
    > +#endif
    > }
    >
    > int __attribute__((weak)) arch_dup_task_struct(struct task_struct *dst,
    > @@ -244,6 +251,9 @@ static struct task_struct *dup_task_stru
    > tsk->btrace_seq = 0;
    > #endif
    > tsk->splice_pipe = NULL;
    > +#ifdef CONFIG_CHECKPOINT_RESTART
    > + atomic_set(&tsk->may_checkpoint, atomic_read(&orig->may_checkpoint));
    > +#endif
    > return tsk;
    >
    > out:
    > _
    > _______________________________________________
    > Containers mailing list
    > Containers@lists.linux-foundation.org
    > https://lists.linux-foundation.org/m...nfo/containers

    --
    Gregory Kurz gkurz@fr.ibm.com
    Software Engineer @ IBM/Meiosys http://www.ibm.com
    Tel +33 (0)534 638 479 Fax +33 (0)561 400 420

    "Anarchy is about taking complete responsibility for yourself."
    Alan Moore.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

    Greg Kurz wrote:
    > On Thu, 2008-10-09 at 12:04 -0700, Dave Hansen wrote:
    >> Suggested by Ingo.
    >>
    >> Checkpoint/restart is going to be a long effort to get things working.
    >> We're going to have a lot of things that we know just don't work for
    >> a long time. That doesn't mean that it will be useless, it just means
    >> that there's some complicated features that we are going to have to
    >> work incrementally to fix.
    >>
    >> This patch introduces a new mechanism to help the checkpoint/restart
    >> developers. A new function pair: task/process_deny_checkpoint() is
    >> created. When called, these tell the kernel that we *know* that the
    >> process has performed some activity that will keep it from being
    >> properly checkpointed.
    >>
    >> The 'flag' is an atomic_t for now so that we can have some level
    >> of atomicity and make sure to only warn once.
    >>
    >> For now, this is a one-way trip. Once a process is no longer
    >> 'may_checkpoint' capable, neither it nor its children ever will be.
    >> This can, of course, be fixed up in the future. We might want to
    >> reset the flag when a new pid namespace is created, for instance.
    >>

    >
    > Then this patch should be described as:
    >
    > Track in-kernel when we expect checkpoint/restart to fail.
    >
    > By the way, why don't you introduce the reverse operation ?


    I think implementing the reverse operation will be a nightmare, IMHO it
    is safe to say we deny checkpointing for the process life-cycle either
    if the created resource was destroyed before we initiate the checkpoint.

    For example, you create a socket, the process becomes uncheckpointable,
    you close (via sys_close) the socket, you have to track this close to be
    related to the socket which made the process uncheckpointable in order
    to make the operation reversible.

    Let's imagine you implement this reverse operation anyway, you have a
    process which creates a TCP connection, writes data and close the socket
    (so you are again checkpointable), but in the namespace there is the
    orphan socket which is not checkpointable yet and you missed this case.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

    Dave Hansen wrote:
    > These are just a few simple examples of things we know we can't
    > checkpoint now. There are plenty more, but this should give
    > everyone an idea how this will look in practice.
    >
    >
    > Signed-off-by: Dave Hansen
    > ---
    >
    > linux-2.6.git-dave/fs/aio.c | 7 +++++++
    > linux-2.6.git-dave/ipc/mqueue.c | 3 +++
    > linux-2.6.git-dave/net/socket.c | 5 +++++
    > 3 files changed, 15 insertions(+)
    >
    > diff -puN fs/aio.c~no-checkpointing-for-sockets fs/aio.c
    > --- linux-2.6.git/fs/aio.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
    > +++ linux-2.6.git-dave/fs/aio.c 2008-10-09 11:56:58.000000000 -0700
    > @@ -19,6 +19,7 @@
    >
    > #define DEBUG 0
    >
    > +#include
    > #include
    > #include
    > #include
    > @@ -1663,6 +1664,8 @@ asmlinkage long sys_io_submit(aio_contex
    > if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
    > return -EFAULT;
    >
    > + process_deny_checkpointing(current);
    > +
    > ctx = lookup_ioctx(ctx_id);
    > if (unlikely(!ctx)) {
    > pr_debug("EINVAL: io_submit: invalid context id\n");
    > @@ -1742,6 +1745,8 @@ asmlinkage long sys_io_cancel(aio_contex
    > if (unlikely(!ctx))
    > return -EINVAL;
    >
    > + process_deny_checkpointing(current);
    > +
    > spin_lock_irq(&ctx->ctx_lock);
    > ret = -EAGAIN;
    > kiocb = lookup_kiocb(ctx, iocb, key);
    > @@ -1796,6 +1801,8 @@ asmlinkage long sys_io_getevents(aio_con
    > struct kioctx *ioctx = lookup_ioctx(ctx_id);
    > long ret = -EINVAL;
    >
    > + process_deny_checkpointing(current);
    > +
    > if (likely(ioctx)) {
    > if (likely(min_nr <= nr && min_nr >= 0 && nr >= 0))
    > ret = read_events(ioctx, min_nr, nr, events, timeout);
    > diff -puN ipc/mqueue.c~no-checkpointing-for-sockets ipc/mqueue.c
    > --- linux-2.6.git/ipc/mqueue.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
    > +++ linux-2.6.git-dave/ipc/mqueue.c 2008-10-09 11:56:58.000000000 -0700
    > @@ -14,6 +14,7 @@
    > */
    >
    > #include
    > +#include
    > #include
    > #include
    > #include
    > @@ -655,6 +656,8 @@ asmlinkage long sys_mq_open(const char _
    > char *name;
    > int fd, error;
    >
    > + process_deny_checkpointing(current);
    > +
    > error = audit_mq_open(oflag, mode, u_attr);
    > if (error != 0)
    > return error;
    > diff -puN net/socket.c~no-checkpointing-for-sockets net/socket.c
    > --- linux-2.6.git/net/socket.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
    > +++ linux-2.6.git-dave/net/socket.c 2008-10-09 11:56:58.000000000 -0700
    > @@ -87,6 +87,7 @@
    > #include
    > #include
    > #include
    > +#include
    >
    > #include
    > #include
    > @@ -1236,6 +1237,8 @@ asmlinkage long sys_socket(int family, i
    > if (SOCK_NONBLOCK != O_NONBLOCK && (flags & SOCK_NONBLOCK))
    > flags = (flags & ~SOCK_NONBLOCK) | O_NONBLOCK;
    >
    > + process_deny_checkpointing(current);
    > +
    > retval = sock_create(family, type, protocol, &sock);
    > if (retval < 0)
    > goto out;
    > @@ -2130,6 +2133,8 @@ asmlinkage long sys_socketcall(int call,
    > a0 = a[0];
    > a1 = a[1];
    >
    > + process_deny_checkpointing(current);
    > +
    > switch (call) {
    > case SYS_SOCKET:
    > err = sys_socket(a0, a1, a[2]);


    That seems to be a good idea.

    There isn't a risk of a big propagation of this function all around the
    kernel code ? Especially if there are partial support for a specific
    resource ? I mean we are able to checkpoint ipv4/tcp sockets, (yeah I
    like socket examples ) , but not other protocols so we have to remove
    from sys_socket the process_deny_checkpoint and add it to all others
    protocols.




    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

    On Fri, 2008-10-10 at 10:37 +0200, Daniel Lezcano wrote:
    > I think implementing the reverse operation will be a nightmare, IMHO it
    > is safe to say we deny checkpointing for the process life-cycle either
    > if the created resource was destroyed before we initiate the checkpoint.
    >
    > For example, you create a socket, the process becomes uncheckpointable,
    > you close (via sys_close) the socket, you have to track this close to be
    > related to the socket which made the process uncheckpointable in order
    > to make the operation reversible.
    >
    > Let's imagine you implement this reverse operation anyway, you have a
    > process which creates a TCP connection, writes data and close the socket
    > (so you are again checkpointable), but in the namespace there is the
    > orphan socket which is not checkpointable yet and you missed this case.


    That's exactly what I wanted to read... Tracking only is inherently
    flawed. The valid way IMHO implies checks at checkpoint time.

    --
    Gregory Kurz gkurz@fr.ibm.com
    Software Engineer @ IBM/Meiosys http://www.ibm.com
    Tel +33 (0)534 638 479 Fax +33 (0)561 400 420

    "Anarchy is about taking complete responsibility for yourself."
    Alan Moore.

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()


    * Dave Hansen wrote:

    > On Thu, 2008-10-09 at 14:43 -0500, Serge E. Hallyn wrote:
    > > Hmm, I don't know too much about aio, but is it possible to succeed with
    > > io_getevents if we didn't first do a submit? It looks like the contexts
    > > are looked up out of current->mm, so I don't think we need this call
    > > here.
    > >
    > > Otherwise, this is neat.

    >
    > Good question. I know nothing, either.
    >
    > My thought was that any process *trying* to do aio stuff of any kind
    > is going to be really confused if it gets checkpointed. Or, it might
    > try to submit an aio right after it checks the list of them. I
    > thought it best to be cautious and say, if you screw with aio, no
    > checkpointing for you!


    as long as there's total transparency and the transition from CR-capable
    to CR-disabled state is absolutely safe and race-free, that should be
    fine.

    I expect users to quickly cause enough pressure to reduce the NOCR areas
    of the kernel significantly ;-)

    In the long run, could we expect a (experimental) version of hibernation
    that would just use this checkpointing facility to hibernate? That would
    be way cool for users and for testing: we could do transparent kernel
    upgrades/downgrades via this form of hibernation, between CR-compatible
    kernels (!).

    distros could mark certain kernels as 'safe fallback' kernels, and if
    say a WARN_ON() or app breakage hits that is suspected to be kernel
    related, the user could hit a 'switch back to safe kernel' button -
    which would switch back _without losing any app state_.

    People could even try new versions of the kernel which would just fall
    back to the known-workin safe kernel if the bootup fails for example.

    Pie in the sky for sure, but way cool: it could propel Linux kernel
    testing to completely new areas - new kernels could be tried
    non-intrusively. (as long as a new kernel does not corrupt the CR data
    structures - so some good consistency and redundancy checking would be
    nice in the format!)

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

    > diff -puN ipc/mqueue.c~no-checkpointing-for-sockets ipc/mqueue.c
    > --- linux-2.6.git/ipc/mqueue.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
    > +++ linux-2.6.git-dave/ipc/mqueue.c 2008-10-09 11:56:58.000000000 -0700
    > @@ -14,6 +14,7 @@
    > */
    >
    > #include
    > +#include
    > #include
    > #include
    > #include
    > @@ -655,6 +656,8 @@ asmlinkage long sys_mq_open(const char _
    > char *name;
    > int fd, error;
    >
    > + process_deny_checkpointing(current);
    > +


    mqueue being a file system, i would put the checks in the inode_operations.

    Also, you can't always deny ! I would expect some allow in sys_mq_unlink().

    C.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work



    Daniel Lezcano wrote:
    > Greg Kurz wrote:
    >> On Thu, 2008-10-09 at 12:04 -0700, Dave Hansen wrote:
    >>> Suggested by Ingo.
    >>>
    >>> Checkpoint/restart is going to be a long effort to get things working.
    >>> We're going to have a lot of things that we know just don't work for
    >>> a long time. That doesn't mean that it will be useless, it just means
    >>> that there's some complicated features that we are going to have to
    >>> work incrementally to fix.
    >>>
    >>> This patch introduces a new mechanism to help the checkpoint/restart
    >>> developers. A new function pair: task/process_deny_checkpoint() is
    >>> created. When called, these tell the kernel that we *know* that the
    >>> process has performed some activity that will keep it from being
    >>> properly checkpointed.
    >>>
    >>> The 'flag' is an atomic_t for now so that we can have some level
    >>> of atomicity and make sure to only warn once.
    >>>
    >>> For now, this is a one-way trip. Once a process is no longer
    >>> 'may_checkpoint' capable, neither it nor its children ever will be.
    >>> This can, of course, be fixed up in the future. We might want to
    >>> reset the flag when a new pid namespace is created, for instance.
    >>>

    >> Then this patch should be described as:
    >>
    >> Track in-kernel when we expect checkpoint/restart to fail.
    >>
    >> By the way, why don't you introduce the reverse operation ?

    >
    > I think implementing the reverse operation will be a nightmare, IMHO it
    > is safe to say we deny checkpointing for the process life-cycle either
    > if the created resource was destroyed before we initiate the checkpoint.
    >
    > For example, you create a socket, the process becomes uncheckpointable,
    > you close (via sys_close) the socket, you have to track this close to be
    > related to the socket which made the process uncheckpointable in order
    > to make the operation reversible.


    I agree that it makes sense to only track transitions in one direction.
    Therefore at any given point in time all we'll know is that the process
    "may be non-checkpointable", instead of the clear-cut "uncheckpointable"
    (webster anyone ?).

    The distinction is important, because it may be that the process is,
    after all, checkpointable, so users/developers could still try to
    perform a checkpoint, should they wish too. The only thing is that
    it is not guaranteed to succeed.

    In fact, one way to transition back to the "checkpointable" state is
    by doing a dry-checkpoint, where no data is saved (/dev/null ?). No
    side effects will occur except for a short downtime due to the freeze
    period. If the dry-checkpoint completes successfully - we can reset
    the non-/un-/not-/a-/dis-checkpointable flag.

    >
    > Let's imagine you implement this reverse operation anyway, you have a
    > process which creates a TCP connection, writes data and close the socket
    > (so you are again checkpointable), but in the namespace there is the
    > orphan socket which is not checkpointable yet and you missed this case.
    > _______________________________________________
    > Containers mailing list
    > Containers@lists.linux-foundation.org
    > https://lists.linux-foundation.org/m...nfo/containers


    Oren.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

    Dave Hansen wrote:
    > On Thu, 2008-10-09 at 14:43 -0500, Serge E. Hallyn wrote:
    >> Hmm, I don't know too much about aio, but is it possible to succeed with
    >> io_getevents if we didn't first do a submit? It looks like the contexts
    >> are looked up out of current->mm, so I don't think we need this call
    >> here.
    >>
    >> Otherwise, this is neat.

    >
    > Good question. I know nothing, either.
    >
    > My thought was that any process *trying* to do aio stuff of any kind is
    > going to be really confused if it gets checkpointed. Or, it might try
    > to submit an aio right after it checks the list of them. I thought it
    > best to be cautious and say, if you screw with aio, no checkpointing for
    > you!


    IMO, the right path would be to deny checkpoint if you have aio requests
    pending but if you have not, aio's are not an issue for checkpoint anymore.

    but that's one easy one to track ...

    C.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

    On Friday, 10 of October 2008, Ingo Molnar wrote:
    >
    > * Dave Hansen wrote:
    >
    > > On Thu, 2008-10-09 at 14:43 -0500, Serge E. Hallyn wrote:
    > > > Hmm, I don't know too much about aio, but is it possible to succeed with
    > > > io_getevents if we didn't first do a submit? It looks like the contexts
    > > > are looked up out of current->mm, so I don't think we need this call
    > > > here.
    > > >
    > > > Otherwise, this is neat.

    > >
    > > Good question. I know nothing, either.
    > >
    > > My thought was that any process *trying* to do aio stuff of any kind
    > > is going to be really confused if it gets checkpointed. Or, it might
    > > try to submit an aio right after it checks the list of them. I
    > > thought it best to be cautious and say, if you screw with aio, no
    > > checkpointing for you!

    >
    > as long as there's total transparency and the transition from CR-capable
    > to CR-disabled state is absolutely safe and race-free, that should be
    > fine.
    >
    > I expect users to quickly cause enough pressure to reduce the NOCR areas
    > of the kernel significantly ;-)
    >
    > In the long run, could we expect a (experimental) version of hibernation
    > that would just use this checkpointing facility to hibernate?


    Surely not ACPI-compliant.

    Apart from this I don't see why not, but OTOH I'm not particularly interested
    in implementing that.

    Thanks,
    Rafael
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()

    Quoting Cedric Le Goater (clg@fr.ibm.com):
    > > diff -puN ipc/mqueue.c~no-checkpointing-for-sockets ipc/mqueue.c
    > > --- linux-2.6.git/ipc/mqueue.c~no-checkpointing-for-sockets 2008-10-09 11:56:58.000000000 -0700
    > > +++ linux-2.6.git-dave/ipc/mqueue.c 2008-10-09 11:56:58.000000000 -0700
    > > @@ -14,6 +14,7 @@
    > > */
    > >
    > > #include
    > > +#include
    > > #include
    > > #include
    > > #include
    > > @@ -655,6 +656,8 @@ asmlinkage long sys_mq_open(const char _
    > > char *name;
    > > int fd, error;
    > >
    > > + process_deny_checkpointing(current);
    > > +

    >
    > mqueue being a file system, i would put the checks in the inode_operations.
    >
    > Also, you can't always deny ! I would expect some allow in sys_mq_unlink().


    Remember a part of Ingo's motivation is to push c/r developers to
    address the lacking features that users use most, earlier. So the
    warnings and subsequent email complaints are what we're after. Hence a
    single 'checkpointable or not' flag.

    Given the single flag, how do you know at sys_mq_unlink() whether the
    process also has an opensocket?

    Rather than make this tracking facility more complicated and intrusive,
    if people complain that they couldn't checkpoint bc of a warning about
    aio, then we implement aio c/r! We don't just try and reduce the amount
    of time that you can't checkpoint bc of lack of aio c/r support

    -serge
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: [RFC][PATCH 2/2] first callers of process_deny_checkpoint()


    * Rafael J. Wysocki wrote:

    > > In the long run, could we expect a (experimental) version of
    > > hibernation that would just use this checkpointing facility to
    > > hibernate?

    >
    > Surely not ACPI-compliant.


    what do you mean?

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work


    * Daniel Lezcano wrote:

    >> By the way, why don't you introduce the reverse operation ?

    >
    > I think implementing the reverse operation will be a nightmare, IMHO
    > it is safe to say we deny checkpointing for the process life-cycle
    > either if the created resource was destroyed before we initiate the
    > checkpoint.


    it's also a not too interesting case. The end goal is to just be able to
    checkpoint everything that matters - in the long run there simply wont
    be many places that are marked 'cannot checkpoint'.

    So the ability to deny a checkpoint is a transitional feature - a
    flexible CR todo list in essence - but also needed for
    applications/users that want to rely on CR being a dependable facility.

    It would be bad for most of the practical usecases of checkpointing to
    allow the checkpointing of an app, just to see it break on restore due
    to lost context.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work



    Ingo Molnar wrote:
    > * Daniel Lezcano wrote:
    >
    >>> By the way, why don't you introduce the reverse operation ?

    >> I think implementing the reverse operation will be a nightmare, IMHO
    >> it is safe to say we deny checkpointing for the process life-cycle
    >> either if the created resource was destroyed before we initiate the
    >> checkpoint.

    >
    > it's also a not too interesting case. The end goal is to just be able to
    > checkpoint everything that matters - in the long run there simply wont
    > be many places that are marked 'cannot checkpoint'.
    >
    > So the ability to deny a checkpoint is a transitional feature - a
    > flexible CR todo list in essence - but also needed for
    > applications/users that want to rely on CR being a dependable facility.
    >
    > It would be bad for most of the practical usecases of checkpointing to
    > allow the checkpointing of an app, just to see it break on restore due
    > to lost context.


    Actually it need not wait for restore to fail - it can fail during the
    checkpoint, as soon as the unsupported feature is encountered.

    Adding that flag of what you suggest will help make it more vocal and
    obvious that a feature isn't supported, even without the user actually
    trying to take a checkpoint. I like that I idea.

    Oren.

    >
    > Ingo
    > _______________________________________________
    > Containers mailing list
    > Containers@lists.linux-foundation.org
    > https://lists.linux-foundation.org/m...nfo/containers

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  17. Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work


    * Oren Laadan wrote:

    >
    >
    > Ingo Molnar wrote:
    > > * Daniel Lezcano wrote:
    > >
    > >>> By the way, why don't you introduce the reverse operation ?
    > >> I think implementing the reverse operation will be a nightmare, IMHO
    > >> it is safe to say we deny checkpointing for the process life-cycle
    > >> either if the created resource was destroyed before we initiate the
    > >> checkpoint.

    > >
    > > it's also a not too interesting case. The end goal is to just be able to
    > > checkpoint everything that matters - in the long run there simply wont
    > > be many places that are marked 'cannot checkpoint'.
    > >
    > > So the ability to deny a checkpoint is a transitional feature - a
    > > flexible CR todo list in essence - but also needed for
    > > applications/users that want to rely on CR being a dependable facility.
    > >
    > > It would be bad for most of the practical usecases of checkpointing to
    > > allow the checkpointing of an app, just to see it break on restore due
    > > to lost context.

    >
    > Actually it need not wait for restore to fail - it can fail during the
    > checkpoint, as soon as the unsupported feature is encountered.


    correct, that is the idea.

    Ingo
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  18. Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

    On Fri, 2008-10-10 at 11:17 -0400, Oren Laadan wrote:
    >
    > Ingo Molnar wrote:
    > > * Daniel Lezcano wrote:
    > >
    > >>> By the way, why don't you introduce the reverse operation ?
    > >> I think implementing the reverse operation will be a nightmare, IMHO
    > >> it is safe to say we deny checkpointing for the process life-cycle
    > >> either if the created resource was destroyed before we initiate the
    > >> checkpoint.

    > >
    > > it's also a not too interesting case. The end goal is to just be able to
    > > checkpoint everything that matters - in the long run there simply wont
    > > be many places that are marked 'cannot checkpoint'.
    > >
    > > So the ability to deny a checkpoint is a transitional feature - a
    > > flexible CR todo list in essence - but also needed for
    > > applications/users that want to rely on CR being a dependable facility.
    > >
    > > It would be bad for most of the practical usecases of checkpointing to
    > > allow the checkpointing of an app, just to see it break on restore due
    > > to lost context.

    >
    > Actually it need not wait for restore to fail - it can fail during the
    > checkpoint, as soon as the unsupported feature is encountered.
    >


    Of course, bad things must be spotted at checkpoint time !

    > Adding that flag of what you suggest will help make it more vocal and
    > obvious that a feature isn't supported, even without the user actually
    > trying to take a checkpoint. I like that I idea.
    >


    This flag is weak... testing it gives absolutly no hint whether the
    checkpoint may succeed or not. As it is designed now, a user can only be
    aware that checkpoint is *forever* denied. I agree that it's only useful
    as a "flexible CR todo list".

    In the long run, if there are still things that can prevent checkpoint
    from being consistent, they will have to be checked at checkpoint time.

    Greg.


    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  19. Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

    On Fri, 2008-10-10 at 18:34 +0200, Greg Kurz wrote:
    > This flag is weak... testing it gives absolutly no hint whether the
    > checkpoint may succeed or not. As it is designed now, a user can only be
    > aware that checkpoint is *forever* denied. I agree that it's only useful
    > as a "flexible CR todo list".


    Cool, so everyone agrees the patch is useful!

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  20. Re: [RFC][PATCH 1/2] Track in-kernel when we expect checkpoint/restart to work

    On Fri, 2008-10-10 at 10:37 +0200, Daniel Lezcano wrote:
    > For example, you create a socket, the process becomes uncheckpointable,
    > you close (via sys_close) the socket, you have to track this close to be
    > related to the socket which made the process uncheckpointable in order
    > to make the operation reversible.


    Challenging, yes. Not quite a nightmare, though.

    It's basically the same problem we have with r/o bind mounts individual
    users in the kernel need to check some global state. There are
    temporary (like the duration of the syscall) and more chronic (like your
    sockets) users. The hard part is making sure that everything gets
    properly balanced with the chronic ones. But, we did manage to
    accomplish that with the r/o bind mounts. I haven't seen a bug in
    *hours*!

    In any case, we can worry about that later.

    -- Dave

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 1 of 2 1 2 LastLast