migration thread and active_load_balance - Kernel

This is a discussion on migration thread and active_load_balance - Kernel ; Back again with more questions about the scheduler, as I've spent two or three days trying to debug on my own and I'm just not getting anywhere. Basically, I'm trying to add a new active balancing mechanism. I made out ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: migration thread and active_load_balance

  1. migration thread and active_load_balance

    Back again with more questions about the scheduler, as I've spent two
    or three days trying to debug on my own and I'm just not getting
    anywhere.

    Basically, I'm trying to add a new active balancing mechanism. I made
    out a diagram of how migration_thread calls active_load_balance and
    so on, and I use a flag (set by writing to a file in sysfs) to
    determine whether to use the standard iterator for the CFS runqueue or
    a different iterator I wrote. The new iterator seems to work fine, as
    I've been using it (again, with a flag) to replace the regular
    iterator when it's called from schedule by idle_balance. I basically
    tried adding an extra conditional in migration_thread that sets up
    some state and then calls active_load_balance, but I was getting
    deadlocks. I'm not really sure why, since all I've really changed is
    add a few variables to struct rq and struct cfs_rq.

    I tried only doing my state setup and restore in that conditional,
    without actually calling active_load_balance, which has given me an
    even more frustrating result--the kernel does not deadlock, but it
    does seem to crash in such a manner as to require a hard reset of the
    machine. (For instance, at one point I got an "invalid page state in
    process 'init'" message from the kernel; if I try to reboot from Gnome
    though it hangs.) I don't understand this at all, since as far as I
    can tell I'm using thread-local variables and really all I'm doing
    right now is assignments to them. Unless, of course the struct rq
    (from rq = cpu_rq(cpu) could be being manipulated elsewhere, leading
    to some sort of race condition...

    Anyway, like I said, I've spent several days trying to understand this
    error by putting in printk()s galore and doing traces through the
    source code to figure out the call chain, but I'm really stuck here.
    Can anybody shed some light, or point me to some more thorough
    documentation on the scheduler and active load balancing?

    Thanks,
    -dan
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: migration thread and active_load_balance

    On 20/04/2008, Dan Upton wrote:
    > Back again with more questions about the scheduler, as I've spent two
    > or three days trying to debug on my own and I'm just not getting
    > anywhere.
    >
    > Basically, I'm trying to add a new active balancing mechanism. I made
    > out a diagram of how migration_thread calls active_load_balance and
    > so on, and I use a flag (set by writing to a file in sysfs) to
    > determine whether to use the standard iterator for the CFS runqueue or
    > a different iterator I wrote. The new iterator seems to work fine, as
    > I've been using it (again, with a flag) to replace the regular
    > iterator when it's called from schedule by idle_balance. I basically
    > tried adding an extra conditional in migration_thread that sets up
    > some state and then calls active_load_balance, but I was getting
    > deadlocks. I'm not really sure why, since all I've really changed is
    > add a few variables to struct rq and struct cfs_rq.
    >
    > I tried only doing my state setup and restore in that conditional,
    > without actually calling active_load_balance, which has given me an
    > even more frustrating result--the kernel does not deadlock, but it
    > does seem to crash in such a manner as to require a hard reset of the
    > machine. (For instance, at one point I got an "invalid page state in
    > process 'init'" message from the kernel; if I try to reboot from Gnome
    > though it hangs.) I don't understand this at all, since as far as I
    > can tell I'm using thread-local variables and really all I'm doing
    > right now is assignments to them. Unless, of course the struct rq
    > (from rq = cpu_rq(cpu) could be being manipulated elsewhere, leading
    > to some sort of race condition...
    >


    can you post your modifications? I'd be much more easy to see what you
    are doing...

    thanks in advance.


    >
    > Thanks,
    > -dan
    >


    --
    Best regards,
    Dmitry Adamushko
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: migration thread and active_load_balance

    "Dan Upton" writes:

    [not a scheduler expert; just some general comments]

    > I'm using thread-local variables and really all I'm doing
    > right now is assignments to them. Unless, of course the struct rq
    > (from rq = cpu_rq(cpu) could be being manipulated elsewhere,


    Other CPUs can access it yes subject to the lock. You can test that
    theory by running with only one CPU.

    > leading
    > to some sort of race condition...
    >
    > Anyway, like I said, I've spent several days trying to understand this
    > error by putting in printk()s galore and doing traces through the


    That might be obvious, but are you aware that printks inside
    the scheduler can lead to deadlocks? printk when the buffer
    is full calls wake_up and that calls the scheduler. So for
    debugging the scheduler you need some other way to get
    the information out.

    -Andi
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: migration thread and active_load_balance

    > >
    > > Anyway, like I said, I've spent several days trying to understand this
    > > error by putting in printk()s galore and doing traces through the

    >
    > That might be obvious, but are you aware that printks inside
    > the scheduler can lead to deadlocks? printk when the buffer
    > is full calls wake_up and that calls the scheduler. So for
    > debugging the scheduler you need some other way to get
    > the information out.
    >
    > -Andi
    >


    Yes, I've read that printk can cause deadlocks, although I get the
    deadlocks without printks as well. Also, I found a modification on
    kerneltrap (http://kerneltrap.org/mailarchive/li...08/1/23/595569)
    that seems to a decent job of preventing deadlocks due to printk, at
    least in my case (I see in the comments on that thread though that
    it's not perfect).

    -dan
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: migration thread and active_load_balance

    On Sun, Apr 20, 2008 at 5:26 PM, Dmitry Adamushko
    wrote:
    >
    > On 20/04/2008, Dan Upton wrote:
    > > Back again with more questions about the scheduler, as I've spent two
    > > or three days trying to debug on my own and I'm just not getting
    > > anywhere.
    > >
    > > Basically, I'm trying to add a new active balancing mechanism. I made
    > > out a diagram of how migration_thread calls active_load_balance and
    > > so on, and I use a flag (set by writing to a file in sysfs) to
    > > determine whether to use the standard iterator for the CFS runqueue or
    > > a different iterator I wrote. The new iterator seems to work fine, as
    > > I've been using it (again, with a flag) to replace the regular
    > > iterator when it's called from schedule by idle_balance. I basically
    > > tried adding an extra conditional in migration_thread that sets up
    > > some state and then calls active_load_balance, but I was getting
    > > deadlocks. I'm not really sure why, since all I've really changed is
    > > add a few variables to struct rq and struct cfs_rq.
    > >
    > > I tried only doing my state setup and restore in that conditional,
    > > without actually calling active_load_balance, which has given me an
    > > even more frustrating result--the kernel does not deadlock, but it
    > > does seem to crash in such a manner as to require a hard reset of the
    > > machine. (For instance, at one point I got an "invalid page state in
    > > process 'init'" message from the kernel; if I try to reboot from Gnome
    > > though it hangs.) I don't understand this at all, since as far as I
    > > can tell I'm using thread-local variables and really all I'm doing
    > > right now is assignments to them. Unless, of course the struct rq
    > > (from rq = cpu_rq(cpu) could be being manipulated elsewhere, leading
    > > to some sort of race condition...
    > >

    >
    > can you post your modifications? I'd be much more easy to see what you
    > are doing...
    >
    > thanks in advance.
    >


    OK, here we go-- I've pretty much copied and pasted it over directly,
    which includes all of my comments and ruminations.

    The new iterator:

    static struct task_struct *__load_balance_therm_iterator(struct cfs_rq
    *cfs_rq, struct rb_node *curr){
    // local info
    int indexctr = 0, retindex=0;
    struct task_struct *p_tmp, *p_ret;
    struct rb_node *iter=curr;
    int lowest_temp = cfs_rq->last_therm_balance_temp;
    int last_index = cfs_rq->last_therm_balance_index;

    if(!curr)
    return NULL;

    // if last_therm_balance_index is -1, then this is being called from
    // load_balance_start_therm, so we can just look through the whole
    // runqueue to find something cooler without worrying about whether
    // we've already tried it
    if(last_index == -1){
    while(iter){
    p_tmp = rb_entry(iter, struct task_struct, se.run_node);
    if(p_tmp->lasttemp < lowest_temp){
    p_ret = p_tmp;
    lowest_temp = p_tmp->lasttemp;
    retindex = indexctr;
    }
    iter = rb_next(iter);
    indexctr++;
    }
    }
    // otherwise, we want to look for
    // - a process of equal temperature further down the queue
    // - the next-lowest temperature
    else{
    int second_lowest_temp=100; // see below

    // so first we look through for the next entry with
    the same temperature
    // the comments on __load_balance_iterator suggest
    dequeues can happen despite
    // the lock being held, but i'm assuming queueing
    can't happen, so we don't have
    // to worry about new, lower-temperatured processes
    magically appearing. this
    // assumption simplifies the search for next-coolest tasks.
    while(iter){
    p_tmp = rb_entry(iter, struct task_struct, se.run_node);
    if( (p_tmp->lasttemp <= lowest_temp) &&
    indexctr > last_index){
    // we're just looking for the next one
    down the line,
    // and it looks like we've found it,
    so we update cf_rq stats
    // and return from here
    cfs_rq->last_therm_balance_temp =
    p_tmp->lasttemp;
    cfs_rq->last_therm_balance_index = indexctr;
    return p_tmp;
    }else if(p_tmp->lasttemp > lowest_temp &&
    p_tmp < second_lowest_temp){
    second_lowest_temp = p_tmp->lasttemp;
    }
    indexctr++;
    iter = rb_next(iter);
    }
    // if we get here, it means we wandered off the end of
    the runqueue without finding
    // anything else with the same lowest temperature.
    however, we know now what the
    // second lowest temperature of the runqueue is
    (second_lowest_temp as calculated above),
    // so we can just look for the first task with that
    temp (or, again, lower, in case something
    // would change out from underneath us).

    // this makes use of the above assumption that tasks
    can only be dequeued but not enqueued
    iter = curr; // reset the iterator
    indexctr=0;
    while(iter){
    p_tmp = rb_entry(iter, struct task_struct, se.run_node);
    if(p_tmp->lasttemp == second_lowest_temp){
    // we found something, so let's update
    the stats and return it
    cfs_rq->last_therm_balance_temp =
    p_tmp->lasttemp;
    cfs_rq->last_therm_balance_index = indexctr;
    return p_tmp;
    }
    indexctr++;
    iter = rb_next(iter);
    }
    }

    // update stats in case we come back here
    cfs_rq->last_therm_balance_temp = lowest_temp;
    cfs_rq->last_therm_balance_index = retindex;
    return p_ret;

    }

    static struct task_struct *load_balance_start_therm(void *arg){
    struct cfs_rq *cfs_rq = arg;
    cfs_rq->last_therm_balance_index = -1;
    cfs_rq->last_therm_balance_temp = 100;
    return __load_balance_therm_iterator(cfs_rq, first_fair(cfs_rq));
    }

    static struct task_struct *load_balance_next_therm(void *arg){
    struct cfs_rq *cfs_rq = arg;
    return __load_balance_therm_iterator(cfs_rq, first_fair(cfs_rq));
    }


    The migration thread:

    static int migration_thread(void *data)
    {
    int cpu = (long)data;
    struct rq *rq;
    static int dbgflag=1;
    int save_ab = 0, save_push=0;
    int coolest = !cpu;
    int crash_likely=0;

    rq = cpu_rq(cpu);
    BUG_ON(rq->migration_thread != current);

    set_current_state(TASK_INTERRUPTIBLE);
    while (!kthread_should_stop()) {
    struct migration_req *req;
    struct list_head *head;

    spin_lock_irq(&rq->lock);

    if (cpu_is_offline(cpu)) {
    spin_unlock_irq(&rq->lock);
    goto wait_to_die;
    }

    // other stuff here too, like checking the cpu temp
    if(sched_thermal == 3){

    coolest = find_coolest_cpu(NULL);
    if(coolest != cpu){ // if there's somewhere
    cooler to push stuff
    rq->from_active_balance=1;
    save_ab = rq->active_balance;
    save_push = rq->push_cpu;
    rq->push_cpu = coolest;

    //active_load_balance(rq, cpu);
    rq->from_active_balance=0;
    rq->active_balance = save_ab;
    rq->push_cpu = save_push;
    crash_likely=1;
    }
    }
    // is it possible this could undo any work we just
    did? or maybe we could
    // cause a bug if this was going to be called because
    it was the busiest proc,
    // and now it isn't?
    if (rq->active_balance) {
    active_load_balance(rq, cpu);
    rq->active_balance = 0;

    }

    head = &rq->migration_queue;


    if (list_empty(head)) {
    spin_unlock_irq(&rq->lock);
    schedule();
    set_current_state(TASK_INTERRUPTIBLE);
    continue;
    }
    req = list_entry(head->next, struct migration_req, list);
    list_del_init(head->next);


    spin_unlock(&rq->lock);
    __migrate_task(req->task, cpu, req->dest_cpu);
    local_irq_enable();

    complete(&req->done);
    }
    __set_current_state(TASK_RUNNING);
    return 0;

    wait_to_die:
    /* Wait for kthread_stop */
    set_current_state(TASK_INTERRUPTIBLE);
    while (!kthread_should_stop()) {
    schedule();
    set_current_state(TASK_INTERRUPTIBLE);
    }
    __set_current_state(TASK_RUNNING);
    return 0;
    }

    The function find_coolest_cpu(NULL) basically duplicates the
    functionality of the coretemp driver to poll all of the cores' thermal
    sensors to find which one is the coolest. The other change I made is
    in move_one_task_fair, which checks if(sched_thermal==3 &&
    busiest->from_active_balance==1) to decide whether to use the new
    iterator. When it crashes, this is what I get:

    kernel BUG at kernel/sched.c:2103
    invalid opcode: 0000 [1] SMP
    CPU 0
    Modules linked in:
    Pid: 3, comm: migration/0 Not tained 2.6.24.3 #135
    RIP: 0010:[] double_rq_lock+0x14/0x4d

    stack:
    migration_thread+0x3db/0x3af
    migration_thread+0x0/0x3af
    kthread+0x3d/0x63
    child_rip_0xa/0x12
    kthread+0x0/0x63
    child_rip+0x0/0x12

    The basic functionality of this is supposed to be that if
    sched_thermal==3 (and eventually, if the core is above a certain
    temperature), we look for the coolest core and try to push stuff off
    onto it.

    If you can suggest what's causing my crash that'd be great; if it
    looks like I'm approaching the implementation all wrong, I'd be happy
    for any pointers along those lines too.

    -dan
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: migration thread and active_load_balance

    On 21/04/2008, Dan Upton wrote:
    > [ ... ]
    >
    > kernel BUG at kernel/sched.c:2103


    and what's this line in your patched sched.c?

    is it -- BUG_ON(!irqs_disabled()); ?

    anything in your unposted code (e.g. find_coolest_cpu()) that might
    re-enable the interrupts before __migration_task() is called?

    If you post your modifications as a patch
    (Documentation/applying-patches.txt) that contains _all_ relevant
    modifications, it'd be easier to guess what's wrong.


    >
    > -dan
    >


    --
    Best regards,
    Dmitry Adamushko
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: migration thread and active_load_balance

    On 21/04/2008, Dan Upton wrote:
    > On Mon, Apr 21, 2008 at 7:03 AM, Dmitry Adamushko
    >
    > wrote:
    >
    > > On 21/04/2008, Dan Upton wrote:
    > > > [ ... ]

    > >
    > > >
    > > > kernel BUG at kernel/sched.c:2103

    > >
    > > and what's this line in your patched sched.c?
    > >
    > > is it -- BUG_ON(!irqs_disabled()); ?
    > >
    > > anything in your unposted code (e.g. find_coolest_cpu()) that might
    > > re-enable the interrupts before __migration_task() is called?
    > >
    > > If you post your modifications as a patch
    > > (Documentation/applying-patches.txt) that contains _all_ relevant
    > > modifications, it'd be easier to guess what's wrong.

    >
    >
    > Yes, that's the line. I don't recall ever reenabling interrupts,


    migration_thread() -> find_coolest_cpu() -> get_temperature() ->
    rdmsr_on_cpu() -> [ if your configuration is SMP ] ->
    smp_call_function_single() ->

    (arch/x86/kernel/smpcommon.c)
    ....
    if (cpu == me) {
    local_irq_disable();
    func(info);
    local_irq_enable(); <----------- REENABLES the interrupts
    put_cpu();
    return 0;
    }
    ....

    as a result, __migrate_task() -> double_rq_lock() -> BUG_ON(!irqs_disabled())
    gives you an "oops".


    >
    > -dan
    >


    --
    Best regards,
    Dmitry Adamushko
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread