latest -git: kernel BUG at arch/x86/kernel/microcode.c:142! - Kernel

This is a discussion on latest -git: kernel BUG at arch/x86/kernel/microcode.c:142! - Kernel ; Hi, I just got this when doing CPU hotplug: ------------[ cut here ]------------ kernel BUG at arch/x86/kernel/microcode.c:142! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Pid: 4140, comm: bash Not tainted (2.6.26-06371-g338b9bb-dirty #14) EIP: 0060:[ ] EFLAGS: 00210202 CPU: 0 EIP ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: latest -git: kernel BUG at arch/x86/kernel/microcode.c:142!

  1. latest -git: kernel BUG at arch/x86/kernel/microcode.c:142!

    Hi,

    I just got this when doing CPU hotplug:

    ------------[ cut here ]------------
    kernel BUG at arch/x86/kernel/microcode.c:142!
    invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC

    Pid: 4140, comm: bash Not tainted (2.6.26-06371-g338b9bb-dirty #14)
    EIP: 0060:[] EFLAGS: 00210202 CPU: 0
    EIP is at __mc_sysdev_add+0x1ee/0x200
    EAX: 00000000 EBX: c1f61028 ECX: 01798000 EDX: c081ac80
    ESI: 00000001 EDI: 00000001 EBP: f5bcbe24 ESP: f5bcbdcc
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process bash (pid: 4140, ti=f5bca000 task=f4066f90 task.ti=f5bca000)
    Stack: 00000000 f5bcbe24 c028300b 00000001 000000d0 c06d8dc3 f73f77d0 00000000
    00000000 00000014 00000000 00000000 c0829254 f4f0fa00 f6e950f0 00200282
    f6d5180c 00000002 00000003 00000002 00000001 c1f61028 f5bcbe2c c0117f3a
    Call Trace:
    [] ? kobject_uevent_env+0xdb/0x380
    [] ? mc_sysdev_add+0xa/0x10
    [] ? mc_cpu_callback+0x1ea/0x240
    [] ? notifier_call_chain+0x37/0x70
    [] ? __raw_notifier_call_chain+0x19/0x20
    [] ? raw_notifier_call_chain+0x1a/0x20
    [] ? _cpu_up+0xa7/0x100
    [] ? cpu_up+0x49/0x80
    [] ? store_online+0x58/0x80
    [] ? store_online+0x0/0x80
    [] ? sysdev_store+0x2c/0x40
    [] ? sysfs_write_file+0xa2/0x100
    [] ? vfs_write+0x96/0x130
    [] ? sysfs_write_file+0x0/0x100
    [] ? sys_write+0x3d/0x70
    [] ? sysenter_do_call+0x12/0x3f
    =======================
    Code: 4d d8 c7 01 00 00 00 00 b8 00 1a 6f c0 e8 fb 46 47 00 8d 55 f0
    64 a1 00 90 7c c0 e8 0d 75 01 00 8b 45 d4 83 c4 4c 5b 5e 5f 5d c3 <0f>
    0b eb fe 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 31 d2
    EIP: [] __mc_sysdev_add+0x1ee/0x200 SS:ESP 0068:f5bcbdcc
    ---[ end trace 8c86c730d90bf362 ]---

    It's this one:

    /* We should bind the task to the CPU */
    BUG_ON(raw_smp_processor_id() != cpu_num);

    Maybe related to recently merged per-cpu changes? (Yesterday's tests ran fine.)

    It seems 100% reproducible, so I'll start bisecting it.


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: latest -git: kernel BUG at arch/x86/kernel/microcode.c:142!

    On Thu, Jul 24, 2008 at 12:48 PM, Vegard Nossum wrote:
    > Hi,
    >
    > I just got this when doing CPU hotplug:
    >
    > ------------[ cut here ]------------
    > kernel BUG at arch/x86/kernel/microcode.c:142!
    > invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    >
    > Pid: 4140, comm: bash Not tainted (2.6.26-06371-g338b9bb-dirty #14)
    > EIP: 0060:[] EFLAGS: 00210202 CPU: 0
    > EIP is at __mc_sysdev_add+0x1ee/0x200
    > EAX: 00000000 EBX: c1f61028 ECX: 01798000 EDX: c081ac80
    > ESI: 00000001 EDI: 00000001 EBP: f5bcbe24 ESP: f5bcbdcc
    > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    > Process bash (pid: 4140, ti=f5bca000 task=f4066f90 task.ti=f5bca000)
    > Stack: 00000000 f5bcbe24 c028300b 00000001 000000d0 c06d8dc3 f73f77d0 00000000
    > 00000000 00000014 00000000 00000000 c0829254 f4f0fa00 f6e950f0 00200282
    > f6d5180c 00000002 00000003 00000002 00000001 c1f61028 f5bcbe2c c0117f3a
    > Call Trace:
    > [] ? kobject_uevent_env+0xdb/0x380
    > [] ? mc_sysdev_add+0xa/0x10
    > [] ? mc_cpu_callback+0x1ea/0x240
    > [] ? notifier_call_chain+0x37/0x70
    > [] ? __raw_notifier_call_chain+0x19/0x20
    > [] ? raw_notifier_call_chain+0x1a/0x20
    > [] ? _cpu_up+0xa7/0x100
    > [] ? cpu_up+0x49/0x80
    > [] ? store_online+0x58/0x80
    > [] ? store_online+0x0/0x80
    > [] ? sysdev_store+0x2c/0x40
    > [] ? sysfs_write_file+0xa2/0x100
    > [] ? vfs_write+0x96/0x130
    > [] ? sysfs_write_file+0x0/0x100
    > [] ? sys_write+0x3d/0x70
    > [] ? sysenter_do_call+0x12/0x3f
    > =======================
    > Code: 4d d8 c7 01 00 00 00 00 b8 00 1a 6f c0 e8 fb 46 47 00 8d 55 f0
    > 64 a1 00 90 7c c0 e8 0d 75 01 00 8b 45 d4 83 c4 4c 5b 5e 5f 5d c3 <0f>
    > 0b eb fe 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 31 d2
    > EIP: [] __mc_sysdev_add+0x1ee/0x200 SS:ESP 0068:f5bcbdcc
    > ---[ end trace 8c86c730d90bf362 ]---
    >
    > It's this one:
    >
    > /* We should bind the task to the CPU */
    > BUG_ON(raw_smp_processor_id() != cpu_num);
    >
    > Maybe related to recently merged per-cpu changes? (Yesterday's tests ran fine.)
    >
    > It seems 100% reproducible, so I'll start bisecting it.


    Ahha, after many hours of hitting various unrelated crashes,
    miscompiles, etc. I finally arrive at this commit:

    commit e761b7725234276a802322549cee5255305a0930
    Author: Max Krasnyansky
    Date: Tue Jul 15 04:43:49 2008 -0700

    cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment
    (take 2)

    This is based on Linus' idea of creating cpu_active_map that prevents
    scheduler load balancer from migrating tasks to the cpu that is going
    down.

    It allows us to simplify domain management code and avoid unecessary
    domain rebuilds during cpu hotplug event handling.

    Please ignore the cpusets part for now. It needs some more work in order
    to avoid crazy lock nesting. Although I did simplfy and unify domain
    reinitialization logic. We now simply call partition_sched_domains() in
    all the cases. This means that we're using exact same code paths as in
    cpusets case and hence the test below cover cpusets too.
    Cpuset changes to make rebuild_sched_domains() callable from various
    contexts are in the separate patch (right next after this one).

    This not only boots but also easily handles
    while true; do make clean; make -j 8; done
    and
    while true; do on-off-cpu 1; done
    at the same time.
    (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).

    Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
    this on right now in gnome-terminal and things are moving just fine.

    Also this is running with most of the debug features enabled (lockdep,
    mutex, etc) no BUG_ONs or lockdep complaints so far.

    I believe I addressed all of the Dmitry's comments for original Linus'
    version. I changed both fair and rt balancer to mask out non-active cpus.
    And replaced cpu_is_offline() with !cpu_active() in the main scheduler
    code where it made sense (to me).

    Signed-off-by: Max Krasnyanskiy
    Acked-by: Linus Torvalds
    Acked-by: Peter Zijlstra
    Acked-by: Gregory Haskins
    Cc: dmitry.adamushko@gmail.com
    Cc: pj@sgi.com
    Signed-off-by: Ingo Molnar

    ....I added everybody to Cc.


    Vegard

    --
    "The animistic metaphor of the bug that maliciously sneaked in while
    the programmer was not looking is intellectually dishonest as it
    disguises that the error is the programmer's own creation."
    -- E. W. Dijkstra, EWD1036
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: latest -git: kernel BUG at arch/x86/kernel/microcode.c:142!

    2008/7/24 Vegard Nossum :
    > On Thu, Jul 24, 2008 at 12:48 PM, Vegard Nossum wrote:
    >> Hi,
    >>
    >> I just got this when doing CPU hotplug:
    >>
    >> ------------[ cut here ]------------
    >> kernel BUG at arch/x86/kernel/microcode.c:142!
    >> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    >>
    >> Pid: 4140, comm: bash Not tainted (2.6.26-06371-g338b9bb-dirty #14)
    >> EIP: 0060:[] EFLAGS: 00210202 CPU: 0
    >> EIP is at __mc_sysdev_add+0x1ee/0x200
    >> EAX: 00000000 EBX: c1f61028 ECX: 01798000 EDX: c081ac80
    >> ESI: 00000001 EDI: 00000001 EBP: f5bcbe24 ESP: f5bcbdcc
    >> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    >> Process bash (pid: 4140, ti=f5bca000 task=f4066f90 task.ti=f5bca000)
    >> Stack: 00000000 f5bcbe24 c028300b 00000001 000000d0 c06d8dc3 f73f77d0 00000000
    >> 00000000 00000014 00000000 00000000 c0829254 f4f0fa00 f6e950f0 00200282
    >> f6d5180c 00000002 00000003 00000002 00000001 c1f61028 f5bcbe2c c0117f3a
    >> Call Trace:
    >> [] ? kobject_uevent_env+0xdb/0x380
    >> [] ? mc_sysdev_add+0xa/0x10
    >> [] ? mc_cpu_callback+0x1ea/0x240
    >> [] ? notifier_call_chain+0x37/0x70
    >> [] ? __raw_notifier_call_chain+0x19/0x20
    >> [] ? raw_notifier_call_chain+0x1a/0x20
    >> [] ? _cpu_up+0xa7/0x100
    >> [] ? cpu_up+0x49/0x80
    >> [] ? store_online+0x58/0x80
    >> [] ? store_online+0x0/0x80
    >> [] ? sysdev_store+0x2c/0x40
    >> [] ? sysfs_write_file+0xa2/0x100
    >> [] ? vfs_write+0x96/0x130
    >> [] ? sysfs_write_file+0x0/0x100
    >> [] ? sys_write+0x3d/0x70
    >> [] ? sysenter_do_call+0x12/0x3f
    >> =======================
    >> Code: 4d d8 c7 01 00 00 00 00 b8 00 1a 6f c0 e8 fb 46 47 00 8d 55 f0
    >> 64 a1 00 90 7c c0 e8 0d 75 01 00 8b 45 d4 83 c4 4c 5b 5e 5f 5d c3 <0f>
    >> 0b eb fe 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 31 d2
    >> EIP: [] __mc_sysdev_add+0x1ee/0x200 SS:ESP 0068:f5bcbdcc
    >> ---[ end trace 8c86c730d90bf362 ]---
    >>
    >> It's this one:
    >>
    >> /* We should bind the task to the CPU */
    >> BUG_ON(raw_smp_processor_id() != cpu_num);
    >>
    >> Maybe related to recently merged per-cpu changes? (Yesterday's tests ran fine.)
    >>
    >> It seems 100% reproducible, so I'll start bisecting it.

    >
    > Ahha, after many hours of hitting various unrelated crashes,
    > miscompiles, etc. I finally arrive at this commit:
    >
    > commit e761b7725234276a802322549cee5255305a0930
    > Author: Max Krasnyansky
    > Date: Tue Jul 15 04:43:49 2008 -0700


    Yeah, there seems to be a funny situation here :-) I'd expect it to be
    100% reproduceable with CONFIG_MICROCODE=y.

    cpu_up() -> raw_notifier_call_chain(CPU_ONLINE, ...) ->

    (microcode's part)

    mc_cpu_callback() -> mc_sysdev_add() -> microcode_init_cpu()

    and here we have:

    set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
    mutex_lock(&microcode_mutex);
    collect_cpu_info(cpu);

    this code expects that after set_cpus_allowed_ptr() has been
    completed, it will continue running on "cpu"

    that's why BUG_ON(raw_smp_processor_id() != cpu_num);

    the funny thing is that (1) it doesn't check for an error (otherwise
    it would see an error)
    and (2) cpu_active_map does _not_ yet have a bit for 'cpu' at this moment.

    so migrate_task() will forward a migration request to migration_thread
    (because 'current' is on-the-queue/running at this point and we can't
    migrate it immediatelly -- current gets blocked inside migrate_task()
    waiting for request's completion)

    it all will end up in migration_thread() -> __migrate_task()
    which does a test for cpu_active(dest_cpu) and bails out.

    summary, with cpu_active_map as it's being used now this microcode's
    scheme (the fact that it expects to be migrated onto 'cpu' while its
    cpu_up(cpu) is not completely finished) doesn't work.

    note, I've only taken a quick look so I don't make any judgements,
    (good-bad)design-wise. But it's quite a funny use-case of
    cpu-hotplug-notifications and CPU_ONLINE in particular :-)


    --
    Best regards,
    Dmitry Adamushko
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. references:in-reply-to:content-type:

    Dmitry Adamushko wrote:
    > 2008/7/24 Dmitry Adamushko :
    >> 2008/7/24 Vegard Nossum :
    >>>> It's this one:
    >>>>
    >>>> /* We should bind the task to the CPU */
    >>>> BUG_ON(raw_smp_processor_id() != cpu_num);
    >>>>
    >>>> Maybe related to recently merged per-cpu changes? (Yesterday's tests ran fine.)
    >>>>
    >>>> It seems 100% reproducible, so I'll start bisecting it.
    >>> Ahha, after many hours of hitting various unrelated crashes,
    >>> miscompiles, etc. I finally arrive at this commit:
    >>>
    >>> commit e761b7725234276a802322549cee5255305a0930
    >>> Author: Max Krasnyansky
    >>> Date: Tue Jul 15 04:43:49 2008 -0700

    >> Yeah, there seems to be a funny situation here :-) I'd expect it to be
    >> 100% reproduceable with CONFIG_MICROCODE=y.
    >>
    >> cpu_up() -> raw_notifier_call_chain(CPU_ONLINE, ...) ->
    >>
    >> (microcode's part)
    >>
    >> mc_cpu_callback() -> mc_sysdev_add() -> microcode_init_cpu()
    >>
    >> and here we have:
    >>
    >> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));

    >
    > btw., this is obviously bad behavior. This code plays with
    > "cpus_allowed" (changes and then restores it) of pretty arbitrary
    > tasks in context of which it happens to run. So it may race with
    > sched_setaffinity() and negate its effect.


    Agree. I came to the similar conclusion.
    The solution is to either convert it to schedule_delayed_work_on() or if it's
    important to update the microcode synchronously we can the whole thing to do
    something like
    smp_call_function_single(cpu, collect_cpu_info);
    if (needs_update) {
    request_firmware(...);
    smp_call_function_single(cpu, update_cpu_microcode);
    }

    Tigran, do we need sync update inside the hotplug handler or async update via
    workqueue is fine ?

    Max


    Max







    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread