2.6.28-rc3-git6: Reported regressions from 2.6.27 - Kernel

This is a discussion on 2.6.28-rc3-git6: Reported regressions from 2.6.27 - Kernel ; On Tue, Nov 11, 2008 at 05:14:01PM +0100, Heiko Carstens wrote: > > > Could you please apply the following debug patch (due to Jiangshan and > > > myself)? Then you should be able to build with CONFIG_RCU_TRACE, > ...

+ Reply to Thread
Page 5 of 5 FirstFirst ... 3 4 5
Results 81 to 85 of 85

Thread: 2.6.28-rc3-git6: Reported regressions from 2.6.27

  1. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, Nov 11, 2008 at 05:14:01PM +0100, Heiko Carstens wrote:
    > > > Could you please apply the following debug patch (due to Jiangshan and
    > > > myself)? Then you should be able to build with CONFIG_RCU_TRACE,
    > > > then mount debugfs after boot, for example, on /debug. This will
    > > > create a /debug/rcu directory with three files, "rcucb", "rcu_data",
    > > > and "rcu_bh_data". Since you are still able to log in, could you
    > > > please send the contents of these three files?
    > > >
    > > > Thanx, Paul

    > >
    > > This time with the patch actually attached... Thanks to Peter Z.
    > > for alerting me to my omission.

    >
    > Well, your patch doesn't apply on git head. However I used preemptible
    > RCU instead and had tracing enabled.


    Were you using preemptible RCU earlier as well? Raphael was using
    classic RCU. Don't get me wrong, all problems need fixing, just trying
    to make sure I understand where the problems are occurring.

    > This is the output of the three files after it stalled (and continued,
    > because I caused an interrupt by sending a network packet) twice:
    >
    > [root@h0545001 rcu]# cat rcuctrs
    > CPU last cur F M
    > 1 0 0 1 1
    > 3 0 0 1 1
    > 4 0 0 0 0
    > 5 0 0 0 1
    > 6 0 0 0 0
    > ggp = 1640, state = waitack
    >
    > [root@h0545001 rcu]# cat rcugp
    > oldggp=1652 newggp=1655
    >
    > [root@h0545001 rcu]# cat rcustats
    > na=33948 nl=3 wa=33945 wl=0 da=33945 dl=0 dr=33945 di=0
    > 1=0 e1=0 i1=1674 ie1=4 g1=1670 a1=1920 ae1=251 a2=1669
    > z1=1669 ze1=0 z2=1669 m1=4411 me1=2742 m2=1669


    This hang also involved synchronize_sched()? Or synchronize_rcu()?

    The reason I ask is that the above stats are for the synchronize_rcu()
    rather than synchronize_sched().

    Thanx, Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: Q: force_quiescent_state && cpu_online_map

    On Tue, Nov 11, 2008 at 06:03:27PM +0100, Oleg Nesterov wrote:
    > I don't think this matters, but still...
    >
    > force_quiescent_state:
    >
    > * cpu_online_map is updated by the _cpu_down()
    > * using __stop_machine(). Since we're in irqs disabled
    > * section, __stop_machine() is not exectuting, hence
    > * the cpu_online_map is stable.
    > *
    > * However, a cpu might have been offlined _just_ before
    > * we disabled irqs while entering here.
    > * And rcu subsystem might not yet have handled the CPU_DEAD
    > * notification, leading to the offlined cpu's bit
    > * being set in the rcp->cpumask.
    > *
    > * Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent
    > * sending smp_reschedule() to an offlined CPU.
    > */
    > cpus_and(cpumask, rcp->cpumask, cpu_online_map);
    > cpu_clear(rdp->cpu, cpumask);
    > for_each_cpu_mask_nr(cpu, cpumask)
    > smp_send_reschedule(cpu);
    >
    > However,
    >
    > // called by __stop_machine take_cpu_down()
    > arch/x86/kernel/smpboot.c:cpu_disable_common()
    >
    > /*
    > * HACK:
    > * Allow any queued timer interrupts to get serviced
    > * This is only a temporary solution until we cleanup
    > * fixup_irqs as we do for IA64.
    > */
    > local_irq_enable();
    > mdelay(1);
    > local_irq_disable();
    > ...
    > remove_cpu_from_maps(cpu);
    >
    > So it is possible to send the ipi to the dying CPU. I know nothing
    > about this low-level irq code, most probably this is harmless. We
    > already did clear_local_APIC(), but I don't understand what it does.


    Indeed, some of the things I am doing as part of the hierarchical RCU
    implementation need to be applied to preemptable RCU. :-/

    Thanx, Paul
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tue, Nov 11, 2008 at 08:45:23AM -0800, Paul E. McKenney wrote:
    > On Tue, Nov 11, 2008 at 05:14:01PM +0100, Heiko Carstens wrote:
    > > > > Could you please apply the following debug patch (due to Jiangshan and
    > > > > myself)? Then you should be able to build with CONFIG_RCU_TRACE,
    > > > > then mount debugfs after boot, for example, on /debug. This will
    > > > > create a /debug/rcu directory with three files, "rcucb", "rcu_data",
    > > > > and "rcu_bh_data". Since you are still able to log in, could you
    > > > > please send the contents of these three files?
    > > > >
    > > > > Thanx, Paul
    > > >
    > > > This time with the patch actually attached... Thanks to Peter Z.
    > > > for alerting me to my omission.

    > >
    > > Well, your patch doesn't apply on git head. However I used preemptible
    > > RCU instead and had tracing enabled.

    >
    > Were you using preemptible RCU earlier as well? Raphael was using
    > classic RCU. Don't get me wrong, all problems need fixing, just trying
    > to make sure I understand where the problems are occurring.


    And here is a version of the patch rebased to linux-2.6 git head.

    This adds tracing to classic RCU.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Lai Jiangshan
    ---

    include/linux/rcuclassic.h | 4
    kernel/Kconfig.preempt | 1
    kernel/Makefile | 2
    kernel/rcuclassic.c | 5 -
    kernel/rcuclassic_trace.c | 198 +++++++++++++++++++++++++++++++++++++++++++++
    5 files changed, 207 insertions(+), 3 deletions(-)

    diff --git a/include/linux/rcuclassic.h b/include/linux/rcuclassic.h
    index 5f89b62..ce183a8 100644
    --- a/include/linux/rcuclassic.h
    +++ b/include/linux/rcuclassic.h
    @@ -63,6 +63,9 @@ struct rcu_ctrlblk {
    /* for current batch to proceed. */
    } ____cacheline_internodealigned_in_smp;

    +extern struct rcu_ctrlblk rcu_ctrlblk;
    +extern struct rcu_ctrlblk rcu_bh_ctrlblk;
    +
    /* Is batch a before batch b ? */
    static inline int rcu_batch_before(long a, long b)
    {
    @@ -81,6 +84,7 @@ struct rcu_data {
    long quiescbatch; /* Batch # for grace period */
    int passed_quiesc; /* User-mode/idle loop etc. */
    int qs_pending; /* core waits for quiesc state */
    + bool beenonline; /* CPU online at least once */

    /* 2) batch handling */
    /*
    diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
    index 9fdba03..ba32338 100644
    --- a/kernel/Kconfig.preempt
    +++ b/kernel/Kconfig.preempt
    @@ -68,7 +68,6 @@ config PREEMPT_RCU

    config RCU_TRACE
    bool "Enable tracing for RCU - currently stats in debugfs"
    - depends on PREEMPT_RCU
    select DEBUG_FS
    default y
    help
    diff --git a/kernel/Makefile b/kernel/Makefile
    index 9a3ec66..9771050 100644
    --- a/kernel/Makefile
    +++ b/kernel/Makefile
    @@ -79,6 +79,8 @@ obj-$(CONFIG_CLASSIC_RCU) += rcuclassic.o
    obj-$(CONFIG_PREEMPT_RCU) += rcupreempt.o
    ifeq ($(CONFIG_PREEMPT_RCU),y)
    obj-$(CONFIG_RCU_TRACE) += rcupreempt_trace.o
    +else
    +obj-$(CONFIG_RCU_TRACE) += rcuclassic_trace.o
    endif
    obj-$(CONFIG_RELAY) += relay.o
    obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
    diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
    index 37f72e5..54bd23b 100644
    --- a/kernel/rcuclassic.c
    +++ b/kernel/rcuclassic.c
    @@ -58,14 +58,14 @@ EXPORT_SYMBOL_GPL(rcu_lock_map);


    /* Definition for rcupdate control block. */
    -static struct rcu_ctrlblk rcu_ctrlblk = {
    +struct rcu_ctrlblk rcu_ctrlblk = {
    .cur = -300,
    .completed = -300,
    .pending = -300,
    .lock = __SPIN_LOCK_UNLOCKED(&rcu_ctrlblk.lock),
    .cpumask = CPU_MASK_NONE,
    };
    -static struct rcu_ctrlblk rcu_bh_ctrlblk = {
    +struct rcu_ctrlblk rcu_bh_ctrlblk = {
    .cur = -300,
    .completed = -300,
    .pending = -300,
    @@ -725,6 +725,7 @@ static void rcu_init_percpu_data(int cpu, struct rcu_ctrlblk *rcp,
    rdp->donetail = &rdp->donelist;
    rdp->quiescbatch = rcp->completed;
    rdp->qs_pending = 0;
    + rdp->beenonline = 1;
    rdp->cpu = cpu;
    rdp->blimit = blimit;
    spin_unlock_irqrestore(&rcp->lock, flags);
    diff --git a/kernel/rcuclassic_trace.c b/kernel/rcuclassic_trace.c
    new file mode 100644
    index 0000000..612170c
    --- /dev/null
    +++ b/kernel/rcuclassic_trace.c
    @@ -0,0 +1,198 @@
    +/*
    + * Read-Copy Update tracing for classic implementation
    + *
    + * This program is free software; you can redistribute it and/or modify
    + * it under the terms of the GNU General Public License as published by
    + * the Free Software Foundation; either version 2 of the License, or
    + * (at your option) any later version.
    + *
    + * This program is distributed in the hope that it will be useful,
    + * but WITHOUT ANY WARRANTY; without even the implied warranty of
    + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    + * GNU General Public License for more details.
    + *
    + * You should have received a copy of the GNU General Public License
    + * along with this program; if not, write to the Free Software
    + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
    + *
    + * Copyright IBM Corporation, 2008
    + *
    + * Updated to use seqfile by Lai Jiangshan.
    + *
    + * Papers: http://www.rdrop.com/users/paulmck/RCU
    + *
    + * For detailed explanation of Read-Copy Update mechanism see -
    + * Documentation/RCU
    + *
    + */
    +#include
    +#include
    +#include
    +#include
    +
    +/* Print out rcu_data structures using seqfile facility. */
    +
    +static struct rcu_data *get_rcu_data_bh(int cpu)
    +{
    + return &per_cpu(rcu_bh_data, cpu);
    +}
    +
    +static struct rcu_data *get_rcu_data(int cpu)
    +{
    + return &per_cpu(rcu_data, cpu);
    +}
    +
    +static int show_rcu_data(struct seq_file *m, void *v)
    +{
    + struct rcu_data *rdp = v;
    +
    + if (!rdp->beenonline)
    + return 0;
    +
    + seq_printf(m, "processor\t: %d", rdp->cpu);
    + if (cpu_is_offline(rdp->cpu))
    + seq_puts(m, "!\n");
    + else
    + seq_puts(m, "\n");
    + seq_printf(m, "quiescbatch\t: %ld\n", rdp->quiescbatch);
    + seq_printf(m, "batch\t\t: %ld\n", rdp->batch);
    + seq_printf(m, "passed_quiesc\t: %d\n", rdp->passed_quiesc);
    + seq_printf(m, "qs_pending\t: %d\n", rdp->qs_pending);
    + seq_printf(m, "qlen\t\t: %ld\n", rdp->qlen);
    + seq_printf(m, "blimit\t\t: %ld\n", rdp->blimit);
    + seq_puts(m, "\n");
    + return 0;
    +}
    +
    +static void *c_start(struct seq_file *m, loff_t *pos)
    +{
    + typedef struct rcu_data *(*get_data_func)(int);
    +
    + if (*pos == 0) /* just in case, cpu 0 is not the first */
    + *pos = first_cpu(cpu_possible_map);
    + else
    + *pos = next_cpu_nr(*pos - 1, cpu_possible_map);
    + if ((*pos) < nr_cpu_ids)
    + return ((get_data_func)m->private)(*pos);
    + return NULL;
    +}
    +
    +static void *c_next(struct seq_file *m, void *v, loff_t *pos)
    +{
    + (*pos)++;
    + return c_start(m, pos);
    +}
    +
    +static void c_stop(struct seq_file *m, void *v)
    +{
    +}
    +
    +const struct seq_operations rcu_data_seq_op = {
    + .start = c_start,
    + .next = c_next,
    + .stop = c_stop,
    + .show = show_rcu_data,
    +};
    +
    +static int rcu_data_open(struct inode *inode, struct file *file)
    +{
    + int ret = seq_open(file, &rcu_data_seq_op);
    +
    + if (ret)
    + return ret;
    + ((struct seq_file *)file->private_data)->private = inode->i_private;
    + return 0;
    +}
    +
    +static const struct file_operations rcu_data_fops = {
    + .owner = THIS_MODULE,
    + .open = rcu_data_open,
    + .read = seq_read,
    + .llseek = seq_lseek,
    + .release = seq_release,
    +};
    +
    +/* Print out rcu_ctrlblk structures using seqfile facility. */
    +
    +static void print_one_rcu_ctrlblk(struct seq_file *m, struct rcu_ctrlblk *rcp)
    +{
    + seq_printf(m, "cur=%ld completed=%ld pending=%d s=%d\n\t",
    + rcp->cur, rcp->completed, rcp->pending, rcp->signaled);
    + seq_cpumask(m, &rcp->cpumask);
    + seq_puts(m, "\n");
    +}
    +
    +static int show_rcucb(struct seq_file *m, void *unused)
    +{
    + seq_puts(m, "rcu: ");
    + print_one_rcu_ctrlblk(m, &rcu_ctrlblk);
    + seq_puts(m, "rcu_bh: ");
    + print_one_rcu_ctrlblk(m, &rcu_bh_ctrlblk);
    + seq_puts(m, "online: ");
    + seq_cpumask(m, &cpu_online_map);
    + seq_puts(m, "\n");
    + return 0;
    +}
    +
    +static int rcucb_open(struct inode *inode, struct file *file)
    +{
    + return single_open(file, show_rcucb, NULL);
    +}
    +
    +static struct file_operations rcucb_fops = {
    + .owner = THIS_MODULE,
    + .open = rcucb_open,
    + .read = seq_read,
    + .llseek = seq_lseek,
    + .release = single_release,
    +};
    +
    +static struct dentry *rcudir, *rcu_bh_data_file, *rcu_data_file, *rcucb_file;
    +
    +static int __init rcuclassic_trace_init(void)
    +{
    + rcudir = debugfs_create_dir("rcu", NULL);
    + if (!rcudir)
    + goto out;
    +
    + rcu_bh_data_file = debugfs_create_file("rcu_bh_data", 0444, rcudir,
    + get_rcu_data_bh, &rcu_data_fops);
    + if (!rcu_bh_data_file)
    + goto out_rcudir;
    +
    + rcu_data_file = debugfs_create_file("rcu_data", 0444, rcudir,
    + get_rcu_data, &rcu_data_fops);
    + if (!rcu_data_file)
    + goto out_rcudata_bh_file;
    +
    + rcucb_file = debugfs_create_file("rcucb", 0444, rcudir,
    + NULL, &rcucb_fops);
    + if (!rcucb_file)
    + goto out_rcudata_file;
    + return 0;
    +
    +out_rcudata_file:
    + debugfs_remove(rcu_data_file);
    +out_rcudata_bh_file:
    + debugfs_remove(rcu_bh_data_file);
    +out_rcudir:
    + debugfs_remove(rcudir);
    +out:
    + return 1;
    +}
    +
    +static void __exit rcuclassic_trace_cleanup(void)
    +{
    + debugfs_remove(rcucb_file);
    + debugfs_remove(rcu_data_file);
    + debugfs_remove(rcu_bh_data_file);
    + debugfs_remove(rcudir);
    +}
    +
    +module_init(rcuclassic_trace_init);
    +module_exit(rcuclassic_trace_cleanup);
    +
    +MODULE_AUTHOR("Paul E. McKenney");
    +MODULE_DESCRIPTION("Read-Copy Update tracing for classic implementation");
    +MODULE_LICENSE("GPL");
    +
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    2008/11/10 Rafael J. Wysocki :
    > On Monday, 10 of November 2008, Rafael J. Wysocki wrote:
    >> On Monday, 10 of November 2008, Heiko Carstens wrote:
    >> > On Sun, Nov 09, 2008 at 06:59:16PM +0100, Rafael J. Wysocki wrote:
    >> > > This message has been generated automatically as a part of a report
    >> > > of recent regressions.
    >> > >
    >> > > The following bug entry is on the current list of known regressions
    >> > > from 2.6.27. Please verify if it still should be listed and let me know
    >> > > (either way).
    >> > >
    >> > >
    >> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11989
    >> > > Subject : Suspend failure on NForce4-based boards due to chanes in stop_machine
    >> > > Submitter : Rafael J. Wysocki
    >> > > Date : 2008-11-03 0:28 (7 days old)
    >> > > First-Bad-Commit: http://git.kernel.org/?p=linux/kerne...3c0bde6c50d9cc
    >> > > References : http://marc.info/?l=linux-kernel&m=122567187604356&w=4
    >> >
    >> > Hi Rafael,

    >>
    >> Hi,
    >>
    >> > could you provide more informations for this, please?
    >> >
    >> > What is your kernel configuration?

    >>
    >> Available at: http://www.sisk.pl/kernel/debug/main...3/kitty-config
    >>
    >> > Do you have any binary only modules (nvidia?) loaded?

    >>
    >> No, I don't.
    >>
    >> > Is it possible to recreate the bug by e.g. just doing something like
    >> >
    >> > echo 0 > /sys/devices/system/cpu/cpu1/online

    >>
    >> I haven't checked (yet), I'll do that later today and let you know.
    >>
    >> > (or any other online cpu)? Or does it trigger any lockdep warnings?

    >
    > It cannot be reproduced with offlining CPU1 and it doesn't trigger any
    > warnings from lockdep.
    >
    > However, it is reproducible by doing
    >
    > # echo core > /sys/power/pm_test
    >
    > and repeating
    >
    > # echo disk > /sys/power/state
    >
    > for a couple of times, in which case the last two lines printed to the console
    > before a (solid) hang are:
    >
    > SMP alternatives: switching to SMP code
    > Booting processor 1 APIC 0x1 ip 0x6000
    >
    > So, it evidently fails while re-enabling the non-boot CPU and not during
    > disabling it as I thought before.


    Can you also provide the full log including the messages when a system
    goes down please?

    At first glance, "Botting processor..." as the last message looks
    strange in this context.
    So either wakeup_secondary_cpu()'s completion failed for some reason
    (say, due to some kind of a problem that took place while disabling
    non-boot cpus... I'm purely speculating here so far) or the printk's
    output was not complete.

    Perhaps, redoing the test with pr_debug() in arch/x86/kernel/smpboot.c
    enabled would shed more light...


    --
    Best regards,
    Dmitry Adamushko
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine

    On Tuesday, 11 of November 2008, Dmitry Adamushko wrote:
    > 2008/11/10 Rafael J. Wysocki :
    > > On Monday, 10 of November 2008, Rafael J. Wysocki wrote:
    > >> On Monday, 10 of November 2008, Heiko Carstens wrote:
    > >> > On Sun, Nov 09, 2008 at 06:59:16PM +0100, Rafael J. Wysocki wrote:
    > >> > > This message has been generated automatically as a part of a report
    > >> > > of recent regressions.
    > >> > >
    > >> > > The following bug entry is on the current list of known regressions
    > >> > > from 2.6.27. Please verify if it still should be listed and let me know
    > >> > > (either way).
    > >> > >
    > >> > >
    > >> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11989
    > >> > > Subject : Suspend failure on NForce4-based boards due to chanes in stop_machine
    > >> > > Submitter : Rafael J. Wysocki
    > >> > > Date : 2008-11-03 0:28 (7 days old)
    > >> > > First-Bad-Commit: http://git.kernel.org/?p=linux/kerne...3c0bde6c50d9cc
    > >> > > References : http://marc.info/?l=linux-kernel&m=122567187604356&w=4
    > >> >
    > >> > Hi Rafael,
    > >>
    > >> Hi,
    > >>
    > >> > could you provide more informations for this, please?
    > >> >
    > >> > What is your kernel configuration?
    > >>
    > >> Available at: http://www.sisk.pl/kernel/debug/main...3/kitty-config
    > >>
    > >> > Do you have any binary only modules (nvidia?) loaded?
    > >>
    > >> No, I don't.
    > >>
    > >> > Is it possible to recreate the bug by e.g. just doing something like
    > >> >
    > >> > echo 0 > /sys/devices/system/cpu/cpu1/online
    > >>
    > >> I haven't checked (yet), I'll do that later today and let you know.
    > >>
    > >> > (or any other online cpu)? Or does it trigger any lockdep warnings?

    > >
    > > It cannot be reproduced with offlining CPU1 and it doesn't trigger any
    > > warnings from lockdep.
    > >
    > > However, it is reproducible by doing
    > >
    > > # echo core > /sys/power/pm_test
    > >
    > > and repeating
    > >
    > > # echo disk > /sys/power/state
    > >
    > > for a couple of times, in which case the last two lines printed to the console
    > > before a (solid) hang are:
    > >
    > > SMP alternatives: switching to SMP code
    > > Booting processor 1 APIC 0x1 ip 0x6000
    > >
    > > So, it evidently fails while re-enabling the non-boot CPU and not during
    > > disabling it as I thought before.

    >
    > Can you also provide the full log including the messages when a system
    > goes down please?
    >
    > At first glance, "Botting processor..." as the last message looks
    > strange in this context.
    > So either wakeup_secondary_cpu()'s completion failed for some reason
    > (say, due to some kind of a problem that took place while disabling
    > non-boot cpus... I'm purely speculating here so far) or the printk's
    > output was not complete.
    >
    > Perhaps, redoing the test with pr_debug() in arch/x86/kernel/smpboot.c
    > enabled would shed more light...


    Will do tomorrow.

    Thanks,
    Rafael
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread
Page 5 of 5 FirstFirst ... 3 4 5