[PATCH 000 of 10] md: Various bug fixes and small improvements for md in 2.6.26-rc - Kernel

This is a discussion on [PATCH 000 of 10] md: Various bug fixes and small improvements for md in 2.6.26-rc - Kernel ; Following are a collection of 10 patches for md/raid that are suitable for 2.6.26-rc. They are ordered roughly from simple to more complex with serious bugfixes possibly getting elevated in the sort order. Thanks, NeilBrown [PATCH 000 of 10] md: ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: [PATCH 000 of 10] md: Various bug fixes and small improvements for md in 2.6.26-rc

  1. [PATCH 000 of 10] md: Various bug fixes and small improvements for md in 2.6.26-rc

    Following are a collection of 10 patches for md/raid that are suitable
    for 2.6.26-rc. They are ordered roughly from simple to more complex with
    serious bugfixes possibly getting elevated in the sort order.

    Thanks,
    NeilBrown

    [PATCH 000 of 10] md: Introduction EXPLAIN PATCH SET HERE
    [PATCH 001 of 10] md: Fix possible oops when removing a bitmap from an active array
    [PATCH 002 of 10] md: proper extern for mdp_major
    [PATCH 003 of 10] md: kill file_path wrapper
    [PATCH 004 of 10] md: md: raid5 rate limit error printk
    [PATCH 005 of 10] md: raid1: Fix restoration of bio between failed read and write.
    [PATCH 006 of 10] md: Notify userspace on 'write-pending' changes to array_state
    [PATCH 007 of 10] md: notify userspace on 'stop' events
    [PATCH 008 of 10] md: Improve setting of "events_cleared" for write-intent bitmaps.
    [PATCH 009 of 10] md: Allow parallel resync of md-devices.
    [PATCH 010 of 10] md: Restart recovery cleanly after device failure.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. [PATCH 006 of 10] md: Notify userspace on 'write-pending' changes to array_state


    When an array enters write pending, 'array_state' changes, so we
    must be sure to sysfs_notify.

    Also, when waiting for user-space to acknowledge 'write-pending' by
    marking the metadata as dirty, we don't want to wait for
    MD_CHANGE_DEVS to be cleared as that might not happen. So explicity
    test for the bits that we are really interested in.

    Signed-off-by: Neil Brown

    ### Diffstat output
    ./drivers/md/md.c | 11 ++++++++++-
    1 file changed, 10 insertions(+), 1 deletion(-)

    diff .prev/drivers/md/md.c ./drivers/md/md.c
    --- .prev/drivers/md/md.c 2008-05-19 11:02:35.000000000 +1000
    +++ ./drivers/md/md.c 2008-05-19 11:03:43.000000000 +1000
    @@ -5435,8 +5435,11 @@ void md_write_start(mddev_t *mddev, stru
    md_wakeup_thread(mddev->thread);
    }
    spin_unlock_irq(&mddev->write_lock);
    + sysfs_notify(&mddev->kobj, NULL, "array_state");
    }
    - wait_event(mddev->sb_wait, mddev->flags==0);
    + wait_event(mddev->sb_wait,
    + !test_bit(MD_CHANGE_CLEAN, &mddev->flags) &&
    + !test_bit(MD_CHANGE_PENDING, &mddev->flags));
    }

    void md_write_end(mddev_t *mddev)
    @@ -5471,6 +5474,12 @@ void md_allow_write(mddev_t *mddev)
    mddev->safemode = 1;
    spin_unlock_irq(&mddev->write_lock);
    md_update_sb(mddev, 0);
    +
    + sysfs_notify(&mddev->kobj, NULL, "array_state");
    + /* wait for the dirty state to be recorded in the metadata */
    + wait_event(mddev->sb_wait,
    + !test_bit(MD_CHANGE_CLEAN, &mddev->flags) &&
    + !test_bit(MD_CHANGE_PENDING, &mddev->flags));
    } else
    spin_unlock_irq(&mddev->write_lock);
    }
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. [PATCH 002 of 10] md: proper extern for mdp_major


    From: Adrian Bunk

    This patch adds a proper extern for mdp_major in include/linux/raid/md.h

    Signed-off-by: Adrian Bunk
    Signed-off-by: Neil Brown

    ### Diffstat output
    ./include/linux/raid/md.h | 2 ++
    ./init/do_mounts_md.c | 1 -
    2 files changed, 2 insertions(+), 1 deletion(-)

    diff .prev/include/linux/raid/md.h ./include/linux/raid/md.h
    --- .prev/include/linux/raid/md.h 2008-05-19 11:02:06.000000000 +1000
    +++ ./include/linux/raid/md.h 2008-05-19 11:02:24.000000000 +1000
    @@ -72,6 +72,8 @@
    */
    #define MD_PATCHLEVEL_VERSION 3

    +extern int mdp_major;
    +
    extern int register_md_personality (struct mdk_personality *p);
    extern int unregister_md_personality (struct mdk_personality *p);
    extern mdk_thread_t * md_register_thread (void (*run) (mddev_t *mddev),

    diff .prev/init/do_mounts_md.c ./init/do_mounts_md.c
    --- .prev/init/do_mounts_md.c 2008-05-19 11:02:06.000000000 +1000
    +++ ./init/do_mounts_md.c 2008-05-19 11:02:24.000000000 +1000
    @@ -24,7 +24,6 @@ static struct {

    static int md_setup_ents __initdata;

    -extern int mdp_major;
    /*
    * Parse the command-line parameters given our kernel, but do not
    * actually try to invoke the MD device now; that is handled by
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. [PATCH 004 of 10] md: md: raid5 rate limit error printk


    From: Bernd Schubert

    last night we had scsi problems and a hardware raid
    unit was offlined during heavy i/o. While this happened we got for
    about 3 minutes a huge number messages like these

    Apr 12 03:36:07 pfs1n14 kernel: [197510.696595] raid5:md7: read error not correctable (sector 2993096568 on sdj2).

    I guess the high error rate is responsible for not scheduling other
    events - during this time the system was not pingable and in the end
    also other devices run into scsi command timeouts causing problems on
    these unrelated devices as well.

    Signed-off-by: Bernd Schubert
    Signed-off-by: Dan Williams
    Signed-off-by: Neil Brown

    ### Diffstat output
    ./drivers/md/raid5.c | 34 ++++++++++++++++++++++------------
    1 file changed, 22 insertions(+), 12 deletions(-)

    diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
    --- .prev/drivers/md/raid5.c 2008-05-19 11:02:09.000000000 +1000
    +++ ./drivers/md/raid5.c 2008-05-19 11:02:44.000000000 +1000
    @@ -94,6 +94,8 @@
    #define __inline__
    #endif

    +#define printk_rl(args...) ((void) (printk_ratelimit() && printk(args)))
    +
    #if !RAID6_USE_EMPTY_ZERO_PAGE
    /* In .bss so it's zeroed */
    const char raid6_empty_zero_page[PAGE_SIZE] __attribute__((aligned(256)));
    @@ -1143,10 +1145,12 @@ static void raid5_end_read_request(struc
    set_bit(R5_UPTODATE, &sh->dev[i].flags);
    if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
    rdev = conf->disks[i].rdev;
    - printk(KERN_INFO "raid5:%s: read error corrected (%lu sectors at %llu on %s)\n",
    - mdname(conf->mddev), STRIPE_SECTORS,
    - (unsigned long long)(sh->sector + rdev->data_offset),
    - bdevname(rdev->bdev, b));
    + printk_rl(KERN_INFO "raid5:%s: read error corrected"
    + " (%lu sectors at %llu on %s)\n",
    + mdname(conf->mddev), STRIPE_SECTORS,
    + (unsigned long long)(sh->sector
    + + rdev->data_offset),
    + bdevname(rdev->bdev, b));
    clear_bit(R5_ReadError, &sh->dev[i].flags);
    clear_bit(R5_ReWrite, &sh->dev[i].flags);
    }
    @@ -1160,16 +1164,22 @@ static void raid5_end_read_request(struc
    clear_bit(R5_UPTODATE, &sh->dev[i].flags);
    atomic_inc(&rdev->read_errors);
    if (conf->mddev->degraded)
    - printk(KERN_WARNING "raid5:%s: read error not correctable (sector %llu on %s).\n",
    - mdname(conf->mddev),
    - (unsigned long long)(sh->sector + rdev->data_offset),
    - bdn);
    + printk_rl(KERN_WARNING
    + "raid5:%s: read error not correctable "
    + "(sector %llu on %s).\n",
    + mdname(conf->mddev),
    + (unsigned long long)(sh->sector
    + + rdev->data_offset),
    + bdn);
    else if (test_bit(R5_ReWrite, &sh->dev[i].flags))
    /* Oh, no!!! */
    - printk(KERN_WARNING "raid5:%s: read error NOT corrected!! (sector %llu on %s).\n",
    - mdname(conf->mddev),
    - (unsigned long long)(sh->sector + rdev->data_offset),
    - bdn);
    + printk_rl(KERN_WARNING
    + "raid5:%s: read error NOT corrected!! "
    + "(sector %llu on %s).\n",
    + mdname(conf->mddev),
    + (unsigned long long)(sh->sector
    + + rdev->data_offset),
    + bdn);
    else if (atomic_read(&rdev->read_errors)
    > conf->max_nr_stripes)

    printk(KERN_WARNING
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. [PATCH 007 of 10] md: notify userspace on 'stop' events


    From: Dan Williams

    This additional notification to 'array_state' is needed to allow the monitor
    application to learn about stop events via sysfs. The
    sysfs_notify("sync_action") call that comes at the end of do_md_stop() (via
    md_new_event) is insufficient since the 'sync_action' attribute has been
    removed by this point.

    (Seems like a sysfs-notify-on-removal patch is a better fix. Currently removal
    updates the event count but does not wake up waiters)

    Signed-off-by: Dan Williams
    Signed-off-by: Neil Brown

    ### Diffstat output
    ./drivers/md/md.c | 2 ++
    1 file changed, 2 insertions(+)

    diff .prev/drivers/md/md.c ./drivers/md/md.c
    --- .prev/drivers/md/md.c 2008-05-19 11:03:43.000000000 +1000
    +++ ./drivers/md/md.c 2008-05-19 11:03:47.000000000 +1000
    @@ -3691,6 +3691,8 @@ static int do_md_stop(mddev_t * mddev, i

    module_put(mddev->pers->owner);
    mddev->pers = NULL;
    + /* tell userspace to handle 'inactive' */
    + sysfs_notify(&mddev->kobj, NULL, "array_state");

    set_capacity(disk, 0);
    mddev->changed = 1;
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. [PATCH 003 of 10] md: kill file_path wrapper


    From: Christoph Hellwig

    Kill the trivial and rather pointless file_path wrapper around d_path.


    Signed-off-by: Christoph Hellwig
    Signed-off-by: Neil Brown

    ### Diffstat output
    ./drivers/md/bitmap.c | 17 ++++-------------
    ./drivers/md/md.c | 4 ++--
    ./include/linux/raid/bitmap.h | 1 -
    3 files changed, 6 insertions(+), 16 deletions(-)

    diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
    --- .prev/drivers/md/bitmap.c 2008-05-19 11:02:05.000000000 +1000
    +++ ./drivers/md/bitmap.c 2008-05-19 11:02:35.000000000 +1000
    @@ -203,17 +203,6 @@ static void bitmap_checkfree(struct bitm
    * bitmap file handling - read and write the bitmap file and its superblock
    */

    -/* copy the pathname of a file to a buffer */
    -char *file_path(struct file *file, char *buf, int count)
    -{
    - if (!buf)
    - return NULL;
    -
    - buf = d_path(&file->f_path, buf, count);
    -
    - return IS_ERR(buf) ? NULL : buf;
    -}
    -
    /*
    * basic page I/O operations
    */
    @@ -721,11 +710,13 @@ static void bitmap_file_kick(struct bitm
    if (bitmap->file) {
    path = kmalloc(PAGE_SIZE, GFP_KERNEL);
    if (path)
    - ptr = file_path(bitmap->file, path, PAGE_SIZE);
    + ptr = d_path(&bitmap->file->f_path, path,
    + PAGE_SIZE);
    +

    printk(KERN_ALERT
    "%s: kicking failed bitmap file %s from array!\n",
    - bmname(bitmap), ptr ? ptr : "");
    + bmname(bitmap), IS_ERR(ptr) ? "" : ptr);

    kfree(path);
    } else

    diff .prev/drivers/md/md.c ./drivers/md/md.c
    --- .prev/drivers/md/md.c 2008-05-19 11:02:08.000000000 +1000
    +++ ./drivers/md/md.c 2008-05-19 11:02:35.000000000 +1000
    @@ -3987,8 +3987,8 @@ static int get_bitmap_file(mddev_t * mdd
    if (!buf)
    goto out;

    - ptr = file_path(mddev->bitmap->file, buf, sizeof(file->pathname));
    - if (!ptr)
    + ptr = d_path(&mddev->bitmap->file->f_path, buf, sizeof(file->pathname));
    + if (IS_ERR(ptr))
    goto out;

    strcpy(file->pathname, ptr);

    diff .prev/include/linux/raid/bitmap.h ./include/linux/raid/bitmap.h
    --- .prev/include/linux/raid/bitmap.h 2008-05-19 11:02:05.000000000 +1000
    +++ ./include/linux/raid/bitmap.h 2008-05-19 11:02:35.000000000 +1000
    @@ -262,7 +262,6 @@ int bitmap_create(mddev_t *mddev);
    void bitmap_flush(mddev_t *mddev);
    void bitmap_destroy(mddev_t *mddev);

    -char *file_path(struct file *file, char *buf, int count);
    void bitmap_print_sb(struct bitmap *bitmap);
    void bitmap_update_sb(struct bitmap *bitmap);

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. [PATCH 009 of 10] md: Allow parallel resync of md-devices.


    From: Bernd Schubert

    In some configurations, a raid6 resync can be limited by CPU speed
    (Calculating P and Q and moving data) rather than by device speed.
    In these cases there is nothing to be gained byt serialising resync
    of arrays that share a device, and doing the resync in parallel can
    provide benefit.
    So add a sysfs tunable to flag an array as being allowed to
    resync in parallel with other arrays that use (a different part of)
    the same device.


    Signed-off-by: Bernd Schubert
    Signed-off-by: Neil Brown

    ### Diffstat output
    ./drivers/md/md.c | 40 ++++++++++++++++++++++++++++++++++++----
    ./include/linux/raid/md_k.h | 3 +++
    2 files changed, 39 insertions(+), 4 deletions(-)

    diff .prev/drivers/md/md.c ./drivers/md/md.c
    --- .prev/drivers/md/md.c 2008-05-19 11:03:47.000000000 +1000
    +++ ./drivers/md/md.c 2008-05-19 11:04:07.000000000 +1000
    @@ -74,6 +74,8 @@ static DEFINE_SPINLOCK(pers_lock);

    static void md_print_devices(void);

    +static DECLARE_WAIT_QUEUE_HEAD(resync_wait);
    +
    #define MD_BUG(x...) { printk("md: bug in file %s, line %d\n", __FILE__, __LINE__); md_print_devices(); }

    /*
    @@ -3013,6 +3015,36 @@ degraded_show(mddev_t *mddev, char *page
    static struct md_sysfs_entry md_degraded = __ATTR_RO(degraded);

    static ssize_t
    +sync_force_parallel_show(mddev_t *mddev, char *page)
    +{
    + return sprintf(page, "%d\n", mddev->parallel_resync);
    +}
    +
    +static ssize_t
    +sync_force_parallel_store(mddev_t *mddev, const char *buf, size_t len)
    +{
    + long n;
    +
    + if (strict_strtol(buf, 10, &n))
    + return -EINVAL;
    +
    + if (n != 0 && n != 1)
    + return -EINVAL;
    +
    + mddev->parallel_resync = n;
    +
    + if (mddev->sync_thread)
    + wake_up(&resync_wait);
    +
    + return len;
    +}
    +
    +/* force parallel resync, even with shared block devices */
    +static struct md_sysfs_entry md_sync_force_parallel =
    +__ATTR(sync_force_parallel, S_IRUGO|S_IWUSR,
    + sync_force_parallel_show, sync_force_parallel_store);
    +
    +static ssize_t
    sync_speed_show(mddev_t *mddev, char *page)
    {
    unsigned long resync, dt, db;
    @@ -3187,6 +3219,7 @@ static struct attribute *md_redundancy_a
    &md_sync_min.attr,
    &md_sync_max.attr,
    &md_sync_speed.attr,
    + &md_sync_force_parallel.attr,
    &md_sync_completed.attr,
    &md_max_sync.attr,
    &md_suspend_lo.attr,
    @@ -5487,8 +5520,6 @@ void md_allow_write(mddev_t *mddev)
    }
    EXPORT_SYMBOL_GPL(md_allow_write);

    -static DECLARE_WAIT_QUEUE_HEAD(resync_wait);
    -
    #define SYNC_MARKS 10
    #define SYNC_MARK_STEP (3*HZ)
    void md_do_sync(mddev_t *mddev)
    @@ -5552,8 +5583,9 @@ void md_do_sync(mddev_t *mddev)
    for_each_mddev(mddev2, tmp) {
    if (mddev2 == mddev)
    continue;
    - if (mddev2->curr_resync &&
    - match_mddev_units(mddev,mddev2)) {
    + if (!mddev->parallel_resync
    + && mddev2->curr_resync
    + && match_mddev_units(mddev, mddev2)) {
    DEFINE_WAIT(wq);
    if (mddev < mddev2 && mddev->curr_resync == 2) {
    /* arbitrarily yield */

    diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h
    --- .prev/include/linux/raid/md_k.h 2008-05-19 11:02:06.000000000 +1000
    +++ ./include/linux/raid/md_k.h 2008-05-19 11:04:07.000000000 +1000
    @@ -180,6 +180,9 @@ struct mddev_s
    int sync_speed_min;
    int sync_speed_max;

    + /* resync even though the same disks are shared among md-devices */
    + int parallel_resync;
    +
    int ok_start_degraded;
    /* recovery/resync flags
    * NEEDED: we might need to start a resync/recover
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread