Problems with Limux software RAID after OS upgrade (long) - Setup

This is a discussion on Problems with Limux software RAID after OS upgrade (long) - Setup ; Having some trouble with Linux software RAID after an OS update, and would be grateful for any insights. Machine is an AMD 64-bit PC running 32-bit Linux. The machine was previously running Fedora Core 4 with no problems. Two 500GB ...

+ Reply to Thread
Results 1 to 2 of 2

Thread: Problems with Limux software RAID after OS upgrade (long)

  1. Problems with Limux software RAID after OS upgrade (long)


    Having some trouble with Linux software RAID after an OS update, and
    would be grateful for any insights.

    Machine is an AMD 64-bit PC running 32-bit Linux. The machine was
    previously running Fedora Core 4 with no problems. Two 500GB hard
    drives were added to the onboard Promise controller and the Promise
    section of the machine's BIOS configured for JBOD.

    On boot, as expected, two new SCSI disk devices could be seen - sda and
    sdb. These were partitioned using fdisk, a single partition occupying
    the entire disk created, and the partition type set to 0xfd (Linux RAID
    autodetect).

    mdadm was used to create a RAID1 (mirror) using /dev/sda and /dev/sdb.
    I can't remember for certain if I used the raw devices (/dev/sda) or the
    partitions (/dev/sda1) to create the array, and my notes aren't clear.
    The resulting RAID device, /dev/md0, had an ext3 filesystem created on
    it and was mounted on a mount point. /etc/fstab was edited to mount
    /dev/md0 on boot.

    This arrangement worked well until recently, when the root partition on
    the (separate) boot drive was trashed and Fedora Core 6 installed by
    someone else, so I have only their version of events to go by. The
    array did not reappear after FC6 was installed. The /etc/raidtab and/or
    /dev/mdadm.conf files were not preserved, so I am working blind to
    reassemble and remount the array.

    Now things are confused. The way Linux software RAID works seems to
    have changed in FC6. On boot, dmraid is run by rc.sysinit and discovers
    the two members of the array OK and mounts it on
    /dev/mapper/pdc_eejidjjag, where pdc_eejidjjag is the array's name:

    [root@linuxbox root]# dmraid -r
    /dev/sda: pdc, "pdc_eejidjjag", mirror, ok, 976562500 sectors, data@ 0
    /dev/sdb: pdc, "pdc_eejidjjag", mirror, ok, 976562500 sectors, data@ 0

    [root@linuxbox root]# dmraid -ay -v
    INFO: Activating mirror RAID set "pdc_eejidjjag"
    ERROR: dos: partition address past end of RAID device

    [root@linuxbox root]# ls -l /dev/mapper/
    total 0
    crw------- 1 root root 10, 63 Jul 5 16:59 control
    brw-rw---- 1 root disk 253, 0 Jul 6 03:11 pdc_eejidjjag

    [root@linuxbox root]# fdisk -l /dev/mapper/pdc_eejidjjag

    Disk /dev/mapper/pdc_eejidjjag: 500.0 GB, 500000000000 bytes
    255 heads, 63 sectors/track, 60788 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

    Device /dev/mapper/pdc_eejidjjag1
    Boot
    Start 1
    End 60801
    Blocks 488384001
    Id fd
    System Linux raid autodetect

    I cannot mount /dev/mapper/pdc_eejidjjag1:

    [root@linuxbox root]# mount -v -t auto /dev/mapper/pdc_eejidjjag1
    /mnt/test
    mount: you didn't specify a filesystem type for
    /dev/mapper/pdc_eejidjjag1
    I will try all types mentioned in /etc/filesystems or
    /proc/filesystems
    Trying hfsplus
    mount: special device /dev/mapper/pdc_eejidjjag1 does not exist

    'fdisk -l /dev/mapper/pdc_eejidjjag' shows that one partition of type
    0xfd (Linux raid autodetect) is filling the disk. Surely this should be
    type 0x83, since the device is the RAIDed disk as presented to the user?
    And why does mount say the device /dev/mapper/pdc_eejidjjag1 does not
    exist?

    This may be due to my unfamiliarity with dmraid. I can find little
    about it on the internet. I'm uncertain if it is meant to be used in
    conjunction with mdadm, or whether it's either/or. In the past, Linux
    software RAID has Just Worked for me using mdadm.

    If I disregard dmraid, disabling the array with 'dmraid -an /dev/md0'
    and use the more familiar mdadm instead, first checking with fdisk that
    the disks have the correct RAID autodetect partitions:

    [root@linuxbox root]# fdisk -l /dev/sda

    Disk /dev/sda: 500.1 GB, 500107862016 bytes
    255 heads, 63 sectors/track, 60801 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot Start End Blocks Id System
    /dev/sda1 1 60801 488384001 fd Linux raid
    autodetect

    [root@linuxbox root]# fdisk -l /dev/sda

    Disk /dev/sda: 500.1 GB, 500107862016 bytes
    255 heads, 63 sectors/track, 60801 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot Start End Blocks Id System
    /dev/sda1 1 60801 488384001 fd Linux raid
    autodetect

    then try to assemble the RAID with those, it fails:

    [root@linuxbox root]# mdadm -v --assemble /dev/md0 /dev/sda1 /dev/sdb1
    mdadm: looking for devices for /dev/md0
    mdadm: cannot open device /dev/sda1: No such device or address
    mdadm: /dev/sda1 has no superblock - assembly aborted

    Perhaps I should be using the raw devices?

    [root@linuxbox root]# mdadm -v --assemble /dev/md0 /dev/sda /dev/sdb
    mdadm: looking for devices for /dev/md0
    mdadm: /dev/sda is identified as a member of /dev/md0, slot 0.
    mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1.
    mdadm: added /dev/sdb to /dev/md0 as 1
    mdadm: added /dev/sda to /dev/md0 as 0
    mdadm: /dev/md0 has been started with 2 drives.

    [root@linuxbox root]# mdadm -E /dev/sda
    /dev/sda:
    Magic : a92b4efc
    Version : 00.90.01
    UUID : c4344083:a8d8cf32:3f00e0db:8765b21b
    Creation Time : Thu Mar 22 15:26:52 2007
    Raid Level : raid1
    Device Size : 488386496 (465.76 GiB 500.11 GB)
    Array Size : 488386496 (465.76 GiB 500.11 GB)
    Raid Devices : 2
    Total Devices : 2
    Preferred Minor : 0

    Update Time : Thu Jul 5 16:58:02 2007
    State : clean
    Active Devices : 2
    Working Devices : 2
    Failed Devices : 0
    Spare Devices : 0
    Checksum : 864ad759 - correct
    Events : 0.4


    Number Major Minor RaidDevice State
    this 0 8 0 0 active sync /dev/sda

    0 0 8 0 0 active sync /dev/sda
    1 1 8 16 1 active sync /dev/sdb

    [root@linuxbox root]# mdadm -E /dev/sdb
    /dev/sdb:
    Magic : a92b4efc
    Version : 00.90.01
    UUID : c4344083:a8d8cf32:3f00e0db:8765b21b
    Creation Time : Thu Mar 22 15:26:52 2007
    Raid Level : raid1
    Device Size : 488386496 (465.76 GiB 500.11 GB)
    Array Size : 488386496 (465.76 GiB 500.11 GB)
    Raid Devices : 2
    Total Devices : 2
    Preferred Minor : 0

    Update Time : Thu Jul 5 16:58:02 2007
    State : clean
    Active Devices : 2
    Working Devices : 2
    Failed Devices : 0
    Spare Devices : 0
    Checksum : 864ad76b - correct
    Events : 0.4


    Number Major Minor RaidDevice State
    this 1 8 16 1 active sync /dev/sdb

    0 0 8 0 0 active sync /dev/sda
    1 1 8 16 1 active sync /dev/sdb

    so that looks OK. Let's see what /dev/md0 looks like:

    [root@linuxbox root]# fdisk -l /dev/md0

    Disk /dev/md0: 500.1 GB, 500107771904 bytes
    255 heads, 63 sectors/track, 60801 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot Start End Blocks Id System
    /dev/md0p1 1 60801 488384001 fd Linux raid
    autodetect

    That doesn't look right; I would have expected to see a partition of
    type 0x83, since /dev/md0p1 is the RAID as presented to the user
    according to fdisk. Trying to mount it anyway:

    [root@linuxbox root]# mount -v -t auto /dev/md0 /mnt/test
    mount: you didn't specify a filesystem type for /dev/md0
    I will try all types mentioned in /etc/filesystems or
    /proc/filesystems
    Trying hfsplus
    mount: you must specify the filesystem type

    [root@linuxbox root]# mount -v -t auto /dev/md0p1 /mnt/test
    mount: you didn't specify a filesystem type for /dev/md0p1
    I will try all types mentioned in /etc/filesystems or
    /proc/filesystems
    Trying hfsplus
    mount: special device /dev/md0p1 does not exist

    mdadm --examine /dev/sd* shows both members of the array as correct,
    with the same serial number. "cat /proc/mdstat" shows the array as
    complete and OK with two members as expected.

    /proc/mdstat shows:

    [root@linuxbox root]# cat /proc/mdstat
    Personalities : [raid1]
    md0 : active raid1 sda[0] sdb[1]
    488386496 blocks [2/2] [UU]

    unused devices:

    I'm confused. I can't find much information on dmraid; the man page
    seems to imply that it's for use with hardware RAID controllers, and I
    don't know if I should be using that or mdadm, or both. Previously I
    just used mdadm and everything Just Worked.

    I don't know why assembling and starting the array doesn't present the
    contents of the md device as expected, and why fdisk shows special
    devices in /dev which the mount command says don't exist.

    The user of the machine is getting worried as there's a lot of data on
    this array, and of course, he has no backup.

    I'm at the point of taking the disks out and trying them in a machine
    running FC4. Any ideas or suggestions please before I do that?

    --
    (\__/) Bunny says NO to Windows Vista!
    (='.'=) http://www.cs.auckland.ac.nz/~pgut00...ista_cost.html
    (")_(")


  2. Re: Problems with Limux software RAID after OS upgrade (long)

    In comp.sys.ibm.pc.hardware.storage Mike Tomlinson wrote:

    > Having some trouble with Linux software RAID after an OS update, and
    > would be grateful for any insights.


    > Machine is an AMD 64-bit PC running 32-bit Linux. The machine was
    > previously running Fedora Core 4 with no problems. Two 500GB hard
    > drives were added to the onboard Promise controller and the Promise
    > section of the machine's BIOS configured for JBOD.


    I assume that is individual disks, instead of the JBOD "RAID" mode?

    > On boot, as expected, two new SCSI disk devices could be seen - sda and
    > sdb. These were partitioned using fdisk, a single partition occupying
    > the entire disk created, and the partition type set to 0xfd (Linux RAID
    > autodetect).


    Ok.

    > mdadm was used to create a RAID1 (mirror) using /dev/sda and /dev/sdb.
    > I can't remember for certain if I used the raw devices (/dev/sda) or the
    > partitions (/dev/sda1) to create the array, and my notes aren't clear.


    That is important. With partitions the RAID would start automatically
    because of type 0xfd. With whole drives it woulrd not and require
    some start script. Also the partitioning left on the disks if you
    used the whole disk will confuse RAID auto-detectors.

    > The resulting RAID device, /dev/md0, had an ext3 filesystem created on
    > it and was mounted on a mount point. /etc/fstab was edited to mount
    > /dev/md0 on boot.


    ok.

    > This arrangement worked well until recently, when the root partition on
    > the (separate) boot drive was trashed and Fedora Core 6 installed by
    > someone else, so I have only their version of events to go by. The
    > array did not reappear after FC6 was installed. The /etc/raidtab and/or
    > /dev/mdadm.conf files were not preserved, so I am working blind to
    > reassemble and remount the array.


    Should not be a problem. If you try to reassemble, any part not having
    a valid RAID signature will be rejected.

    > Now things are confused. The way Linux software RAID works seems to
    > have changed in FC6. On boot, dmraid is run by rc.sysinit and discovers
    > the two members of the array OK and mounts it on
    > /dev/mapper/pdc_eejidjjag, where pdc_eejidjjag is the array's name:


    Hmmm. From what I can see dmraid is not intended for normal
    software RAID, but rather for fakeRAID controllers (software
    RAID done by BIOS code). It may also be able to handle normal
    software RAID, but I have never used it.

    > [root@linuxbox root]# dmraid -r
    > /dev/sda: pdc, "pdc_eejidjjag", mirror, ok, 976562500 sectors, data@ 0
    > /dev/sdb: pdc, "pdc_eejidjjag", mirror, ok, 976562500 sectors, data@ 0


    > [root@linuxbox root]# dmraid -ay -v
    > INFO: Activating mirror RAID set "pdc_eejidjjag"
    > ERROR: dos: partition address past end of RAID device


    > [root@linuxbox root]# ls -l /dev/mapper/
    > total 0
    > crw------- 1 root root 10, 63 Jul 5 16:59 control
    > brw-rw---- 1 root disk 253, 0 Jul 6 03:11 pdc_eejidjjag


    > [root@linuxbox root]# fdisk -l /dev/mapper/pdc_eejidjjag


    > Disk /dev/mapper/pdc_eejidjjag: 500.0 GB, 500000000000 bytes
    > 255 heads, 63 sectors/track, 60788 cylinders
    > Units = cylinders of 16065 * 512 = 8225280 bytes


    > Device /dev/mapper/pdc_eejidjjag1
    > Boot
    > Start 1
    > End 60801
    > Blocks 488384001
    > Id fd
    > System Linux raid autodetect
    >
    > I cannot mount /dev/mapper/pdc_eejidjjag1:


    > [root@linuxbox root]# mount -v -t auto /dev/mapper/pdc_eejidjjag1
    > /mnt/test
    > mount: you didn't specify a filesystem type for
    > /dev/mapper/pdc_eejidjjag1
    > I will try all types mentioned in /etc/filesystems or
    > /proc/filesystems
    > Trying hfsplus
    > mount: special device /dev/mapper/pdc_eejidjjag1 does not exist


    > 'fdisk -l /dev/mapper/pdc_eejidjjag' shows that one partition of type
    > 0xfd (Linux raid autodetect) is filling the disk. Surely this should be
    > type 0x83, since the device is the RAIDed disk as presented to the user?
    > And why does mount say the device /dev/mapper/pdc_eejidjjag1 does not
    > exist?


    Because this works differently. The problem is that the check for
    partitions is done by the pernel. Itt seems thet it is done before
    assembly of the RAID array, and hence no partition discovery is done
    for it.

    > This may be due to my unfamiliarity with dmraid. I can find little
    > about it on the internet. I'm uncertain if it is meant to be used in
    > conjunction with mdadm, or whether it's either/or. In the past, Linux
    > software RAID has Just Worked for me using mdadm.


    By all means go back to mdadm. dmraid has no business being run
    automatically. The people that configured it that way screwed up IMO.

    > If I disregard dmraid, disabling the array with 'dmraid -an /dev/md0'
    > and use the more familiar mdadm instead, first checking with fdisk that
    > the disks have the correct RAID autodetect partitions:


    > [root@linuxbox root]# fdisk -l /dev/sda


    > Disk /dev/sda: 500.1 GB, 500107862016 bytes
    > 255 heads, 63 sectors/track, 60801 cylinders
    > Units = cylinders of 16065 * 512 = 8225280 bytes


    > Device Boot Start End Blocks Id System
    > /dev/sda1 1 60801 488384001 fd Linux raid
    > autodetect


    > [root@linuxbox root]# fdisk -l /dev/sda


    > Disk /dev/sda: 500.1 GB, 500107862016 bytes
    > 255 heads, 63 sectors/track, 60801 cylinders
    > Units = cylinders of 16065 * 512 = 8225280 bytes


    > Device Boot Start End Blocks Id System
    > /dev/sda1 1 60801 488384001 fd Linux raid
    > autodetect


    > then try to assemble the RAID with those, it fails:


    > [root@linuxbox root]# mdadm -v --assemble /dev/md0 /dev/sda1 /dev/sdb1
    > mdadm: looking for devices for /dev/md0
    > mdadm: cannot open device /dev/sda1: No such device or address
    > mdadm: /dev/sda1 has no superblock - assembly aborted


    > Perhaps I should be using the raw devices?


    > [root@linuxbox root]# mdadm -v --assemble /dev/md0 /dev/sda /dev/sdb
    > mdadm: looking for devices for /dev/md0
    > mdadm: /dev/sda is identified as a member of /dev/md0, slot 0.
    > mdadm: /dev/sdb is identified as a member of /dev/md0, slot 1.
    > mdadm: added /dev/sdb to /dev/md0 as 1
    > mdadm: added /dev/sda to /dev/md0 as 0
    > mdadm: /dev/md0 has been started with 2 drives.


    So you definitely used the whole devices (a mistake with software RAID
    IMO, but you can do it), and the partition tables are only left
    because they have not yet been overwritten. They do confuse the
    autodetection script, though.

    > [root@linuxbox root]# mdadm -E /dev/sda
    > /dev/sda:
    > Magic : a92b4efc
    > Version : 00.90.01
    > UUID : c4344083:a8d8cf32:3f00e0db:8765b21b
    > Creation Time : Thu Mar 22 15:26:52 2007
    > Raid Level : raid1
    > Device Size : 488386496 (465.76 GiB 500.11 GB)
    > Array Size : 488386496 (465.76 GiB 500.11 GB)
    > Raid Devices : 2
    > Total Devices : 2
    > Preferred Minor : 0


    > Update Time : Thu Jul 5 16:58:02 2007
    > State : clean
    > Active Devices : 2
    > Working Devices : 2
    > Failed Devices : 0
    > Spare Devices : 0
    > Checksum : 864ad759 - correct
    > Events : 0.4



    > Number Major Minor RaidDevice State
    > this 0 8 0 0 active sync /dev/sda


    > 0 0 8 0 0 active sync /dev/sda
    > 1 1 8 16 1 active sync /dev/sdb


    > [root@linuxbox root]# mdadm -E /dev/sdb
    > /dev/sdb:
    > Magic : a92b4efc
    > Version : 00.90.01
    > UUID : c4344083:a8d8cf32:3f00e0db:8765b21b
    > Creation Time : Thu Mar 22 15:26:52 2007
    > Raid Level : raid1
    > Device Size : 488386496 (465.76 GiB 500.11 GB)
    > Array Size : 488386496 (465.76 GiB 500.11 GB)
    > Raid Devices : 2
    > Total Devices : 2
    > Preferred Minor : 0


    > Update Time : Thu Jul 5 16:58:02 2007
    > State : clean
    > Active Devices : 2
    > Working Devices : 2
    > Failed Devices : 0
    > Spare Devices : 0
    > Checksum : 864ad76b - correct
    > Events : 0.4



    > Number Major Minor RaidDevice State
    > this 1 8 16 1 active sync /dev/sdb


    > 0 0 8 0 0 active sync /dev/sda
    > 1 1 8 16 1 active sync /dev/sdb


    > so that looks OK. Let's see what /dev/md0 looks like:


    > [root@linuxbox root]# fdisk -l /dev/md0


    > Disk /dev/md0: 500.1 GB, 500107771904 bytes
    > 255 heads, 63 sectors/track, 60801 cylinders
    > Units = cylinders of 16065 * 512 = 8225280 bytes


    > Device Boot Start End Blocks Id System
    > /dev/md0p1 1 60801 488384001 fd Linux raid
    > autodetect


    You do not have that partition! Unless you did partition /dev/md0?
    If not, this is leftover junk from your first partitioning that you
    then did not use. It confises dmraid and should be removed, see below.

    > That doesn't look right; I would have expected to see a partition of
    > type 0x83, since /dev/md0p1 is the RAID as presented to the user
    > according to fdisk. Trying to mount it anyway:


    > [root@linuxbox root]# mount -v -t auto /dev/md0 /mnt/test
    > mount: you didn't specify a filesystem type for /dev/md0
    > I will try all types mentioned in /etc/filesystems or
    > /proc/filesystems
    > Trying hfsplus
    > mount: you must specify the filesystem type


    > [root@linuxbox root]# mount -v -t auto /dev/md0p1 /mnt/test
    > mount: you didn't specify a filesystem type for /dev/md0p1
    > I will try all types mentioned in /etc/filesystems or
    > /proc/filesystems
    > Trying hfsplus
    > mount: special device /dev/md0p1 does not exist


    > mdadm --examine /dev/sd* shows both members of the array as correct,
    > with the same serial number. "cat /proc/mdstat" shows the array as
    > complete and OK with two members as expected.


    > /proc/mdstat shows:


    > [root@linuxbox root]# cat /proc/mdstat
    > Personalities : [raid1]
    > md0 : active raid1 sda[0] sdb[1]
    > 488386496 blocks [2/2] [UU]


    > unused devices:


    > I'm confused. I can't find much information on dmraid; the man page
    > seems to imply that it's for use with hardware RAID controllers, and I
    > don't know if I should be using that or mdadm, or both. Previously I
    > just used mdadm and everything Just Worked.


    > I don't know why assembling and starting the array doesn't present the
    > contents of the md device as expected,


    Why, but it does? You said that you created an ext3 on it, so why
    not just mount /dev/md0 directly? I think you have indeed gotten a
    bit confused (understandably. And maybe a bit panicked too...), and
    may have forgotten what you said at the top of this posting ;-)

    > and why fdisk shows special
    > devices in /dev which the mount command says don't exist.


    The mount command does say they exist. However it cannot ID the
    filesystem on them. No wonder, since there isn't one there.

    > The user of the machine is getting worried as there's a lot of data on
    > this array, and of course, he has no backup.


    Well, allways the same story. There is no excuse for not having
    backup...

    > I'm at the point of taking the disks out and trying them in a machine
    > running FC4. Any ideas or suggestions please before I do that?


    Mount /dev/md0 directly. It should have your ext3. However it is
    important that you remove the bogus partition table. Easiest way to do
    that is as follows:

    0. (Optionally) disable unhelpful dmraid boot script
    1. Get the thing to work again, then make full backup.
    2. Degrade the array by setting sdb as faulty
    3. remove sdb from array
    4. Partition sdb with one large partition of type 0xfb
    Reboot if fdisk could not get th kernel to reload the partition table.
    5. make a degraded raid 1 on /dev/sdb1 as md1 (specify the
    second disk as "missing" to mdadm)
    6. make filesystem on /dev/md1 and copy all data over from /dev/md0
    7. stop /dev/md0, and create similar partition to sdb on sda
    Reboot if fdisk told you it could not reload the patrtition table.
    8. Add /dev/sda1 to /dev/md1
    9. Adjust /etc/fstab as needed

    You should not have a partition on sda and one on sdb, both set to be
    auto-started as /dev/md1 by the kernel.

    BTW, you can do this whole operation with a Knoppix CD or memory stick,
    you just need to load the RAID kernel modules manually.

    Arno

+ Reply to Thread