RAID1 rebuild time question - Storage

This is a discussion on RAID1 rebuild time question - Storage ; I have a RAID1 with a Silicon Image SATA controller sil3114. Often, my PC bluescreens (that caused data corruption on my previous HD so I switched to RAID1). The SATARAID utility reports in such cases always an "event" (exceeding S.M.A.R.T ...

+ Reply to Thread
Results 1 to 12 of 12

Thread: RAID1 rebuild time question

  1. RAID1 rebuild time question

    I have a RAID1 with a Silicon Image SATA controller sil3114.

    Often, my PC bluescreens (that caused data corruption on my previous HD so I
    switched to RAID1).
    The SATARAID utility reports in such cases always an "event" (exceeding
    S.M.A.R.T status) and starts rebuilding.
    Incidentally, the same happens when copying a few hundred thousand files,
    after about half an hour there seems to be so much overheating that the
    S.M.A.R.T status is exceeded.

    Anyway. The rebuild rate has been set to "fastest" but it still takes 24
    hours (or more!) to rebuild the mirror.
    Is that normal?

    And, more importantly, what happens when there is another crash that
    corrupts the *other* HDD during the rebuilding process?
    (This has already happened several times..)

    Some more questions: Can I add more disks to that card (which has 4 SATA
    connectors). Doesn't seem so.

    And this rebuilding, why is always the entire HD rebuilt and not the sector
    that is deemed corrupted?
    Is rebuilding so slow to avoid overheating or taking away to many system
    resources?
    Can the rebuilding be sped up by a registry hack or something?

    Could it be that mismatching RAM timing ratings are responsible for my many
    bluescreens that cause HD corruption?

    (Newly installed WinXP SP2, virus free machine).

    TIA,
    Frank




  2. Re: RAID1 rebuild time question

    On Thu, 10 Mar 2005 18:07:49 +0100, "Frank de Groot"
    wrote:

    >I have a RAID1 with a Silicon Image SATA controller sil3114.
    >
    >Often, my PC bluescreens (that caused data corruption on my previous HD so I
    >switched to RAID1).
    >The SATARAID utility reports in such cases always an "event" (exceeding
    >S.M.A.R.T status) and starts rebuilding.
    >Incidentally, the same happens when copying a few hundred thousand files,
    >after about half an hour there seems to be so much overheating that the
    >S.M.A.R.T status is exceeded.
    >
    >Anyway. The rebuild rate has been set to "fastest" but it still takes 24
    >hours (or more!) to rebuild the mirror.
    >Is that normal?
    >
    >And, more importantly, what happens when there is another crash that
    >corrupts the *other* HDD during the rebuilding process?
    >(This has already happened several times..)
    >
    >Some more questions: Can I add more disks to that card (which has 4 SATA
    >connectors). Doesn't seem so.
    >
    >And this rebuilding, why is always the entire HD rebuilt and not the sector
    >that is deemed corrupted?
    >Is rebuilding so slow to avoid overheating or taking away to many system
    >resources?
    >Can the rebuilding be sped up by a registry hack or something?
    >
    >Could it be that mismatching RAM timing ratings are responsible for my many
    >bluescreens that cause HD corruption?
    >
    >(Newly installed WinXP SP2, virus free machine).
    >
    >TIA,
    >Frank
    >
    >

    The event means your disk is is dying, perhaps it is out of spare
    sectors for reallocation. Replace the disk.


  3. Re: RAID1 rebuild time question

    "David A.Lethe" wrote in message
    news:19d231d1sreoe9ouk9pda6vvp9cda6g1f2@4ax.com...
    >
    > The event means your disk is is dying, perhaps it is out of spare
    > sectors for reallocation. Replace the disk.


    I forgot to mention that all disks are brand new, almost the same serial #s
    and verified OK.
    When it happened the first time I traded the disk for another new one. Same
    problem.

    I have 5 new disks, any combination of two in a RAID1 shows the same
    problem.

    The same problem occurs with any other disk, not in a RAID1, but that causes
    the disk to be corrupted so much that finally I loose data. So I gather it's
    the MB, as replacing the OS (2000 to XP) did not do the trick either.

    Still I wonder why the rebuild times are so slow with SATARAID (or the
    sil3114).





  4. Re: RAID1 rebuild time question

    > I have a RAID1 with a Silicon Image SATA controller sil3114.
    >
    > Often, my PC bluescreens (that caused data corruption on my previous HD so I


    Try using Windows software RAID instead. Maybe this is a bug in the
    controller's driver.

    --
    Maxim Shatskih, Windows DDK MVP
    StorageCraft Corporation
    maxim@storagecraft.com
    http://www.storagecraft.com



  5. Re: RAID1 rebuild time question

    "Frank de Groot" wrote in message
    news:HCmYd.1683$SL4.30777@news4.e.nsc.no...

    FYI, I now kno what it is.
    One of my disks just DIED on me (meaning Windows could not do a delayed
    write any more and the disk was gone from the Admin tools).

    Then an AV with a message from MS saying that a certaain HP driver needed
    updating urgently or I could get a damaged system.
    I went to the HP site and there they said that the buggy driver could
    irrepairably damage the bootsector, lead to loss of files making it
    inevitable that the OS needed to be reinstalled etc. Some nice mess..

    The craziest part is that the name of the app that wreaks all this havoc
    (and has been doing for the past 2 year apparently, since I bought a
    scanner) is called: "Memories to CD" or something
    Damage suffered: many thousands of USD and many weeks of delays and many
    lost file over the years.
    I wish people wouldn't force-install that crap with scanners and printers
    nowadays.



  6. Re: RAID1 rebuild time question

    On Thu, 10 Mar 2005 18:07:49 +0100, "Frank de Groot"
    wrote:

    >I have a RAID1 with a Silicon Image SATA controller sil3114.
    >
    >Often, my PC bluescreens (that caused data corruption on my previous HD so I
    >switched to RAID1).
    >The SATARAID utility reports in such cases always an "event" (exceeding
    >S.M.A.R.T status) and starts rebuilding.
    >Incidentally, the same happens when copying a few hundred thousand files,
    >after about half an hour there seems to be so much overheating that the
    >S.M.A.R.T status is exceeded.


    First off, mirroring will not protect against write corruption.
    Whatever it writes to one it writes to the other. And a blusescreen
    would likely be a write corruption since most of the OS runs out of
    memory bypassing reads. Unless you get a BSOD on boot or shortly
    after login. Then maybe...

    >
    >Anyway. The rebuild rate has been set to "fastest" but it still takes 24
    >hours (or more!) to rebuild the mirror.
    >Is that normal?


    You don't mention the size of the drives but in most cases yes. FC
    drives can take less but most sata drives are 24 hours, some as long
    as 36.

    >
    >And, more importantly, what happens when there is another crash that
    >corrupts the *other* HDD during the rebuilding process?
    >(This has already happened several times..)
    >
    >Some more questions: Can I add more disks to that card (which has 4 SATA
    >connectors). Doesn't seem so.


    If you only have 2 drives on there now then you can add 2 more.
    Hopefully "connectors" really means channels. If not you're only
    going to halve your speed per drive since a channel can only handle so
    much througput.

    >
    >And this rebuilding, why is always the entire HD rebuilt and not the sector
    >that is deemed corrupted?
    >Is rebuilding so slow to avoid overheating or taking away to many system
    >resources?
    >Can the rebuilding be sped up by a registry hack or something?


    There is what's called "sick disk recovery" where valid data on a
    drive will be copied off to the spare, but I highly doubt your card
    has that.
    Rebuilding a mirror can seriously impact performance on a 2 drive
    system. It may be purposeful, it may just be the limit of the drive.
    Hardware raid rebuilds by blocks, not files. So if you only have 24gb
    of data on the drive it's not rebuilding 24gb, it's rebuilding all
    120/260/430gb (whatever) worth of blocks on the drive.

    >
    >Could it be that mismatching RAM timing ratings are responsible for my many
    >bluescreens that cause HD corruption?


    I would not think timing mismatch would be an issue, more likely bad
    segments of memory if you suspect memory for some reason. It could
    also be the card. Wouldn't be the first time a raid controller went
    to crap slowly.

    ~F

  7. Re: RAID1 rebuild time question

    > First off, mirroring will not protect against write corruption.

    I was afraid of that.
    Thanks for your answer BTW.

    > You don't mention the size of the drives but in most cases yes. FC
    > drives can take less but most sata drives are 24 hours, some as long
    > as 36.


    Indeed it takes up to 36 hours for 120 GB drives.

    > If you only have 2 drives on there now then you can add 2 more.
    > Hopefully "connectors" really means channels.


    I meant to make a RAID12 that contains 3 drives + 1 hot spare instead of 2
    drives + 1 hot spare.

    > There is what's called "sick disk recovery" where valid data on a
    > drive will be copied off to the spare, but I highly doubt your card
    > has that.


    No, this card (Silicon Image) doesn't even speed up reading from a RAID1.
    It's software-based.

    > I would not think timing mismatch would be an issue, more likely bad
    > segments of memory if you suspect memory for some reason.


    Will do a thorought test, thanks.

    > It could
    > also be the card. Wouldn't be the first time a raid controller went
    > to crap slowly.


    I am so serious about this ongoing issue that I bought two RAID controller
    cards.
    It's not the card, it happens on other drives as well, and other MOBO's as
    well..
    It is either a drriver, or the RAM.
    In a previous post I mentioned an error message after a disk crash I got (I
    got this a few times before and ignored it..)
    The message + HP explanation at their site, in my words, amounted to: "There
    is a Hewlett-Packard program on your system "Sweet Memories To Disk", that
    can totally corrupt your harddisk so that you would have to wipe it clean
    and re-install everything - please download this update". Well I have done
    it now..

    No kidding.





  8. Re: RAID1 rebuild time question

    (C) Copyright 2005 Frank A. de Groot - All rights reserved.

    > Will do a thorought test, thanks.



    I removed 2 of 3 DIMMS and now the disks run like a charm instead of
    immediately reporting errors, even under a severe stresstest.
    Looks like the cause was a faulty DIMM.



  9. Re: RAID1 rebuild time question

    On Sat, 12 Mar 2005 12:48:49 +0100 in comp.arch.storage, "Frank de
    Groot" wrote:

    >I removed 2 of 3 DIMMS and now the disks run like a charm instead of
    >immediately reporting errors, even under a severe stresstest.
    >Looks like the cause was a faulty DIMM.


    May not be a faulty DIMM, the fault may be installing 3 DIMMs: most
    systems nowadays will only run with 2 normal spec DIMMs, unless all
    DIMMs are registered or tested: check your motherboard manufacturer's
    web site for DIMM specs, and what specific brands and models are
    allowed when you have more than 2 installed.

    --
    Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

    Brian.Inglis@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
    fake address use address above to reply

  10. Re: RAID1 rebuild time question

    "Brian Inglis" wrote in message
    news:nk9631topovlrjfh5ghvsceiiqu7cehvmu@4ax.com...

    > May not be a faulty DIMM, the fault may be installing 3 DIMMs: most
    > systems nowadays will only run with 2 normal spec DIMMs, unless all
    > DIMMs are registered or tested:


    You have the answer!
    I had 2 "paired" DIMMs and one "rogue" with slightly different timings.
    I could never find anything really *wrong* with that "odd" one, but when I
    leave that one out, the system works like a charm.



  11. Re: RAID1 rebuild time question


    Frank de Groot:

    > "Brian Inglis" wrote in message
    > news:nk9631topovlrjfh5ghvsceiiqu7cehvmu@4ax.com...


    >> May not be a faulty DIMM, the fault may be installing 3 DIMMs: most
    >> systems nowadays will only run with 2 normal spec DIMMs, unless all
    >> DIMMs are registered or tested:


    > You have the answer!
    > I had 2 "paired" DIMMs and one "rogue" with slightly different timings.
    > I could never find anything really *wrong* with that "odd" one, but when I
    > leave that one out, the system works like a charm.


    If you want to make sure that your memory is OK now, go to
    http://www.memtest.org/ or to http://www.memtest86.com/ and use the
    memory testing tools there for further verification. They catch a lot
    of hardware problems.

    --

    Joerg Lenneis

    email: lenneis@wu-wien.ac.at

  12. Re: RAID1 rebuild time question

    Frank de Groot wrote:
    >"Brian Inglis" wrote in message
    >news:nk9631topovlrjfh5ghvsceiiqu7cehvmu@4ax.com...
    >
    >> May not be a faulty DIMM, the fault may be installing 3 DIMMs: most
    >> systems nowadays will only run with 2 normal spec DIMMs, unless all
    >> DIMMs are registered or tested:

    >
    >You have the answer!
    >I had 2 "paired" DIMMs and one "rogue" with slightly different timings.
    >I could never find anything really *wrong* with that "odd" one, but when I
    >leave that one out, the system works like a charm.


    There are a multitude of possible causes for this.

    It could the differences between the DIMMs that causes this, it might
    not handle that many "sides" or it might require slowing down memory
    accesses with that many "sides" (standard DIMMs can have one or two
    sides).

    Personally I'd suspect that the most likely cause is the difference
    between the DIMMs, and that either rearranging the DIMMs (so that it
    sees the "slower" first, unless they have *different* slow parameters.
    Which way it scans is totally undocumented, and it SHOULD query all of
    them, but in reallity this does help surprisingly often) or manually
    settting down the memory speed slightly (CAS, speed or one of the
    other parameteres).

    The second most likely cause is that you need to reduce memory speed
    due to too many sides, the reason I list this as less likely is that
    this is something that the BIOS almost always gets right without help.
    But check the manual and see what it says about memory.

    I usually use Memtest86+ to verify that it works. It takes time (the
    longer the better, let it run overnight), and isn't guaranteed to
    always catch all errors, but it's fairly good.

    It also shows what memory speed settings are used, so you can easily
    see if the setting changes as you rotate the three dimms (three
    tests).

    (The original Memtest86 is also pretty good and they now trade
    information back and forth, I find the + version still to be better).

    http://www.memtest.org/

+ Reply to Thread