Raid level write verification - Storage

This is a discussion on Raid level write verification - Storage ; Is it true that Raid levels like 1, 1+0, 3, 4 verify on the fly that both the data write and the ecc write match while levels like raid 5 don't (with few exceptions on the very high end)? Is ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: Raid level write verification

  1. Raid level write verification

    Is it true that Raid levels like 1, 1+0, 3, 4 verify on the fly that
    both the data write and the ecc write match while levels like raid 5
    don't (with few exceptions on the very high end)? Is this dependant
    rather on make & model?

    TIA


  2. Re: Raid level write verification

    teckytim wrote:
    > Is it true that Raid levels like 1, 1+0, 3, 4 verify on the fly that
    > both the data write and the ecc write match while levels like raid 5
    > don't (with few exceptions on the very high end)?


    No. No RAID level *requires* any kind of read-after-write verification,
    though any RAID *implementation* could offer it as an additional feature.

    However, I think some RAID-3 implementations verify on the fly that the
    parity information matches the stripe being *read* (since that has no
    impact on performance, save for the CPU cycles required by the
    comparison and the bus cycles consumed by reading the parity). Though I
    don't recall that the accepted RAID-3 definition *requires* this.

    - bill

  3. Re: Raid level write verification


    Bill Todd wrote:
    > teckytim wrote:
    > > Is it true that Raid levels like 1, 1+0, 3, 4 verify on the fly

    that
    > > both the data write and the ecc write match while levels like raid

    5
    > > don't (with few exceptions on the very high end)?

    >
    > No. No RAID level *requires* any kind of read-after-write

    verification,
    > though any RAID *implementation* could offer it as an additional

    feature.
    >
    > However, I think some RAID-3 implementations verify on the fly that

    the
    > parity information matches the stripe being *read* (since that has no


    > impact on performance, save for the CPU cycles required by the
    > comparison and the bus cycles consumed by reading the parity).

    Though I
    > don't recall that the accepted RAID-3 definition *requires* this.
    >
    > - bill



    Thanks. I was afraid that was the answer as so many raid details are
    nonstandard or rather manufacturer specific.

    So if I *require* a raid implementation that does this, where do I have
    to look? Are there PCI card products (SATA & SCSI), raid boxes, or is
    this only available in the high end non-das raid like emc, etc.

    Are background media scans sufficient protection against failing/flaky
    media so the verify feature discussed above is not necessary?

    Thanks again.


  4. Re: Raid level write verification

    teckytim wrote:

    ....

    > So if I *require* a raid implementation that does this, where do I have
    > to look? Are there PCI card products (SATA & SCSI), raid boxes, or is
    > this only available in the high end non-das raid like emc, etc.


    Someone else here might know, but I don't.

    >
    > Are background media scans sufficient protection against failing/flaky
    > media so the verify feature discussed above is not necessary?
    >
    > Thanks again.
    >


    I don't think the two are all that closely related. All
    read-after-write does is verify that the data written was what you
    intended to write: while this does guard against very low-probability
    errors like silently-failing null writes or 'wild' writes (though with
    the latter you have to worry about what got clobbered as well), it isn't
    likely to be any kind of substitute for background 'scrubbing' to catch
    deteriorating sectors (which I think are orders of magnitude more likely
    than unheralded write failures, but that's just my impression).

    Sun claims that its new ZFS file system for Solaris has supplementary
    checksum information that guards data from main-memory to disk and back
    again - you might find a look there interesting. But that's not
    specifically RAID-related.

    - bill

  5. Re: Raid level write verification


    Bill Todd wrote:
    > teckytim wrote:
    >
    > ...
    >
    > > So if I *require* a raid implementation that does this, where do I

    have
    > > to look? Are there PCI card products (SATA & SCSI), raid boxes, or

    is
    > > this only available in the high end non-das raid like emc, etc.

    >
    > Someone else here might know, but I don't.
    >
    > >
    > > Are background media scans sufficient protection against

    failing/flaky
    > > media so the verify feature discussed above is not necessary?
    > >
    > > Thanks again.
    > >

    >
    > I don't think the two are all that closely related. All
    > read-after-write does is verify that the data written was what you
    > intended to write: while this does guard against very

    low-probability
    > errors like silently-failing null writes or 'wild' writes (though

    with
    > the latter you have to worry about what got clobbered as well), it

    isn't
    > likely to be any kind of substitute for background 'scrubbing' to

    catch
    > deteriorating sectors (which I think are orders of magnitude more

    likely
    > than unheralded write failures, but that's just my impression).


    I didn't think they are related, at least not outside of the most
    general sense. Read-after-write just seems to me a reasonable extra
    failsafe where data integrity/security trumps all else. That
    perception could be wrong though.

    I have occasionally read about transient write errors in raid 5
    implementations which writers/poster believe make raid 5 less reliable
    than other levels. Also I have read about some interesting data
    protection features in EMC & I think Netapp which I believe combat
    these fears.

    It seems to me the likelihood of a flakey drive causing problems
    increases with array size (drive #) and esp in larger ATA arrays. In
    the event of, say, a weakening sector which causes a write to fail but
    is not quite weak enough to be marked bad it would cause confusion on
    defect scan. I have also seen a drive or two which was failing by
    corrupting data, but still spinning & not showing much or anything in
    the way of bad sectors. It's rare, but I've seen it and wouldn't want
    one such drive to take a crap all over an arrays stripes.
    "read-after-write" in addition to background defect scanning makes
    sense to me. I usually see only the latter. That makes me wonder.

    > Sun claims that its new ZFS file system for Solaris has supplementary


    > checksum information that guards data from main-memory to disk and

    back
    > again - you might find a look there interesting. But that's not
    > specifically RAID-related.
    >
    > - bill


    Very interesting. Will look. Thanks again for the response.


  6. Re: Raid level write verification

    teckytim (technotim@hotmail.com) wrote:
    : I have occasionally read about transient write errors in raid 5
    : implementations which writers/poster believe make raid 5 less reliable
    : than other levels. Also I have read about some interesting data
    : protection features in EMC & I think Netapp which I believe combat
    : these fears.

    A block protection scheme (aka DIF) has recently been standardized by T10.
    That protection scheme has been implemented by a few of the silicon
    suppliers (including my employer). Look for that scheme to become a
    pretty common feature in the next couple of years. It has been a
    proprietary feature of a few storage vendors for a number of years already.
    The recently announced SGI 4G FC array (OEMed from Engenio) is an example
    that has this new standardized feature built into it.

    Dave


  7. Re: Raid level write verification

    Bill Todd (billtodd@metrocast.net) wrote:
    : Dave Sheehy wrote:
    : > teckytim (technotim@hotmail.com) wrote:
    : > : I have occasionally read about transient write errors in raid 5
    : > : implementations which writers/poster believe make raid 5 less reliable
    : > : than other levels. Also I have read about some interesting data
    : > : protection features in EMC & I think Netapp which I believe combat
    : > : these fears.
    : >
    : > A block protection scheme (aka DIF) has recently been standardized by T10.
    : > That protection scheme has been implemented by a few of the silicon
    : > suppliers (including my employer). Look for that scheme to become a
    : > pretty common feature in the next couple of years. It has been a
    : > proprietary feature of a few storage vendors for a number of years already.
    : > The recently announced SGI 4G FC array (OEMed from Engenio) is an example
    : > that has this new standardized feature built into it.

    : If it is indeed now a standard I suspect that given sufficient effort I
    : could learn its details. But if you found it convenient to post them
    : (at least to the degree that one could understand the technology
    : involved - e.g., is it simply an additional checksum, does it live with
    : the data or separate from it, etc.), it would save me and other curious
    : individuals some time.

    The details can be found in the SBC-2 or -3 standard at t10.org. Look for
    the section on "Protection Information". Also, some new 32 bit extended SCSI
    commands are being proposed to support this functionality. There are some
    rumblings about adding this to T13 as well but I'm not familiar with the
    status of that.

    Briefly, 8 bytes of information are appended to each block. There are 3
    fields of information, a 2 byte CRC (of the data), a 4 byte LBA count, and
    a 2 byte application tag. Theoretically, the information can be applied end
    to end (i.e. generated at the server and sent to and returned from the
    array) but that is not a typical deployment (although a few HBA manufacturers
    are incorporating the feature). The typical deployment is to generate the
    information in the protocol controller on the front end of the array as
    its written to memory (i.e. data cache). It is written to disk by the back
    end. The information is validated by both back end and front end when the
    data is read by the protocol controller. When performend in this fashion the
    data is protected as it traverses the bus (e.g. PCI and PCIX only have
    simple parity protection), while it resides in memory, and while it resides
    on the disk.

    Dave


  8. Re: Raid level write verification

    Dave Sheehy wrote:

    ....

    > Briefly, 8 bytes of information are appended to each block. There are 3
    > fields of information, a 2 byte CRC (of the data), a 4 byte LBA count, and
    > a 2 byte application tag. Theoretically, the information can be applied end
    > to end (i.e. generated at the server and sent to and returned from the
    > array) but that is not a typical deployment (although a few HBA manufacturers
    > are incorporating the feature). The typical deployment is to generate the
    > information in the protocol controller on the front end of the array as
    > its written to memory (i.e. data cache). It is written to disk by the back
    > end. The information is validated by both back end and front end when the
    > data is read by the protocol controller. When performend in this fashion the
    > data is protected as it traverses the bus (e.g. PCI and PCIX only have
    > simple parity protection), while it resides in memory, and while it resides
    > on the disk.


    Thanks. That's the kind of thing I thought might be useful a decade
    ago, though seems a little stingy today - e.g., limiting the LBA address
    to 32 bits (common arrays below the level that the host system may be
    aware of already exceed this size, though when used only as a sanity
    check the low 32 bits of the LBA may be sufficient) and the
    application-specific area to 16 (if both fields were longer the
    application-specific area could be used, e.g., to hold a file identifier
    which would facilitate reconstruction of a file system - I have a vague
    recollection that IBM's i-series boxes and their ancestors may have done
    this).

    It should at least allow a host which cares enough to implement the
    functionality the ability to generate the validation information before
    the data leaves main memory and check it after it returns. This will
    catch otherwise undetected bus errors and anything clobbered by a wild
    write, but unfortunately still won't detect that the intended
    destination was never updated (or that a silent null write failure
    occurred).

    And the largest single potential market for such a feature could turn
    out to be SATA based...

    - bill

  9. Re: Raid level write verification

    Thanks for the follow-up posts Bill & Dave. Very helpful. In addition
    I see a proposal for a "Write Read Verify" feature extension over
    at T13.org
    http://www.t13.org/docs2005/e04129r5...ead_verify.pdf

    I am specifically looking for SCSI & SATA DAS & controllers that
    utilizes advanced protection mechanisms as has been mentioned here.
    Any product recommendations along those lines?


    Thanks again for your time.


  10. Re: Raid level write verification

    > Thanks. That's the kind of thing I thought might be useful a decade
    > ago, though seems a little stingy today - e.g., limiting the LBA address
    > to 32 bits (common arrays below the level that the host system may be
    > aware of already exceed this size, though when used only as a sanity
    > check the low 32 bits of the LBA may be sufficient)

    I think, that this is LBA address for given drive inside array (and we
    still have to wait for drives with 2^32 sectors) - because IMHO there is
    little sense to address linear sectors from different drives in RAID;
    and this is controller job to specify drive and sector on that drive for
    given data portion

    regards
    Jaroslaw Weglinski

  11. Re: Raid level write verification

    Jaroslaw Weglinski wrote:
    >> Thanks. That's the kind of thing I thought might be useful a decade
    >> ago, though seems a little stingy today - e.g., limiting the LBA
    >> address to 32 bits (common arrays below the level that the host system
    >> may be aware of already exceed this size, though when used only as a
    >> sanity check the low 32 bits of the LBA may be sufficient)

    >
    > I think, that this is LBA address for given drive inside array (and we
    > still have to wait for drives with 2^32 sectors)


    That may be how those who currently use it do so, but it's hardly the
    only useful way to use it.

    - because IMHO there is
    > little sense to address linear sectors from different drives in RAID;
    > and this is controller job to specify drive and sector on that drive for
    > given data portion


    The sense was implicit in my following description of how a
    seriously-interested host system could use the fields. A host system
    should (at least at the level where the code generating these validation
    checks would likely operate) know nothing about the nature of the
    apparent SCSI device it's talking to: it's just a device (and in the
    case of a hardware array, a 'device' which can already often be larger
    than 2 TB).

    Although this would raise issues about dynamically expanding or
    contracting the array, since its controller would have to understand how
    the host was using the fields to the extent that it could propagate them
    to the new block locations rather than create new ones there. Of
    course, this may already be an issue for its application-controlled
    portion: if the application actually is cognizant of the locations on
    the physical disks, then how it uses that portion might differ from how
    it would if it were considering the location as a logical block address
    in the context of the array.

    - bill

  12. Re: Raid level write verification

    >>> Thanks. That's the kind of thing I thought might be useful a decade
    >>> ago, though seems a little stingy today - e.g., limiting the LBA
    >>> address to 32 bits (common arrays below the level that the host
    >>> system may be aware of already exceed this size, though when used
    >>> only as a sanity check the low 32 bits of the LBA may be sufficient)

    >> I think, that this is LBA address for given drive inside array (and we
    >> still have to wait for drives with 2^32 sectors)

    > That may be how those who currently use it do so, but it's hardly the
    > only useful way to use it.


    yes - for example HDS is using 520 byte sectors internally, and they
    contain 32bit sector number

    >
    > - because IMHO there is
    >
    >> little sense to address linear sectors from different drives in RAID;
    >> and this is controller job to specify drive and sector on that drive
    >> for given data portion

    >
    >
    > The sense was implicit in my following description of how a
    > seriously-interested host system could use the fields.

    ok - in case of host<->array communication, 32bit may be too short (but
    it is not used for error detection anyway - because of address
    translation on controller [if we are storing physical sector numbers on
    disks])


    another way is to store host level sector numbers on disks - but this
    would need second field, describing association with specific, host
    visible lun (if not, and if for example, we have many host visible disks
    with few sectors each - there is probability of not detecting wild
    writes - if we write sector X for lun A in place of sector X for lun B)
    - and, in this case, we have problem with 32bit address space
    > A host system
    > should (at least at the level where the code generating these validation
    > checks would likely operate) know nothing about the nature of the
    > apparent SCSI device it's talking to: it's just a device (and in the
    > case of a hardware array, a 'device' which can already often be larger
    > than 2 TB).
    >
    > Although this would raise issues about dynamically expanding or
    > contracting the array, since its controller would have to understand how
    > the host was using the fields to the extent that it could propagate them
    > to the new block locations rather than create new ones there. Of
    > course, this may already be an issue for its application-controlled
    > portion: if the application actually is cognizant of the locations on
    > the physical disks, then how it uses that portion might differ from how
    > it would if it were considering the location as a logical block address
    > in the context of the array.
    >
    > - bill


    I think that host would be mostly interested in CRC [and app. field] -
    as you have written - and sector count would be mainly interesting for
    the array controller (to verify wild writes - and restore from RAID in
    that case) [of course controller can use CRC too (to verify data - and
    restore form RAID if needed)]; so we can as well store physical sector
    counts on disks (in case of intelligent RAID controller) - and send
    rewritten, proper sector numbers to host

    another question is - if crc includes only data, or also LBA (in second
    case, controller has to recalculate crc on transfer, as sector number
    changes)


    regards,
    Jaroslaw Weglinski

  13. Re: Raid level write verification

    Jaroslaw Weglinski wrote:

    ....

    > ok - in case of host<->array communication, 32bit may be too short (but
    > it is not used for error detection anyway - because of address
    > translation on controller [if we are storing physical sector numbers on
    > disks])


    I'm not quite sure what you meant to say there. It would be nice to
    have sufficient host-manageable space to store both a modest CRC (for
    end-to-end content-integrity validation) and, say, a file ID/file block
    number (both for end-to-end address-integrity validation and to
    facilitate rebuilding a file system after some catastrophe). I'd be
    willing to give up the latter, but if I let the controller handle the
    address validation it can't catch addressing errors between it and the
    host memory (though things like packet CRCs and bus parity may catch
    most of them, and something as simple as returning the input address as
    the operation ACK could allow the host to check what the controller
    thinks it wrote or read).

    ....

    > I think that host would be mostly interested in CRC [and app. field] -
    > as you have written - and sector count would be mainly interesting for
    > the array controller (to verify wild writes - and restore from RAID in
    > that case)


    While the controller could correct the effects of a
    previously-undetected wild write when it encountered the data that it
    clobbered (leaving aside the potential addressing errors between host
    and controller described above), if the host uses the *intended* target
    of that write before the unintended target is noticed neither the host
    nor the controller has any indication that the data being read should
    have been updated but wasn't.

    Gives one a good reason not to be too lazy about background
    validation-scan activity, I guess.

    [of course controller can use CRC too (to verify data - and
    > restore form RAID if needed)]; so we can as well store physical sector
    > counts on disks (in case of intelligent RAID controller) - and send
    > rewritten, proper sector numbers to host
    >
    > another question is - if crc includes only data, or also LBA (in second
    > case, controller has to recalculate crc on transfer, as sector number
    > changes)


    If I didn't have additional space available to use for file ID/file
    block number information, I think I'd include them in my CRC (tell the
    controller that I was handling the CRC in the host so that it would
    leave it alone - which may already be pretty clear on writes by virtue
    of seeing that the host is providing a CRC, but is not obvious on
    reads). Then the only additional feature I'd need would be the ability
    to tell the controller to fetch the 'other' copy of the data if I didn't
    like the one I got back, and update the faulty one.

    That still wouldn't normally protect against the undetected update
    failure above, though.

    - bill

+ Reply to Thread