IOPS from RAID units - Storage

This is a discussion on IOPS from RAID units - Storage ; The IOPS the array can deliver is the aggregate of all the drives in the raid group less the hot spares and parity drives. Its more complicated than that of course but its a not bad rule of thumb. So ...

+ Reply to Thread
Results 1 to 12 of 12

Thread: IOPS from RAID units

  1. IOPS from RAID units


    The IOPS the array can deliver is the aggregate of all the drives in the raid group less the hot spares and parity drives. Its more complicated than that of course but its a not bad rule of thumb. So 10 data drives that can each deliver 100 iops could (in theory) deliver 1000 iops when bound together into a raid group. If you want to know what performance you will get out of a SPECIFIC array with SPECIFIC applications, that can be calculated more accurately....at great expense of course.

    +----------------------------------------------------------------------
    |This was sent by david.hoffer@7group.ca via Backup Central.
    |Forward SPAM to abuse@backupcentral.com.
    +----------------------------------------------------------------------




  2. Re: IOPS from RAID units

    On Thu, 20 Mar 2008 19:44:27 -0400, davidhoffer
    wrote:

    >
    >The IOPS the array can deliver is the aggregate of all the drives in the raid group less the hot spares and parity drives. Its more complicated than that of course but its a not bad rule of thumb. So 10 data drives that can each deliver 100 iops could (in theory) deliver 1000 iops when bound together into a raid group. If you want to know what performance you will get out of a SPECIFIC array with SPECIFIC applications, that can be calculated more accurately....at great expense of course.
    >
    >+----------------------------------------------------------------------
    >|This was sent by david.hoffer@7group.ca via Backup Central.
    >|Forward SPAM to abuse@backupcentral.com.
    >+----------------------------------------------------------------------
    >
    >


    Was there a question in there somewhere and I just missed it?

    ~F

  3. Re: IOPS from RAID units

    davidhoffer wrote: > > The
    IOPS the array can deliver is the aggregate of all the drives in the raid
    group less the hot spares and parity drives. Its more complicated than
    that of course but its a not bad rule of thumb. So 10 data drives that can
    each deliver 100 iops could (in theory) deliver 1000 iops when bound

    this is a bad assumption.

    let me ruin this example.

    I write 5 bytes to a raid5 array of 10 drives.

    that's one operation to the host.

    it's also at least 10 reads and 10 writes if you're using 10 drives, and
    probably more depending on the stripe size of the raid array.

    we're looking at negative performance gains here in addition to an
    incredible increase of operations inside that raid group.


    together into a raid group. If you want to know what performance you will
    get out of a SPECIFIC array with SPECIFIC applications, that can be
    calculated more accurately....at great expense of course. > >

    calculating performance is just stupid. Run benchmarks and see what really
    happens as that's what counts.



  4. Re: IOPS from RAID units

    On Mar 21, 11:15 am, Cydrome Leader wrote:
    > davidhoffer wrote: > > The
    >
    > IOPS the array can deliver is the aggregate of all the drives in the raid
    > group less the hot spares and parity drives. Its more complicated than
    > that of course but its a not bad rule of thumb. So 10 data drives that can
    > each deliver 100 iops could (in theory) deliver 1000 iops when bound
    >
    > this is a bad assumption.
    >
    > let me ruin this example.
    >
    > I write 5 bytes to a raid5 array of 10 drives.
    >
    > that's one operation to the host.
    >
    > it's also at least 10 reads and 10 writes if you're using 10 drives, and
    > probably more depending on the stripe size of the raid array.



    Not it's not. Assuming your write doesn't span a stripe or block,
    it's two reads and two writes, assuming nothing is cached.

    Note that a much larger write (the size of a stripe and so aligned),
    can be done with a single set of writes.


    > we're looking at negative performance gains here in addition to an
    > incredible increase of operations inside that raid group.



    No, it's two reads and two writes no matter how may drives in the
    RAID5 array.


    > together into a raid group. If you want to know what performance you will
    > get out of a SPECIFIC array with SPECIFIC applications, that can be
    > calculated more accurately....at great expense of course. > >
    >
    > calculating performance is just stupid. Run benchmarks and see what really
    > happens as that's what counts.



    While it's difficult to do with great precision, some of us do like to
    do a bit of planning before buying the expensive storage box.
    Seriously, "just buy the DS8000, and then we'll figure out if it's the
    right size" isn't going to fly at a lot of places. And while IBM will
    likely help me do some benchmarking of a DS8000 before I buy it, the
    process in neither easy of cheap, so I really don't want to be doing
    it too much, and I want to use the result of that only for fine
    tuning.

    If I can get projected (uncached) read and write rates from my
    database, using W*4+R as my baseline IOPS requirement is a pretty
    reasonable first order approximation.

  5. Re: IOPS from RAID units

    robertwessel2@yahoo.com wrote:
    > On Mar 21, 11:15 am, Cydrome Leader wrote:
    >> davidhoffer wrote: > > The
    >>
    >> IOPS the array can deliver is the aggregate of all the drives in the raid
    >> group less the hot spares and parity drives. Its more complicated than
    >> that of course but its a not bad rule of thumb. So 10 data drives that can
    >> each deliver 100 iops could (in theory) deliver 1000 iops when bound
    >>
    >> this is a bad assumption.
    >>
    >> let me ruin this example.
    >>
    >> I write 5 bytes to a raid5 array of 10 drives.
    >>
    >> that's one operation to the host.
    >>
    >> it's also at least 10 reads and 10 writes if you're using 10 drives, and
    >> probably more depending on the stripe size of the raid array.

    >
    >
    > Not it's not. Assuming your write doesn't span a stripe or block,


    It doesn't matter what it spans or how small it is. Everything gets
    rewritten. that's why raid5 is crap for write performance and especially
    small writes.

    > it's two reads and two writes, assuming nothing is cached.
    >
    > Note that a much larger write (the size of a stripe and so aligned),
    > can be done with a single set of writes.


    As the write size approaches the stripe size, the ratio of overhead to
    actual writes (the data the user sees and cares about) starts to get
    closer to 1, but it still doesn't change that there's all sorts of busy
    work going on.

    >> we're looking at negative performance gains here in addition to an
    >> incredible increase of operations inside that raid group.

    >
    >
    > No, it's two reads and two writes no matter how may drives in the
    > RAID5 array.


    Ok, so I change 513 bytes of data from my host to a raid 5 array with 64kB
    stripe size and 512byte blocks. Please explain how that array gets updated
    with only two writes.

    >> together into a raid group. If you want to know what performance you will
    >> get out of a SPECIFIC array with SPECIFIC applications, that can be
    >> calculated more accurately....at great expense of course. > >
    >>
    >> calculating performance is just stupid. Run benchmarks and see what really
    >> happens as that's what counts.

    >
    >
    > While it's difficult to do with great precision, some of us do like to
    > do a bit of planning before buying the expensive storage box.
    > Seriously, "just buy the DS8000, and then we'll figure out if it's the
    > right size" isn't going to fly at a lot of places. And while IBM will
    > likely help me do some benchmarking of a DS8000 before I buy it, the
    > process in neither easy of cheap, so I really don't want to be doing


    If you're too lazy and cheap to do testing of an expensive storage unit,
    maybe you shouldn't be the person testing and deciding what you buy in the
    first place.

    Considering the ds8000 series is so costly that you can't even buy it now
    off the IBM site, it's pretty reasonable to assume IBM will jump through
    some hoops to sell you one, even if that involves testing one out.

    > If I can get projected (uncached) read and write rates from my
    > database, using W*4+R as my baseline IOPS requirement is a pretty
    > reasonable first order approximation.


    Which still has little bearing on what's going to happen in the real
    world. that's why we test things out to cut though the sales sheet
    bull****.

  6. Re: IOPS from RAID units

    Cydrome Leader wrote:
    > robertwessel2@yahoo.com wrote:
    >> On Mar 21, 11:15 am, Cydrome Leader wrote:
    >>> davidhoffer wrote: > > The
    >>>
    >>> IOPS the array can deliver is the aggregate of all the drives in the raid
    >>> group less the hot spares and parity drives. Its more complicated than
    >>> that of course but its a not bad rule of thumb. So 10 data drives that can
    >>> each deliver 100 iops could (in theory) deliver 1000 iops when bound
    >>>
    >>> this is a bad assumption.


    For many common situations (specifically, where reads constitute most of
    the workload and the request sizes are much smaller than the array
    stripe segment size) in which IOPS are important (i.e., workloads
    dominated by many small accesses: otherwise, bandwidth starts being the
    primary concern), it's actually a pretty good assumption.

    >>>
    >>> let me ruin this example.
    >>>
    >>> I write 5 bytes to a raid5 array of 10 drives.
    >>>
    >>> that's one operation to the host.
    >>>
    >>> it's also at least 10 reads and 10 writes if you're using 10 drives, and
    >>> probably more depending on the stripe size of the raid array.

    >>
    >> Not it's not. Assuming your write doesn't span a stripe or block,

    >
    > It doesn't matter what it spans or how small it is. Everything gets
    > rewritten. that's why raid5 is crap for write performance and especially
    > small writes.


    Some people who don't have a clue at least have the sense to shut up
    when corrected. Obviously, though, you're not one of them.

    >
    >> it's two reads and two writes, assuming nothing is cached.


    100% correct, unless you're unlucky enough to find that the 5 bytes
    cross a stripe-segment boundary (with the 64 KB segment size used as an
    example later, the chances of this are a bit under 0.01%) in which case
    it'll be three reads and three writes (unless the array size is small
    enough to make another strategy more efficient).

    ....

    >> No, it's two reads and two writes no matter how may drives in the
    >> RAID5 array.

    >
    > Ok, so I change 513 bytes of data from my host to a raid 5 array with 64kB
    > stripe size and 512byte blocks. Please explain how that array gets updated
    > with only two writes.


    Leaving aside the slightly under 1% chance that the 513 bytes happen to
    span a stripe segment boundary, you read the two sectors affected by the
    update (if they're not already cached, which given that you're updating
    them is quite likely), you read the corresponding two parity sectors
    from the stripe's parity segment, you XOR the original 513 bytes with
    the new bytes, you XOR the result into the corresponding bytes in the
    parity segment in memory, you overwrite the original 513 bytes in the
    data segment in memory with the changed data, you write back the two
    sectors of the data segment, and you write back the two sectors of the
    parity segment.

    One or at most two small reads, plus two small writes. I suspect that
    your understanding of how RAID-5 actually functions is seriously flawed,
    but here's at least a start on correcting that situation (which of
    course you should have attempted before making a fool of yourself, but
    better late than never).

    >
    >>> together into a raid group. If you want to know what performance you will
    >>> get out of a SPECIFIC array with SPECIFIC applications, that can be
    >>> calculated more accurately....at great expense of course. > >
    >>>
    >>> calculating performance is just stupid.


    Actually, *not* making serious attempts to estimate performance is
    stupid: not only do benchmarks (even those run with your own
    application) often fail to uncover important corner cases, but comparing
    your estimates with the benchmarks validates your own understanding of
    your hardware and its relation to your workload (or, perhaps even more
    importantly, exposes gaps in it).

    ....

    >> While it's difficult to do with great precision, some of us do like to
    >> do a bit of planning before buying the expensive storage box.
    >> Seriously, "just buy the DS8000, and then we'll figure out if it's the
    >> right size" isn't going to fly at a lot of places. And while IBM will
    >> likely help me do some benchmarking of a DS8000 before I buy it, the
    >> process in neither easy of cheap, so I really don't want to be doing

    >
    > If you're too lazy and cheap to do testing of an expensive storage unit,
    > maybe you shouldn't be the person testing and deciding what you buy in the
    > first place.


    Hmmm - you hardly sound like someone competent to be making judgments in
    this area: let's hope that your employer doesn't drop in here often.

    - bill

  7. Re: IOPS from RAID units

    Bill Todd wrote:
    > Cydrome Leader wrote:
    >> robertwessel2@yahoo.com wrote:
    >>> On Mar 21, 11:15 am, Cydrome Leader wrote:
    >>>> davidhoffer wrote: > > The
    >>>>
    >>>> IOPS the array can deliver is the aggregate of all the drives in the raid
    >>>> group less the hot spares and parity drives. Its more complicated than
    >>>> that of course but its a not bad rule of thumb. So 10 data drives that can
    >>>> each deliver 100 iops could (in theory) deliver 1000 iops when bound
    >>>>
    >>>> this is a bad assumption.

    >
    > For many common situations (specifically, where reads constitute most of
    > the workload and the request sizes are much smaller than the array
    > stripe segment size) in which IOPS are important (i.e., workloads
    > dominated by many small accesses: otherwise, bandwidth starts being the
    > primary concern), it's actually a pretty good assumption.
    >
    >>>>
    >>>> let me ruin this example.
    >>>>
    >>>> I write 5 bytes to a raid5 array of 10 drives.
    >>>>
    >>>> that's one operation to the host.
    >>>>
    >>>> it's also at least 10 reads and 10 writes if you're using 10 drives, and
    >>>> probably more depending on the stripe size of the raid array.
    >>>
    >>> Not it's not. Assuming your write doesn't span a stripe or block,

    >>
    >> It doesn't matter what it spans or how small it is. Everything gets
    >> rewritten. that's why raid5 is crap for write performance and especially
    >> small writes.

    >
    > Some people who don't have a clue at least have the sense to shut up
    > when corrected. Obviously, though, you're not one of them.
    >
    >>
    >>> it's two reads and two writes, assuming nothing is cached.

    >
    > 100% correct, unless you're unlucky enough to find that the 5 bytes
    > cross a stripe-segment boundary (with the 64 KB segment size used as an
    > example later, the chances of this are a bit under 0.01%) in which case
    > it'll be three reads and three writes (unless the array size is small
    > enough to make another strategy more efficient).
    >
    > ...
    >
    >>> No, it's two reads and two writes no matter how may drives in the
    >>> RAID5 array.

    >>
    >> Ok, so I change 513 bytes of data from my host to a raid 5 array with 64kB
    >> stripe size and 512byte blocks. Please explain how that array gets updated
    >> with only two writes.

    >
    > Leaving aside the slightly under 1% chance that the 513 bytes happen to
    > span a stripe segment boundary, you read the two sectors affected by the
    > update (if they're not already cached, which given that you're updating
    > them is quite likely), you read the corresponding two parity sectors
    > from the stripe's parity segment, you XOR the original 513 bytes with
    > the new bytes, you XOR the result into the corresponding bytes in the
    > parity segment in memory, you overwrite the original 513 bytes in the
    > data segment in memory with the changed data, you write back the two
    > sectors of the data segment, and you write back the two sectors of the
    > parity segment.
    >
    > One or at most two small reads, plus two small writes. I suspect that
    > your understanding of how RAID-5 actually functions is seriously flawed,
    > but here's at least a start on correcting that situation (which of
    > course you should have attempted before making a fool of yourself, but
    > better late than never).


    Unless you're some raid salesperson, who uses different definitions of
    "operations" between the host and raid controller and then something else
    between the raid controller and the disks themselves, I'm not following.

    So try again.

    I alter 513 bytes on a raid 5 array, overwriting existing data. That's
    two 512 byte writes for the host.

    How many writes take place between that raid controller and the disks
    themselves, in 512 byte writes to any disk in that array?





  8. Re: IOPS from RAID units

    Cydrome Leader wrote:

    ....

    > Unless you're some raid salesperson, who uses different definitions of
    > "operations" between the host and raid controller and then something else
    > between the raid controller and the disks themselves, I'm not following.


    That's your problem. I'd suggest finding a tutor, if you can't learn
    how a conventional RAID-5 array works from the description that I
    provided or from other easily-accessible on-line resources.

    - bill

  9. Re: IOPS from RAID units

    Bill Todd wrote:
    > Cydrome Leader wrote:
    >
    > ...
    >
    >> Unless you're some raid salesperson, who uses different definitions of
    >> "operations" between the host and raid controller and then something else
    >> between the raid controller and the disks themselves, I'm not following.

    >
    > That's your problem. I'd suggest finding a tutor, if you can't learn
    > how a conventional RAID-5 array works from the description that I
    > provided or from other easily-accessible on-line resources.
    >
    > - bill


    I noticed you cut out my question, probably on purpose.

    I'll ask again.

    I want to change two blocks on a raid5 array, the host is changing 513
    bytes, so it's really doing two writes of 512bytes each.

    you and your buddy state that it only takes two write to do this on raid5.

    So, can you explain how it only takes two 512byte writes to update data on
    a raid5 aray where the change in user data is also two 512byes?






  10. Re: IOPS from RAID units

    On Mar 22, 7:18*pm, Cydrome Leader wrote:
    > robertwess...@yahoo.com wrote:
    > > On Mar 21, 11:15 am, Cydrome Leader wrote:
    > >> davidhoffer wrote: > > The

    >
    > >> IOPS the array can deliver is the aggregate of all the drives in the raid
    > >> group less the hot spares and parity drives. *Its more complicated than
    > >> that of course but its a not bad rule of thumb. So 10 data drives that can
    > >> each deliver 100 iops could (in theory) deliver 1000 iops when bound

    >
    > >> this is a bad assumption.

    >
    > >> let me ruin this example.

    >
    > >> I write 5 bytes to a raid5 array of 10 drives.

    >
    > >> that's one operation to the host.

    >
    > >> it's also at least 10 reads and 10 writes if you're using 10 drives, and
    > >> probably more depending on the stripe size of the raid array.

    >
    > > Not it's not. *Assuming your write doesn't span a stripe or block,

    >
    > It doesn't matter what it spans or how small it is. Everything gets
    > rewritten. that's why raid5 is crap for write performance and especially
    > small writes.
    >
    > > it's two reads and two writes, assuming nothing is cached.

    >
    > > Note that a much larger write (the size of a stripe and so aligned),
    > > can be done with a single set of writes.

    >
    > As the write size approaches the stripe size, the ratio of overhead to
    > actual writes (the data the user sees and cares about) starts to get
    > closer to 1, but it still doesn't change that there's all sorts of busy
    > work going on.
    >
    > >> we're looking at negative performance gains here in addition to an
    > >> incredible increase of operations inside that raid group.

    >
    > > No, it's two reads and two writes no matter how may drives in the
    > > RAID5 array.

    >
    > Ok, so I change 513 bytes of data from my host to a raid 5 array with 64kB
    > stripe size and 512byte blocks. Please explain how that array gets updated
    > with only two writes.



    Your original example involved a five byte write, and my response was
    clearly in the context of small random writes like that.

    Let's see, 512 byte blocks, and a 64K stripe... OK, a RAID-5 array
    with 129 disks in it. An unusual configuration to be sure, but we'll
    run with it.

    Assuming this stays within the stripe (the array isn't always laid out
    that way), the update process then involves three reads, and three
    writes. The two old blocks and the old parity block are read, and
    then the updated blocks and recomputed parity block are written back.
    Which is actually better (3X) than the (roughly) 4X performance hit
    that a single block update incurs.

    A larger sequential write (which you've done by specifying a 513 byte
    write with 512 byte blocks) can be done with rather less overhead,
    approaching 2X as you near the stripe size, and falling to a little
    over 1X (writes only for the data plus a parity block) when the write
    covers an entire stripe.

    OTOH, assuming you mean something more reasonable like 4K blocks on a
    64K stripe and the usually 512 byte *sectors* (although that's still
    pretty darn small for both dimensions), your two sector update
    requires two reads of sequential pairs of sectors (the old data pair,
    and the matching old parity pair), and then a pair of two sector
    sequential writes. Assuming of course you don't span the block or
    stripe boundary. And given the almost non-existent overhead of
    reading or writing a sequential pair of sectors compared to reading a
    single sector, that's invariable counted as a single I/O, and not
    two. Not to mention that reads, at least, are almost always done in
    bigger units than a sector anyway.

    But in the end, your basic small random write requires two physical
    reads and two physical writes, assuming no caching, no matter how many
    disks in the RAID-5 array.


    > > While it's difficult to do with great precision, some of us do like to
    > > do a bit of planning before buying the expensive storage box.
    > > Seriously, "just buy the DS8000, and then we'll figure out if it's the
    > > right size" isn't going to fly at a lot of places. *And while IBM will
    > > likely help me do some benchmarking of a DS8000 before I buy it, the
    > > process in neither easy of cheap, so I really don't want to be doing

    >
    > If you're too lazy and cheap to do testing of an expensive storage unit,
    > maybe you shouldn't be the person testing and deciding what you buy in the
    > first place.
    >
    > Considering the ds8000 series is so costly that you can't even buy it now
    > off the IBM site, it's pretty reasonable to assume IBM will jump through
    > some hoops to sell you one, even if that involves testing one out.



    Sure, IBM will jump through hoops, but do you know how much work it is
    to set up a realistic set of benchmarks for a complex system at an IBM
    facility (or even if they ship you one to play with for a month at
    your site)? And that's not work IBM can do. And as I stated in the
    part you chose not to quote, that's not something I want to do too
    much of, and I want to be pretty close before I start, and want to use
    the "real" benchmark results only for fine tuning the configuration.

  11. Re: IOPS from RAID units

    Cydrome Leader wrote:
    > Bill Todd wrote:
    >> Cydrome Leader wrote:


    ....

    > I noticed you cut out my question, probably on purpose.


    Indeed: I ignored it because it demonstrated the profoundness of your
    ignorance.

    As I suggested, when it comes to disk arrays (and to the way disks are
    themselves accessed, for that matter - at least in the two decades since
    the ST506 interleaved interface was superseded by contiguous
    multi-sector transfers), you obviously require some very basic tutoring
    - well below the level of discussion that this group is aimed at (and
    that I have any interest in pursuing).

    - bill

  12. Re: IOPS from RAID units

    robertwessel2@yahoo.com wrote:

    ....

    > Let's see, 512 byte blocks, and a 64K stripe... OK, a RAID-5 array
    > with 129 disks in it. An unusual configuration to be sure, but we'll
    > run with it.


    Sufficiently unusual that I assumed he merely worded his example
    sloppily, and that he really meant 64 KB stripe segments on each disk (a
    fairly common configuration, though segments in the MB and even larger
    range make a lot more sense in many situations).

    - bill

+ Reply to Thread