Very High Rate Continous Transfer - Storage

This is a discussion on Very High Rate Continous Transfer - Storage ; I am looking into what it will take to support continuous (not burst) 160 to 200 MBytes/sec transfer to disk. What types of drives and how would they be configured as an array (or multiple arrays)? What type of processor ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: Very High Rate Continous Transfer

  1. Very High Rate Continous Transfer

    I am looking into what it will take to support continuous (not burst)
    160 to 200 MBytes/sec transfer to disk. What types of drives and how
    would they be configured as an array (or multiple arrays)?

    What type of processor and bus architecture would be appropriate? Data
    will be from a capture memory that is shared between the processor and
    the capture electronics. So memory bandwidth will be at least 320 to
    400 MBytes/sec.

    Thanks in advance,
    Jim


  2. Re: Very High Rate Continous Transfer

    jim_nospam_beasley@yahoo.com wrote:
    > I am looking into what it will take to support continuous (not burst)
    > 160 to 200 MBytes/sec transfer to disk. What types of drives and how
    > would they be configured as an array (or multiple arrays)?
    >
    > What type of processor and bus architecture would be appropriate? Data
    > will be from a capture memory that is shared between the processor and
    > the capture electronics. So memory bandwidth will be at least 320 to
    > 400 MBytes/sec.


    Modern x86 processor/memory configurations should be able to handle that
    kind of memory bandwidth without even starting to break a sweat - if the
    capture electronics can use standard DMA mechanisms to provide the data.

    But in that case you'd need a double-speed, double-width (64/66) PCI bus
    at a minimum to handle the bi-directional bandwidth unless you used a
    chipset that bypassed the PCI for disk activity (I'm not even sure
    they're available: if they only support IDE, you'd have to split the
    load across two IDE ports - and where are you going to find an IDE RAID
    box?) or worked around the problem by using the AGP port for the input
    or output stream, in which case you'd still need double-width or
    double-speed PCI but probably not both. In other words, PCI-X or
    PCI-express might be a better option (and today probably much more
    available than double-speed, double-width PCI).

    RAID-3 sounds like what you want for disk storage (assuming that you
    want some redundancy in it - though I'm not sure where you'd find a
    non-redundant implementation comparable in performance to a good RAID-3
    box even if redundancy weren't necessary). I saw a single 1 Gbit/s
    fibre channel link deliver almost 90 MB/s of streaming write bandwidth
    to an 8 + 1 RAID-3 array back in 1998, so 2 Gbit/s fiber channel with at
    most 6 + 1 of today's disks - which should offer 30 - 40 MB/s/disk
    streaming bandwidth at an absolute minimum - should do the job with a
    single array (if not, perhaps the application can easily distribute the
    streaming data across two arrays or use software RAID-0 to do so).

    Probably someone with more recent and detailed experience can answer
    this question better, but at least that's a start.

    - bill

  3. Re: Very High Rate Continous Transfer

    In article <6PidnW5L1Ywq_0TfRVn-2w@metrocastcablevision.com>,
    Bill Todd wrote:
    >jim_nospam_beasley@yahoo.com wrote:
    >> I am looking into what it will take to support continuous (not burst)
    >> 160 to 200 MBytes/sec transfer to disk. What types of drives and how
    >> would they be configured as an array (or multiple arrays)?
    >>
    >> What type of processor and bus architecture would be appropriate? Data
    >> will be from a capture memory that is shared between the processor and
    >> the capture electronics. So memory bandwidth will be at least 320 to
    >> 400 MBytes/sec.

    >
    >Modern x86 processor/memory configurations should be able to handle that
    >kind of memory bandwidth without even starting to break a sweat - if the
    >capture electronics can use standard DMA mechanisms to provide the data.


    Agree. Although you have to be careful with memory <-> CPU bandwidth.
    If you need to do multiple passes over the data (for example, copy if
    between buffers, or run CRCs or TCP checksums over them) the memory
    bandwidth could become an issue.

    >But in that case you'd need a double-speed, double-width (64/66) PCI bus
    >at a minimum to handle the bi-directional bandwidth unless you used a
    >chipset that bypassed the PCI for disk activity (I'm not even sure
    >they're available: if they only support IDE, you'd have to split the
    >load across two IDE ports - and where are you going to find an IDE RAID
    >box?) or worked around the problem by using the AGP port for the input
    >or output stream, in which case you'd still need double-width or
    >double-speed PCI but probably not both.


    You can get motherboards with multiple PCI channels. Somewhere in my
    lab, I have a few 2-U rackmounts with 3 open PCI busses (on 3 slots);
    if I remember right the Ethernet and SCSI/RAID chips on the
    motherboard are on a separate bus.

    Don't know whether such motherboards are sold in the white-box market;
    you may have to buy a server-class x86 box from one of the big vendors
    to get a system like this.

    Also, AFAIK the AGP bus is a superset of the PCI bus, just on a
    different connector. If you are custom-building your electronics, you
    might want to use the AGP connector.

    > In other words, PCI-X or
    >PCI-express might be a better option (and today probably much more
    >available than double-speed, double-width PCI).


    One other issue: For the outgoing link, I would split the bandwidth
    over two separate fibre channel ports. With 2Gbit FC, each port can
    theoretically handle >200 MB/sec, so each would be loaded <50%, which
    will make the whole thing run much more smoothly. You could for
    either stripe the data yourself (if you control the data writigng
    software, or for example use a LVM on the host to stripe the data over
    the two FC ports. Whether to also stripe it over two disk arrays
    depends on what disk array you buy. As 2-port 2Gbit FC cards are
    easily available, this does not require extra PCI slots.

    For an inexpensive solution (no redundancy, do-it-yourself), get a PCI
    SCSI controller with two U320 ports. Connect a few (maybe a half
    dozen) 10K RPM SCSI disks to each port. This might be quite easy: Buy
    a rackmount JBOD (a.k.a. disk tray) with two SCSI ports; make sure you
    get a model with a splittable SCSI backplane (two half-backplanes,
    each with about a half dozen slots, instead of one long backplane with
    two SCSI connevtors and with a dozen SCSI slots). Then use custom
    software or an LVM to stripe the data across the disks. Let's look at
    the numbers: 10K RPM SCSI drives can write data at about 50 MB/sec
    each; the reason I picked a half dozen drives per SCSI port is to
    match disk bandwidth with SCSI bandwidth. This hardware configuration
    can theoretically handle about 600 MB/sec, so it should have no
    problem runnind day-in day-out at 1/3 of that. Problem is: No
    redundancy, and you have to roll your own striping.

    >RAID-3 sounds like what you want for disk storage (assuming that you
    >want some redundancy in it - though I'm not sure where you'd find a
    >non-redundant implementation comparable in performance to a good RAID-3
    >box even if redundancy weren't necessary). I saw a single 1 Gbit/s
    >fibre channel link deliver almost 90 MB/s of streaming write bandwidth
    >to an 8 + 1 RAID-3 array back in 1998, so 2 Gbit/s fiber channel with at
    >most 6 + 1 of today's disks - which should offer 30 - 40 MB/s/disk
    >streaming bandwidth at an absolute minimum - should do the job with a
    >single array (if not, perhaps the application can easily distribute the
    >streaming data across two arrays or use software RAID-0 to do so).


    To some extent I agree. RAID-3 has traditionally been used for large
    streaming IO (examples: multimedia, supercomputing). On the other
    hand, the bandwidth required here is so small that any modern
    mid-range disk array could do it, in any RAID-level. So if disk cost
    is an issue and you don't need redundancy (meaning you are willing to
    tolerate loss of data, and downtime), you might want to try RAID-0.
    If disk costs are irrelevant, and you want the best possible
    redundancy, you could try RAID-1 (in effect implemented as RAID-10).
    Or maybe RAID-5 might work better than RAID-3: few people use RAID-3
    today, so it is quite possible that RAID-5 implementations have been
    carefully tested and tuned, and are quite fast.

    Another warning: If you think RAID will give you redandancy, and at
    the same time you are relying on maxing out the performance of the
    disk array, you are cheating yourself. What I mean is this: If you
    buy a disk array that can barely handle 200 MB/sec (meaning your
    configuration is cost optimized), then it will not handle that
    bandwidth in degraded mode (with a dead disk). RAID redundancy is in
    some sense about preserving your data; while running with a dead disk,
    the speed will be quite low. If your data is lost for good if you
    can't write it to disk, you'll have to significantly overdesign your
    RAID arrays to make sure they can handle the traffic even in degraded
    mode. This should be less of an issue for RAID-1 (which is easier to
    write to in degraded mode) than for the parity-based RAIDs.

    Happy experimenting!

    --
    The address in the header is invalid for obvious reasons. Please
    reconstruct the address from the information below (look for _).
    Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us

  4. Re: Very High Rate Continous Transfer

    _firstname_@lr_dot_los-gatos_dot_ca.us wrote:

    ....

    > Another warning: If you think RAID will give you redandancy, and at
    > the same time you are relying on maxing out the performance of the
    > disk array, you are cheating yourself. What I mean is this: If you
    > buy a disk array that can barely handle 200 MB/sec (meaning your
    > configuration is cost optimized), then it will not handle that
    > bandwidth in degraded mode (with a dead disk). RAID redundancy is in
    > some sense about preserving your data; while running with a dead disk,
    > the speed will be quite low. If your data is lost for good if you
    > can't write it to disk, you'll have to significantly overdesign your
    > RAID arrays to make sure they can handle the traffic even in degraded
    > mode.


    While I generally agree with the other points you made, my distinct
    impression is that good RAID-3 implementations (unlike, say, RAID-4, -5,
    or -6) suffer no noticeable performance degradation (for either reads or
    writes) while running with a dead disk. That was one of the reasons I
    suggested it.

    - bill

  5. Re: Very High Rate Continous Transfer

    In article ,
    Bill Todd wrote:
    >_firstname_@lr_dot_los-gatos_dot_ca.us wrote:
    >
    >...
    >
    >> Another warning: If you think RAID will give you redandancy, and at
    >> the same time you are relying on maxing out the performance of the
    >> disk array, you are cheating yourself. What I mean is this: If you
    >> buy a disk array that can barely handle 200 MB/sec (meaning your
    >> configuration is cost optimized), then it will not handle that
    >> bandwidth in degraded mode (with a dead disk). RAID redundancy is in
    >> some sense about preserving your data; while running with a dead disk,
    >> the speed will be quite low. If your data is lost for good if you
    >> can't write it to disk, you'll have to significantly overdesign your
    >> RAID arrays to make sure they can handle the traffic even in degraded
    >> mode.

    >
    >While I generally agree with the other points you made, my distinct
    >impression is that good RAID-3 implementations (unlike, say, RAID-4, -5,
    >or -6) suffer no noticeable performance degradation (for either reads or
    >writes) while running with a dead disk. That was one of the reasons I
    >suggested it.


    I agree, and correction gladly accepted. With one minor fly in the
    ointment: While the disk is dead, things should run just fine. As
    soon as you put a spare disk in, the array might try to rebuild onto
    the new disk; that rebuild will compete with the foreground workload.
    With good systems management, this could be circumvented, for example:
    once a LUN has a dead disk, slowly drain the data from it, then get
    the disk array to accept a spare disk without rebuilding, for example
    by destroying the LUN, and recreating it using the spare (doing all
    this on a live system without having to shut the software stack down
    might be tricky).

    Compared to all the other questions the original poster should think
    about (SCSI or FC? Commercial disk arrays or JBODs? PCI/AGP and
    memory bandwidth? Redundancy or not? Stripe by hand, by LVM, or not
    at all? and many others) the selection of RAID level is actually a
    minor point.

    --
    The address in the header is invalid for obvious reasons. Please
    reconstruct the address from the information below (look for _).
    Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us

  6. Re: Very High Rate Continous Transfer

    > Also, AFAIK the AGP bus is a superset of the PCI bus, just on a
    > different connector. If you are custom-building your electronics, you
    > might want to use the AGP connector.


    AGP is obsolete, being replaced by PCI-X in all newer mobos, even for home
    gaming platforms.

    AGP is rather hardly 3D-oriented, while PCI-X is universal.

    --
    Maxim Shatskih, Windows DDK MVP
    StorageCraft Corporation
    maxim@storagecraft.com
    http://www.storagecraft.com



  7. Re: Very High Rate Continous Transfer

    Maxim S. Shatskih wrote:

    > AGP is obsolete, being replaced by PCI-X in all newer mobos, even for home
    > gaming platforms.


    PCI-express, that is. PCI-X is 100/133 MHz PCI in servers, PCI express
    is a serial version.


    Thomas

  8. Re: Very High Rate Continous Transfer

    I appreciate the advice. I am absorbing it a fast as I can.

    I'll be using this as a real-time recording device, and data can be
    transferred to more reliable long term memory after the recording
    procedure is completed, which may be several hours later.

    I am inclined to think that RAID-0 for performance, or RAID-5 for some
    redundancy, is the way to go. I am also considering 2.5" vs 3.5"
    drives, and a review at Tom's Hardware suggests an investment in 2.5"
    will reduce my noise and power by a lot. Also, I expect the equivalent
    array of 2.5" drives to be cooler, lighter and inherently more rugged,
    which I need for this application.

    I will need at least 600 MB of hard drive storage.

    I may be willing to reduce the hard drive continuous bandwidth by half
    (at the expense of losing a main feature). All other requirements
    remain the same.

    Now I will need to identify RAID controllers and a processor board. I
    would prefer a 3U form factor.

    I'll need about 10 GB of fast memory, and will DMA data to it from an
    input device. I expect this to be pricey, so no need to beat me about
    that.

    I am going about this blindly at the moment, but any suggestions will
    be greatly appreciated.

    You are all very helpful, and I appreciate it very much.

    Jim


  9. Re: Very High Rate Continous Transfer


    wrote in message
    news:1121528669.995816.282650@o13g2000cwo.googlegr oups.com...
    > I am looking into what it will take to support continuous (not burst)
    > 160 to 200 MBytes/sec transfer to disk. What types of drives and how
    > would they be configured as an array (or multiple arrays)?
    >
    > What type of processor and bus architecture would be appropriate? Data
    > will be from a capture memory that is shared between the processor and
    > the capture electronics. So memory bandwidth will be at least 320 to
    > 400 MBytes/sec.


    Might be worth looking for articles and papers from CERN - they've been
    doing this sort of stuff for years and used to publishe papers on the
    computing architectures.



  10. Re: Very High Rate Continous Transfer

    You need to just go get a CX700... It will handle the I/O you are
    talking about even with a bad disk. Just make sure you have a global
    hot spare for every shelf. As long as the data is not random you will
    be fine with Raid 5 on that arrary.


  11. Re: Very High Rate Continous Transfer

    In article ,
    Stephen Maudsley wrote:
    >
    > wrote in message
    >news:1121528669.995816.282650@o13g2000cwo.googlegr oups.com...
    >> I am looking into what it will take to support continuous (not burst)
    >> 160 to 200 MBytes/sec transfer to disk. What types of drives and how
    >> would they be configured as an array (or multiple arrays)?
    >>
    >> What type of processor and bus architecture would be appropriate? Data
    >> will be from a capture memory that is shared between the processor and
    >> the capture electronics. So memory bandwidth will be at least 320 to
    >> 400 MBytes/sec.

    >
    >Might be worth looking for articles and papers from CERN - they've been
    >doing this sort of stuff for years and used to publishe papers on the
    >computing architectures.


    Being a retired high-energy physicist (and former CERN collaborator)
    myself ...

    Yes, it would be a good idea to start there, and read their stuff. A
    good starting point is to look for the web presence of the "CERN
    OpenLab", and read what is posted there.

    But the original poster's situation and CERN are in different leagues.
    I've recently seen a ~1 GByte/sec test running at CERN, sustained for
    a whole week. But it required O(100) computers, massive networking
    gear, and many hundred disk drives, with some of the finer software
    and hardware products from industry thrown into the mix. It also
    consumed all told probably a dozen people (both from CERN and from
    industry) for a year to set up, and the hardware cost should be
    measured in units of M$.

    The other thing to remember is that to CERN, the data storage problem
    (even though it is massive) is a small part of their overall mission.
    Anyone who spends ~10 billion $ on building an accelerator, about the
    same on the physics experiments, and a few billion $ a year on
    operation and support, has a strong incentive to build a reliable and
    fast data storage system, because loss of data would have huge
    economic costs.

    I very much doubt that the original poster's system will reach this
    scale; still, stealing some good ideas there is a good plan.

    Another thing to remember from the CERN experience: Just because the
    system can do a certain speed (say 400 MB/sec) once, doesn't mean at
    all that it can do so sustained. Things go wrong all the time
    (guaranteed to happen in a large system, which typically even involves
    a few humans, which are about as unreliable as disk drives, and nobody
    has invented RAID for sys admins yet). The real test is not to do 400
    MB/sec for 10 seconds, but do so sustained 24x7 for a month. This is
    much much harder.

    --
    The address in the header is invalid for obvious reasons. Please
    reconstruct the address from the information below (look for _).
    Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca.us

  12. Re: Very High Rate Continous Transfer

    Because of the nature of the data I am saving, I think I can simplify
    quite a bit. I'll still need to figure out what hardware I need to
    support this, but here is the direction I am heading:

    Most of the time the data will be in blocks that are about 1 mS in
    sample time duration. At 160 MBps, that's only 160KB per block. With a
    set of 7 or 8 separate physical volumes, I will write data blocks into
    separate files, sequentially writing in the next physical drive for
    each block. I can do this under software control.

    (I am not worried about redundancy at the moment. This data will be
    stored for only a few hours before it is transferred to a server, where
    data security can be addressed.)

    I read a report at Tom's Hardware that shows the worst case write
    bandwidth for a 2.5" Toshiba MK1032GAX 100 GB drive assymtotically goes
    to about 27 MBps as the drive becomes full. (Can someone verify my
    understanding of this? It's here:
    http://www.tomshardware.com/storage/...transfer_graph
    )

    Overhead
    --------
    I am not sure what the processor overhead will be to open and close
    files while doing this. The alternative is to stream the blocks into
    larger files, which only changes the data read process.

    With this process, I can probably rely on disk cache to absorb most
    remaining delays (like seek time).

    Drive reliability
    -----------------
    I am wondering if I can stream data into these blocks continuously
    without buffering it in main memory. As I said in an earlier post, I
    want to continuously capture into memory and decide when to offload the
    last 10 GB of recorded data. But I am now thinking I can stream this
    data directly to the disks (with a significant savings in main memory)
    and overwrite that which I don't want to keep. How hard is this on the
    drives, if I do this continuously for 12 hours striaght, or for 24/7?

    Drive Controller
    ----------------
    With the solution outline given above, I will need a controller,
    preferrably in a 3U format (cPCI). Like I said, it will support 7 to 8
    independent physical drives, at a minimum. Does anyone have a
    suggestion?


    Regards,
    Jim


  13. Re: Very High Rate Continous Transfer

    jim_nospam_beasley@yahoo.com wrote:
    > Because of the nature of the data I am saving, I think I can simplify
    > quite a bit. I'll still need to figure out what hardware I need to
    > support this, but here is the direction I am heading:
    >
    > Most of the time the data will be in blocks that are about 1 mS in
    > sample time duration. At 160 MBps, that's only 160KB per block. With a
    > set of 7 or 8 separate physical volumes, I will write data blocks into
    > separate files, sequentially writing in the next physical drive for
    > each block. I can do this under software control.


    It would be even easier using a single file spread across the disks
    under RAID-0 software control (you're effectively talking above about
    recreating RAID-0 in your application).

    >
    > (I am not worried about redundancy at the moment. This data will be
    > stored for only a few hours before it is transferred to a server, where
    > data security can be addressed.)


    Hmmm. 3 hrs. x 3600 sec/hr. x 160 MB/sec = 1.728 TB - considerably more
    space than you'll have using 7 or 8 100 GB drives even if you manage it
    optimally.

    >
    > I read a report at Tom's Hardware that shows the worst case write
    > bandwidth for a 2.5" Toshiba MK1032GAX 100 GB drive assymtotically goes
    > to about 27 MBps as the drive becomes full. (Can someone verify my
    > understanding of this? It's here:
    > http://www.tomshardware.com/storage/...transfer_graph
    > )


    The number sounds reasonable, but you should still leave a bit of margin
    just in case (especially using a non-RAID-3 array where the disks won't
    be synchronized with each other, though your application may tend to be
    self-synchronizing). Of course, you should check the manufacturer's
    spec sheet too.

    >
    > Overhead
    > --------
    > I am not sure what the processor overhead will be to open and close
    > files while doing this.


    You almost certainly don't want to be opening and closing files at all
    frequently: that could start to screw up your data rate to disk (even
    if the relevant file data is usually cached, it often gets updated on
    close). For that matter, you'll want to suppress any frequent on-disk
    updates to things like the file's last-accessed and last-modified times,
    reuse existing file space rather than allocate new space to avoid
    on-disk allocation update activity and suppress end-of-file-mark
    updates, etc.

    The alternative is to stream the blocks into
    > larger files, which only changes the data read process.
    >
    > With this process, I can probably rely on disk cache to absorb most
    > remaining delays (like seek time).


    Quite possibly not at the data rates you're talking about.

    >
    > Drive reliability
    > -----------------
    > I am wondering if I can stream data into these blocks continuously
    > without buffering it in main memory.


    Probably not - see previous comment. Besides, if you don't go through
    main memory you'd be completely by-passing the file system and writing
    driver code. But using asynchronous multi-buffering you can stay within
    the realm of normal application behavior without needing much memory.

    As I said in an earlier post, I
    > want to continuously capture into memory and decide when to offload the
    > last 10 GB of recorded data. But I am now thinking I can stream this
    > data directly to the disks (with a significant savings in main memory)
    > and overwrite that which I don't want to keep. How hard is this on the
    > drives, if I do this continuously for 12 hours striaght, or for 24/7?
    >
    > Drive Controller
    > ----------------
    > With the solution outline given above, I will need a controller,
    > preferrably in a 3U format (cPCI). Like I said, it will support 7 to 8
    > independent physical drives, at a minimum. Does anyone have a
    > suggestion?


    If you're as cost-conscious as you appear to be, consider 3.5" SATA
    drives - which will give you the temporary storage space you need and
    comparable or better bandwidth in numbers that should fit in a 3U
    enclosure. 3Ware makes controllers which may handle the bandwidth when
    used as a simple JBOD (I've heard varying reports of their capabilities
    at the higher RAID levels).

    - bill

+ Reply to Thread