Dumping high-bandwidth data from network to disk. - Linux

This is a discussion on Dumping high-bandwidth data from network to disk. - Linux ; We want to dump high-bandwidth data (around 1 Gbit/second) from a network card to disks: - What performance can we expect from Linux add how does it depend on my choice of hardware? - What kind of harddisk should I ...

+ Reply to Thread
Results 1 to 7 of 7

Thread: Dumping high-bandwidth data from network to disk.

  1. Dumping high-bandwidth data from network to disk.

    We want to dump high-bandwidth data (around 1 Gbit/second) from a
    network card to disks:
    - What performance can we expect from Linux add how does it depend on
    my choice of hardware?
    - What kind of harddisk should I use: SATA or SCSI? I have heard that
    SATA can do about 40 Mbytes/second.
    - We are assuming that UDP is the correct layer to use in the network
    programming.
    - Is it recommended to use a standard desktop distribution like
    Ubuntu, Fedora or SUSE?

    Thanks in advance,
    Nordl÷w

  2. Re: Dumping high-bandwidth data from network to disk.

    On Apr 28, 6:52 am, Nordl÷w wrote:
    > We want to dump high-bandwidth data (around 1 Gbit/second) from a
    > network card to disks:
    > - What performance can we expect from Linux add how does it depend on
    > my choice of hardware?
    > - What kind of harddisk should I use: SATA or SCSI? I have heard that
    > SATA can do about 40 Mbytes/second.
    > - We are assuming that UDP is the correct layer to use in the network
    > programming.
    > - Is it recommended to use a standard desktop distribution like
    > Ubuntu, Fedora or SUSE?


    Bursts or continuous? If continuous, for how long? Are you seriously
    planning to fill a disk at those speeds? If so, you will need *way*
    beyond commodity hardware.

    DS

  3. Re: Dumping high-bandwidth data from network to disk.

    > We want to dump high-bandwidth data (around 1 Gbit/second) from a
    > network card to disks:
    > - What performance can we expect from Linux add how does it depend on
    > my choice of hardware?


    Procure a test setup, and measure it yourself.

    If "network card" implies PCI bus, then the absolute limit is 133 MB/s,
    which barely handles 1 Gbit/s. PCI-E can handle 300 MB/s.
    A network interface connected directly to the main chipset controller
    might in theory handle the entire memory bandwidth of several GB/s,
    but remember the 1 Gbit/s limit for inexpensive network hardware.

    > - What kind of harddisk should I use: SATA or SCSI? I have heard that
    > SATA can do about 40 Mbytes/second.


    I have a commodity 320 GB SATA drive that reads 70 MiB/s and
    writes 45 MiB/s on a box with one lightly-loaded 2 GHz CPU.
    So writing 1 Gbit/s requires at least 3-way striping on such hardware.
    Note that 320 GB / 125 MB/s is only 40 minutes or so. Probably
    you should investigate enterprise-class network attached storage.

    Pay attention to the filesystem type, too. A filesystem that uses a
    journal for data consistency must write everything twice.

    > - We are assuming that UDP is the correct layer to use in the network
    > programming.
    > - Is it recommended to use a standard desktop distribution like
    > Ubuntu, Fedora or SUSE?


    All of those distributions can handle the job for a few hours at a time.
    Then you must exchange the full physical drives for empty ones.

    --

  4. Re: Dumping high-bandwidth data from network to disk.

    On 28 Apr, 18:06, John Reiser wrote:
    > > We want to dump high-bandwidth data (around 1 Gbit/second) from a
    > > network card to disks:
    > > - What performance can we expect from Linux add how does it depend on
    > > my choice of hardware?

    >
    > Procure a test setup, and measure it yourself.
    >
    > If "network card" implies PCI bus, then the absolute limit is 133 MB/s,
    > which barely handles 1 Gbit/s. PCI-E can handle 300 MB/s.
    > A network interface connected directly to the main chipset controller
    > might in theory handle the entire memory bandwidth of several GB/s,
    > but remember the 1 Gbit/s limit for inexpensive network hardware.
    >
    > > - What kind of harddisk should I use: SATA or SCSI? I have heard that
    > > SATA can do about 40 Mbytes/second.

    >
    > I have a commodity 320 GB SATA drive that reads 70 MiB/s and
    > writes 45 MiB/s on a box with one lightly-loaded 2 GHz CPU.
    > So writing 1 Gbit/s requires at least 3-way striping on such hardware.
    > Note that 320 GB / 125 MB/s is only 40 minutes or so. Probably
    > you should investigate enterprise-class network attached storage.
    >
    > Pay attention to the filesystem type, too. A filesystem that uses a
    > journal for data consistency must write everything twice.
    >
    > > - We are assuming that UDP is the correct layer to use in the network
    > > programming.
    > > - Is it recommended to use a standard desktop distribution like
    > > Ubuntu, Fedora or SUSE?

    >
    > All of those distributions can handle the job for a few hours at a time.
    > Then you must exchange the full physical drives for empty ones.
    >
    > --


    What kind of filesystem supported by the recent linux kernel would you
    recommend if I want to optimize the bandwidth presumably without the
    journalling? Thereby of course sacrificing fault-tolerance on
    accidental reboot etc. Can I deactivate journalling on ext3, reiser3
    etc.

    /Nordl÷w

  5. Re: Dumping high-bandwidth data from network to disk.

    > What kind of filesystem supported by the recent linux kernel would you
    > recommend if I want to optimize the bandwidth presumably without the
    > journalling? Thereby of course sacrificing fault-tolerance on
    > accidental reboot etc. Can I deactivate journalling on ext3, reiser3
    > etc.


    ext3 can be mounted with option "data=[ journal | ordered | writeback ]"
    to control how much is journaled. ext2 and ext3 have the same on-disk
    format, so mount an ext3 filesystem as "-t ext2" to get no journaling
    at all.

    To minimize filesystem overhead, look for extent-based allocation
    of space. The xfs filesystem is recommended by many who record
    live video to disk. The bandwidth requirements may be less than yours,
    but otherwise the usage is similar. The zfs filesystem has adherents,
    but the linux code is somewhat new. reiser3 has been around for a while.
    There are other extent-based file systems, also.

    It would be foolish to make decisions about a project of this
    magnitude without doing some experiments yourself. You can certainly
    afford ten days and up to a thousand euros to build a RAID array
    on a 2GHz box, then experiment and measure. If it works, then use it.
    If not, then sell the system or pieces.

    --

  6. Re: Dumping high-bandwidth data from network to disk.

    Nordl÷w wrote:
    > We want to dump high-bandwidth data (around 1 Gbit/second) from a
    > network card to disks:
    > - What performance can we expect from Linux add how does it depend on
    > my choice of hardware?
    > - What kind of harddisk should I use: SATA or SCSI? I have heard that
    > SATA can do about 40 Mbytes/second.
    > - We are assuming that UDP is the correct layer to use in the network
    > programming.
    > - Is it recommended to use a standard desktop distribution like
    > Ubuntu, Fedora or SUSE?
    >
    > Thanks in advance,
    > Nordl÷w


    The new 7200 RPM Seagate Barracuda ES drives claim to maximum sustained
    transfer rate of 105MB/s, but I haven't tested them myself. Past
    personal experience with Seagate in this area has been good. They
    usually do what they claim. And these drives are relatively inexpensive.

    There is also a SAS version of the same drive, which comes in higher
    capacities (and higher prices).

    Don't count on getting 105MB/s on the entire volume, and when (if) you
    put a filesystem on top of it, you will lose additional throughput as well.

    If your input rate is 1Gb/s (Ethernet?) one drive may be too slow. You
    may have to consider a RAID0 of two drives for performance margin. Many
    controllers (some motherboards) have RAID1 on board, and allow you to
    combine into RAID10 for high speed and redundancy. Depends on your HW.

    As far as filesystems, I haven't seen anything come close to XFS. The
    only exception is JFS, which as I recall does well when you use
    O_DIRECT. O_DIRECT makes your writes more deterministic, but you can't
    then take advantage of filesystem cache. I have found it's the only way
    to get consistent throughput.

    Your data rate (100MB/s) is actually not very fast at all on a modern
    x86 server class motherboard (Intel or AMD). The preceding paragraph
    might be moot if your using one of these.

    As far as network, TCP sockets can get close to 100% utilization if done
    right. Look at ttcp, for example.

    As far as Linux, yes it's up to the task. I wouldn't consider anything
    earlier than a 2.6 kernel though. My experience is entirely with x86
    server class motherboards, however.

    Hope that helps.

    John


  7. Re: Dumping high-bandwidth data from network to disk.

    John Reiser wrote:

    > If "network card" implies PCI bus, then the absolute limit is 133 MB/s,
    > which barely handles 1 Gbit/s.


    AFAIU, there may be more than one PCI bus in a system.

    > PCI-E can handle 300 MB/s.


    A *single* lane can handle 250 MB/s full duplex (PCIe 1.1)
    (250 MB/s up + 250 MB/s down at the same time).

    PCIe 2.0 doubled the bandwidth to 500 MB/s per lane.

    A typical systems provides 16-32 such lanes.

    > A network interface connected directly to the main chipset controller
    > might in theory handle the entire memory bandwidth of several GB/s,
    > but remember the 1 Gbit/s limit for inexpensive network hardware.


    What are you trying to say?

    Regards.

+ Reply to Thread