Poor GB Ethernet performance - 2.6.18 kernel - TCP-IP

This is a discussion on Poor GB Ethernet performance - 2.6.18 kernel - TCP-IP ; Hello: If I've posted to the wrong group, I apologize in advance. Please refer me to the correct group. I'm having a performance issue with my GB Ethernet network (that we are setting up for an ISCSI SAN). The network ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: Poor GB Ethernet performance - 2.6.18 kernel

  1. Poor GB Ethernet performance - 2.6.18 kernel

    Hello:

    If I've posted to the wrong group, I apologize in advance. Please
    refer me to the correct group.

    I'm having a performance issue with my GB Ethernet network (that we
    are setting up for an ISCSI SAN). The network consists of:

    1) Dell 6224 switch
    2) Dell 1950 (Dual Quad-Core 2.6GHz/4MB Cache/4GB RAM) with an Intel
    EtherExpress Pro/1000 Quad Port card running CentOS 5 latest patch
    level 2.6.18-8.1.8.el5
    3) Dell SC1425 (Dual 2.8GHz/1MB Cache/2GB RAM) with the on-board
    Broadcom NetExtreme ports running CentOS 5 latest patch level
    2.6.18-8.1.8.el5
    4) Dell 8250 (3GHz Pentium 4/1GB RAM) running Windows XP SP2 -
    Linksys EG1032 v1
    5) Dell XPS 600 (2.8GHz Pentium D/2 GB RAM) running Windows XP SP2 -
    Netgear GA311
    6) Equallogic PS3900XV

    All connected with Cat-6 cabling (brand new)

    I have Jumbo Frames enabled where applicable and my MTU is set to
    9000. I have Flow Control enabled everywhere, storm control turned
    off in the switch, and 9016 packet size enabled in the switch on the
    appropriate ports. The ports are set up in an un-tagged VLAN in the
    switch.

    Performance numbers with Netperf v2.4.3 using the TCP STREAM test
    (netperf -f M -H ):

    1) Dell SC1425 to Dell 1950 - 65MB/sec
    2) Dell 1950 to Dell SC1425 - 65MB/sec

    I've also gotten similar numbers using IOMeter between the Linux
    systems.

    Performance numbers using IOMeter 2006_07_27 between the PC's for a
    comparison (1 network worker, 32K 100% Reads, no disk workers):

    1) Dell 8250 to XPS 600 - 75MB/sec
    2) XPS 600 to Dell 8250 - 58MB/sec

    I think the theoretical maximum speed is 125MB/sec. We can't even get
    close. I would have expected 100MB/sec or more from the Linux
    machines. Those are the machines we are really worried about. At the
    very least, I would expect the Linux machines to pump data faster than
    my PC. :-(

    An Equallogic tech support specialist says that he routinely sees
    100MB/sec from Linux. He has spent hours on the systems with me
    remotely, but we can't seem to find anything mis-configured.

    Please help me figure out why we can't get the speeds we hoped for.
    What can we tune? Where might we look? What are we doing wrong? Are
    our expectations wrong?

    TIA

    Eric Raskin
    eraskin@paslists.com


  2. Re: Poor GB Ethernet performance - 2.6.18 kernel

    eraskin@paslists.com wrote:
    > If I've posted to the wrong group, I apologize in advance. Please
    > refer me to the correct group.


    Depending on where things go, you might want the
    comp.os.linux.networking (IIRC) group, or comp.dcom.lans.ethernet.

    > I'm having a performance issue with my GB Ethernet network (that we
    > are setting up for an ISCSI SAN). The network consists of:


    > 1) Dell 6224 switch
    > 2) Dell 1950 (Dual Quad-Core 2.6GHz/4MB Cache/4GB RAM) with an Intel
    > EtherExpress Pro/1000 Quad Port card running CentOS 5 latest patch
    > level 2.6.18-8.1.8.el5
    > 3) Dell SC1425 (Dual 2.8GHz/1MB Cache/2GB RAM) with the on-board
    > Broadcom NetExtreme ports running CentOS 5 latest patch level
    > 2.6.18-8.1.8.el5
    > 4) Dell 8250 (3GHz Pentium 4/1GB RAM) running Windows XP SP2 -
    > Linksys EG1032 v1
    > 5) Dell XPS 600 (2.8GHz Pentium D/2 GB RAM) running Windows XP SP2 -
    > Netgear GA311
    > 6) Equallogic PS3900XV


    > All connected with Cat-6 cabling (brand new)


    > I have Jumbo Frames enabled where applicable and my MTU is set to


    Exactly what do you mean by "where applicable?" JumboFrames is one of
    those "entire broadcast domain or nothing" sorts of things (well, if
    all one does is speak TCP perhaps not, but it is easier to explain
    that way...)

    > 9000. I have Flow Control enabled everywhere, storm control turned
    > off in the switch, and 9016 packet size enabled in the switch on the
    > appropriate ports. The ports are set up in an un-tagged VLAN in the
    > switch.


    > Performance numbers with Netperf v2.4.3 using the TCP STREAM test
    > (netperf -f M -H ):


    > 1) Dell SC1425 to Dell 1950 - 65MB/sec
    > 2) Dell 1950 to Dell SC1425 - 65MB/sec


    Well at least you know it wasn't stepping-down to 100 Mbit/s

    What happens if you add settings for socket buffer and send size?

    netperf -f M -H -- -s -S -m

    where is something like say "256K" and "64K" or perhaps
    "32K" ?

    > I think the theoretical maximum speed is 125MB/sec.


    Well, you have 1 billion bits per second (power of 10). That is
    indeed 125 million Bytes per second. In netperf parlance though "M"
    is a power of two, not a power of ten. In power-of-two MBytes/s that
    is then 119.21 MByte/s as the raw rate on the wire. If you then
    consider the header overheads that is then 14 bytes of Ethernet header
    (i'm ignoring vlans) plus probably 32 bytes of TCP header (I'm
    assuming timestamps are enabled, which IIRC adds 12 bytes to the
    standard 20 byte TCP header) and 20 bytes of IP header, so you have an
    "efficiency" on the link of:

    8948/9014 or 0.99 and we arrive at something like 118 MByte/s in the
    units netperf emits for -f M.

    > We can't even get close. I would have expected 100MB/sec or more
    > from the Linux machines. Those are the machines we are really
    > worried about. At the very least, I would expect the Linux machines
    > to pump data faster than my PC. :-(


    Well, we start to get into questions like:

    *) CPU util on _each_ CPU on either side. You can add -c and -C to
    netperf:

    netperf -c -C ...

    but keep in mind it reports overall CPU util from 0 to 100% regardless
    of CPU count, so 25% could mean all of one core of a four-core system
    was consumed.

    *) were there any packet losses - what does ethtool say about
    link-level stats? what does netstat say about TCP-level stats? take
    before and after snapshots when you run netperf and run them through
    "beforeafter" from ftp://ftp.cup.hp.com/dist/networking/tools/

    *) do the GbE NICs in your linux systems support ChecKsum Offload
    (CKO)? Do they support Tcp Segmentation Offload (TSO)? Variations on
    an ethtool command can show that.

    *) What manner of PCI slot is holding the NIC? The systems above
    sound like new-enough systems that they wouldn't have ancient/slow PCI
    slots but still good to make triply sure I suppose...particularly on
    the 8250 and the 600 I'm guessing.

    *) what happens if you disable flow-control everywhere?

    rick jones
    --
    web2.0 n, the dot.com reunion tour...
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  3. Re: Poor GB Ethernet performance - 2.6.18 kernel

    Wow -- thanks for the quick response. (Sorry for the long post...)

    First of all, jumbo frames are set in the Dell 6224 switch by setting
    MTU to 9016 on each port. It is set on the Linux boxes by MTU=9000 in
    the network config scripts for each interface. It is set on the PC's
    by configuring Packet Size "3) 9014 (Alteon)" in the Advanced
    Properties in the Device Manager on the Dell 8250. It is set on the
    XPS 600 by setting Jumbo Frame=Enable on the Netgear GA311.

    As for disabling flow control, that could be a problem. Equallogic
    specifically asks that it be on so that iSCSI packets are not lost
    when the SAN disks can't keep up. We really need that to be kept
    enabled. However, I will test without it and post again. In the
    meantime, here's everything you asked about (I hope!) with flow
    control enabled.

    --------------------------------------------------------------------------------------------------------------------------------
    Dell SC1425 settings:
    -------------------------------

    Ethernet port on Motherboard (lspci -v output):

    02:04.0 Ethernet controller: Intel Corporation 82541GI Gigabit
    Ethernet Controller (rev 05)
    Subsystem: Dell PowerEdge SC1425
    Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 217
    Memory at fe9e0000 (32-bit, non-prefetchable) [size=128K]
    I/O ports at ecc0 [size=64]
    Capabilities: [dc] Power Management version 2
    Capabilities: [e4] PCI-X non-bridge device


    eth1 Link encap:Ethernet HWaddr 00:11:43:FD:73:5D
    inet addr:192.168.100.102 Bcast:192.168.100.255 Mask:
    255.255.255.0
    inet6 addr: fe80::211:43ff:fefd:735d/64 Scope:Link
    UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
    RX packets:12858539 errors:0 dropped:0 overruns:0 frame:0
    TX packets:11218140 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:86471109458 (80.5 GiB) TX bytes:26362954975 (24.5
    GiB)
    Base address:0xccc0 Memory:fe3e0000-fe400000

    # ethtool eth1
    Settings for eth1:
    Supported ports: [ TP ]
    Supported link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Full
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 0
    Transceiver: internal
    Auto-negotiation: on
    Supports Wake-on: umbg
    Wake-on: d
    Current message level: 0x00000007 (7)
    Link detected: yes

    # ethtool -a eth1
    Pause parameters for eth1:
    Autonegotiate: on
    RX: on
    TX: on

    # ethtool -k eth1
    Offload parameters for eth1:
    Cannot get device udp large send offload settings: Operation not
    supported
    rx-checksumming: on
    tx-checksumming: on
    scatter-gather: on
    tcp segmentation offload: on
    udp fragmentation offload: off
    generic segmentation offload: off

    ------------------------------------------------------------------------------------------------------------------
    Dell 1950 settings:
    ---------------------------

    Intel EtherExpress Pro/1000 Quad Port in PCI-Express slot (lscpi -v
    output):

    10:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit
    Ethernet Controller (Copper) (rev 06)
    Subsystem: Intel Corporation PRO/1000 PT Quad Port LP Server
    Adapter
    Flags: bus master, fast devsel, latency 0, IRQ 138
    Memory at fc7e0000 (32-bit, non-prefetchable) [size=128K]
    Memory at fc7c0000 (32-bit, non-prefetchable) [size=128K]
    I/O ports at ece0 [size=32]
    Expansion ROM at fc800000 [disabled] [size=128K]
    Capabilities: [c8] Power Management version 2
    Capabilities: [d0] Message Signalled Interrupts: 64bit+
    Queue=0/0 Enable+
    Capabilities: [e0] Express Endpoint IRQ 0
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [140] Device Serial Number 5c-35-34-ff-
    ff-17-15-00

    eth2 Link encap:Ethernet HWaddr 00:15:17:34:35:5D
    inet addr:192.168.100.101 Bcast:192.168.100.255 Mask:
    255.255.255.0
    inet6 addr: fe80::215:17ff:fe34:355d/64 Scope:Link
    UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
    RX packets:19011337 errors:0 dropped:0 overruns:0 frame:0
    TX packets:19488544 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:93258926725 (86.8 GiB) TX bytes:92409469881 (86.0
    GiB)
    Base address:0xece0 Memory:fc7e0000-fc800000

    # ethtool eth2
    Settings for eth2:
    Supported ports: [ TP ]
    Supported link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    1000baseT/Full
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 0
    Transceiver: internal
    Auto-negotiation: on
    Supports Wake-on: umbg
    Wake-on: g
    Current message level: 0x00000007 (7)
    Link detected: yes

    # ethtool -a eth2
    Pause parameters for eth2:
    Autonegotiate: on
    RX: on
    TX: on

    # ethtool -k eth2
    Offload parameters for eth2:
    Cannot get device udp large send offload settings: Operation not
    supported
    rx-checksumming: on
    tx-checksumming: on
    scatter-gather: on
    tcp segmentation offload: on
    udp fragmentation offload: off
    generic segmentation offload: off

    ------------------------------------------------------------------------------------
    >From SC1425 to 1950:

    --------------------------------

    # ./netperf -f M -H 192.168.100.101 -c -C -- -s 256k -S 256k -m 32k
    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    192.168.100.101 (192.168.100.101) port 0 AF_INET
    Recv Send Send
    Utilization Service Demand
    Socket Socket Message Elapsed Send
    Recv Send Recv
    Size Size Size Time Throughput local
    remote local remote
    bytes bytes bytes secs. MBytes /s % S %
    S us/KB us/KB

    512000 512000 32000 10.01 65.13 2.81
    4.72 1.687 5.667

    beforeafter output
    -------------------------

    Ip:
    40904 total packets received
    0 forwarded
    0 incoming packets discarded
    40904 incoming packets delivered
    25392 requests sent out
    0 outgoing packets dropped
    0 fragments received ok
    0 fragments created
    Icmp:
    0 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
    destination unreachable: 0
    echo requests: 0
    echo replies: 0
    0 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
    destination unreachable: 0
    echo replies: 0
    Tcp:
    31 active connections openings
    25 passive connection openings
    4 failed connection attempts
    5 connection resets received
    0 connections established
    40840 segments received
    25331 segments send out
    0 segments retransmited
    0 bad segments received.
    11 resets sent
    Udp:
    63 packets received
    0 packets to unknown port received.
    0 packet receive errors
    61 packets sent
    TcpExt:
    0 invalid SYN cookies received
    0 resets received for embryonic SYN_RECV sockets
    0 packets pruned from receive queue because of socket buffer
    overrun
    22 TCP sockets finished time wait in fast timer
    0 time wait sockets recycled by time stamp
    49 delayed acks sent
    0 delayed acks further delayed because of locked socket
    Quick ack mode was activated 0 times
    278 packets directly queued to recvmsg prequeue.
    0 packets directly received from backlog
    27711 packets directly received from prequeue
    61 packets header predicted
    111 packets header predicted and directly queued to user
    132 acknowledgments not containing data received
    39975 predicted acknowledgments
    0 times recovered from packet loss due to SACK data
    0 congestion windows recovered after partial ack
    0 TCP data loss events
    0 fast retransmits
    0 forward retransmits
    0 retransmits in slow start
    0 other TCP timeouts
    0 sack retransmits failed
    0 packets collapsed in receive queue due to low socket buffer
    0 DSACKs sent for old packets
    0 DSACKs received
    3 connections reset due to unexpected data
    6 connections reset due to early user close
    0 connections aborted due to timeout

    -----------------------------------------------------------------------------------------------------------------------
    >From 1950 to SC1425:

    ---------------------------------

    ../netperf -f M -H 192.168.100.102 -c -C -- -s 256k -S 256k -m 32k
    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    192.168.100.102 (192.168.100.102) port 0 AF_INET
    Recv Send Send
    Utilization Service Demand
    Socket Socket Message Elapsed Send
    Recv Send Recv
    Size Size Size Time Throughput local
    remote local remote
    bytes bytes bytes secs. MBytes /s % S %
    S us/KB us/KB

    512000 512000 32000 10.01 64.86 4.55
    4.66 5.482 2.808

    beforeafter output:
    --------------------------

    Ip:
    39994 total packets received
    0 with invalid headers
    0 forwarded
    0 incoming packets discarded
    39994 incoming packets delivered
    24571 requests sent out
    0 reassemblies required
    0 packets reassembled ok
    Icmp:
    0 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
    destination unreachable: 0
    echo requests: 0
    echo replies: 0
    0 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
    destination unreachable: 0
    echo replies: 0
    Tcp:
    2 active connections openings
    0 passive connection openings
    0 failed connection attempts
    0 connection resets received
    0 connections established
    39846 segments received
    24425 segments send out
    0 segments retransmited
    0 bad segments received.
    0 resets sent
    Udp:
    147 packets received
    0 packets to unknown port received.
    0 packet receive errors
    146 packets sent
    TcpExt:
    0 resets received for embryonic SYN_RECV sockets
    0 TCP sockets finished time wait in fast timer
    0 time wait sockets recycled by time stamp
    0 delayed acks sent
    0 delayed acks further delayed because of locked socket
    Quick ack mode was activated 0 times
    31 packets directly queued to recvmsg prequeue.
    0 packets directly received from backlog
    0 packets directly received from prequeue
    21 packets header predicted
    0 packets header predicted and directly queued to user
    55 acknowledgments not containing data received
    39680 predicted acknowledgments
    0 times recovered from packet loss due to SACK data
    Detected reordering 0 times using FACK
    Detected reordering 0 times using time stamp
    0 congestion windows fully recovered
    0 congestion windows partially recovered using Hoe heuristic
    TCPDSACKUndo: 0
    0 congestion windows recovered after partial ack
    0 TCP data loss events
    0 fast retransmits
    0 forward retransmits
    0 retransmits in slow start
    0 other TCP timeouts
    0 sack retransmits failed
    0 DSACKs sent for old packets
    0 DSACKs received
    0 connections reset due to unexpected data
    0 connections reset due to early user close

    Hopefully I've gotten all that you requested. Do you see anything
    obviously wrong?

    Eric



  4. Re: Poor GB Ethernet performance - 2.6.18 kernel

    eraskin@paslists.com wrote:
    > # ./netperf -f M -H 192.168.100.101 -c -C -- -s 256k -S 256k -m 32k
    > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    > 192.168.100.101 (192.168.100.101) port 0 AF_INET
    > Recv Send Send
    > Utilization Service Demand
    > Socket Socket Message Elapsed Send
    > Recv Send Recv
    > Size Size Size Time Throughput local
    > remote local remote
    > bytes bytes bytes secs. MBytes /s % S %
    > S us/KB us/KB


    > 512000 512000 32000 10.01 65.13 2.81
    > 4.72 1.687 5.667


    k caused the number to be multiplied by 1000. Had it been K it would
    have been 1024 Doubt that really matters though.

    > Tcp:
    > 31 active connections openings
    > 25 passive connection openings
    > 4 failed connection attempts


    If this is beforeafter output it suggests that other stuff is
    happening on the systems at the same time? Doesn't seem to matter to
    the CPU util reported by netperf though...


    > Tcp:
    > 2 active connections openings


    That would be consistent with the netperf side of a test system which
    wasn't doing other connections - first the control connection is
    established, then the data connection.

    > Hopefully I've gotten all that you requested. Do you see anything
    > obviously wrong?


    Alas no. I don't suppose there is any way to go back-to-back with
    these systems?

    The CPU utilization figures support jumboframe being used, but just
    for grins, try a -v 2 option and see what netperf reports for the MSS.

    rick jones
    --
    web2.0 n, the dot.com reunion tour...
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  5. Re: Poor GB Ethernet performance - 2.6.18 kernel

    On Aug 10, 6:43 pm, Rick Jones wrote:
    > eras...@paslists.com wrote:
    > > # ./netperf -f M -H 192.168.100.101 -c -C -- -s 256k -S 256k -m 32k
    > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    > > 192.168.100.101 (192.168.100.101) port 0 AF_INET
    > > Recv Send Send
    > > Utilization Service Demand
    > > Socket Socket Message Elapsed Send
    > > Recv Send Recv
    > > Size Size Size Time Throughput local
    > > remote local remote
    > > bytes bytes bytes secs. MBytes /s % S %
    > > S us/KB us/KB
    > > 512000 512000 32000 10.01 65.13 2.81
    > > 4.72 1.687 5.667

    >
    > k caused the number to be multiplied by 1000. Had it been K it would
    > have been 1024 Doubt that really matters though.


    Sorry about that. Typing too fast.

    >
    > > Tcp:
    > > 31 active connections openings
    > > 25 passive connection openings
    > > 4 failed connection attempts

    >
    > If this is beforeafter output it suggests that other stuff is
    > happening on the systems at the same time? Doesn't seem to matter to
    > the CPU util reported by netperf though...


    The only other thing happening is that I'm connected to the systems
    via X-windows. There is some network traffic, on a different subnet
    and through different controllers.

    >
    > > Tcp:
    > > 2 active connections openings

    >
    > That would be consistent with the netperf side of a test system which
    > wasn't doing other connections - first the control connection is
    > established, then the data connection.
    >
    > > Hopefully I've gotten all that you requested. Do you see anything
    > > obviously wrong?

    >
    > Alas no. I don't suppose there is any way to go back-to-back with
    > these systems?


    By "back to back", do you mean removing the switch and connecting a
    cable directly between them? I can do that on Monday when I get back
    into the office where these machines are located.

    >
    > The CPU utilization figures support jumboframe being used, but just
    > for grins, try a -v 2 option and see what netperf reports for the MSS.
    >

    [root@merge01 src]# ./netperf -v 2 -f M -H 192.168.100.102 -c -C -- -s
    256k -S 256k -m 32k
    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    192.168.100.102 (192.168.100.102) port 0 AF_INET
    Recv Send Send Utilization Service
    Demand
    Socket Socket Message Elapsed Send Recv Send
    Recv
    Size Size Size Time Throughput local remote local
    remote
    bytes bytes bytes secs. MBytes /s % S % S us/KB
    us/KB

    512000 512000 32000 10.01 64.90 3.97 4.64 4.773
    2.792

    Alignment Offset Bytes Bytes Sends Bytes
    Recvs
    Local Remote Local Remote Xfered Per Per
    Send Recv Send Recv Send (avg) Recv (avg)
    8 8 0 0 6.81e+08 32001.36 21280 9168.36
    74276

    Maximum
    Segment
    Size (bytes)
    8948

    I guess that the maximum segment size of 8948 indicates that jumbo
    frames are in use?

    > rick jones
    > --
    > web2.0 n, the dot.com reunion tour...
    > these opinions are mine, all mine; HP might not want them anyway...
    > feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...




  6. Re: Poor GB Ethernet performance - 2.6.18 kernel

    Sorry to post to my own post, but I have some new info:

    1) I downloaded the latest E1000 drivers from Intel and installed
    them. No fix.
    2) I turned on IOATDMA acceleration. No fix.
    3) I modified e1000 options: modprobe e1000 FlowControl=3,3,3,3
    XsumRX=1,1,1,1 RxDescriptors=1024,1024,1024,1024
    TxDescriptors=1024,1024,1024,1024 RxIntDelay=0,0,0,0
    TxIntDelay=0,0,0,0 InterruptThrottleRate=0 copybreak=0. No fix.
    4) I turned off RX and TX flow control. FIXED!

    New results are:

    [root@merge01 src]# netperf -H 192.168.100.102 -f M -c -C -- -s 256K -
    S 256K -m 32K
    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    192.168.100.102 (192.168.100.102) port 0 AF_INET
    Recv Send Send Utilization Service
    Demand
    Socket Socket Message Elapsed Send Recv Send
    Recv
    Size Size Size Time Throughput local remote local
    remote
    bytes bytes bytes secs. MBytes /s % S % S us/KB
    us/KB

    524288 524288 32768 10.01 117.53 5.55 6.65 3.689
    2.210

    As you can see, without flow control we have 117.53 MB/sec, which is
    better than I expected! :-)

    Turning on TX flow control only still works fine:

    [root@merge01 src]# ethtool -A eth2 rx off
    [root@merge01 src]# ethtool -a eth2
    Pause parameters for eth2:
    Autonegotiate: on
    RX: off
    TX: on

    [root@merge01 src]# netperf -H 192.168.100.102 -f M -c -C -- -s 256K -
    S 256K -m 32K
    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    192.168.100.102 (192.168.100.102) port 0 AF_INET
    Recv Send Send Utilization Service
    Demand
    Socket Socket Message Elapsed Send Recv Send
    Recv
    Size Size Size Time Throughput local remote local
    remote
    bytes bytes bytes secs. MBytes /s % S % S us/KB
    us/KB

    524288 524288 32768 10.00 117.50 5.91 6.79 3.933
    2.257

    However, with RX flow control on, I get:

    [root@merge01 src]# ethtool -a eth2
    Pause parameters for eth2:
    Autonegotiate: on
    RX: off
    TX: off

    [root@merge01 src]# ethtool -A eth2 rx on
    [root@merge01 src]# ethtool -a eth2
    Pause parameters for eth2:
    Autonegotiate: on
    RX: on
    TX: off

    [root@merge01 src]# netperf -H 192.168.100.102 -f M -c -C -- -s 256K -
    S 256K -m 32K
    TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
    192.168.100.102 (192.168.100.102) port 0 AF_INET
    Recv Send Send Utilization Service
    Demand
    Socket Socket Message Elapsed Send Recv Send
    Recv
    Size Size Size Time Throughput local remote local
    remote
    bytes bytes bytes secs. MBytes /s % S % S us/KB
    us/KB

    524288 524288 32768 10.01 64.97 4.52 5.03 5.438
    3.026

    Can you (or anyone else) explain this? Why does RX flow control cut
    my performance in half?

    Now I have to deal with the Equallogic people and see what they have
    to say about running without flow control.

    Thanks very much for all of your help. I'm still very interested in
    any other suggestions you might have.


  7. Re: Poor GB Ethernet performance - 2.6.18 kernel

    Forgot one last piece of info -- the beforeafter nestat results (with
    flow control TX and RX):

    Ip:
    71015 total packets received
    0 with invalid headers
    0 forwarded
    0 incoming packets discarded
    71015 incoming packets delivered
    43915 requests sent out
    0 dropped because of missing route
    0 reassemblies required
    0 packets reassembled ok
    Icmp:
    0 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
    destination unreachable: 0
    redirects: 0
    echo requests: 0
    echo replies: 0
    0 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
    destination unreachable: 0
    echo replies: 0
    Tcp:
    2 active connections openings
    0 passive connection openings
    0 failed connection attempts
    0 connection resets received
    0 connections established
    70857 segments received
    43760 segments send out
    1 segments retransmited
    0 bad segments received.
    0 resets sent
    Udp:
    157 packets received
    0 packets to unknown port received.
    0 packet receive errors
    154 packets sent
    TcpExt:
    0 resets received for embryonic SYN_RECV sockets
    0 ICMP packets dropped because they were out-of-window
    0 TCP sockets finished time wait in fast timer
    0 time wait sockets recycled by time stamp
    39 delayed acks sent
    0 delayed acks further delayed because of locked socket
    Quick ack mode was activated 0 times
    0 packets directly queued to recvmsg prequeue.
    0 packets directly received from backlog
    0 packets directly received from prequeue
    18 packets header predicted
    0 packets header predicted and directly queued to user
    106 acknowledgments not containing data received
    70634 predicted acknowledgments
    1 times recovered from packet loss due to SACK data
    Detected reordering 0 times using FACK
    Detected reordering 0 times using time stamp
    0 congestion windows fully recovered
    0 congestion windows partially recovered using Hoe heuristic
    TCPDSACKUndo: 0
    0 congestion windows recovered after partial ack
    0 TCP data loss events
    0 timeouts after SACK recovery
    0 timeouts in loss state
    1 fast retransmits
    0 forward retransmits
    0 retransmits in slow start
    0 other TCP timeouts
    0 sack retransmits failed
    0 DSACKs sent for old packets
    0 DSACKs received
    0 connections reset due to unexpected data
    0 connections reset due to early user close
    0 connections aborted due to timeout

    Does this show any reason for using RX flow control? I'm not really
    sure what all the statistics mean, but I only see 1 segment
    retransmitted. Would increasing the receive buffers on my e1000 card
    help? Current settings are:

    modprobe e1000 FlowControl=3,3,3,3 XsumRX=1,1,1,1
    RxDescriptors=1024,1024,1024,1024 TxDescriptors=1024,1024,1024,1024
    RxIntDelay=0,0,0,0 TxIntDelay=0,0,0,0 InterruptThrottleRate=0
    copybreak=0

    That means I've got 1024 receive buffers and 1024 transmit buffers per
    port. I think I can go up to 4096. I've turned off all the other
    throttling settings that I could find.

    Eric Raskin
    eraskin@paslists.com


  8. Re: Poor GB Ethernet performance - 2.6.18 kernel

    eraskin@paslists.com wrote:

    > Maximum
    > Segment
    > Size (bytes)
    > 8948


    > I guess that the maximum segment size of 8948 indicates that jumbo
    > frames are in use?


    Correct.

    rick jones
    --
    The computing industry isn't as much a game of "Follow The Leader" as
    it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
    - Rick Jones
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  9. Re: Poor GB Ethernet performance - 2.6.18 kernel

    eraskin@paslists.com wrote:
    > Can you (or anyone else) explain this? Why does RX flow control cut
    > my performance in half?


    I can only guess that it triggers some pause frames and takes "too
    long" to re-enable. The folks over in comp.dcom.lans.ethernet would
    be the better folks to ask.

    rick jones
    --
    The computing industry isn't as much a game of "Follow The Leader" as
    it is one of "Ring Around the Rosy" or perhaps "Duck Duck Goose."
    - Rick Jones
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

  10. Re: Poor GB Ethernet performance - 2.6.18 kernel

    eraskin@paslists.com wrote:

    [stats snipped]
    > Does this show any reason for using RX flow control? I'm not really
    > sure what all the statistics mean, but I only see 1 segment
    > retransmitted. Would increasing the receive buffers on my e1000
    > card help? Current settings are:


    > modprobe e1000 FlowControl=3,3,3,3 XsumRX=1,1,1,1
    > RxDescriptors=1024,1024,1024,1024 TxDescriptors=1024,1024,1024,1024
    > RxIntDelay=0,0,0,0 TxIntDelay=0,0,0,0 InterruptThrottleRate=0
    > copybreak=0


    Why are you disabling the interrupt coalescing by setting
    InterruptThottleRate to 0 etc, and then why are you doing that on only
    one interface? Generally if one is concerned about bulk throughput,
    one tweaks the interrupt throttle rate up, not off. It is turned-off
    when one is concerned about minimizing latency:

    ftp://ftp.cup.hp.com/dist/networking...cy_vs_tput.txt

    > That means I've got 1024 receive buffers and 1024 transmit buffers
    > per port. I think I can go up to 4096. I've turned off all the
    > other throttling settings that I could find.


    If the ethtool stats show that the loss which triggered the
    retransmission was on the card (perhaps the remote card) then
    increasing the queues could help. If it was lost elsewhere then it
    would not.

    rick jones
    --
    Process shall set you free from the need for rational thought.
    these opinions are mine, all mine; HP might not want them anyway...
    feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...

+ Reply to Thread