GBit speed problem - TG3 on HP DL380 G3 - Networking

This is a discussion on GBit speed problem - TG3 on HP DL380 G3 - Networking ; Hi, Initially, I will summarise the environment, then I will discuss the problem. I appologize if this is the incorrect place to address. Please direct me to the correct place. == The environment == I have a HP DL380 G3 ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: GBit speed problem - TG3 on HP DL380 G3

  1. GBit speed problem - TG3 on HP DL380 G3

    Hi,

    Initially, I will summarise the environment, then I will discuss the
    problem. I appologize if this is the incorrect place to address.
    Please direct me to the correct place.

    == The environment ==

    I have a HP DL380 G3 server with duel GBit ethernet. Here is output of
    lspci:

    root@koyoko:~# lspci
    00:00.0 Host bridge: Broadcom CMIC-WS Host Bridge (GC-LE chipset) (rev
    13)
    00:00.1 Host bridge: Broadcom CMIC-WS Host Bridge (GC-LE chipset)
    00:00.2 Host bridge: Broadcom CMIC-LE
    00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev
    27)
    00:04.0 System peripheral: Compaq Computer Corporation Integrated
    Lights Out Controller (rev 01)
    00:04.2 System peripheral: Compaq Computer Corporation Integrated
    Lights Out Processor (rev 01)
    00:0f.0 ISA bridge: Broadcom CSB5 South Bridge (rev 93)
    00:0f.1 IDE interface: Broadcom CSB5 IDE Controller (rev 93)
    00:0f.2 USB Controller: Broadcom OSB4/CSB5 OHCI USB Controller (rev
    05)
    00:0f.3 Host bridge: Broadcom CSB5 LPC bridge
    00:10.0 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05)
    00:10.2 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05)
    00:11.0 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 03)
    00:11.2 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 03)
    01:03.0 RAID bus controller: Compaq Computer Corporation Smart Array
    5i/532 (rev 01)
    02:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X
    Gigabit Ethernet (rev 02)
    02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X
    Gigabit Ethernet (rev 02)
    03:01.0 RAID bus controller: Silicon Image, Inc. SiI 3124 PCI-X Serial
    ATA Controller (rev 01)
    06:02.0 SCSI storage controller: Adaptec AHA-2940U2/U2W
    06:1e.0 PCI Hot-plug controller: Compaq Computer Corporation PCI
    Hotplug Controller (rev 14)

    It has two "Broadcom Corporation NetXtreme BCM5703X" as you can see.
    The two ports are identically set up in "interfaces" configuration
    file:

    root@koyoko:~# cat /etc/network/interfaces
    # This file describes the network interfaces available on your system
    # and how to activate them. For more information, see interfaces(5).

    # The loopback network interface
    auto lo
    iface lo inet loopback

    # The primary network interface
    auto eth0
    iface eth0 inet dhcp

    auto eth1
    iface eth1 inet dhcp

    We are connecting this to a NetGear 8port Gbit switch GS108:

    http://www.netgear.com/Products/Swit...hes/GS108.aspx

    We have used a variety of cables.

    Versions:

    root@koyoko:~# uname -a
    Linux koyoko 2.6.22-14-server #1 SMP Sun Oct 14 23:34:23 GMT 2007 i686
    GNU/Linux
    root@koyoko:~# ethtool -i eth0
    driver: tg3
    version: 3.77
    firmware-version: 5703-v2.21a
    bus-info: 0000:02:01.0


    == The problem ==

    Initially, both ports would connect at Gbit speed almost
    instantaneously. I was using only eth0 initially. However, one day I
    was transferring some files, and it was going very slowly. I noticed
    the port was running at 100Mbits. I tried a lot of things but nothing
    worked. Using ethtool showed that it was set to autonegotiate to 10,
    100 and 1000Mbits, but it would settle at 100Mbits.

    After a couple of hours of mucking about, I reset the machine and
    pulled out all the cables - power, ethernet, kvm, scsi. Then, the
    machine negotiated correctly at 1000Mbits. I couldn't understand why
    this was, but put it down to something erratic.

    Now recently it has occurred again, and this time, cold-starting the
    machine won't fix the problem.

    Throughout all these incidents, when eth0 is plugged in initially, it
    takes about 20-30 seconds for it to negotiate 100Mbits. eth1, only
    takes a second at most to negotiate 1Gbit connection. This output from
    dmesg should give show the time difference between eth0 and eth1 - I
    just disconnect the cable and plug back in. This timing is consistent.

    [ 5001.617724] tg3: eth0: Link is down.
    [ 5019.558241] tg3: eth0: Link is up at 100 Mbps, full duplex.
    [ 5019.558249] tg3: eth0: Flow control is on for TX and on for RX.
    [ 5038.595038] tg3: eth1: Link is down.
    [ 5041.375736] tg3: eth1: Link is up at 1000 Mbps, full duplex.
    [ 5041.375744] tg3: eth1: Flow control is on for TX and on for RX.

    Using ethtool, I downloaded eeprom data. The data is different for
    eth0 and eth1. Is this something that may be an issue? Could the data
    be corrupt for eth0 or configured incorrectly? Also, register dump was
    different. I am not exactly sure what these correspond to, so if
    someone could elaborate, that would be much appreciated.

    Is there some way to reset this device? (Has anyone had experience
    with HP DL380 servers?) I also emailed the guy I bought it from asking
    for advise.

    I have changed the network settings to use eth1 for now, which is
    currently running at Gbit speed, appears to be running fine. I'll keep
    an eye on it to see if any problems develop.

    I wonder whether this is a hardware problem, or if there are some
    options I can change. Is it possible (or desirable) to load the eeprom
    data from eth1 to eth0?

    (Also, I must note, I am going away on holiday for about 1.5 weeks, so
    I might not reply immediately, although I will still try to get online
    during that time.)

    I'm really stuck at this point. Is there any way I can debug this
    issue? Increase log level or something like that?

    Thanks to any who can help,
    Samuel


  2. Re: GBit speed problem - TG3 on HP DL380 G3

    Gigabit is fussy about cables. Are you sure that your cabling is up to
    the standard? (cable quality, length of run, no kinks, proper
    terminations, and so on)

    I went through this when I upgraded my home to gigabit.

    i

  3. Re: GBit speed problem - TG3 on HP DL380 G3

    On Dec 29, 4:22*am, Ignoramus14384 14384.invalid> wrote:
    > Gigabit is fussy about cables. Are you sure that your cabling is up to
    > the standard? (cable quality, length of run, no kinks, proper
    > terminations, and so on)
    >
    > I went through this when I upgraded my home to gigabit.
    >
    > i


    Exactly the same cable is going to eth1, which is running at Gbit
    speed, and the same cable was running eth0 at Gbit speed beforehand.

    I will test using some different cables today.

    Thanks,
    Samuel

  4. Re: GBit speed problem - TG3 on HP DL380 G3

    On 2007-12-28, space.ship.traveller@gmail.com wrote:
    > On Dec 29, 4:22*am, Ignoramus14384 > 14384.invalid> wrote:
    >> Gigabit is fussy about cables. Are you sure that your cabling is up to
    >> the standard? (cable quality, length of run, no kinks, proper
    >> terminations, and so on)
    >>
    >> I went through this when I upgraded my home to gigabit.
    >>
    >> i

    >
    > Exactly the same cable is going to eth1, which is running at Gbit
    > speed, and the same cable was running eth0 at Gbit speed beforehand.
    >
    > I will test using some different cables today.


    Try logically eliminating possibilities.

    i

  5. Re: GBit speed problem - TG3 on HP DL380 G3

    On Dec 29 2007, 2:14*pm, Ignoramus14384 14384.invalid> wrote:
    > On 2007-12-28, space.ship.travel...@gmail.com wrote:
    >
    > > On Dec 29, 4:22*am, Ignoramus14384 > > 14384.invalid> wrote:
    > >> Gigabit is fussy about cables. Are you sure that your cabling is up to
    > >> the standard? (cable quality, length of run, no kinks, proper
    > >> terminations, and so on)

    >
    > >> I went through this when I upgraded my home to gigabit.

    >
    > >> i

    >
    > > Exactly the same cable is going to eth1, which is running at Gbit
    > > speed, and the same cable was running eth0 at Gbit speed beforehand.

    >
    > > I will test using some different cables today.

    >
    > Try logically eliminating possibilities.
    >
    > i


    Hi

    I am back from holiday.

    I've tried a lot of different cables. Even the cable that works in
    eth1 does not work in eth0.

    Today, I tried cleaning the eth0 port with nail polish remover. I used
    a high density sand paper and filed the inside of the connector.
    Neither of these things appeared to help either.

    Well, it is still a problem. The cable isn't the problem, and the hub
    isn't the problem. The connector isn't the problem. So, it must be
    software/driver/internal hardware.

    So... I tried setting the following

    ethtool -s eth0 speed 10 autoneg off

    and it connected at 10Mbits; then i tried

    ethtool -s eth0 speed 100 autoneg off

    and it connected at 100Mbits; then

    ethtool -s eth0 speed 1000 autoneg off

    and I got "invalid argument!" error.

    I noticed that it stays on the speed previously set when autoneg is
    off, so i wrote:

    ethtool -s eth0 speed 10 autoneg off
    ethtool -s eth0 speed 1000 autoneg on

    and plugged it into 100Mbit hub - it negotiated to 100Mbit correctly
    (from 10Mbit, that was the setting that ethtool eth0 showed as current
    speed).

    Then I tried the same commands again

    ethtool -s eth0 speed 10 autoneg off
    ethtool -s eth0 speed 1000 autoneg on

    And plugged it into the gbit hub - and it worked right away!

    I can only imagine that I have reset something internally in the
    network chip. I'm not exactly sure why this sequence of commands
    worked, but I suspect that having it set to 10Mbits may have forced it
    to renegotiate?

    The output of ethtool looks exactly the same as it did before, except
    now it is running at 1000Mbits.

    I am not sure why this fixed it, and I'm not sure if it is a permanent
    fix.

    Regards,
    Samuel

+ Reply to Thread