nettest fails, network works though, is nettest broken? - SUN

This is a discussion on nettest fails, network works though, is nettest broken? - SUN ; Our V880 locked up for no apparent reason a couple of days ago, where locked up means it would not respond on the network or the serial console and had to be reset. There was nothing in the system logs ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: nettest fails, network works though, is nettest broken?

  1. nettest fails, network works though, is nettest broken?

    Our V880 locked up for no apparent reason a couple of days ago, where
    locked up means it would not respond on the network or the serial
    console and had to be reset. There was nothing in the system logs to
    indicate a problem, nor were there any power events logged on the UPS
    (or by other machines in the same room), prtdiag showed the hardware it
    tested to be ok, and SMART analysis of the disks showed no issues.

    So today the vts diagnostics are being run. Everything passed except
    nettest, which said this for eri0:

    05/29/08 10:27:36 gec SunVTS5.1ps13: VTSID 6002 nettest.
    ERROR eri0(/pci@9,700000/network@1,1): "No ICMP echo reply from
    131.215.XX.YY."
    Probable_Cause(s):
    (1)bad network board
    (2)system load too heavy
    (3)no cable connection
    (4)target machine too busy
    Recommended_Actions:
    (1)replace network board
    (2)reduce system load or increase timeout time
    (3)check cable connection
    (4)reduce target machine load

    We can rule out causes 2,3,4 . If this is a bad network board, it seems
    to only want to fail for nettest. Despite what nettest says, the network
    seems to be working just fine on the V880 (for instance, I ran vts
    remotely over that very network). Nettest chose XX.YY to be one less
    than the address of the V880, and that happened to be a printer. That
    printer pings just fine not only from other machines, but from the V880.

    % ping -I 1 131.215.XX.YY

    Does not drop a single packet in 100 tries. Thinking that nettest might
    send the packets faster than 1 per second, I tried it from a linux host
    using

    ping -i .01 131.215.XX.YY

    and the printer was able to return pings that fast. (Ping on solaris
    did not allow this test.) netstat -i shows a couple of errors:

    Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis
    Queue
    lo0 8232 loopback localhost 14562131 0 14562131 0 0
    0
    eri0 1500 gec gec 785446 0 723980 3 0
    0

    There were also a couple of funky messages in /var/adm/messages after
    the reboot yesterday that I don't recall seeing before:

    May 28 11:27:31 gec inetd[187]: [ID 965992 daemon.error] 100229/1-2/tcp:
    unknown service
    May 28 11:27:31 gec inetd[187]: [ID 965992 daemon.error] 100230/1/tcp:
    unknown service
    May 28 11:28:33 gec inetd[187]: [ID 965992 daemon.error] 100229/1-2/tcp:
    unknown service
    May 28 11:28:33 gec inetd[187]: [ID 965992 daemon.error] 100230/1/tcp:
    unknown service

    Is nettest right and the network board is going, or is nettest itself
    messed up? I don't have a network loopback connector, so cannot run
    netlbtest.

    Opinions?

    Thanks,

    David Mathog





  2. Re: nettest fails, network works though, is nettest broken?

    David Mathog wrote:
    > Is nettest right and the network board is going, or is nettest itself
    > messed up? I don't have a network loopback connector, so cannot run
    > netlbtest.


    Made a network loopback connector (two actually, since I had to cut up a
    patch cable to do it) and then ran netlbtest for a long time, during
    which it logged no errors. So back to the regular nettest and
    configured it to target the local network switch, instead of the printer
    it picked by itself. Ran nettest 25 times, no errors. My best guess is
    that even though the printer can normally handle a high rate of pings,
    perhaps when the errors occurred it was accepting a print job, or
    performing some other task, and so could not keep up with nettest.

    Also, for future reference, in order to get the serial console to work
    properly with "sunvts -t" through TeraTerm (or probably any other VT100
    emulator) it was necessary to first do:

    export TERM=vt100

    otherwise TERM defaulted to "sun" and "sunvts -t" resulted in an
    unusable mess of an interface.

    Regards,

    David Mathog

  3. Re: nettest fails, network works though, is nettest broken?

    "David Mathog" wrote in message
    news:g1mqp5$1ee$1@naig.caltech.edu...
    > Our V880 locked up for no apparent reason a couple of days ago, where
    > locked up means it would not respond on the network or the serial
    > console and had to be reset. There was nothing in the system logs to
    > indicate a problem, nor were there any power events logged on the UPS
    > (or by other machines in the same room), prtdiag showed the hardware it
    > tested to be ok, and SMART analysis of the disks showed no issues.


    Probably not what you want to hear, but it could be the motherboard is bad.
    Especially if when it locks up there is no way to send a break and collect a
    core file.

    All the CPU/Memory boards plug into this and the ethernet is onboard.

    Depending on the P/N or rev level of your motherboard there are some known
    issues with the onboard ethernet.
    The info is here:

    http://www.sunsolve.sun.com/search/d...assetkey=I1195

    although access requires a SunSolve account.

    Hopefully you have a Sun service contract.
    I have no idea what such a part is worth, but even on eBay people are asking
    a lot.

    Trinean



  4. Re: nettest fails, network works though, is nettest broken?

    Trinean wrote:
    > "David Mathog" wrote in message
    > news:g1mqp5$1ee$1@naig.caltech.edu...
    >> Our V880 locked up for no apparent reason a couple of days ago, where
    >> locked up means it would not respond on the network or the serial
    >> console and had to be reset. There was nothing in the system logs to
    >> indicate a problem, nor were there any power events logged on the UPS
    >> (or by other machines in the same room), prtdiag showed the hardware it
    >> tested to be ok, and SMART analysis of the disks showed no issues.

    >
    > Probably not what you want to hear, but it could be the motherboard is bad.
    > Especially if when it locks up there is no way to send a break and collect a
    > core file.


    Yeah, I wasn't happy that it wouldn't respond to a break. If the system
    does it again fairly soon I'll have to put in a service call. In the
    meantime the ethernet (re)tested clean.

    > Depending on the P/N or rev level of your motherboard there are some known
    > issues with the onboard ethernet.
    > The info is here:
    >
    > http://www.sunsolve.sun.com/search/d...assetkey=I1195


    Using an account linked to a service contract I still couldn't follow
    that link. Is the link current?

    Thanks,

    David Mathog

+ Reply to Thread