Network interface trouble - BSD

This is a discussion on Network interface trouble - BSD ; I have a FreeBSD box which was running on 6.1. I just upgraded to 6.2 in hope of fixing this problem, but without any success. The problem is that the LAN interface randomly goes down, then up. This happens a ...

+ Reply to Thread
Results 1 to 17 of 17

Thread: Network interface trouble

  1. Network interface trouble

    I have a FreeBSD box which was running on 6.1. I just upgraded to 6.2 in
    hope of fixing this problem, but without any success.

    The problem is that the LAN interface randomly goes down, then up. This
    happens a few times, then the whole machine freezes.

    This is the output in /var/log/messages. The first UP is me bringing it
    up by hand - it stays up a whole six seconds before starting to flap.

    Jan 20 19:34:06 mariner kernel: xl0: link state changed to UP
    Jan 20 19:34:12 mariner kernel: xl0: link state changed to DOWN
    Jan 20 19:34:14 mariner kernel: xl0: link state changed to UP
    Jan 20 19:34:40 mariner kernel: xl0: link state changed to DOWN
    Jan 20 19:34:42 mariner kernel: xl0: link state changed to UP
    Jan 20 19:35:23 mariner kernel: xl0: link state changed to DOWN
    Jan 20 19:35:24 mariner kernel: xl0: link state changed to UP

    This goes on for a while, then the whole thing just freezes up. When I
    force a reboot, there is nothing in the log file to indicate what
    happened. There is not even access from the local console during the freeze.

    Note that bge0, the WAN interface, does not appear here - it's just xl0.
    I am suspecting hardware trouble at this point, but it seems odd that a
    NIC would just die on me like this.

    Everything is just where it was last week, when it was working fine.
    There have been no upgrades, no new cables, no nothing. The machine is
    on a UPS, so even power fluctuations shouldn't affect it, and it has no
    keyboard attached normally, so humans can't mess it up.

    Any suggestions? Or should I just shut up and go buy a new NIC?


    --
    It can be difficult to translate into iptables the artistic intent of a
    pf rule that says "pass out quick on $cheap_gin".

  2. Re: Network interface trouble

    double_dub wrote:

    > I have a FreeBSD box which was running on 6.1. I just upgraded to 6.2 in
    > hope of fixing this problem, but without any success.
    >
    > The problem is that the LAN interface randomly goes down, then up. This
    > happens a few times, then the whole machine freezes.
    >
    > This is the output in /var/log/messages. The first UP is me bringing it
    > up by hand - it stays up a whole six seconds before starting to flap.
    >
    > Jan 20 19:34:06 mariner kernel: xl0: link state changed to UP
    > Jan 20 19:34:12 mariner kernel: xl0: link state changed to DOWN
    > Jan 20 19:34:14 mariner kernel: xl0: link state changed to UP
    > Jan 20 19:34:40 mariner kernel: xl0: link state changed to DOWN
    > Jan 20 19:34:42 mariner kernel: xl0: link state changed to UP
    > Jan 20 19:35:23 mariner kernel: xl0: link state changed to DOWN
    > Jan 20 19:35:24 mariner kernel: xl0: link state changed to UP
    >
    > This goes on for a while, then the whole thing just freezes up. When I
    > force a reboot, there is nothing in the log file to indicate what
    > happened. There is not even access from the local console during the
    > freeze.
    >
    > Note that bge0, the WAN interface, does not appear here - it's just xl0.
    > I am suspecting hardware trouble at this point, but it seems odd that a
    > NIC would just die on me like this.
    >
    > Everything is just where it was last week, when it was working fine.
    > There have been no upgrades, no new cables, no nothing. The machine is
    > on a UPS, so even power fluctuations shouldn't affect it, and it has no
    > keyboard attached normally, so humans can't mess it up.
    >
    > Any suggestions? Or should I just shut up and go buy a new NIC?
    >


    I have an old gateway/firewall/server box that has two xl cards in it. It
    was recently updated from 6.1-Release to 6.2-Release and I've not noticed
    any problems with my xl's. I was using dialup for many years, so for the
    longest time I only had one xl in the box. A couple of months ago I got DSL
    service and added the second xl NIC. Been fine all this time.

    While it is very easy to suspect a defective NIC in this case, it is
    possible that even if you put a different xl in you may have the same
    problem. This would probably be in the area of something to do with a bad
    interaction between the bge and the xl. (I'm also left wondering why the
    bge is used on the WAN, some have had connectivity problems with their ISP
    when trying to use Gigabit cards - me I'd have it on the LAN side of things
    and put the xl on the WAN). Look for something like sharing of interrupts,
    vmstat -i for interrupt storms, etc. Maybe even try putting the xl in a
    different PCI slot. A lot of motherboards these days share IRQs on one or
    two of the slots and rely on ACPI to sort things out. Look at your mobo
    manual and see if the xl is in a slot that's sharing an IRQ with the bge.
    If it is move it to a slot that does not share.

    Since mine is OK and the main difference between our systems is hardware I'd
    suspect an interaction or incompatibility of some sort. If you have another
    xl NIC to plug in, give it a try. If it magically becomes "fixed", chalk it
    up to the first one being defective. My "gut" feeling here is it's
    something else. Also consider whatever the xl is connected to in addition.

    -Jason





  3. Re: Network interface trouble

    [still me, but posting from Gmail account]

    OK, thanks for the tip. I will try swapping duties for the xl and the
    bge around, but this machine has been runnning for six months exactly
    as it stands now with no problem. I can't move the card as this machine
    only has the one PCI slot - the bge interface is on-board. I tried
    changing the cable, with no effect. I can't easily change the switch,
    but the other three machines on the switch (Win2k, WinXP, and Debian)
    have no problem with it, so I am assuming that's not the cause of the
    problem.

    The reason I thought it might be software is that the interface seems
    to be fine until I try to put data through it, then it starts flapping.
    I haven't knowingly changed anything, but freebsd-update might have
    changed something which I did not notice was related.

    I don't see any shared interrupts in the vmstat -i output, but again, I
    would have been surprised if one had suddenly appeared after six months
    of uninterrupted running and without a reboot being involved.

    Any other suggestions gratefully accepted...

    /me exits grumbling and muttering


  4. Re: Network interface trouble

    double_dub wrote:
    > I have a FreeBSD box which was running on 6.1. I just upgraded to 6.2 in
    > hope of fixing this problem, but without any success.
    >
    > The problem is that the LAN interface randomly goes down, then up. This
    > happens a few times, then the whole machine freezes.
    >
    > This is the output in /var/log/messages. The first UP is me bringing it
    > up by hand - it stays up a whole six seconds before starting to flap.
    >
    > Jan 20 19:34:06 mariner kernel: xl0: link state changed to UP
    > Jan 20 19:34:12 mariner kernel: xl0: link state changed to DOWN
    > Jan 20 19:34:14 mariner kernel: xl0: link state changed to UP
    > Jan 20 19:34:40 mariner kernel: xl0: link state changed to DOWN
    > Jan 20 19:34:42 mariner kernel: xl0: link state changed to UP
    > Jan 20 19:35:23 mariner kernel: xl0: link state changed to DOWN
    > Jan 20 19:35:24 mariner kernel: xl0: link state changed to UP
    >
    > This goes on for a while, then the whole thing just freezes up. When I
    > force a reboot, there is nothing in the log file to indicate what
    > happened. There is not even access from the local console during the
    > freeze.
    >
    > Note that bge0, the WAN interface, does not appear here - it's just xl0.
    > I am suspecting hardware trouble at this point, but it seems odd that a
    > NIC would just die on me like this.
    >
    > Everything is just where it was last week, when it was working fine.
    > There have been no upgrades, no new cables, no nothing. The machine is
    > on a UPS, so even power fluctuations shouldn't affect it, and it has no
    > keyboard attached normally, so humans can't mess it up.
    >
    > Any suggestions? Or should I just shut up and go buy a new NIC?
    >
    >


    After reading all the existing posts, you've done all the good
    diagnosing that helps in further helping your problem. Recently (within
    30 days), I've found two computers with bad RAM. After replacing the
    RAM, things started shaping up and working all of a sudden.

    Alternatively, although it's a long shot, the PnP subsystem of the PC's
    BIOS may be set to yes (Plug and Play OS installed = Yes). Setting this
    to NO when using any NON-Microsoft OS, plus MS NT v4.0, forces the BIOS
    to assign IRQs and all other resources for the devices in the computer,
    instead of letting the OS do it.

    Others have had success letting Unix variants assign the IRQs, but
    AFAIK, they aren't defined as a Plug and Play OS.

    So, there's my suggestions, test for bad memory and switch PNP OS off.

    HTH, let us know.

    TJ

  5. Re: Network interface trouble

    Tim Judd wrote:
    > So, there's my suggestions, test for bad memory and switch PNP OS off.


    Thanks for the suggestions, but no luck. I replaced the NIC and the
    switch, and it just crashed on me again. :-(

    This BIOS does not seem to have a PnP option. I can change IRQs by
    hand, but various different IRQs all change in sync, so if there is a
    conflict in there it will stay even if I change.

    This leaves the RAM. Unfortunately I don't have any of the appropriate
    RAM to hand, but then again, this machine only has 256 MB, so another
    256 wouldn't hurt it. This will have to wait until Thursday though, as
    I am on the road between now and then.

    Dagnabbit.


  6. Re: Network interface trouble

    riotnrrd wrote:
    > This leaves the RAM.


    Also, the network interface is not flapping any more. Perhaps I had two
    unrelated failures... Grrr!


  7. Re: Network interface trouble

    Arrrgh. RAM replaced, no joy - BSD died before finishing the
    background FS checks.

    Annoyingly, I am posting this from the very same box, but booted with
    Knoppix. I am exercising the HD to see if that is the issue, but it
    seems to be fine with it.

    The one thing I would like to do would be to get some config files
    (pf, smbd, and the like) and other data off the thing before I wipe it
    and reinstall. However, Knoppix happily recognizes ad0s1a, where /
    lives, as hda1, but I want to get at /usr, which is on ad0s1e. When I
    try to mount that, it seems to go through, but what I get is a second
    mount of / instead of /usr.

    I know it's slightly off-topic, but can anyone suggest anything? Also,
    are there any forensics I can perform, as this is now looking more
    like a software problem with BSD?


  8. Re: Network interface trouble

    Arrrgh. I replaced the RAM, and the dratted thing crashed before
    finishing its file system checks. Uptime of less than five minutes
    from a BSD box?

    Annoyingly, I am typing this from the very same device, but booted
    with a Knoppix CD. I have been exercising the HD as much as possible,
    as I was now wondering whether it might be the culprit, but it seems
    not. The machine has been up for two and a half hours and counting.

    I think a reinstall is in order. This will be made more complex by the
    machine with the CD burner having somehow eaten its MBR and needing
    keyboard input during POST. However, it refuses to recognize USB
    keyboards, and the PS/2 ports are dead, so this could be... /
    interesting/.

    Are there any useful forensics I can do while I have this magically
    unstable installation of BSD? Offer void where prohibited, limited
    time only, etc.


  9. Re: Network interface trouble

    riotnrrd wrote:

    > Arrrgh. RAM replaced, no joy - BSD died before finishing the
    > background FS checks.


    Might want to take a look at the hardware support list and try and figure
    out if FreeBSD just doesn't like what you are trying to run it on.

    > Annoyingly, I am posting this from the very same box, but booted with
    > Knoppix. I am exercising the HD to see if that is the issue, but it
    > seems to be fine with it.
    >
    > The one thing I would like to do would be to get some config files
    > (pf, smbd, and the like) and other data off the thing before I wipe it
    > and reinstall. However, Knoppix happily recognizes ad0s1a, where /
    > lives, as hda1, but I want to get at /usr, which is on ad0s1e. When I
    > try to mount that, it seems to go through, but what I get is a second
    > mount of / instead of /usr.
    >
    > I know it's slightly off-topic, but can anyone suggest anything? Also,
    > are there any forensics I can perform, as this is now looking more
    > like a software problem with BSD?


    There is a FreeBSD based LiveCD known as Frenzy available; it has loads of
    utilities on it. I've used the old version in the past. The 1.1 version
    will be based on FreeBSD 6.2 Release, so it can be used to see if FreeBSD
    likes the hardware, or not.

    http://frenzy.org.ua/eng/

    At least it should enable you to get files off the drive if you have
    somewhere else to put them. Also, if it were me, I'd only install the
    operating system and no packages from the CDs. Verify that the OS can run
    with no problems first. I also prefer to install the ports system, install
    the cvsup-without-gui package, update the ports tree, and then install my
    userland using the ports system. But that's just my preference, YMMV.

    -Jason


  10. Re: Network interface trouble

    No, definitely hardware - it died overnight while booted in Knoppix.
    The only replaceable component that I haven't messed with is the HDD,
    so the next step is to replace that with an old 10 GB HDD, install a
    minimum OS, and run a smoke test with that.

    Either way, not a BSD problem. I'm somewhat relieved, though annoyed -
    at least FreeBSD's reputation is untarnished!


  11. Re: Network interface trouble

    riotnrrd wrote:

    > No, definitely hardware - it died overnight while booted in Knoppix.
    > The only replaceable component that I haven't messed with is the HDD,
    > so the next step is to replace that with an old 10 GB HDD, install a
    > minimum OS, and run a smoke test with that.
    >
    > Either way, not a BSD problem. I'm somewhat relieved, though annoyed -
    > at least FreeBSD's reputation is untarnished!


    At least there is progress on problem resolution. Divining the split
    between "is it hardware, or software?" is often murky, even at the best of
    times. For me, when a machine behaves erratically, ie., doesn't malfunction
    in any particular pattern, other than for the period between intermittent
    failure to decrease over time I usually suspect hardware. A lot of times
    you'll know it's software if a defined sequence of press this, do that, do
    this, etc, produces the failure at will by just repeating the sequence.

    Don't forget the power supply. I've seen marginal power supplies that seemed
    to work well for a time, and then fritz. It is usually the last item anyone
    suspects. In any event I wish you the best of luck.

    -Jason


  12. Re: Network interface trouble

    Jason Bourne wrote:
    > Don't forget the power supply. I've seen marginal power supplies that seemed
    > to work well for a time, and then fritz. It is usually the last item anyone
    > suspects. In any event I wish you the best of luck.
    >


    Besides power supply, you can check the fans, the processor fan, and if
    applicable, the chipset fan. A died chipset fan produces exactly the
    same sort of errors mentioned such as seemingly disk errors.

    > -Jason
    >


    --

    Michel TALON


  13. Re: Network interface trouble

    Jason Bourne writes:

    > Don't forget the power supply. I've seen marginal power supplies that seemed
    > to work well for a time, and then fritz. It is usually the last item anyone
    > suspects.


    A friend of mine mentions anecdotally that a line conditioner placed
    between the power supply and the wall increases the MTBF of most
    components, particularly NICs, tremendously.

    --
    vsync
    http://quadium.net/~vsync/

    READ CAREFULLY. By receiving this email you agree, on behalf of your
    employer, to release me from all obligations and waivers arising from
    any and all NON-NEGOTIATED agreements, licenses, terms-of-service,
    shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure,
    non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I
    have entered into with your employer, its partners, licensors, agents
    and assigns, in perpetuity, without prejudice to my ongoing rights and
    privileges. You further represent that you have the authority to
    release me from any BOGUS AGREEMENTS on behalf of your employer.
    For further details please visit http://www.reasonableagreement.org/

    --
    Posted via a free Usenet account from http://www.teranews.com


  14. Re: Network interface trouble

    Tim Howe wrote:
    [snip]
    >
    > A friend of mine mentions anecdotally that a line conditioner placed
    > between the power supply and the wall increases the MTBF of most
    > components, particularly NICs, tremendously.
    >


    ROFL! Odd you should mention this: my line conditioner is from a mini-frame
    that was pieced out after a retail business a friend of mine worked at went
    defunct. It also has a CV transformer in it that can make up for small line
    sags. My UPS is then plugged up to the conditioner.

    I'm also a believer in the "steady state" theory of operating electronic
    gear. Most damage to electronic circuits occurs when cold started due to a
    higher inrush current than when quiescent. So I tend to do like I learned
    in the USAF: give it clean, pure 60Hz power, leave it on all the time and
    never turn it off, and keep everything as cool as my air conditioning will
    allow. Needless to say, my stuff *does* last a long time and I have very
    few failures, relatively speaking.

    I also try to warn people that static damage is a cumulative experience, but
    no one ever seems to listen. The more you handle your cards and parts the
    more you are damaging them, unless properly grounded via a wrist strap, or
    the like. Even something as simple as just leaving the computer plugged up
    to the mains and keeping some skin always in contact with the case will do
    the job. A few people I know at work who are constantly messing around with
    their stuff also seem to be those with the most hardware related problems.

    -Jason




  15. Re: Network interface trouble

    Jason Bourne writes:

    > So I tend to do like I learned in the USAF: give it clean, pure 60Hz
    > power, leave it on all the time and never turn it off, and keep
    > everything as cool as my air conditioning will allow. Needless to say,
    > my stuff *does* last a long time and I have very few failures,
    > relatively speaking.


    And, if possible, overspec the power supply. Even common consumer video
    cards and the like draw a lot of power nowadays, and even if yours
    doesn't need the full rating of the power supply, no reason not to
    ensure that it won't be surprised by an extra power draw.

    --
    vsync
    http://quadium.net/~vsync/

    READ CAREFULLY. By receiving this email you agree, on behalf of your
    employer, to release me from all obligations and waivers arising from
    any and all NON-NEGOTIATED agreements, licenses, terms-of-service,
    shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure,
    non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I
    have entered into with your employer, its partners, licensors, agents
    and assigns, in perpetuity, without prejudice to my ongoing rights and
    privileges. You further represent that you have the authority to
    release me from any BOGUS AGREEMENTS on behalf of your employer.
    For further details please visit http://www.reasonableagreement.org/

  16. Re: Network interface trouble

    Michel Talon wrote:

    [snip]
    >
    > Besides power supply, you can check the fans, the processor fan, and if
    > applicable, the chipset fan. A died chipset fan produces exactly the
    > same sort of errors mentioned such as seemingly disk errors.
    >


    Yes - total agreement here. I've also seen a few instances where the cpu fan
    and heat sink were marginal in quality. Worked fine when new but after a
    year of clogging up with dust were unable to keep the cpu cool enough. Just
    blowing them out with a can of nitrogen made all the difference in the
    world.

    I think the fans run off the 12 volt rail of the power supply, and this is
    the rail that is rated for the smallest amount of current. So a fan that's
    burnt up and pulling way more than it should will make it look like the
    power supply is bad. I have seen this before.

    -Jason



  17. Re: Network interface trouble

    double_dub wrote:

    > Jan 20 19:34:06 mariner kernel: xl0: link state changed to UP
    > Jan 20 19:34:12 mariner kernel: xl0: link state changed to DOWN
    > Jan 20 19:34:14 mariner kernel: xl0: link state changed to UP
    > Jan 20 19:34:40 mariner kernel: xl0: link state changed to DOWN
    > Jan 20 19:34:42 mariner kernel: xl0: link state changed to UP
    > Jan 20 19:35:23 mariner kernel: xl0: link state changed to DOWN
    > Jan 20 19:35:24 mariner kernel: xl0: link state changed to UP


    What machine is this on? I have an HP Vectra at work that's doing the
    same thing (also a xl0).

    Mine cycles up or down about once a second. I came to work the next
    day, and the switch was blinking at me. I inspected the console, and it
    was flappin in the 'wind' inside!

    Just curious.

+ Reply to Thread