Troubleshooting connection loss (continued) - Networking

This is a discussion on Troubleshooting connection loss (continued) - Networking ; Allen Weiner wrote: > I've now got Fedora configured > for static IP of 150 and WinME configured for static IP of 140. The > dhclient-leases lists an expiration date of 11/7 (today is 11/11). Are you talking about 2 ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 39 of 39

Thread: Troubleshooting connection loss (continued)

  1. Re: Troubleshooting connection loss (continued)

    Allen Weiner wrote:

    > I've now got Fedora configured
    > for static IP of 150 and WinME configured for static IP of 140. The
    > dhclient-leases lists an expiration date of 11/7 (today is 11/11).


    Are you talking about 2 computers,
    or a single dual-boot computer?

    If the latter, how does your dhcp server
    know which OS is being used on the computer connected to it?


  2. Re: Troubleshooting connection loss (continued)

    On Sun, 11 Nov 2007 12:22:46 GMT, Allen Weiner wrote:
    > Bit Twister wrote:
    >>

    >
    > I don't understand *any* of the above.


    Most were actual commands. If you did not understand the command, do a
    man command

    when in doubt, try command with junk names and check results. Example:

    Given
    cp /dev/null /var/lib/dhclient/dhclient-eth0.leases

    do cp /var/lib/dhclient/dhclient-eth0.leases junk
    cp /dev/null junk
    cat junk

    Now you know what the "cp /dev/null fn" does.



    > I've now got Fedora configured
    > for static IP of 150 and WinME configured for static IP of 140. The
    > dhclient-leases lists an expiration date of 11/7 (today is 11/11).


    One of those commands was to empty the lease file so it would be ruled
    out of the suspect list. It was
    cp /dev/null /var/lib/dhclient/dhclient-eth0.leases


    > I'll post my config-dump below. How does it look?
    >
    > So could you please clarify what is the next step in troubleshooting.


    My Karnack gene is defective, what problem?

    You have to tell me what is your problem /now/.
    I know what the original problem was.

    Are you saying both OSs set static and you are having connection
    problems, or what?



    > Some explanatory comments along with the procedural steps might make it
    > more understandable. Thanks.


    We have already covered the trouble shooting steps. The order of the
    steps logicaly test hard/software. When a test fails, that is the area
    to fix, you look at the config files for suspect area failure/vaules.

    ethtool/mii-tool tells you the physical cable/path is good.

    Assuming static setup, pings tells you which part of the connection fails.
    pinging localhosts indicates your system is working.
    pinging your node name proves dns reads /etc/hosts, local routing are working
    pinging next node in path to internet proves both nodes are working.
    when ping fails, suspects are routing (route -n), firewalls, other node.

    node config files match hostname results
    /etc/sysconfig/network
    /etc/hosts

    dns files:
    /etc/nsswitch.conf define which/order to chech what
    /etc/host.conf
    etc/resolv.conf

    route has a UG flag and the gateway address matches gateway's ip addy.

    ipconfig ip results match
    /etc/hosts
    /etc/sysconfig/network-scripts/ifcfg-eth0



    > ======== cat /etc/*version ==========
    > cat: /etc/subversion: Is a directory


    Attaching a new script to fix that error with code to display default runlevel.


    > ======== grep -v '^#' /etc/resolv.conf ==========
    > ; generated by /sbin/dhclient-script


    What the hell, That semi-colon should not be there. Look at mine
    $ cat /fc7/etc/resolv.conf
    nameserver 192.168.1.1

    > search myhome.westell.com
    > nameserver 192.168.1.1
    > nameserver 192.168.1.1


    Had you followed my instructions, /etc/resolv.conf would have had just
    nameserver 192.168.1.1

    At this point, I have no idea if your dhcp clint is helping us into
    the ditch. Run this commands:

    echo "nameserver 192.168.1.1" > /etc/resolv.conf
    cat /etc/resolv.conf
    service network restart
    cat /etc/resolv.conf

    If resolv.conf reverts back to
    # generated by /sbin/dhclient-script
    search myhome.westell.com
    nameserver 192.168.1.1
    nameserver 192.168.1.1

    dhcp client is getting into your problem but it will not stop connectivity.
    We eill have to trap that alligator later.

    > ======== hostname ==========
    > alweiner.nowhere.invalid


    Thank, you thank you, thank you
    I would not have picked your user name, but hey, it is your system.


    > ========== head -15 /etc/hosts ===========
    > 127.0.0.1 alweiner.nowhere.invalid alweiner localhost
    > 192.168.1.1 gateway
    > 192.168.1.150 alweiner.invalid alweiner


    Frap, gotta love those gui tools and you need to pay attention to details.
    If you will notice you did not get the .150 line correct.
    Pop test, what is wrong with the 192.168.1.150 line?

    You lucked out because of the gui help.
    Your nodename in /etc/sysconfig/network should match the one in /etc/hosts.


    READ MY LIPS, you are to delete contents of /etc/hosts
    cut the following and paste them into your hosts file.

    127.0.0.1 localhost.localdomain localhost
    192.168.1.150 alweiner.nowhere.invalid alweiner
    ::1 localhost6.localdomain6 localhost6

    When you modify a config file, you should always recheck your work by using
    cat fn_here and double check values.

    Except for the prompt, you should see something on your screen as follows:

    [root@alweiner ~]# cat /etc/hosts
    127.0.0.1 localhost.localdomain localhost
    192.168.1.150 alweiner.nowhere.invalid alweiner
    ::1 localhost6.localdomain6 localhost6
    [root@alweiner ~]#

    I had expected the gui to add the ::1 line. So much for assuming.

    WARING: Changing node/domain/ip addy for your node may cause you to
    lose gui dispay. Reboot is recommended.

    Node name changes can cause Mail Transport Agent (MTA), print server
    (cups) and whatnot to feel sad. You may have to fix their config files
    and/or restart their services.

    -------------------------------------------------------------------------------
    As promised, new-n-improved script follows:

    You can use diff to find changes. Example:

    diff -bBw my_script your_script



    ------------------ Script starts below this line ---------
    #!/bin/bash
    #************************************************* ************
    #*
    #* xx - Dump network config files and network hardware status
    #*
    #* Output: a.txt linux file
    #* doza.txt Windows file
    #*
    #************************************************* ************

    _fn=a.txt
    _out_fn=$PWD/$_fn
    _dos_fn=$PWD/dos${_fn}
    _home=$PWD

    function cat_fn
    {
    _fn=$1
    if [ -f $_fn ] ; then
    _count=$(stat -c %s $_fn )
    if [ $_count -gt 0 ] ; then
    echo "======== cat $_fn ==========" >> $_out_fn
    cat $_fn >> $_out_fn
    fi
    fi
    } # end cat_fn

    function grep_fn
    {
    _fn=$1
    if [ -e $_fn ] ; then
    _count=$(stat -c %s $_fn )
    if [ $_count -gt 0 ] ; then
    _count=$(grep -v '^#' $_fn | wc -l)
    if [ $_count -gt 0 ] ; then
    echo "======== grep -v '^#' $_fn ==========" >> $_out_fn
    if [ "$_fn" != "shorewall.conf" ] ; then
    grep -v '^#' $_fn >> $_out_fn
    else
    awk 'empty{if (!/^#/) print; empty=0} /^$/{empty=1}' $_fn >> $_out_fn
    fi
    fi
    fi
    fi
    } # end grep_fn

    function ls_dir
    {
    _dr=$1
    if [ -d $_dr ] ; then
    echo "========= cd $_dr ; ls -al ========" >> $_out_fn
    cd $_dr
    ls -al >> $_out_fn
    fi
    } # end ls_dir

    function tail_fn
    {
    _fn=$1
    if [ -e $_fn ] ; then
    echo "======== tail -18 $_fn ==========" >> $_out_fn
    tail -18 $_fn >> $_out_fn
    fi
    } # end tail_fn

    #********************************
    # check if commands are in $PATH
    # and if not add them to PATH
    #********************************

    _path=""
    type ifconfig > /dev/null 2>&1
    if [ $? -ne 0 ] ; then
    _path="${_path}/sbin:"
    fi

    type cat > /dev/null 2>&1
    if [ $? -ne 0 ] ; then
    _path="${_path}/bin:"
    fi

    type id > /dev/null 2>&1
    if [ $? -ne 0 ] ; then
    _path="${_path}/usr/bin:"
    fi

    if [ -n "$_path" ] ; then
    PATH=${_path}$PATH
    export PATH
    fi

    #********************************
    # check if root and logged in correctly
    #********************************

    _uid=$(id --user)

    if [ $_uid -ne 0 ] ; then
    echo " "
    echo "You need to be root to run $0"
    echo "CLick up a terminal and do the following:"
    echo " "
    echo "su - root"
    echo "$PWD/xx"
    echo " "
    echo "or "
    echo " "
    echo "sudo -i"
    echo "$PWD/xx"
    echo " "
    exit 1
    fi

    root_flg=1

    if [ -n "$LOGNAME" ] ; then
    if [ "$LOGNAME" != "root" ] ; then
    root_flg=0
    fi
    fi

    if [ -n "$USER" ] ; then
    if [ "$USER" != "root" ] ; then
    root_flg=0
    fi
    fi

    if [ $root_flg -eq 0 ] ; then
    echo " "
    echo "Guessing you did a su root"
    echo "instead of a su - root"
    echo "please exit/logout of this session and do the following:"
    echo " "
    echo "su - root"
    echo "$PWD/xx"
    echo " "
    echo "or "
    echo " "
    echo "sudo -i"
    echo "$PWD/xx"
    echo " "
    exit 1
    fi


    #********************************
    # main code starts here
    #********************************


    echo "Working, output will be in $_out_fn "

    date > $_out_fn
    chmod 666 $_out_fn

    if [ -n "$_path" ] ; then
    echo "======== echo $PATH ==========" >> $_out_fn
    echo "$PATH" >> $_out_fn 2>&1
    fi

    cat_fn /etc/product.id

    for _d in /etc/*release ; do
    if [ ! -d $_d ] ; then
    echo "======== cat $_d ==========" >> $_out_fn
    cat $_d >> $_out_fn
    fi
    done


    echo "======== uname -rvi =============" >> $_out_fn
    uname -rvi >> $_out_fn

    for _d in /etc/*version ; do
    if [ ! -d $_d ] ; then
    echo "======== cat $_d ==========" >> $_out_fn
    cat $_d >> $_out_fn
    fi
    done

    cat_fn /proc/*version

    type lsb_release > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
    echo "======== lsb_release -a ==========" >> $_out_fn
    lsb_release -a >> $_out_fn 2>&1
    fi

    echo " " >> $_out_fn
    if [ -n "$SECURE_LEVEL" ] ; then
    echo "msec security level is $SECURE_LEVEL" >> $_out_fn
    fi

    echo "======== free ==========" >> $_out_fn
    free >> $_out_fn 2>&1
    echo " " >> $_out_fn

    if [ -e /etc/inittab ] ; then
    _line=$(grep :initdefault /etc/inittab)
    set -- $(IFS=':'; echo $_line)
    echo " " >> $_out_fn
    echo "Default run level is $2" >> $_out_fn
    echo " " >> $_out_fn
    fi

    type chkconfig > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
    echo "======== chkconfig --list ==========" >> $_out_fn
    for _serv in avahi named tmdns ; do
    chkconfig --list | grep -i $_serv > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
    echo "Double check if /$_serv/ needs to be disabled on boot" >> $_out_fn
    chkconfig --list | grep -i $_serv >> $_out_fn
    fi
    done

    chkconfig --list >> $_out_fn

    else
    echo "======== ls -o /etc/rcS.d/ ==========" >> $_out_fn
    for _serv in avahi named tmdns ; do
    ls /etc/rcS.d/S* | grep $_serv > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
    echo "Double check if /$_serv/ needs to be disabled on boot" >> $_out_fn
    fi
    done

    ls -o /etc/rcS.d >> $_out_fn
    fi

    _fn=/etc/nsswitch.conf
    if [ -e $_fn ] ; then
    echo "======== grep hosts: $_fn ==========" >> $_out_fn
    grep hosts: $_fn >> $_out_fn
    fi

    grep_fn /etc/resolv.conf

    grep_fn /etc/resolvconf/resolv.conf.d/head
    cat_fn /etc/resolvconf/resolv.conf.d/base
    cat_fn /etc/resolvconf/resolv.conf.d/tail


    echo "======== hostname ==========" >> $_out_fn
    hostname >> $_out_fn

    cat_fn /etc/netprofile/profiles/default/files/etc/hosts
    cat_fn /etc/hostname
    cat_fn /etc/HOSTNAME

    ls /etc/mod*.conf > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
    echo "======== grep eth /etc/mod*.conf ==========" >> $_out_fn
    grep eth /etc/mod*.conf >> $_out_fn
    fi

    cat_fn /etc/dhclient-enter-hooks
    cat_fn /etc/dhclient-exit-hooks

    grep_fn /etc/host.conf

    echo "================ ifconfig -a ==============" >> $_out_fn
    ifconfig -a >> $_out_fn

    cat_fn /etc/iftab
    cat_fn /etc/udev/rules.d/61-net_config.rules

    echo "============== route -n =================" >> $_out_fn
    route -n >> $_out_fn

    cat_fn /etc/sysconfig/network/routes

    cat_fn /etc/sysconfig/network
    grep_fn /etc/mkinitramfs/initramfs.conf

    echo "========== head -15 /etc/hosts ===========" >> $_out_fn
    head -15 /etc/hosts >> $_out_fn

    cat_fn /etc/network/interfaces
    cat_fn /var/run/network/ifstate


    _cmd=""
    type ethtool > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
    _cmd="ethtool"
    fi

    type mii-tool > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
    _cmd="mii-tool -v"
    fi

    if [ -z "$_cmd" ] ; then
    echo "==== mii-tool/ethtool NOT INSTALLED ====" >> $_out_fn
    fi

    for nic in 0 1 2 ; do

    if [ -n "$_cmd" ] ; then
    $_cmd eth$nic > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
    echo "======== $_cmd eth$nic ==========" >> $_out_fn
    $_cmd eth$nic >> $_out_fn
    fi
    fi

    echo "=== dmesg | grep eth$nic | grep -v SRC= ===" >> $_out_fn
    dmesg | grep eth$nic | grep -v SRC= >> $_out_fn

    echo "=== grep eth$nic /var/log/messages | tail -10 ===" >> $_out_fn
    grep eth$nic /var/log/messages | tail -10 >> $_out_fn

    cat_fn /etc/sysconfig/network-scripts/ifcfg-eth$nic

    ifconfig eth$nic > /dev/null 2>&1
    if [ $? -eq 0 ] ; then
    set $(ifconfig eth$nic | tr [A-Z] [a-z])
    cat_fn /etc/sysconfig/network/ifcfg-eth-id-$5
    fi

    tail_fn /var/lib/dhcp/dhclient-eth${nic}.leases
    tail_fn /var/lib/dhclient/dhclient-eth${nic}.leases
    tail_fn /etc/dhcpc/dhcpcd-eth${nic}.info

    done # end for nic in 0 1 2 ; do

    _dir=/etc/NetworkManager/dispatcher.d
    if [ -d $_dir ] ; then
    ls_dir $_dir

    for _d in "if-up.d" "if-down.d" "if-pre-up.d" "if-post-down.d" ; do
    if [ -e /etc/network/${_d} ] ; then
    echo "==== cd /etc/network/${_d} ; ls -al ===" >> $_out_fn
    cd /etc/network/${_d}
    ls -al >> $_out_fn
    fi
    done
    fi

    if [ -d /etc/sysconfig/network-scripts ] ; then
    for _d in "ifdown.d" "ifup.d" ; do
    if [ -e /etc/sysconfig/network-scripts/${_d} ] ; then
    _cmd="cd /etc/sysconfig/network-scripts/${_d} ; ls -al "
    echo "===== $_cmd ====" >> $_out_fn
    cd /etc/sysconfig/network-scripts/${_d}
    ls -al >> $_out_fn
    fi
    done
    fi

    ls_dir /etc/dhcp3/dhclient-exit-hooks.d
    ls_dir /etc/resolvconf/update.d


    if [ -d /etc/shorewall ] ; then
    _count=$(chkconfig --list shorewall | grep -c n )
    if [ $_count -gt 0 ] ; then
    echo "======= Shorewall settings =========" >> $_out_fn
    cd /etc/shorewall
    for _f in $(ls) ; do
    echo "======= $_f =========" >> $_out_fn
    grep_fn $_f
    done
    fi
    fi


    cd $_home

    grep_fn /etc/hosts.allow
    grep_fn /etc/hosts.deny
    echo "==== end of config/network data dump =======" >> $_out_fn

    awk '{print $0 "\r" }' $_out_fn > $_dos_fn
    chmod 666 $_dos_fn


    echo " "
    echo "If posting via linux, post contents of $_out_fn"
    echo "You might want to copy it to your account with the command"
    echo "cp $_out_fn ~your_login"
    echo " "
    echo "If posting via windows, post contents of $_dos_fn"
    echo " "
    echo "If using diskette,"
    echo "Copy $_dos_fn to diskette with the following commands:"
    echo " "
    echo "mkdir -p /floppy"
    echo "mount -t auto /dev/fd0 /floppy"
    echo "cp $_dos_fn /floppy"
    echo "umount /floppy "
    echo " "
    echo "and $_dos_fn is ready for windows from diskette"
    echo " "

    #*********** end of dump xx.txt script *********

  3. Re: Troubleshooting connection loss (continued)

    On Sun, 11 Nov 2007 15:04:51 +0000, Timothy Murphy wrote:

    > Are you talking about 2 computers,
    > or a single dual-boot computer?


    He has one computer connected to a adsl router.


    > If the latter, how does your dhcp server
    > know which OS is being used on the computer connected to it?


    router's dhcp server looks at MAC value to know who is talking to it. :-D

    In my stupid opinion, the router should see the dhcp renew/rebind
    request from the same nic and should extend/issue the same lease
    regardless of what OSs created the initial connection.

    What I am not sure about, in the router software, is if WinME gets a
    netbios lease, Allen then boots fedora.
    Router waits for a netbios lease renewal, times out, and blows away
    fedora's connection.

    Having finally gotten Allen to set both OSs static, he should have a
    stable connection, regardless of what system was running before boot.

    If so, we have solved Allen's connection problem, but do not have a
    working solution which Allen desires.

    I think I might have to poke him a little harder and ask him to read
    http://www.catb.org/~esr/faqs/smart-questions.html

  4. Re: Troubleshooting connection loss (continued)

    Bit Twister wrote:
    > On Sun, 11 Nov 2007 12:22:46 GMT, Allen Weiner wrote:
    >> Bit Twister wrote:
    >> I don't understand *any* of the above.

    >
    > Most were actual commands. If you did not understand the command, do a
    > man command
    >
    > when in doubt, try command with junk names and check results. Example:
    >
    > Given
    > cp /dev/null /var/lib/dhclient/dhclient-eth0.leases
    >
    > do cp /var/lib/dhclient/dhclient-eth0.leases junk
    > cp /dev/null junk
    > cat junk
    >
    > Now you know what the "cp /dev/null fn" does.
    >

    Thanks very much for your continuing help and patience. My primary lack
    of understanding is the purpose of the commands. I'm totally missing the
    strategy. I suspected that the copy of /dev/null into /var/lib/dhclient
    was a means of erasing /var/lib/dhclient. But to me, it's a puzzling and
    unconventional way of erasing a file. (Remember, I'm a refugee from
    Windows.) If there was a comment "clear the file", I would use Kedit and
    clear the file. Using /dev/null seems to me a "power user" trick.
    >
    >
    >> I've now got Fedora configured
    >> for static IP of 150 and WinME configured for static IP of 140. The
    >> dhclient-leases lists an expiration date of 11/7 (today is 11/11).

    >
    > One of those commands was to empty the lease file so it would be ruled
    > out of the suspect list. It was
    > cp /dev/null /var/lib/dhclient/dhclient-eth0.leases
    >
    >
    >> I'll post my config-dump below. How does it look?
    >>
    >> So could you please clarify what is the next step in troubleshooting.

    >
    > My Karnack gene is defective, what problem?
    >
    > You have to tell me what is your problem /now/.
    > I know what the original problem was.
    >
    > Are you saying both OSs set static and you are having connection
    > problems, or what?
    >

    What I meant by the question is, are there additional steps I need to do
    so that I can do effective troubleshooting if the original problem
    (connection loss) happens again. You appear to be assuming that you've
    diagnosed the connection loss problem, repaired it, and it will not
    happen again.


    >
    >




    >
    >
    >


    >> ======== grep -v '^#' /etc/resolv.conf ==========
    >> ; generated by /sbin/dhclient-script

    >
    > What the hell, That semi-colon should not be there. Look at mine
    > $ cat /fc7/etc/resolv.conf
    > nameserver 192.168.1.1
    >
    >> search myhome.westell.com
    >> nameserver 192.168.1.1
    >> nameserver 192.168.1.1

    >
    > Had you followed my instructions, /etc/resolv.conf would have had just
    > nameserver 192.168.1.1
    >

    You explained that removing the "search westell" is a performance
    optimization. For the time being, I'm making only changes necessary for
    troubleshooting, unless I can see (from my novice knowledge base) that
    the change is not potentially harmful. BTW, thanks very much for
    mentioning "Rescue mode" if Fedora becomes unbootable. This thread is a
    real learning experience.

    > At this point, I have no idea if your dhcp clint is helping us into
    > the ditch. Run this commands:
    >
    > echo "nameserver 192.168.1.1" > /etc/resolv.conf
    > cat /etc/resolv.conf
    > service network restart
    > cat /etc/resolv.conf


    How about if I use Kedit to just change the comment (and nothing else)
    to some garbage sentence? This would eliminate any chance of side-effects.


    >
    > If resolv.conf reverts back to
    > # generated by /sbin/dhclient-script
    > search myhome.westell.com
    > nameserver 192.168.1.1
    > nameserver 192.168.1.1
    >
    > dhcp client is getting into your problem but it will not stop connectivity.
    > We eill have to trap that alligator later.
    >
    >> ======== hostname ==========
    >> alweiner.nowhere.invalid

    >
    > Thank, you thank you, thank you
    > I would not have picked your user name, but hey, it is your system.


    My user name is "aweiner".
    >
    >
    >> ========== head -15 /etc/hosts ===========
    >> 127.0.0.1 alweiner.nowhere.invalid alweiner localhost
    >> 192.168.1.1 gateway
    >> 192.168.1.150 alweiner.invalid alweiner

    >
    > Frap, gotta love those gui tools and you need to pay attention to details.


    What gui tool are you referring to? I edited the file with Kedit.

    > If you will notice you did not get the .150 line correct.
    > Pop test, what is wrong with the 192.168.1.150 line?
    >


    > You lucked out because of the gui help.
    > Your nodename in /etc/sysconfig/network should match the one in /etc/hosts.
    >
    >
    > READ MY LIPS, you are to delete contents of /etc/hosts
    > cut the following and paste them into your hosts file.
    >
    > 127.0.0.1 localhost.localdomain localhost
    > 192.168.1.150 alweiner.nowhere.invalid alweiner
    > ::1 localhost6.localdomain6 localhost6


    What's wrong with just fixing the FQDN of 192.168.1.150? I don't
    understand that third line.
    >
    > When you modify a config file, you should always recheck your work by using
    > cat fn_here and double check values.
    >

    what is "fn_here"?

    > Except for the prompt, you should see something on your screen as follows:
    >
    > [root@alweiner ~]# cat /etc/hosts
    > 127.0.0.1 localhost.localdomain localhost
    > 192.168.1.150 alweiner.nowhere.invalid alweiner
    > ::1 localhost6.localdomain6 localhost6
    > [root@alweiner ~]#
    >
    > I had expected the gui to add the ::1 line. So much for assuming.
    >

    Again, I used Kedit.



  5. Re: Troubleshooting connection loss (continued)

    Bit Twister wrote:

    < snip>
    >
    > In my stupid opinion, the router should see the dhcp renew/rebind
    > request from the same nic and should extend/issue the same lease
    > regardless of what OSs created the initial connection.
    >
    > What I am not sure about, in the router software, is if WinME gets a
    > netbios lease, Allen then boots fedora.
    > Router waits for a netbios lease renewal, times out, and blows away
    > fedora's connection.


    I don't know if this is relevant to your diagnosis of my connection-loss
    problem. Most of the time, I only run WinME on Saturdays. The Fedora
    connection-loss problem happens (apparently) randomly throughout the week.
    >


    >
    > I think I might have to poke him a little harder and ask him to read
    > http://www.catb.org/~esr/faqs/smart-questions.html


    Is this about the point you made about me not snipping enough when I
    reply, or something else?

  6. Re: Troubleshooting connection loss (continued)

    On Sun, 11 Nov 2007 20:57:13 GMT, Allen Weiner wrote:
    > Bit Twister wrote:
    >>


    Still need to start trimming a bit more please.

    > Thanks very much for your continuing help and patience. My primary lack
    > of understanding is the purpose of the commands.


    Now, that is a different story. :-)
    I am subscribed to 130 news groups, and I whip through those providing
    commands when I can.
    So far you have been one of the few who realy want to know what
    is going on. So I have been adding bunches of information for you.
    Order of commands and the commands set the system to a know state.
    Telling you why the command was needed, gets me into typing the rest
    of the day.

    > I'm totally missing the strategy.
    > I suspected that the copy of /dev/null into /var/lib/dhclient
    > was a means of erasing /var/lib/dhclient.


    Hot dang. You are keeping up.

    > But to me, it's a puzzling


    Well, I have no problem with you asking the question of why not do it
    this way......
    Then I can give you the reason for not doing someting, as you will see next.

    > and
    > unconventional way of erasing a file. (Remember, I'm a refugee from
    > Windows.) If there was a comment "clear the file",


    Oh, no, Nature is constantly improving the idiot.
    If they cannot cut/paste the command, then there is a good chance of
    Murphy being able to do his best. :-(


    > I would use Kedit and


    Yeah, but, downside to that is if leave a backup file with the Tilde
    on the end.

    While on that subject, I want you to do a

    ls /etc/sysconfig/network-scripts/ifcfg-eth0*

    If there is a /etc/sysconfig/network-scripts/ifcfg-eth0~
    I want you to "delete/remove it". See, much simpler to say

    rm /etc/sysconfig/network-scripts/ifcfg-eth0~

    > clear the file. Using /dev/null seems to me a "power user" trick.


    Hehe, the "power user trick" would be
    >/var/lib/dhclient/dhclient-eth0.leases


    But the idiot think the > is part of usenet quoting.



    Dang had to snip 21 lines which you should have trimmed.
    That is a rudness which I can get tired of pretty quick.


    >> Are you saying both OSs set static and you are having connection
    >> problems, or what?
    >>

    > What I meant by the question is, are there additional steps I need to do
    > so that I can do effective troubleshooting if the original problem
    > (connection loss) happens again. You appear to be assuming that you've
    > diagnosed the connection loss problem, repaired it, and it will not
    > happen again.


    No, if you know have both systems using static address, fedora no
    longer loses connectivity after dose used the connection.

    Now that both systems are static, we know that we have a dhcp issue,
    modem server or fedora dhcp client.

    You indicated second fedora reboot using dhcp ran ok.
    To me the router is the culprit.

    You have also indicated you wanted to run doze with dhcp.
    No problem, set it dhcp, fedora static, boot doze, boot fedora and see
    if connection drops. If not, there is the /working/ solution.

    My SWAG, on doze shutdown, no dhcp release is issued, modem is half
    smart and knows it was doze who should be using the .47 ip and refuses
    to allow fedora to use the lease.
    fedora shutdown does a dhcp release, you boot fedora again and router
    allows use of the .47 lease to work like it is supposed to.

    >
    >


    Yea, thanks. is good enough if you realy want to add them.


    >>
    >>> search myhome.westell.com
    >>> nameserver 192.168.1.1


    > You explained that removing the "search westell" is a performance
    > optimization. For the time being, I'm making only changes necessary for
    > troubleshooting, unless I can see (from my novice knowledge base) that
    > the change is not potentially harmful.


    I hear where you are comming from, but why have myhome.westell.com
    looking up ip addresses for you.
    I consider that a security risk.
    Your dns resolver will try a search there, then ask the nameserver.


    > BTW, thanks very much for
    > mentioning "Rescue mode" if Fedora becomes unbootable. This thread is a
    > real learning experience.


    Yeah, as an oh by the way, you can you can make it a practice of
    copying the files into /root/hold or some such thing and copy them
    back in the rescue mode.

    As for the dhcp/static, just changing BOOTPROTO= back to dhcp value
    in /etc/sysconfig/network-scripts/ifcfg-eth0
    would have you booting dhcp :-)

    >
    >> At this point, I have no idea if your dhcp clint is helping us into
    >> the ditch. Run this commands:
    >>
    >> echo "nameserver 192.168.1.1" > /etc/resolv.conf
    >> cat /etc/resolv.conf
    >> service network restart
    >> cat /etc/resolv.conf

    >
    > How about if I use Kedit to just change the comment (and nothing else)
    > to some garbage sentence? This would eliminate any chance of side-effects.


    I run under general rules.
    You do not go adhoc'ing config files.
    You only change the data to be what it needs to be changed, and
    contents are as close to original as can be.
    You always make sure the last line has a carriage return.

    It depends on the code reading the config file as to what you can get
    away with. Example:
    nameserver 192.168.1.1 # router ip
    may not work

    cat /etc/resol.conf
    # router ip
    nameserver 192.168.1.1
    might work.


    cat /etc/resol.conf
    # router ip
    nameserver 192.168.1.1
    # verizon fallback dns server
    nameserver 68.238.96.12

    might not work.
    cat /etc/resol.conf
    # 1'st is router ip
    # 2'nd is verizon fallback dns server
    nameserver 192.168.1.1
    nameserver 68.238.96.12
    would work.



    Window newbies using editors tend to not remember to add the carriage return.

    Example: I wanted resolv.conf to have just
    nameserver 192.168.1.1

    Now the newbie will use the editor to delete everything, just paste
    nameserver 192.168.1.1, Save and quit.

    When you run the xx script you will see the mistake in a.txt as

    ======== grep -v '^#' /etc/resolv.conf ==========
    nameserver 192.168.1.1======== hostname ==========

    instead of
    ======== grep -v '^#' /etc/resolv.conf ==========
    nameserver 192.168.1.1
    ======== hostname ==========


    The echo command makes sure that I get the trailing carriage return
    and /etc/resolv.conf will have just what I wanted.

    Dang, Had to trim 12 more lines.

    >>> ======== hostname ==========
    >>> alweiner.nowhere.invalid

    >>
    >> Thank, you thank you, thank you
    >> I would not have picked your user name, but hey, it is your system.

    >
    > My user name is "aweiner".


    Hehe, Ok,


    >>
    >>
    >>> ========== head -15 /etc/hosts ===========
    >>> 127.0.0.1 alweiner.nowhere.invalid alweiner localhost
    >>> 192.168.1.1 gateway
    >>> 192.168.1.150 alweiner.invalid alweiner

    >>
    >> Frap, gotta love those gui tools and you need to pay attention to details.

    >
    > What gui tool are you referring to?


    Assumed you used the network gui which has a tab to manage host/domain.
    They have the bad habit of putting the node name in the 127.0.0.1 line.

    > I edited the file with Kedit.


    Then you did not follow the example given.

    >> If you will notice you did not get the .150 line correct.
    >> Pop test, what is wrong with the 192.168.1.150 line?
    >>

    >
    >> You lucked out because of the gui help.
    >> Your nodename in /etc/sysconfig/network should match the one in /etc/hosts.
    >>
    >>
    >> READ MY LIPS, you are to delete contents of /etc/hosts
    >> cut the following and paste them into your hosts file.
    >>
    >> 127.0.0.1 localhost.localdomain localhost
    >> 192.168.1.150 alweiner.nowhere.invalid alweiner
    >> ::1 localhost6.localdomain6 localhost6

    >
    > What's wrong with just fixing the FQDN of 192.168.1.150? I don't
    > understand that third line.


    It is there incase you enable ipv 6 and is localhost 120.0.0.1 in
    ipv6 format.

    >> When you modify a config file, you should always recheck your work by using
    >> cat fn_here and double check values.
    >>

    > what is "fn_here"?


    Dang, there goes all your Gold stars and Atta Boys, :-(

    I want you to do a
    cat /whatever/file/you/just/modifed/displayed_on_the_screen_so_you_can_check_it

    so you can make sure contents are correct and you have a trailing
    carriage return.

    >> Except for the prompt, you should see something on your screen as follows:
    >>
    >> [root@alweiner ~]# cat /etc/hosts
    >> 127.0.0.1 localhost.localdomain localhost
    >> 192.168.1.150 alweiner.nowhere.invalid alweiner
    >> ::1 localhost6.localdomain6 localhost6
    >> [root@alweiner ~]#
    >>
    >> I had expected the gui to add the ::1 line. So much for assuming.
    >>

    > Again, I used Kedit.


    And you missed the point. I wanted the cat /etc/hosts to look like

    [root@alweiner ~]# cat /etc/hosts
    127.0.0.1 localhost.localdomain localhost
    192.168.1.150 alweiner.nowhere.invalid alweiner
    ::1 localhost6.localdomain6 localhost6
    [root@alweiner ~]#


    not like
    [root@alweiner ~]# cat /etc/hosts
    127.0.0.1 localhost.localdomain localhost
    192.168.1.150 alweiner.invalid alweiner
    ::1 localhost6.localdomain6 localhost6
    [root@alweiner ~]#


    and not like
    [root@alweiner ~]# cat /etc/hosts
    127.0.0.1 localhost.localdomain localhost
    192.168.1.150 alweiner.nowhere.invalid alweiner
    ::1 localhost6.localdomain6 localhost6[root@alweiner ~]#

  7. Re: Troubleshooting connection loss (continued)

    On Sun, 11 Nov 2007 21:36:23 GMT, Allen Weiner wrote:
    >
    > I don't know if this is relevant to your diagnosis of my connection-loss
    > problem. Most of the time, I only run WinME on Saturdays. The Fedora
    > connection-loss problem happens (apparently) randomly throughout the week.


    Ah Frap. Nothing like spending hours troubleshooting the wrong problem.
    Ok, final SWAG. Your router looses it's mind every once in awhile.

    Why, you ask. If fedora runs with dhcp ip for more than a day, you know
    it was able to renew/rebind the lease and keep the connection.
    Your ifconfig showed no errors/dropped/overruns/frame tx/rx hardware problems.

    AS a matter of fact, while editing that big long reply about 2 to 3
    replies back, my modem lost it's mind and the post failed.
    Leds normal.

    Tried pinging yahoo.com failed. did a service network restart. still failed.
    pinged modem. worked. What the F? First time I had this problem since
    getting FiOS
    click router web page. hangs. Dang. Power cycled modem it worked.

    Stupid, stupid, stupid. Should have pinged yahoo.com ip first to
    see if it was modem dns problem. Maybe dns server(s) in modem were AFU.
    Should have pinged them. Guess I'll write a little script to
    troubleshoot the problem.

    Adding fallback dns nameserver to my resolve.conf as I type.

    >> I think I might have to poke him a little harder and ask him to read
    >> http://www.catb.org/~esr/faqs/smart-questions.html

    >
    > Is this about the point you made about me not snipping enough when I
    > reply, or something else?


    Hehehe, well that is in there also, but mosly about you thinking about the
    question(s) you ask. :-)

    Eveytime you had to say What I meant was,.... should tell you where
    your wheel ran off. :-D

  8. Re: Troubleshooting connection loss (continued)

    Bit Twister wrote:
    > On Sun, 11 Nov 2007 21:36:23 GMT, Allen Weiner wrote:
    >> I don't know if this is relevant to your diagnosis of my connection-loss
    >> problem. Most of the time, I only run WinME on Saturdays. The Fedora
    >> connection-loss problem happens (apparently) randomly throughout the week.

    >
    > Ah Frap. Nothing like spending hours troubleshooting the wrong problem.
    > Ok, final SWAG. Your router looses it's mind every once in awhile.
    >

    Given this new theory, what troubleshooting steps would you recommend
    the next time I get a connection loss and "service network retart" hangs.

    I had another connection loss this afternoon.

    Following is troubleshooting info plus recent online history.

    11/10 Booted WinME with dynamic IP at approx 3:00 PM. Modem was not
    powered on. Powered modem on at 8:00 PM, then rebooted into Fedora
    (static IP). Changed Fedora static IP address and hostname.

    11/11 Booted WinME with dynamic IP at 6:40 AM. Modem was not powered on.
    Configured WinME to static IP. Modem powered on around 7:00 AM. Rebooted
    into Fedora. System powered off at 8:40 AM.

    11/11 Booted Fedora at 3:00 PM.

    5:18 PM Connection loss (approx 2 hours into session, as with many
    other instances)
    5:22 PM Powered off modem and disconnected ethernet cable.
    5:29 PM Reconnected ethernet cable and powered up modem.
    5:35 PM ran Bit Twister script (dumps configuration info)
    5:38 PM Issued "service network restart", which hung.

    (I did not try ethtool -r eth0)

    Following is troubleshooting data (taken after connection loss but
    before "service network restart":

    Sun Nov 11 17:35:59 EST 2007
    ======== cat /etc/fedora-release ==========
    Fedora release 7 (Moonshine)
    ======== cat /etc/redhat-release ==========
    Fedora release 7 (Moonshine)
    ======== uname -rvi =============
    2.6.23.1-21.fc7 #1 SMP Thu Nov 1 21:09:24 EDT 2007 i386
    ======== lsb_release -a ==========
    LSB Version:
    :core-3.1-ia32:core-3.1-noarch:graphics-3.1-ia32:graphics-3.1-noarch
    Distributor ID: Fedora
    Description: Fedora release 7 (Moonshine)
    Release: 7
    Codename: Moonshine

    ======== free ==========
    total used free shared buffers cached
    Mem: 125128 122408 2720 0 2464 37924
    -/+ buffers/cache: 82020 43108
    Swap: 771080 173584 597496


    Default run level is 5

    ======== chkconfig --list ==========
    Double check if /avahi/ needs to be disabled on boot
    avahi-daemon 0ff 1ff 2ff 3n 4n 5ff 6ff
    avahi-dnsconfd 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    Double check if /named/ needs to be disabled on boot
    named 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    ConsoleKit 0ff 1ff 2ff 3n 4n 5n 6ff
    NetworkManager 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    NetworkManagerDispatcher 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    acpid 0ff 1ff 2ff 3n 4n 5n 6ff
    anacron 0ff 1ff 2n 3n 4n 5n 6ff
    apmd 0ff 1ff 2n 3n 4n 5n 6ff
    atd 0ff 1ff 2ff 3n 4n 5n 6ff
    autofs 0ff 1ff 2ff 3n 4n 5n 6ff
    avahi-daemon 0ff 1ff 2ff 3n 4n 5ff 6ff
    avahi-dnsconfd 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    bluetooth 0ff 1ff 2n 3n 4n 5ff 6ff
    capi 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    cpuspeed 0ff 1n 2n 3n 4n 5ff 6ff
    crond 0ff 1ff 2n 3n 4n 5n 6ff
    cups 0ff 1ff 2n 3n 4n 5ff 6ff
    dhcdbd 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    dund 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    firestarter 0ff 1ff 2n 3n 4n 5n 6ff
    firstboot 0ff 1ff 2ff 3n 4ff 5ff 6ff
    gkrellmd 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    gpm 0ff 1ff 2n 3n 4n 5n 6ff
    haldaemon 0ff 1ff 2ff 3n 4n 5n 6ff
    hddtemp 0ff 1ff 2ff 3ff 4ff 5n 6ff
    hidd 0ff 1ff 2n 3n 4n 5ff 6ff
    hplip 0ff 1ff 2n 3n 4n 5ff 6ff
    httpd 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    ip6tables 0ff 1ff 2n 3n 4n 5ff 6ff
    iptables 0ff 1ff 2ff 3ff 4ff 5n 6ff
    irda 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    irqbalance 0ff 1ff 2n 3n 4n 5ff 6ff
    isdn 0ff 1ff 2n 3n 4n 5ff 6ff
    kdump 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    kudzu 0ff 1ff 2ff 3n 4n 5n 6ff
    lisa 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    lm_sensors 0ff 1ff 2n 3n 4n 5ff 6ff
    mcstrans 0ff 1ff 2n 3n 4n 5n 6ff
    mdmonitor 0ff 1ff 2n 3n 4n 5ff 6ff
    messagebus 0ff 1ff 2ff 3n 4n 5n 6ff
    named 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    nasd 0ff 1ff 2ff 3n 4n 5n 6ff
    netconsole 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    netfs 0ff 1ff 2ff 3n 4n 5ff 6ff
    netplugd 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    network 0ff 1ff 2n 3n 4n 5n 6ff
    nfs 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    nfslock 0ff 1ff 2ff 3n 4n 5ff 6ff
    nscd 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    ntpd 0ff 1ff 2ff 3ff 4ff 5n 6ff
    pand 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    psacct 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    rdisc 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    readahead_early 0ff 1ff 2n 3n 4n 5n 6ff
    readahead_later 0ff 1ff 2ff 3ff 4ff 5n 6ff
    restorecond 0ff 1ff 2n 3n 4n 5n 6ff
    rpcbind 0ff 1ff 2ff 3n 4n 5ff 6ff
    rpcgssd 0ff 1ff 2ff 3n 4n 5ff 6ff
    rpcidmapd 0ff 1ff 2ff 3n 4n 5ff 6ff
    rpcsvcgssd 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    saslauthd 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    sendmail 0ff 1ff 2n 3n 4n 5n 6ff
    smartd 0ff 1ff 2n 3n 4n 5n 6ff
    spamassassin 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    sshd 0ff 1ff 2n 3n 4n 5ff 6ff
    syslog 0ff 1ff 2n 3n 4n 5n 6ff
    tomcat5 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    vncserver 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    winbind 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    wpa_supplicant 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    xfs 0ff 1ff 2n 3n 4n 5n 6ff
    ypbind 0ff 1ff 2ff 3ff 4ff 5ff 6ff
    yum-updatesd 0ff 1ff 2ff 3n 4n 5n 6ff
    ======== grep hosts: /etc/nsswitch.conf ==========
    #hosts: db files nisplus nis dns
    hosts: files dns
    ======== grep -v '^#' /etc/resolv.conf ==========
    ; generated by /sbin/dhclient-script
    search myhome.westell.com
    nameserver 192.168.1.1
    nameserver 192.168.1.1
    ======== hostname ==========
    alweiner.nowhere.invalid
    ======== grep eth /etc/mod*.conf ==========
    alias eth0 e100
    ======== grep -v '^#' /etc/host.conf ==========
    order hosts,bind
    ================ ifconfig -a ==============
    eth0 Link encap:Ethernet HWaddr 00:07:E9:01:B2:09
    inet addr:192.168.1.150 Bcast:192.168.1.255 Mask:255.255.255.0
    inet6 addr: fe80::207:e9ff:fe01:b209/64 Scope:Link
    UP BROADCAST MULTICAST MTU:1500 Metric:1
    RX packets:4014 errors:0 dropped:0 overruns:0 frame:0
    TX packets:1942 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:498352 (486.6 KiB) TX bytes:208165 (203.2 KiB)

    lo Link encap:Local Loopback
    inet addr:127.0.0.1 Mask:255.0.0.0
    inet6 addr: ::1/128 Scope:Host
    UP LOOPBACK RUNNING MTU:16436 Metric:1
    RX packets:4144 errors:0 dropped:0 overruns:0 frame:0
    TX packets:4144 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:0
    RX bytes:5700347 (5.4 MiB) TX bytes:5700347 (5.4 MiB)

    ============== route -n =================
    Kernel IP routing table
    Destination Gateway Genmask Flags Metric Ref Use
    Iface
    192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
    169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
    0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 eth0
    ======== cat /etc/sysconfig/network ==========
    NETWORKING=yes
    HOSTNAME=alweiner.nowhere.invalid
    ========== head -15 /etc/hosts ===========
    127.0.0.1 alweiner.nowhere.invalid alweiner localhost
    192.168.1.1 gateway
    192.168.1.150 alweiner.nowhere.invalid alweiner
    ======== ethtool eth0 ==========
    Settings for eth0:
    Supported ports: [ TP MII ]
    Supported link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    Supports auto-negotiation: Yes
    Advertised link modes: 10baseT/Half 10baseT/Full
    100baseT/Half 100baseT/Full
    Advertised auto-negotiation: Yes
    Speed: 100Mb/s
    Duplex: Full
    Port: MII
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    Supports Wake-on: g
    Wake-on: g
    Current message level: 0x00000007 (7)
    Link detected: yes
    === dmesg | grep eth0 | grep -v SRC= ===
    e100: eth0: e100_probe: addr 0xfc9ff000, irq 11, MAC addr 00:07:E9:01:B2:09
    ADDRCONF(NETDEV_UP): eth0: link is not ready
    e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
    ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
    eth0: no IPv6 routers present
    NETDEV WATCHDOG: eth0: transmit timed out
    === grep eth0 /var/log/messages | tail -10 ===
    Nov 11 17:14:06 alweiner kernel: Inbound IN=eth0 OUT=
    MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
    DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3767 DF PROTO=TCP
    SPT=1197 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
    Nov 11 17:14:11 alweiner kernel: Inbound IN=eth0 OUT=
    MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
    DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3768 DF PROTO=TCP
    SPT=1197 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
    Nov 11 17:14:35 alweiner kernel: Inbound IN=eth0 OUT=
    MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
    DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3769 DF PROTO=TCP
    SPT=1197 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
    Nov 11 17:14:54 alweiner kernel: Inbound IN=eth0 OUT=
    MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
    DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3786 DF PROTO=TCP
    SPT=1198 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
    Nov 11 17:14:59 alweiner kernel: Inbound IN=eth0 OUT=
    MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
    DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3787 DF PROTO=TCP
    SPT=1198 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
    Nov 11 17:15:23 alweiner kernel: Inbound IN=eth0 OUT=
    MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
    DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3788 DF PROTO=TCP
    SPT=1198 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
    Nov 11 17:15:42 alweiner kernel: Inbound IN=eth0 OUT=
    MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
    DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3805 DF PROTO=TCP
    SPT=1199 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
    Nov 11 17:15:47 alweiner kernel: Inbound IN=eth0 OUT=
    MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
    DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3806 DF PROTO=TCP
    SPT=1199 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
    Nov 11 17:16:11 alweiner kernel: Inbound IN=eth0 OUT=
    MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
    DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3807 DF PROTO=TCP
    SPT=1199 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
    Nov 11 17:32:05 alweiner kernel: NETDEV WATCHDOG: eth0: transmit timed out
    ======== cat /etc/sysconfig/network-scripts/ifcfg-eth0 ==========
    # Intel Corporation 82557/8/9 [Ethernet Pro 100]
    DEVICE=eth0
    ONBOOT=yes
    BOOTPROTO=none
    HWADDR=00:07:e9:01:b2:09
    TYPE=Ethernet
    USERCTL=yes
    IPV6INIT=no
    PEERDNS=yes
    NETMASK=255.255.255.0
    IPADDR=192.168.1.150
    GATEWAY=192.168.1.1
    ======== tail -18 /var/lib/dhclient/dhclient-eth0.leases ==========
    rebind 3 2007/11/7 12:23:43;
    expire 3 2007/11/7 15:23:43;
    }
    lease {
    interface "eth0";
    fixed-address 192.168.1.47;
    option subnet-mask 255.255.255.0;
    option routers 192.168.1.1;
    option dhcp-lease-time 86400;
    option dhcp-message-type 5;
    option domain-name-servers 192.168.1.1,192.168.1.1;
    option dhcp-server-identifier 192.168.1.1;
    option broadcast-address 255.255.255.255;
    option domain-name "myhome.westell.com";
    renew 3 2007/11/7 05:23:24;
    rebind 3 2007/11/7 15:31:25;
    expire 3 2007/11/7 18:31:25;
    }
    === dmesg | grep eth1 | grep -v SRC= ===
    === grep eth1 /var/log/messages | tail -10 ===
    === dmesg | grep eth2 | grep -v SRC= ===
    === grep eth2 /var/log/messages | tail -10 ===
    ======== grep -v '^#' /etc/hosts.allow ==========

    ======== grep -v '^#' /etc/hosts.deny ==========

    ==== end of config/network data dump =======




  9. Re: Troubleshooting connection loss (continued)

    On Mon, 12 Nov 2007 00:57:23 GMT, Allen Weiner wrote:

    Allen, take note, there are no smiley faces/emoticons in this post.

    Read this whole reply before doing any changes.
    Respond to all my questions.
    Make the change to /etc/hosts last, and reboot.

    > Bit Twister wrote:
    >>
    >> Ah Frap. Nothing like spending hours troubleshooting the wrong problem.
    >> Ok, final SWAG. Your router looses it's mind every once in awhile.
    >>

    > Given this new theory, what troubleshooting steps would you recommend
    > the next time I get a connection loss


    Reset the router.

    > and "service network retart" hangs.


    Fix /etc/resolv.conf as will I suggest again.
    Fix /etc/host as I suggest, yet again.
    empty /var/lib/dhclient/dhclient-eth0.leases as I suggest.


    > 11/11 Booted WinME with dynamic IP at 6:40 AM. Modem was not powered on.
    > Configured WinME to static IP. Modem powered on around 7:00 AM. Rebooted
    > into Fedora. System powered off at 8:40 AM.


    Hmmm, new data, "Modem powered on" and "System powered off"

    > 11/11 Booted Fedora at 3:00 PM.
    > 5:18 PM Connection loss (approx 2 hours into session, as with many
    > other instances)


    Are the majority of the disconnects happening "approximately 2 hours"
    after the modem is powered up?

    > 5:22 PM Powered off modem and disconnected ethernet cable.
    > 5:29 PM Reconnected ethernet cable and powered up modem.


    You can discontinue that step, I was hopping cable dis/connect
    would reset the modems dhcp lease.

    > 5:35 PM ran Bit Twister script (dumps configuration info)
    > 5:38 PM Issued "service network restart", which hung.
    >
    > (I did not try ethtool -r eth0)


    That would indicate if modem and node did the handshake nic to nic.
    If link not OK that can cause a hang.
    Cannot rule that out yet because SOMEONE is not setting resolv.conf
    hosts, leases as requested.

    > Following is troubleshooting data (taken after connection loss but
    > before "service network restart":


    > ======== grep -v '^#' /etc/resolv.conf ==========
    > ; generated by /sbin/dhclient-script
    > search myhome.westell.com
    > nameserver 192.168.1.1
    > nameserver 192.168.1.1



    I realy, realy, realy, realy, realy, want you to do a
    echo "nameserver 192.168.1.1" > /etc/resolv.conf

    Hopping the ; is causing the restart hang and the SUGGESTION will fix
    your problem.

    Not doing the SUGGESTION, will force me to place you in my kill file.

    Do you know what a kill file is?


    > ========== head -15 /etc/hosts ===========
    > 127.0.0.1 alweiner.nowhere.invalid alweiner localhost
    > 192.168.1.1 gateway
    > 192.168.1.150 alweiner.nowhere.invalid alweiner



    For the last time. Change /etc/hosts to match the following:
    127.0.0.1 localhost
    192.168.1.1 gateway
    192.168.1.150 alweiner.nowhere.invalid alweiner


    That SUGGESTION, may also clear up your restart hang.
    having alweiner.nowhere.invalid resolving to 127.0.0.1 and 192.168.1.150
    is not fair to the system and will make
    ping alweiner.nowhere.invalid hide where a problem exists when trying
    to debug connection problems.

    > ======== tail -18 /var/lib/dhclient/dhclient-eth0.leases ==========
    > rebind 3 2007/11/7 12:23:43;
    > expire 3 2007/11/7 15:23:43;
    > }
    > lease {
    > interface "eth0";
    > fixed-address 192.168.1.47;
    > option subnet-mask 255.255.255.0;
    > option routers 192.168.1.1;
    > option dhcp-lease-time 86400;
    > option dhcp-message-type 5;
    > option domain-name-servers 192.168.1.1,192.168.1.1;
    > option dhcp-server-identifier 192.168.1.1;
    > option broadcast-address 255.255.255.255;
    > option domain-name "myhome.westell.com";
    > renew 3 2007/11/7 05:23:24;
    > rebind 3 2007/11/7 15:31:25;
    > expire 3 2007/11/7 18:31:25;
    > }


    I would like for you to do a

    cp /dev/null /var/lib/dhclient/dhclient-eth0.leases

    I want to rule out that your dhcp server is no longer running.

    The following is MANDATORY, Do a
    ls /etc/sysconfig/network-scripts/*~

    If you get any file names returned, you NEED to delete them.

    I DO NOT want any edit backup files (*~) in that directory.

  10. Re: Troubleshooting connection loss (continued)

    Bit Twister wrote:

    > Are the majority of the disconnects happening "approximately 2 hours"
    > after the modem is powered up?


    It seems that way. But I haven't kept a log book.
    >


    >


    >


    >
    >> ======== grep -v '^#' /etc/resolv.conf ==========
    >> ; generated by /sbin/dhclient-script
    >> search myhome.westell.com
    >> nameserver 192.168.1.1
    >> nameserver 192.168.1.1

    >
    >
    > I realy, realy, realy, realy, realy, want you to do a
    > echo "nameserver 192.168.1.1" > /etc/resolv.conf
    >
    > Hopping the ; is causing the restart hang and the SUGGESTION will fix
    > your problem.
    >
    > Not doing the SUGGESTION, will force me to place you in my kill file.
    >

    That's your choice. I did a Google search on resolv.conf & generated. I
    saw several examples similar to mine. Here's one:

    t 11:30 AM 12/30/2005, Jerry57 (GMail) wrote:
    >Hello Robert,
    >
    > What is listed in /etc/resolv.conf? You should have something like:
    > search my.domain
    > nameserver 10.0.0.1


    I got that:

    cat resolv.conf
    ; generated by /sbin/dhclient-script
    search htt-consult.com
    nameserver 65.84.78.211
    nameserver 65.84.78.209

    So, I doubt that that strange first line with the leading semicolon is
    causing a problem. If you choose to "plonk" me, let me take this
    opportunity to again thank you for all the help you've given me.
    >
    >



    >
    > The following is MANDATORY, Do a
    > ls /etc/sysconfig/network-scripts/*~
    >

    Result was "no such file or directory". There are no backup files in
    network-scripts.

  11. Re: Troubleshooting connection loss (continued)

    On Mon, 12 Nov 2007 04:46:20 GMT, Allen Weiner wrote:
    > Bit Twister wrote:
    >
    >> Are the majority of the disconnects happening "approximately 2 hours"
    >> after the modem is powered up?

    >
    > It seems that way. But I haven't kept a log book.


    My guess, is there might be a loose connection inside the modem.
    You power up, about 2hrs later, heat causes the problem. little while
    later the heat makes the connection go back together.
    Imagin a loose sodder connection on a pin. Sorry for the bad graphics.

    cold connection (* ) works
    warm connection ( * ) breaks
    warmer connection ( *) working again

    connection in this context is physical connection.

    > So, I doubt that that strange first line with the leading semicolon is
    > causing a problem.


    Well I am happy, you have learned all you need to know.
    Guess we are done.

    Here is a present to play with.
    --------------- script starts below this line ----------------
    #!/bin/bash
    #************************************************* ****************
    #*
    #* ck_connection - Check internet connection.
    #*
    #*
    #* Install procedure:
    #* Save into a file named ck_connection
    #* actual location should be somewhere in $PATH
    #* chmod +x ck_connection
    #*
    #*
    #* Code walks through the png array to test each point
    #* in the path to/though the internet. DNS are also tested.
    #*
    #* You will need to modify the script to use system's gateway
    #* and insert the ISP's gateway value.
    #*
    #* You may have to get into the modem's web page to find
    #* the modem's gateway (ISP's gateway) for the modem.
    #*
    #* Depending on your distribution, the $(hostname -s) and
    #* $(hostname) may need changing.
    #*
    #* On Mandriva linux hostname returns the FQDN and
    #* hostname -s returns the short name for the node.
    #*
    #************************************************* ****************

    function net_info {
    cat < There are settings which define where and what for DNS search order.
    In the following, I'll give commands, results and maybe comments.
    The command line starts with a $ so you can tell it from results and
    my comments. You do not use the leading $ when you run the command.

    You can get more help about the command with
    man first_word_here
    Example: you would do a man grep to get grep command manual.

    The commands and example values follow:

    $ grep hosts: /etc/nsswitch.conf
    hosts: files dns nis

    For speed, mine has
    hosts: files dns

    $ grep -v '^#' /etc/host.conf
    order hosts,bind
    multi on
    nospoof on
    spoofalert on

    $ grep -v '^#' /etc/resolv.conf
    nameserver 192.168.0.0
    nameserver 0.238.0.12
    nameserver 0.203.0.86

    For speed improvements, I alwasy remove any search or domain lines.
    Do not use the above numbers on your system. They are examples only.
    If a nameserver fails to return anything, the next server is tried.
    Because of that, I like to have the last server to be my ISP's public DNS

    For routing check, there is
    $ route -n
    Kernel IP routing table
    Destination Gateway Genmask Flags Metric Ref Use Iface
    192.168.1.0 0.0.0.0 255.255.255.0 U 10 0 0 eth0
    0.0.0.0 192.168.1.1 0.0.0.0 UG 10 0 0 eth0

    In the above, UG in the Flags column indicate that line will be used
    as the default Gateway route to ip addresses that can not be routed
    via the lines above it.

    The ip address in the Gateway column is where that traffic is sent.
    If you can ping that address, you know that device is alive and
    packets are leaving your node.

    $ ifconfig
    will allow you to see the ip address assigned to your nic and allow you
    to check if you are getting unreasonable counts for errors, dropped,
    overruns, frame, carrier and collisions.

    If you want to check internet speeds to somewhere, Example:
    $ traceroute -n yahoo.com

    Some nodes drop those trace packets, so you may want to use
    $ traceroute -In yahoo.com

    For dns testing there is something like
    $ dig google.com @isp_name_server1

    You will get information about how isp_name_server1 performed
    researching google.com lookup .

    EOF
    } # end net_info


    #********************************************
    #*
    #* The following are not acutal checks
    #* The comment box is about what the ping value
    #* will be used to make what check/verification.
    #*
    #* You will need to make changes to match your setup.
    #* If you want to skip a test you either put
    #* 127.0.0.1 in the png[x] test to skip.
    #*
    #* Or you delete the png[] and msg[] lines,
    #* and renumber them to keep the numbers continuous
    #* through the png[12]="done" line.
    #*
    #* NOTE:
    #* The png[12]="done" line has to remain and
    #* must be the last one in the png array.
    #*
    #* When renumbering, check the msg[] text to verify
    #* if there is a png[] value used in the text.
    #*
    #* You will also have to fix the code whcih
    #* uses png[9].
    #*
    #********************************************


    #********************************************
    #* check ping works on the node
    #********************************************

    png[1]="127.0.0.1"
    msg[1]="$(hostname -s) problem,
    No idea where to look, I never had the problem
    "
    #********************************************
    #* check dns on my node
    #********************************************

    png[2]="localhost"
    msg[2]="Check $(hostname -s) /etc/hosts localhost line.
    I assume you have a line like
    127.0.0.1 localhost.localdomain localhost
    man hosts for more info"

    #********************************************
    #* check pinging my ip address works
    #********************************************

    png[3]="192.168.1.130"
    msg[3]="Check $(hostname -s) /etc/hosts $(hostname) ip addy.
    I assume you have a line like
    192.168.1.130 $(hostname) $(hostname -s)
    man hosts for more info"

    #********************************************
    #* check dns reads my /etc/hosts by full name
    #********************************************

    png[4]="$(hostname)"
    msg[4]="Check $(hostname -s) /etc/hosts $(hostname) line.
    I assume you have a line like
    192.168.1.130 $(hostname) $(hostname -s)
    man hosts for more info"

    #********************************************
    #* check dns reads my /etc/hosts by alias
    #********************************************

    png[5]="$(hostname -s)"
    msg[5]="Check $(hostname -s) /etc/hosts $(hostname) line for an alias.
    I assume you have a line like
    192.168.1.130 $(hostname) $(hostname -s)
    man hosts for more info"

    #********************************************
    #* check my gatway device is alive
    #********************************************

    png[6]="192.168.1.1"
    msg[6]="Check physical connection to next device to internet (gateway).
    run mii-tool -v eth0
    or ethtool eth0
    You are looking for link ok line
    or Link detected: yes depending on which tool used
    run route -n to verify you have a UG Flags line
    $(net_info)"

    #********************************************
    #* check my gatway alias in /etc/hosts
    #********************************************

    png[7]="router"
    msg[7]="Check $(hostname -s) /etc/hosts router line
    I assume you have a
    192.168.1.1 router line
    man hosts for more info
    $(net_info)"


    #********************************************
    #* check my ISP's gateway connected to router
    #********************************************

    png[8]="71.252.137.1"
    msg[8]="Check leds on internet device.
    poweroff internet device (adsl/cable modem)
    wait 30 seconds by watch/clock to let capacitors discharge
    and reset device
    power up, wait for leds to settle down
    run service network restart
    Leds not right, check wiring out to telephone pole
    call your ISP
    $(net_info)"


    #********************************************
    #* check if DNS server is alive
    #********************************************

    _dns_ip=9
    png[$_dns_ip]="192.168.1.1"
    msg[$_dns_ip]="Check $(hostname -s) /etc/resolv.conf nameserver line
    You will have to check the device which has the name server running.
    Your internet device (adsl/cable modem your dns server)
    If none of the above, ${png[$_dns_ip]} is down
    Work around, change namesever ip_here to a public nameserver
    in /etc/resolv.conf
    man resolv.conf for more info
    $(net_info)"


    #********************************************
    #* check ISP can route to yahoo.com
    #********************************************

    png[10]="66.94.234.13"
    msg[10]="cannot ping yahoo by ip address
    yahoo.com is down or ip address changed.
    check google.com with ping -c1 72.14.207.99
    If that fails, google.com is down or ip address changed
    or it is an ISP/internet problem
    $(net_info)"


    #********************************************
    #* check DNS can resolve yahoo.com
    #********************************************

    png[11]="yahoo.com"
    msg[11]="Cannot ping yahoo.com by name
    yahoo.com just went down, or dns is broke on your ISP or somewhere else.
    $(net_info)"


    png[12]="done"
    msg[12]="Last array element to tell while loop we are done pinging"

    #********************************************
    #* Actual testing starts here
    #********************************************

    #********************************************
    #* get the first dns server from /etc/reso.conf
    #********************************************

    set -- $(grep nameserver /etc/resolv.conf | grep -v '^#' | head -1)
    _ip=$2
    if [ -z "$_ip" ] ; then
    echo "/etc/resolv.conf does not have a nameserver line.
    man resolv.conf
    for more information"
    exit 1
    else
    pgn[$_dns_ip]=$_ip
    fi

    #********************************************
    #* loop through all ip/name tests
    #********************************************


    i=1
    while [ "${png[$i]}" != "done" ] ; do
    echo "running ping -c 1 -w 3 ${png[$i]} "
    ping -c 1 -w 3 ${png[$i]} > /dev/null
    if [ $? -ne 0 ] ; then
    /bin/echo -e "\nFailure: ping -c 1 -w 3 ${png[$i]} "
    /bin/echo -e "${msg[$i]} "
    exit 1
    fi
    i=$i+1
    done

    #********************************************
    #* loop through all nameservers in /etc/resov.conf
    #********************************************

    while read line
    do
    set -- $line
    _ip=$2
    if [ "$1" = "nameserver" ] ; then
    echo "running ping -c 1 -w 3 $_ip "
    ping -c 1 -w 3 $_ip > /dev/null
    if [ $? -ne 0 ] ; then
    /bin/echo -e "\nDNS nameserver Failure: ping -c 1 -w 3 $_ip "
    echo "nameserver $_ip in /etc/resolv.conf is not responding to pings."
    echo "$(net_info)"
    exit 1
    fi
    fi

    done < /etc/resolv.conf

    #********* end ck_connection **********************************

  12. Re: Troubleshooting connection loss (continued)

    Allen Weiner wrote:
    >Bit Twister wrote:
    >>> ======== grep -v '^#' /etc/resolv.conf ==========
    >>> ; generated by /sbin/dhclient-script
    >>> search myhome.westell.com
    >>> nameserver 192.168.1.1
    >>> nameserver 192.168.1.1

    >> I realy, realy, realy, realy, realy, want you to
    >> do a echo "nameserver 192.168.1.1" > /etc/resolv.conf
    >> Hopping the ; is causing the restart hang and the
    >> SUGGESTION will fix
    >> your problem.
    >> Not doing the SUGGESTION, will force me to place you
    >> in my kill file.
    >>

    >That's your choice. I did a Google search on resolv.conf
    >& generated. I saw several examples similar to
    >mine. Here's one:


    Here's a better one... Download virtually any source code
    to libc, and look in the .../resolv/res_init.c file for
    this code:

    if ((fp = fopen(_PATH_RESCONF, "r")) != NULL) {
    /* read the config file */
    while (fgets_unlocked(buf, sizeof(buf), fp) != NULL) {
    /* skip comments */
    if (*buf == ';' || *buf == '#')
    continue;
    /* read default domain name */
    if (MATCH(buf, "domain")) {

    What that is doing is reading the /etc/resolv.conf file, and
    skipping any line that begins with either ';' or '#'.

    Personally, I would fault it for not initially removing all
    leading white space, but....

    --
    Floyd L. Davidson
    Ukpeagvik (Barrow, Alaska) floyd@apaflo.com

  13. Re: Troubleshooting connection loss (continued)

    Floyd L. Davidson wrote:
    > Allen Weiner wrote:
    >> Bit Twister wrote:
    >>>> ======== grep -v '^#' /etc/resolv.conf ==========
    >>>> ; generated by /sbin/dhclient-script
    >>>> search myhome.westell.com
    >>>> nameserver 192.168.1.1
    >>>> nameserver 192.168.1.1
    >>> I realy, realy, realy, realy, realy, want you to
    >>> do a echo "nameserver 192.168.1.1" > /etc/resolv.conf
    >>> Hopping the ; is causing the restart hang and the
    >>> SUGGESTION will fix
    >>> your problem.
    >>> Not doing the SUGGESTION, will force me to place you
    >>> in my kill file.
    >>>

    >> That's your choice. I did a Google search on resolv.conf
    >> & generated. I saw several examples similar to
    >> mine. Here's one:

    >
    > Here's a better one... Download virtually any source code
    > to libc, and look in the .../resolv/res_init.c file for
    > this code:
    >
    > if ((fp = fopen(_PATH_RESCONF, "r")) != NULL) {
    > /* read the config file */
    > while (fgets_unlocked(buf, sizeof(buf), fp) != NULL) {
    > /* skip comments */
    > if (*buf == ';' || *buf == '#')
    > continue;
    > /* read default domain name */
    > if (MATCH(buf, "domain")) {
    >
    > What that is doing is reading the /etc/resolv.conf file, and
    > skipping any line that begins with either ';' or '#'.
    >
    > Personally, I would fault it for not initially removing all
    > leading white space, but....
    >

    Thanks very much Floyd for your reply. I'm a Linux novice and am a long
    way from having the savvy to do what you did.

    By the way, for many years I subscribed to comp.dcom.modems. I always
    found your posts highly informative. I'm really astounded by how much
    more function my small Westell DSL modem/router has than my old USR
    dial-up modem.

  14. Re: Troubleshooting connection loss (continued)

    Bit Twister wrote:
    > On Mon, 12 Nov 2007 04:46:20 GMT, Allen Weiner wrote:
    >> Bit Twister wrote:
    >>


    >
    > My guess, is there might be a loose connection inside the modem.
    > You power up, about 2hrs later, heat causes the problem. little while
    > later the heat makes the connection go back together.
    > Imagin a loose sodder connection on a pin. Sorry for the bad graphics.
    >
    > cold connection (* ) works
    > warm connection ( * ) breaks
    > warmer connection ( *) working again
    >
    > connection in this context is physical connection.


    If that is the problem, the broken connection must be short-lived,
    because without fail, the moment I reboot, My Internet connection is
    restored.

    So let's assume there is a momentary connection loss. The next time it
    occurs, what troubleshooting steps can I perform to determine why
    "service network restart" hangs?

    We're saying the problem is local, so there is no point in trying to
    verify DNS, or ping outside servers.
    >
    >> So, I doubt that that strange first line with the leading semicolon is
    >> causing a problem.

    >
    > Well I am happy, you have learned all you need to know.
    > Guess we are done.
    >

    The post in this thread by Floyd Davidson should close the issue. We
    ought to be done pursuing the angle that there is a DHCP problem. What
    would be worthwhile to me is a troubleshooting procedure for the
    "service network restart" hang that is not predicated on a DHCP problem.


    > Here is a present to play with.


    Thanks. But that isn't applicable to diagnosing the hang of "sewrvice
    network restart".

  15. Re: Troubleshooting connection loss (continued)

    On Mon, 12 Nov 2007 18:46:26 GMT, Allen Weiner wrote:
    >
    > If that is the problem, the broken connection must be short-lived,
    > because without fail, the moment I reboot, My Internet connection is
    > restored.


    Hehe, think about it, router chip connection opens, software goes
    insane and quits working for your internet, sometime later you notice
    connection drop, start process of restart. Plenty of time for the
    metal to keep expanding to the other side of the hols. Those hole are
    pretty tight. Not to mention the chips that are just laided on the
    board and soldered.

    >
    > So let's assume there is a momentary connection loss. The next time it
    > occurs, what troubleshooting steps can I perform to determine why
    > "service network restart" hangs?


    You already know how to troubleshoot to which component is not working.
    You refuse to do the three things I want done rule out possible and
    get more information.

    It was bad enough to have to work under the hood of your car through
    the tail pipe, now that you have tied my hands, I can not help you
    with that problem. :-P


    > The post in this thread by Floyd Davidson should close the issue.


    Saw that and your reply. Had to laugh, you just got your feet wet with
    scripting in bash. Floyd's post showd the C or C++ (I forget which)
    which is another programming language if you want to drill that far
    down to learn what is going on.


    > We ought to be done pursuing the angle that there is a DHCP problem.


    I THINK so, but you will not let me rule that out. :-(

    > What would be worthwhile to me is a troubleshooting procedure for the
    > "service network restart" hang that is not predicated on a DHCP problem.


    Make my 3 SUGGESTIONS, and see if the problem goes away while in a
    static ip setup.


    >> Here is a present to play with.

    >
    > Thanks. But that isn't applicable to diagnosing the hang of "sewrvice
    > network restart".


    True, just a nice script to know what is not working next time connection drops.

    By the way, here is the lastest one with info on more network trouble
    shooting commands and prints out what is being tested at each point.
    Save/run in your user accout. Does not require root privs to run.
    Run as is and I think it should fail on testing ISP gateway to modem.
    It will give the number of the array to modify with your modems value.

    #!/bin/bash
    #************************************************* ****************
    #*
    #* ck_connection - Check internet connection.
    #*
    #* Install procedure:
    #* Save into a file named ck_connection
    #* actual location should be somewhere in $PATH
    #* chmod +x ck_connection
    #*
    #*
    #* Code walks through the png array to test each point
    #* in the path to/though the internet. DNS are also tested.
    #*
    #* You will need to modify the script to use node's gateway
    #* ip in png[$_gate_loc], Usually your modem's ip.
    #* and insert the ISP's gateway value at png[8]
    #*
    #* You may have to get into the modem's web page to find
    #* the modem's gateway (ISP's gateway) for the modem.
    #* If you cannot find it, just change png[8] to 127.0.0.1
    #*
    #* Depending on your distribution, the $(hostname -s) and
    #* $(hostname) may need changing.
    #*
    #* On Mandriva linux hostname returns the FQDN and
    #* hostname -s returns the short name for the node.
    #*
    #************************************************* ****************

    if [ $# -gt 0 ] ; then
    _arg2=$1
    fi

    function net_info {

    if [ -z "$_arg2" ] ; then
    echo "$0 hints will give you more research tools/info"
    return
    fi

    cat <
    Note: just because you can ping a server does not mean
    it is serving up what it is supposed to be serving.

    There are settings which define where and what DNS search order.
    In the following, I'll give commands, results and maybe comments. The
    command line starts with a $ so you can tell command linefrom results
    and my comments. You do not use the leading $ when you run the command.

    You can get more help about the command with
    man first_word_here
    Example: you would do a man grep to get grep command manual.

    The commands and example values follow:

    $ grep hosts: /etc/nsswitch.conf
    hosts: files dns nis

    For speed, mine has
    hosts: files dns

    $ grep -v '^#' /etc/host.conf
    order hosts,bind
    multi on
    nospoof on
    spoofalert on

    $ grep -v '^#' /etc/resolv.conf
    nameserver 192.168.0.0
    nameserver 0.238.0.12
    nameserver 0.203.0.86

    For speed improvements, I alwasy remove any search or domain lines.
    Do not use the above numbers on your system. They are examples only.
    If a nameserver fails to return anything, the next server is tried.
    Because of that, I like to have the last server to be my ISP's public DNS

    For routing check, there is
    $ route -n
    Kernel IP routing table
    Destination Gateway Genmask Flags Metric Ref Use Iface
    192.168.1.0 0.0.0.0 255.255.255.0 U 10 0 0 eth0
    0.0.0.0 192.168.1.1 0.0.0.0 UG 10 0 0 eth0

    In the above, UG in the Flags column indicate that line will be used
    as the default Gateway route to ip addresses that can not be routed
    via the lines above it.

    The ip address in the Gateway column is where that traffic is sent.
    If you can ping that address, you know that device is alive and
    packets are leaving your node.

    $ ifconfig
    will allow you to see the ip address assigned to your nic and allow you
    to check if you are getting unreasonable counts for errors, dropped,
    overruns, frame, carrier and collisions.

    If you want to check internet speeds to somewhere, Example:
    $ traceroute -n yahoo.com

    Some nodes drop those trace packets, so you may want to use
    $ traceroute -In yahoo.com

    For dns testing there is something like
    $ dig google.com @isp_name_server1

    You will get information about how isp_name_server1 performed
    researching google.com lookup .

    EOF
    } # end net_info


    #********************************************
    #*
    #* You will need to make changes to match your setup.
    #* Read script header for details
    #* If you want to skip a test you either put
    #* 127.0.0.1 in the png[x] test to skip.
    #*
    #* Or you delete the png[], tst[] and msg[] lines,
    #* and renumber them to keep the numbers continuous
    #* through the png[12]="done" line.
    #*
    #* NOTE:
    #* The png[12]="done" line has to remain and
    #* must be the last one in the png array.
    #*
    #* When renumbering, check the msg[] text to verify
    #* if there is a png[] value used in the text.
    #*
    #********************************************


    png[1]="127.0.0.1"
    tst[1]="that ping is working on $(hostname -s) "
    msg[1]="$(hostname -s) problem,
    No idea where to look, I never had the problem
    "

    png[2]="localhost"
    tst[2]="that resolver reads /etc/hosts "
    msg[2]="Check $(hostname -s) /etc/hosts localhost line.
    I assume you have a line like
    127.0.0.1 localhost.localdomain localhost
    man hosts for more info"


    png[3]="192.168.1.130"
    tst[3]="nic access by ip address"
    msg[3]="Check $(hostname -s) /etc/hosts $(hostname) ip addy.
    I assume you have a line like
    192.168.1.130 $(hostname) $(hostname -s)
    man hosts for more info"


    png[4]="$(hostname)"
    tst[4]="that resolver reads /etc/hosts by full name "
    msg[4]="Check $(hostname -s) /etc/hosts $(hostname) line.
    I assume you have a line like
    192.168.1.130 $(hostname) $(hostname -s)
    man hosts for more info"


    png[5]="$(hostname -s)"
    tst[5]="that resolver reads /etc/hosts by alias "
    msg[5]="Check $(hostname -s) /etc/hosts $(hostname) line for an alias.
    I assume you have a line like
    192.168.1.130 $(hostname) $(hostname -s)
    man hosts for more info"

    #********************************************
    #* Script fills in real value in later.
    #********************************************

    _gate_loc=6
    png[$_gate_loc]="192.168.1.1"
    tst[$_gate_loc]="that $(hostname -s) gateway is alive "
    msg[$_gate_loc]="Check connection to next device to internet (gateway).
    run mii-tool -v eth0
    or ethtool eth0
    You are looking for link ok
    or Link detected: yes
    depending on which tool used. run
    route -n
    to verify you have a UG in the Flags column of the last line
    $(net_info)"


    png[7]="gateway"
    tst[7]="if gateway alias works via /etc/hosts "
    msg[7]="Check $(hostname -s) /etc/hosts gateway line
    I assume you have added a
    192.168.1.1 gateway
    line to /etc/hosts

    That lets you do a quick test by doing a
    ping -c1 router
    at a terminal
    man hosts for more info
    $(net_info)"

    #********************************************
    #* Look in modem's web page or dhcp leases file.
    #********************************************

    png[8]="71.252.137.1"
    tst[8]="modem talks to ISP gateway "
    msg[8]="Check leds on internet device.
    poweroff internet device (adsl/cable modem)
    wait 30 seconds by watch/clock to let capacitors discharge
    and reset device
    power up, wait for leds to settle down
    run service network restart
    Leds not right, check wiring out to telephone pole
    call your ISP
    $(net_info)"


    #********************************************
    #* Script fill in real value from /etc/resolv.conf
    #********************************************

    _dns_loc=9
    png[$_dns_loc]="127.0.0.1"
    tst[$_dns_loc]="if DNS server is alive "
    msg[$_dns_loc]="Check $(hostname -s) /etc/resolv.conf nameserver line
    You will have to check the device which has the name server running.
    Your internet device (adsl/cable modem your dns server)
    If none of the above, ${png[$_dns_loc]} is down
    Work around, change namesever ip_here to a public nameserver
    in /etc/resolv.conf
    man resolv.conf for more info
    $(net_info)"


    png[10]="66.94.234.13"
    tst[10]="that ISP can route to yahoo.com "
    msg[10]="cannot ping yahoo by ip address
    yahoo.com is down or ip address changed.
    check google.com with ping -c1 72.14.207.99
    If that fails, google.com is down or ip address changed
    or it is an ISP/internet problem
    $(net_info)"


    png[11]="yahoo.com"
    tst[11]="ISP can get a DNS resolve yahoo.com"
    msg[11]="Cannot ping yahoo.com by name
    yahoo.com just went down, or dns is broke on your ISP or somewhere else.
    $(net_info)"


    png[12]="done"
    tst[12]="We never use this because png done is "
    msg[12]="last array element to tell while loop we are done pinging"

    #********************************************
    #*
    #* Actual testing starts here
    #*
    #********************************************

    tput clear
    #********************************************
    #* get/save the first dns server from /etc/resov.conf
    #********************************************

    set -- $(grep nameserver /etc/resolv.conf | grep -v '^#' | head -1)
    _ip=$2
    if [ -z "$_ip" ] ; then
    echo "/etc/resolv.conf does not have a nameserver line.
    man resolv.conf
    for more information
    If using dhcp, resolv.conf is updated by contents of leases file,
    depending on which dhcp client being used.
    locate leases | grep var/
    should find it.
    I assume you have mlocate or slocate installed so you can use the
    locate command.

    Going to use ${pgn[$_dns_loc]=$_ip} to make test run farther to
    help find the failure.

    Press any key to continue
    "
    read -n 1
    exit 1
    else
    pgn[$_dns_loc]=$_ip
    fi

    #********************************************
    #* get/save the gateway ip address
    #********************************************

    set -- $(route -n | grep 'UG' | tail -1)
    _ip=$2
    if [ -z "$_ip" ] ; then
    echo "no default gateway line found in
    route -n
    results. Expected to see last line something like
    0.0.0.0 192.168.1.1 0.0.0.0 UG 10 0 0 eth0
    that UG line is missing which can be because the network did not
    come up correctly. Usually a dhcp access problem.
    using ${png[$_gate_loc]}

    Press any key to continue
    "
    read -n 1
    else
    png[$_gate_loc]=$_ip
    fi

    #********************************************
    #* loop through all ip/name tests
    #********************************************


    i=1
    while [ "${png[$i]}" != "done" ] ; do
    echo "$i Test ${tst[$i]}"
    ping -c 1 -w 3 ${png[$i]} > /dev/null
    if [ $? -ne 0 ] ; then
    /bin/echo -e "\nFailure: ping -c 1 -w 3 ${png[$i]} "
    /bin/echo -e "${msg[$i]} "
    exit 1
    fi
    i=$(( $i + 1 ))
    done

    #********************************************
    #* loop through all nameservers in /etc/resov.conf
    #********************************************

    while read line
    do
    set -- $line
    _ip=$2
    if [ "$1" = "nameserver" ] ; then
    echo "Test /etc/resolv.conf nameserver $_ip is alive"
    ping -c 1 -w 3 $_ip > /dev/null
    if [ $? -ne 0 ] ; then
    /bin/echo -e "\nDNS nameserver Failure: ping -c 1 -w 3 $_ip "
    echo "nameserver $_ip in /etc/resolv.conf is not responding to pings."
    echo "$(net_info)"
    exit 1
    fi
    fi

    done < /etc/resolv.conf

    echo " "
    echo "Basic network connectivity is working to yahoo.com"
    echo " "

    #********* end ck_connection **********************************

  16. Re: Troubleshooting connection loss (continued)

    Allen Weiner wrote:
    > Bit Twister wrote:



    >
    > So let's assume there is a momentary connection loss. The next time it
    > occurs, what troubleshooting steps can I perform to determine why
    > "service network restart" hangs?
    >


    The "service network restart" hangs after eth0 is closed down.

    It seems to me that an effective troubleshooting approach to isolate the
    hang would be to put hooks in the scripts that "service network restart"
    invokes. But being a Linux novice, I'd prefer not play with the
    networking scripts (although I could make backups).

    Another possible approach to isolating the hang that avoids modifying
    networking scripts would be to turn on strace from the terminal before
    issuing "service network restart". To cut down on strace output, it
    would be even better to turn on strace after eth0 is closed down. I have
    no idea how to do this. Suggestions would be appreciated.

  17. Re: Troubleshooting connection loss (continued)

    On Thu, 15 Nov 2007 03:11:31 GMT, Allen Weiner wrote:
    >
    > The "service network restart" hangs after eth0 is closed down.


    Well, WE will not be working that problem, unless you take my
    suggestions as to what config files are to look like.


    > It seems to me that an effective troubleshooting approach to isolate the
    > hang would be to put hooks in the scripts that "service network restart"
    > invokes.


    Hehe, I spent a day in those 1 or two years ago.
    What I had to do was create 8 desktops, pretty near each desktop had 3
    or 4 terminals up, 1 term following the code, another to see config files,
    another to hunt down man pages and doucments, ..

    When a script would call another script, I would open it in another desktop
    so I could keep drilling down reading code. When I finally hit the
    bottom of the script, I would go back to the desktop which called the script.

    > But being a Linux novice, I'd prefer not play with the
    > networking scripts (although I could make backups).


    Sounds good in theory, takes a very methodical, conscientious person
    to make that work, and you better damn well know your backups are good.

    That is why a multi-boot system, with selection to boot a copy of your
    "Production Install" is handy for screwing with system scripts that
    could hurt you. :-D

    > Another possible approach to isolating the hang that avoids modifying
    > networking scripts would be to turn on strace from the terminal before
    > issuing "service network restart".


    Never tried it, but pretty sure trying to do a
    strace /etc/init.d/network restart is not going to work.

    > To cut down on strace output, it
    > would be even better to turn on strace after eth0 is closed down. I have
    > no idea how to do this. Suggestions would be appreciated.


    You would do a service network stop,
    enable your tracing, then do the service network start.

    Restart is just an easy call to stop/start.

    FYI: I assume you are always logged into a user account, not root.
    When you need root privs, you click up a terminal and su - root
    as a security percation.

    For debugging scripts, I find playing with the set command can help.
    I would like you to click up a terminal and add
    set -xv
    to the first line of .bash_profile, save exit.
    Now do the following command

    su - $USER

    exit
    Up Arrow
    and change set -xv to set -x, save exit
    Up Arrow

    exit

    Up Arrow
    and remove the set line.


  18. Re: Troubleshooting connection loss (continued)

    Bit Twister wrote:
    > On Thu, 15 Nov 2007 03:11:31 GMT, Allen Weiner wrote:
    >> The "service network restart" hangs after eth0 is closed down.

    >
    > Well, WE will not be working that problem, unless you take my
    > suggestions as to what config files are to look like.
    >

    I did change the hosts file.

    My dhclient-eth0.leases has not changed in the past week. Lease expires
    on 11/7. DHCP isn't being invoked.

    Suppose either the leases file or the resolv.conf was causing the
    problem. Should that cause "service network restart" to hang?
    >


    >
    > Hehe, I spent a day in those 1 or two years ago.
    > What I had to do was create 8 desktops, pretty near each desktop had 3
    > or 4 terminals up, 1 term following the code, another to see config files,
    > another to hunt down man pages and doucments, ..
    >

    It's interesting and discouraging to hear of your experience. It would
    be interesting to hear what troubleshooting technique you use for this
    situation.
    >


    >
    > Never tried it, but pretty sure trying to do a
    > strace /etc/init.d/network restart is not going to work.
    >

    Could you elaborate on why this won't work?


    > You would do a service network stop,
    > enable your tracing, then do the service network start.
    >

    Thanks very much for pointing that out.
    >

    You might find this interesting. My modem/router uses the AR7 ADSL chip.
    A leading ISP feels this chip provides unreliable connections.

    http://www.theregister.com/2007/10/2...neon_bt_fault/

  19. Re: Troubleshooting connection loss (continued)

    On Thu, 15 Nov 2007 15:51:09 GMT, Allen Weiner wrote:
    > Bit Twister wrote:
    >> On Thu, 15 Nov 2007 03:11:31 GMT, Allen Weiner wrote:
    >>> The "service network restart" hangs after eth0 is closed down.

    >>
    >> Well, WE will not be working that problem, unless you take my
    >> suggestions as to what config files are to look like.
    >>


    > I did change the hosts file.


    And I know this, how?

    And would you provide what you did.

    > My dhclient-eth0.leases has not changed in the past week. Lease expires
    > on 11/7. DHCP isn't being invoked.


    Does not, matter, I was not troubleshooting dhclient-eth0.leases file change.
    That information is one one aspect of your problem needing checking.
    Glad you picked up on that tibit, Sorry you refused my suggestion on
    what it is to contain.

    > Suppose either the leases file or the resolv.conf was causing the
    > problem. Should that cause "service network restart" to hang?


    Told you, "WE will not be working that problem, unless you take my
    suggestions as to what config files are to look like"


    > It's interesting


    Dang, tip on how to follow a complex script gone to waste on the OP.

    > and discouraging to hear of your experience.


    Sorry to hear that. It was not hard, just lots to things to look at,
    man some_cmd_here to get a feel to what is cmd did. I gave me the
    experience, to what to play with, when, and why, not to mention seeing
    tricks and what you can do with bash scripting language.

    > It would be interesting to hear what troubleshooting technique you
    > use for this situation.


    I have been giving you basic troubleshooting techniques and smart
    question link to read, and for all my trouble, I was given was static
    about what you believe, should not make a difference, and was not
    going to change the file, go ahead and kill file me if you want.....

    Instead of reading the whole document, this is the section I have in mind.
    http://www.catb.org/~esr/faqs/smart-....html#symptoms
    for the above paragraph.

    >> Never tried it, but pretty sure trying to do a
    >> strace /etc/init.d/network restart is not going to work.


    > Could you elaborate on why this won't work?


    You have to use the proper tool for the job at hand.

    Do not get me wrong, on the whole, I applaude how well you are doing
    and what you have done.

    I want you to keep in mind, I try to keep the lurkers in mind when I
    post, and teach you how to fish. Not cut the pole, sping the line,
    catch the fish, fry, cut it up and feed you.

    I do try to keep in mind the poster's skill, and knowledge when making
    my respones.

    sevice is basically a wrapper script which runs what is found in /etc/init.d
    If you were to look at the files in /etc/init.d, you would see that
    they for the most part scripts.

    Generally speaking, in my mind, you have program/scripts which do the work.
    Scripts are what you can view with the cat command. Programs are
    compiled into a binary form.

    Easy way to tell, try less /bin/ls
    less ~/.bashrc
    See the difference.

    Now instead of less, use strace and see what you can see.

    Next time you go to ask about a command, you need to
    Read The Fine Manual (RTFM), try the commmand to see what it does.

    You never experiment when logged in as root, if possible.
    You boot and play in a hot backup partition.

    Always as a user, if possible. If afraid of hurting your account,
    create a junk account. I do not recommend calling it test.
    Log into junk and play around there. You can alwasy delete/create it again.

    >> You would do a service network stop,
    >> enable your tracing, then do the service network start.
    >>

    > Thanks very much for pointing that out.


    That is a function of /etc/init.d/network, not service.

    So, doing a bit of reading in /etc/init.d/network, you would find
    stop, start, restart, reload, status were commands available for
    service network cmd_here.

    > http://www.theregister.com/2007/10/2...neon_bt_fault/


    Yep, saw that article on the site when they posted it.


+ Reply to Thread
Page 2 of 2 FirstFirst 1 2