Troubleshooting connection loss (continued) - Networking
This is a discussion on Troubleshooting connection loss (continued) - Networking ; Allen Weiner wrote:
> I've now got Fedora configured
> for static IP of 150 and WinME configured for static IP of 140. The
> dhclient-leases lists an expiration date of 11/7 (today is 11/11).
Are you talking about 2 ...
-
Re: Troubleshooting connection loss (continued)
Allen Weiner wrote:
> I've now got Fedora configured
> for static IP of 150 and WinME configured for static IP of 140. The
> dhclient-leases lists an expiration date of 11/7 (today is 11/11).
Are you talking about 2 computers,
or a single dual-boot computer?
If the latter, how does your dhcp server
know which OS is being used on the computer connected to it?
-
Re: Troubleshooting connection loss (continued)
On Sun, 11 Nov 2007 12:22:46 GMT, Allen Weiner wrote:
> Bit Twister wrote:
>>
>
> I don't understand *any* of the above.
Most were actual commands. If you did not understand the command, do a
man command
when in doubt, try command with junk names and check results. Example:
Given
cp /dev/null /var/lib/dhclient/dhclient-eth0.leases
do cp /var/lib/dhclient/dhclient-eth0.leases junk
cp /dev/null junk
cat junk
Now you know what the "cp /dev/null fn" does.
> I've now got Fedora configured
> for static IP of 150 and WinME configured for static IP of 140. The
> dhclient-leases lists an expiration date of 11/7 (today is 11/11).
One of those commands was to empty the lease file so it would be ruled
out of the suspect list. It was
cp /dev/null /var/lib/dhclient/dhclient-eth0.leases
> I'll post my config-dump below. How does it look?
>
> So could you please clarify what is the next step in troubleshooting.
My Karnack gene is defective, what problem?
You have to tell me what is your problem /now/.
I know what the original problem was.
Are you saying both OSs set static and you are having connection
problems, or what?
> Some explanatory comments along with the procedural steps might make it
> more understandable. Thanks.
We have already covered the trouble shooting steps. The order of the
steps logicaly test hard/software. When a test fails, that is the area
to fix, you look at the config files for suspect area failure/vaules.
ethtool/mii-tool tells you the physical cable/path is good.
Assuming static setup, pings tells you which part of the connection fails.
pinging localhosts indicates your system is working.
pinging your node name proves dns reads /etc/hosts, local routing are working
pinging next node in path to internet proves both nodes are working.
when ping fails, suspects are routing (route -n), firewalls, other node.
node config files match hostname results
/etc/sysconfig/network
/etc/hosts
dns files:
/etc/nsswitch.conf define which/order to chech what
/etc/host.conf
etc/resolv.conf
route has a UG flag and the gateway address matches gateway's ip addy.
ipconfig ip results match
/etc/hosts
/etc/sysconfig/network-scripts/ifcfg-eth0
> ======== cat /etc/*version ==========
> cat: /etc/subversion: Is a directory
Attaching a new script to fix that error with code to display default runlevel.
> ======== grep -v '^#' /etc/resolv.conf ==========
> ; generated by /sbin/dhclient-script
What the hell, That semi-colon should not be there. Look at mine
$ cat /fc7/etc/resolv.conf
nameserver 192.168.1.1
> search myhome.westell.com
> nameserver 192.168.1.1
> nameserver 192.168.1.1
Had you followed my instructions, /etc/resolv.conf would have had just
nameserver 192.168.1.1
At this point, I have no idea if your dhcp clint is helping us into
the ditch. Run this commands:
echo "nameserver 192.168.1.1" > /etc/resolv.conf
cat /etc/resolv.conf
service network restart
cat /etc/resolv.conf
If resolv.conf reverts back to
# generated by /sbin/dhclient-script
search myhome.westell.com
nameserver 192.168.1.1
nameserver 192.168.1.1
dhcp client is getting into your problem but it will not stop connectivity.
We eill have to trap that alligator later.
> ======== hostname ==========
> alweiner.nowhere.invalid
Thank, you thank you, thank you
I would not have picked your user name, but hey, it is your system.
> ========== head -15 /etc/hosts ===========
> 127.0.0.1 alweiner.nowhere.invalid alweiner localhost
> 192.168.1.1 gateway
> 192.168.1.150 alweiner.invalid alweiner
Frap, gotta love those gui tools and you need to pay attention to details.
If you will notice you did not get the .150 line correct.
Pop test, what is wrong with the 192.168.1.150 line?
You lucked out because of the gui help.
Your nodename in /etc/sysconfig/network should match the one in /etc/hosts.
READ MY LIPS, you are to delete contents of /etc/hosts
cut the following and paste them into your hosts file.
127.0.0.1 localhost.localdomain localhost
192.168.1.150 alweiner.nowhere.invalid alweiner
::1 localhost6.localdomain6 localhost6
When you modify a config file, you should always recheck your work by using
cat fn_here and double check values.
Except for the prompt, you should see something on your screen as follows:
[root@alweiner ~]# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.150 alweiner.nowhere.invalid alweiner
::1 localhost6.localdomain6 localhost6
[root@alweiner ~]#
I had expected the gui to add the ::1 line. So much for assuming.
WARING: Changing node/domain/ip addy for your node may cause you to
lose gui dispay. Reboot is recommended.
Node name changes can cause Mail Transport Agent (MTA), print server
(cups) and whatnot to feel sad. You may have to fix their config files
and/or restart their services.
-------------------------------------------------------------------------------
As promised, new-n-improved script follows:
You can use diff to find changes. Example:
diff -bBw my_script your_script
------------------ Script starts below this line ---------
#!/bin/bash
#************************************************* ************
#*
#* xx - Dump network config files and network hardware status
#*
#* Output: a.txt linux file
#* doza.txt Windows file
#*
#************************************************* ************
_fn=a.txt
_out_fn=$PWD/$_fn
_dos_fn=$PWD/dos${_fn}
_home=$PWD
function cat_fn
{
_fn=$1
if [ -f $_fn ] ; then
_count=$(stat -c %s $_fn )
if [ $_count -gt 0 ] ; then
echo "======== cat $_fn ==========" >> $_out_fn
cat $_fn >> $_out_fn
fi
fi
} # end cat_fn
function grep_fn
{
_fn=$1
if [ -e $_fn ] ; then
_count=$(stat -c %s $_fn )
if [ $_count -gt 0 ] ; then
_count=$(grep -v '^#' $_fn | wc -l)
if [ $_count -gt 0 ] ; then
echo "======== grep -v '^#' $_fn ==========" >> $_out_fn
if [ "$_fn" != "shorewall.conf" ] ; then
grep -v '^#' $_fn >> $_out_fn
else
awk 'empty{if (!/^#/) print; empty=0} /^$/{empty=1}' $_fn >> $_out_fn
fi
fi
fi
fi
} # end grep_fn
function ls_dir
{
_dr=$1
if [ -d $_dr ] ; then
echo "========= cd $_dr ; ls -al ========" >> $_out_fn
cd $_dr
ls -al >> $_out_fn
fi
} # end ls_dir
function tail_fn
{
_fn=$1
if [ -e $_fn ] ; then
echo "======== tail -18 $_fn ==========" >> $_out_fn
tail -18 $_fn >> $_out_fn
fi
} # end tail_fn
#********************************
# check if commands are in $PATH
# and if not add them to PATH
#********************************
_path=""
type ifconfig > /dev/null 2>&1
if [ $? -ne 0 ] ; then
_path="${_path}/sbin:"
fi
type cat > /dev/null 2>&1
if [ $? -ne 0 ] ; then
_path="${_path}/bin:"
fi
type id > /dev/null 2>&1
if [ $? -ne 0 ] ; then
_path="${_path}/usr/bin:"
fi
if [ -n "$_path" ] ; then
PATH=${_path}$PATH
export PATH
fi
#********************************
# check if root and logged in correctly
#********************************
_uid=$(id --user)
if [ $_uid -ne 0 ] ; then
echo " "
echo "You need to be root to run $0"
echo "CLick up a terminal and do the following:"
echo " "
echo "su - root"
echo "$PWD/xx"
echo " "
echo "or "
echo " "
echo "sudo -i"
echo "$PWD/xx"
echo " "
exit 1
fi
root_flg=1
if [ -n "$LOGNAME" ] ; then
if [ "$LOGNAME" != "root" ] ; then
root_flg=0
fi
fi
if [ -n "$USER" ] ; then
if [ "$USER" != "root" ] ; then
root_flg=0
fi
fi
if [ $root_flg -eq 0 ] ; then
echo " "
echo "Guessing you did a su root"
echo "instead of a su - root"
echo "please exit/logout of this session and do the following:"
echo " "
echo "su - root"
echo "$PWD/xx"
echo " "
echo "or "
echo " "
echo "sudo -i"
echo "$PWD/xx"
echo " "
exit 1
fi
#********************************
# main code starts here
#********************************
echo "Working, output will be in $_out_fn "
date > $_out_fn
chmod 666 $_out_fn
if [ -n "$_path" ] ; then
echo "======== echo $PATH ==========" >> $_out_fn
echo "$PATH" >> $_out_fn 2>&1
fi
cat_fn /etc/product.id
for _d in /etc/*release ; do
if [ ! -d $_d ] ; then
echo "======== cat $_d ==========" >> $_out_fn
cat $_d >> $_out_fn
fi
done
echo "======== uname -rvi =============" >> $_out_fn
uname -rvi >> $_out_fn
for _d in /etc/*version ; do
if [ ! -d $_d ] ; then
echo "======== cat $_d ==========" >> $_out_fn
cat $_d >> $_out_fn
fi
done
cat_fn /proc/*version
type lsb_release > /dev/null 2>&1
if [ $? -eq 0 ] ; then
echo "======== lsb_release -a ==========" >> $_out_fn
lsb_release -a >> $_out_fn 2>&1
fi
echo " " >> $_out_fn
if [ -n "$SECURE_LEVEL" ] ; then
echo "msec security level is $SECURE_LEVEL" >> $_out_fn
fi
echo "======== free ==========" >> $_out_fn
free >> $_out_fn 2>&1
echo " " >> $_out_fn
if [ -e /etc/inittab ] ; then
_line=$(grep :initdefault /etc/inittab)
set -- $(IFS=':'; echo $_line)
echo " " >> $_out_fn
echo "Default run level is $2" >> $_out_fn
echo " " >> $_out_fn
fi
type chkconfig > /dev/null 2>&1
if [ $? -eq 0 ] ; then
echo "======== chkconfig --list ==========" >> $_out_fn
for _serv in avahi named tmdns ; do
chkconfig --list | grep -i $_serv > /dev/null 2>&1
if [ $? -eq 0 ] ; then
echo "Double check if /$_serv/ needs to be disabled on boot" >> $_out_fn
chkconfig --list | grep -i $_serv >> $_out_fn
fi
done
chkconfig --list >> $_out_fn
else
echo "======== ls -o /etc/rcS.d/ ==========" >> $_out_fn
for _serv in avahi named tmdns ; do
ls /etc/rcS.d/S* | grep $_serv > /dev/null 2>&1
if [ $? -eq 0 ] ; then
echo "Double check if /$_serv/ needs to be disabled on boot" >> $_out_fn
fi
done
ls -o /etc/rcS.d >> $_out_fn
fi
_fn=/etc/nsswitch.conf
if [ -e $_fn ] ; then
echo "======== grep hosts: $_fn ==========" >> $_out_fn
grep hosts: $_fn >> $_out_fn
fi
grep_fn /etc/resolv.conf
grep_fn /etc/resolvconf/resolv.conf.d/head
cat_fn /etc/resolvconf/resolv.conf.d/base
cat_fn /etc/resolvconf/resolv.conf.d/tail
echo "======== hostname ==========" >> $_out_fn
hostname >> $_out_fn
cat_fn /etc/netprofile/profiles/default/files/etc/hosts
cat_fn /etc/hostname
cat_fn /etc/HOSTNAME
ls /etc/mod*.conf > /dev/null 2>&1
if [ $? -eq 0 ] ; then
echo "======== grep eth /etc/mod*.conf ==========" >> $_out_fn
grep eth /etc/mod*.conf >> $_out_fn
fi
cat_fn /etc/dhclient-enter-hooks
cat_fn /etc/dhclient-exit-hooks
grep_fn /etc/host.conf
echo "================ ifconfig -a ==============" >> $_out_fn
ifconfig -a >> $_out_fn
cat_fn /etc/iftab
cat_fn /etc/udev/rules.d/61-net_config.rules
echo "============== route -n =================" >> $_out_fn
route -n >> $_out_fn
cat_fn /etc/sysconfig/network/routes
cat_fn /etc/sysconfig/network
grep_fn /etc/mkinitramfs/initramfs.conf
echo "========== head -15 /etc/hosts ===========" >> $_out_fn
head -15 /etc/hosts >> $_out_fn
cat_fn /etc/network/interfaces
cat_fn /var/run/network/ifstate
_cmd=""
type ethtool > /dev/null 2>&1
if [ $? -eq 0 ] ; then
_cmd="ethtool"
fi
type mii-tool > /dev/null 2>&1
if [ $? -eq 0 ] ; then
_cmd="mii-tool -v"
fi
if [ -z "$_cmd" ] ; then
echo "==== mii-tool/ethtool NOT INSTALLED ====" >> $_out_fn
fi
for nic in 0 1 2 ; do
if [ -n "$_cmd" ] ; then
$_cmd eth$nic > /dev/null 2>&1
if [ $? -eq 0 ] ; then
echo "======== $_cmd eth$nic ==========" >> $_out_fn
$_cmd eth$nic >> $_out_fn
fi
fi
echo "=== dmesg | grep eth$nic | grep -v SRC= ===" >> $_out_fn
dmesg | grep eth$nic | grep -v SRC= >> $_out_fn
echo "=== grep eth$nic /var/log/messages | tail -10 ===" >> $_out_fn
grep eth$nic /var/log/messages | tail -10 >> $_out_fn
cat_fn /etc/sysconfig/network-scripts/ifcfg-eth$nic
ifconfig eth$nic > /dev/null 2>&1
if [ $? -eq 0 ] ; then
set $(ifconfig eth$nic | tr [A-Z] [a-z])
cat_fn /etc/sysconfig/network/ifcfg-eth-id-$5
fi
tail_fn /var/lib/dhcp/dhclient-eth${nic}.leases
tail_fn /var/lib/dhclient/dhclient-eth${nic}.leases
tail_fn /etc/dhcpc/dhcpcd-eth${nic}.info
done # end for nic in 0 1 2 ; do
_dir=/etc/NetworkManager/dispatcher.d
if [ -d $_dir ] ; then
ls_dir $_dir
for _d in "if-up.d" "if-down.d" "if-pre-up.d" "if-post-down.d" ; do
if [ -e /etc/network/${_d} ] ; then
echo "==== cd /etc/network/${_d} ; ls -al ===" >> $_out_fn
cd /etc/network/${_d}
ls -al >> $_out_fn
fi
done
fi
if [ -d /etc/sysconfig/network-scripts ] ; then
for _d in "ifdown.d" "ifup.d" ; do
if [ -e /etc/sysconfig/network-scripts/${_d} ] ; then
_cmd="cd /etc/sysconfig/network-scripts/${_d} ; ls -al "
echo "===== $_cmd ====" >> $_out_fn
cd /etc/sysconfig/network-scripts/${_d}
ls -al >> $_out_fn
fi
done
fi
ls_dir /etc/dhcp3/dhclient-exit-hooks.d
ls_dir /etc/resolvconf/update.d
if [ -d /etc/shorewall ] ; then
_count=$(chkconfig --list shorewall | grep -c
n )
if [ $_count -gt 0 ] ; then
echo "======= Shorewall settings =========" >> $_out_fn
cd /etc/shorewall
for _f in $(ls) ; do
echo "======= $_f =========" >> $_out_fn
grep_fn $_f
done
fi
fi
cd $_home
grep_fn /etc/hosts.allow
grep_fn /etc/hosts.deny
echo "==== end of config/network data dump =======" >> $_out_fn
awk '{print $0 "\r" }' $_out_fn > $_dos_fn
chmod 666 $_dos_fn
echo " "
echo "If posting via linux, post contents of $_out_fn"
echo "You might want to copy it to your account with the command"
echo "cp $_out_fn ~your_login"
echo " "
echo "If posting via windows, post contents of $_dos_fn"
echo " "
echo "If using diskette,"
echo "Copy $_dos_fn to diskette with the following commands:"
echo " "
echo "mkdir -p /floppy"
echo "mount -t auto /dev/fd0 /floppy"
echo "cp $_dos_fn /floppy"
echo "umount /floppy "
echo " "
echo "and $_dos_fn is ready for windows from diskette"
echo " "
#*********** end of dump xx.txt script *********
-
Re: Troubleshooting connection loss (continued)
On Sun, 11 Nov 2007 15:04:51 +0000, Timothy Murphy wrote:
> Are you talking about 2 computers,
> or a single dual-boot computer?
He has one computer connected to a adsl router.
> If the latter, how does your dhcp server
> know which OS is being used on the computer connected to it?
router's dhcp server looks at MAC value to know who is talking to it. :-D
In my stupid opinion, the router should see the dhcp renew/rebind
request from the same nic and should extend/issue the same lease
regardless of what OSs created the initial connection.
What I am not sure about, in the router software, is if WinME gets a
netbios lease, Allen then boots fedora.
Router waits for a netbios lease renewal, times out, and blows away
fedora's connection.
Having finally gotten Allen to set both OSs static, he should have a
stable connection, regardless of what system was running before boot.
If so, we have solved Allen's connection problem, but do not have a
working solution which Allen desires.
I think I might have to poke him a little harder and ask him to read
http://www.catb.org/~esr/faqs/smart-questions.html
-
Re: Troubleshooting connection loss (continued)
Bit Twister wrote:
> On Sun, 11 Nov 2007 12:22:46 GMT, Allen Weiner wrote:
>> Bit Twister wrote:
>> I don't understand *any* of the above.
>
> Most were actual commands. If you did not understand the command, do a
> man command
>
> when in doubt, try command with junk names and check results. Example:
>
> Given
> cp /dev/null /var/lib/dhclient/dhclient-eth0.leases
>
> do cp /var/lib/dhclient/dhclient-eth0.leases junk
> cp /dev/null junk
> cat junk
>
> Now you know what the "cp /dev/null fn" does.
>
Thanks very much for your continuing help and patience. My primary lack
of understanding is the purpose of the commands. I'm totally missing the
strategy. I suspected that the copy of /dev/null into /var/lib/dhclient
was a means of erasing /var/lib/dhclient. But to me, it's a puzzling and
unconventional way of erasing a file. (Remember, I'm a refugee from
Windows.) If there was a comment "clear the file", I would use Kedit and
clear the file. Using /dev/null seems to me a "power user" trick.
>
>
>> I've now got Fedora configured
>> for static IP of 150 and WinME configured for static IP of 140. The
>> dhclient-leases lists an expiration date of 11/7 (today is 11/11).
>
> One of those commands was to empty the lease file so it would be ruled
> out of the suspect list. It was
> cp /dev/null /var/lib/dhclient/dhclient-eth0.leases
>
>
>> I'll post my config-dump below. How does it look?
>>
>> So could you please clarify what is the next step in troubleshooting.
>
> My Karnack gene is defective, what problem?
>
> You have to tell me what is your problem /now/.
> I know what the original problem was.
>
> Are you saying both OSs set static and you are having connection
> problems, or what?
>
What I meant by the question is, are there additional steps I need to do
so that I can do effective troubleshooting if the original problem
(connection loss) happens again. You appear to be assuming that you've
diagnosed the connection loss problem, repaired it, and it will not
happen again.
>
>
>
>
>
>> ======== grep -v '^#' /etc/resolv.conf ==========
>> ; generated by /sbin/dhclient-script
>
> What the hell, That semi-colon should not be there. Look at mine
> $ cat /fc7/etc/resolv.conf
> nameserver 192.168.1.1
>
>> search myhome.westell.com
>> nameserver 192.168.1.1
>> nameserver 192.168.1.1
>
> Had you followed my instructions, /etc/resolv.conf would have had just
> nameserver 192.168.1.1
>
You explained that removing the "search westell" is a performance
optimization. For the time being, I'm making only changes necessary for
troubleshooting, unless I can see (from my novice knowledge base) that
the change is not potentially harmful. BTW, thanks very much for
mentioning "Rescue mode" if Fedora becomes unbootable. This thread is a
real learning experience.
> At this point, I have no idea if your dhcp clint is helping us into
> the ditch. Run this commands:
>
> echo "nameserver 192.168.1.1" > /etc/resolv.conf
> cat /etc/resolv.conf
> service network restart
> cat /etc/resolv.conf
How about if I use Kedit to just change the comment (and nothing else)
to some garbage sentence? This would eliminate any chance of side-effects.
>
> If resolv.conf reverts back to
> # generated by /sbin/dhclient-script
> search myhome.westell.com
> nameserver 192.168.1.1
> nameserver 192.168.1.1
>
> dhcp client is getting into your problem but it will not stop connectivity.
> We eill have to trap that alligator later.
>
>> ======== hostname ==========
>> alweiner.nowhere.invalid
>
> Thank, you thank you, thank you
> I would not have picked your user name, but hey, it is your system.
My user name is "aweiner".
>
>
>> ========== head -15 /etc/hosts ===========
>> 127.0.0.1 alweiner.nowhere.invalid alweiner localhost
>> 192.168.1.1 gateway
>> 192.168.1.150 alweiner.invalid alweiner
>
> Frap, gotta love those gui tools and you need to pay attention to details.
What gui tool are you referring to? I edited the file with Kedit.
> If you will notice you did not get the .150 line correct.
> Pop test, what is wrong with the 192.168.1.150 line?
>
> You lucked out because of the gui help.
> Your nodename in /etc/sysconfig/network should match the one in /etc/hosts.
>
>
> READ MY LIPS, you are to delete contents of /etc/hosts
> cut the following and paste them into your hosts file.
>
> 127.0.0.1 localhost.localdomain localhost
> 192.168.1.150 alweiner.nowhere.invalid alweiner
> ::1 localhost6.localdomain6 localhost6
What's wrong with just fixing the FQDN of 192.168.1.150? I don't
understand that third line.
>
> When you modify a config file, you should always recheck your work by using
> cat fn_here and double check values.
>
what is "fn_here"?
> Except for the prompt, you should see something on your screen as follows:
>
> [root@alweiner ~]# cat /etc/hosts
> 127.0.0.1 localhost.localdomain localhost
> 192.168.1.150 alweiner.nowhere.invalid alweiner
> ::1 localhost6.localdomain6 localhost6
> [root@alweiner ~]#
>
> I had expected the gui to add the ::1 line. So much for assuming.
>
Again, I used Kedit.
-
Re: Troubleshooting connection loss (continued)
Bit Twister wrote:
< snip>
>
> In my stupid opinion, the router should see the dhcp renew/rebind
> request from the same nic and should extend/issue the same lease
> regardless of what OSs created the initial connection.
>
> What I am not sure about, in the router software, is if WinME gets a
> netbios lease, Allen then boots fedora.
> Router waits for a netbios lease renewal, times out, and blows away
> fedora's connection.
I don't know if this is relevant to your diagnosis of my connection-loss
problem. Most of the time, I only run WinME on Saturdays. The Fedora
connection-loss problem happens (apparently) randomly throughout the week.
>
>
> I think I might have to poke him a little harder and ask him to read
> http://www.catb.org/~esr/faqs/smart-questions.html
Is this about the point you made about me not snipping enough when I
reply, or something else?
-
Re: Troubleshooting connection loss (continued)
On Sun, 11 Nov 2007 20:57:13 GMT, Allen Weiner wrote:
> Bit Twister wrote:
>>
Still need to start trimming a bit more please.
> Thanks very much for your continuing help and patience. My primary lack
> of understanding is the purpose of the commands.
Now, that is a different story. :-)
I am subscribed to 130 news groups, and I whip through those providing
commands when I can.
So far you have been one of the few who realy want to know what
is going on. So I have been adding bunches of information for you.
Order of commands and the commands set the system to a know state.
Telling you why the command was needed, gets me into typing the rest
of the day.
> I'm totally missing the strategy.
> I suspected that the copy of /dev/null into /var/lib/dhclient
> was a means of erasing /var/lib/dhclient.
Hot dang. You are keeping up.
> But to me, it's a puzzling
Well, I have no problem with you asking the question of why not do it
this way......
Then I can give you the reason for not doing someting, as you will see next.
> and
> unconventional way of erasing a file. (Remember, I'm a refugee from
> Windows.) If there was a comment "clear the file",
Oh, no, Nature is constantly improving the idiot.
If they cannot cut/paste the command, then there is a good chance of
Murphy being able to do his best. :-(
> I would use Kedit and
Yeah, but, downside to that is if leave a backup file with the Tilde
on the end.
While on that subject, I want you to do a
ls /etc/sysconfig/network-scripts/ifcfg-eth0*
If there is a /etc/sysconfig/network-scripts/ifcfg-eth0~
I want you to "delete/remove it". See, much simpler to say
rm /etc/sysconfig/network-scripts/ifcfg-eth0~
> clear the file. Using /dev/null seems to me a "power user" trick.
Hehe, the "power user trick" would be
>/var/lib/dhclient/dhclient-eth0.leases
But the idiot think the > is part of usenet quoting. 
Dang had to snip 21 lines which you should have trimmed.
That is a rudness which I can get tired of pretty quick.
>> Are you saying both OSs set static and you are having connection
>> problems, or what?
>>
> What I meant by the question is, are there additional steps I need to do
> so that I can do effective troubleshooting if the original problem
> (connection loss) happens again. You appear to be assuming that you've
> diagnosed the connection loss problem, repaired it, and it will not
> happen again.
No, if you know have both systems using static address, fedora no
longer loses connectivity after dose used the connection.
Now that both systems are static, we know that we have a dhcp issue,
modem server or fedora dhcp client.
You indicated second fedora reboot using dhcp ran ok.
To me the router is the culprit.
You have also indicated you wanted to run doze with dhcp.
No problem, set it dhcp, fedora static, boot doze, boot fedora and see
if connection drops. If not, there is the /working/ solution.
My SWAG, on doze shutdown, no dhcp release is issued, modem is half
smart and knows it was doze who should be using the .47 ip and refuses
to allow fedora to use the lease.
fedora shutdown does a dhcp release, you boot fedora again and router
allows use of the .47 lease to work like it is supposed to.
>
>
Yea, thanks. is good enough if you realy want to add them.
>>
>>> search myhome.westell.com
>>> nameserver 192.168.1.1
> You explained that removing the "search westell" is a performance
> optimization. For the time being, I'm making only changes necessary for
> troubleshooting, unless I can see (from my novice knowledge base) that
> the change is not potentially harmful.
I hear where you are comming from, but why have myhome.westell.com
looking up ip addresses for you.
I consider that a security risk.
Your dns resolver will try a search there, then ask the nameserver.
> BTW, thanks very much for
> mentioning "Rescue mode" if Fedora becomes unbootable. This thread is a
> real learning experience.
Yeah, as an oh by the way, you can you can make it a practice of
copying the files into /root/hold or some such thing and copy them
back in the rescue mode.
As for the dhcp/static, just changing BOOTPROTO= back to dhcp value
in /etc/sysconfig/network-scripts/ifcfg-eth0
would have you booting dhcp :-)
>
>> At this point, I have no idea if your dhcp clint is helping us into
>> the ditch. Run this commands:
>>
>> echo "nameserver 192.168.1.1" > /etc/resolv.conf
>> cat /etc/resolv.conf
>> service network restart
>> cat /etc/resolv.conf
>
> How about if I use Kedit to just change the comment (and nothing else)
> to some garbage sentence? This would eliminate any chance of side-effects.
I run under general rules.
You do not go adhoc'ing config files.
You only change the data to be what it needs to be changed, and
contents are as close to original as can be.
You always make sure the last line has a carriage return.
It depends on the code reading the config file as to what you can get
away with. Example:
nameserver 192.168.1.1 # router ip
may not work
cat /etc/resol.conf
# router ip
nameserver 192.168.1.1
might work.
cat /etc/resol.conf
# router ip
nameserver 192.168.1.1
# verizon fallback dns server
nameserver 68.238.96.12
might not work.
cat /etc/resol.conf
# 1'st is router ip
# 2'nd is verizon fallback dns server
nameserver 192.168.1.1
nameserver 68.238.96.12
would work.
Window newbies using editors tend to not remember to add the carriage return.
Example: I wanted resolv.conf to have just
nameserver 192.168.1.1
Now the newbie will use the editor to delete everything, just paste
nameserver 192.168.1.1, Save and quit.
When you run the xx script you will see the mistake in a.txt as
======== grep -v '^#' /etc/resolv.conf ==========
nameserver 192.168.1.1======== hostname ==========
instead of
======== grep -v '^#' /etc/resolv.conf ==========
nameserver 192.168.1.1
======== hostname ==========
The echo command makes sure that I get the trailing carriage return
and /etc/resolv.conf will have just what I wanted.
Dang, Had to trim 12 more lines.
>>> ======== hostname ==========
>>> alweiner.nowhere.invalid
>>
>> Thank, you thank you, thank you
>> I would not have picked your user name, but hey, it is your system.
>
> My user name is "aweiner".
Hehe, Ok,
>>
>>
>>> ========== head -15 /etc/hosts ===========
>>> 127.0.0.1 alweiner.nowhere.invalid alweiner localhost
>>> 192.168.1.1 gateway
>>> 192.168.1.150 alweiner.invalid alweiner
>>
>> Frap, gotta love those gui tools and you need to pay attention to details.
>
> What gui tool are you referring to?
Assumed you used the network gui which has a tab to manage host/domain.
They have the bad habit of putting the node name in the 127.0.0.1 line.
> I edited the file with Kedit.
Then you did not follow the example given. 
>> If you will notice you did not get the .150 line correct.
>> Pop test, what is wrong with the 192.168.1.150 line?
>>
>
>> You lucked out because of the gui help.
>> Your nodename in /etc/sysconfig/network should match the one in /etc/hosts.
>>
>>
>> READ MY LIPS, you are to delete contents of /etc/hosts
>> cut the following and paste them into your hosts file.
>>
>> 127.0.0.1 localhost.localdomain localhost
>> 192.168.1.150 alweiner.nowhere.invalid alweiner
>> ::1 localhost6.localdomain6 localhost6
>
> What's wrong with just fixing the FQDN of 192.168.1.150? I don't
> understand that third line.
It is there incase you enable ipv 6 and is localhost 120.0.0.1 in
ipv6 format.
>> When you modify a config file, you should always recheck your work by using
>> cat fn_here and double check values.
>>
> what is "fn_here"?
Dang, there goes all your Gold stars and Atta Boys, :-(
I want you to do a
cat /whatever/file/you/just/modifed/displayed_on_the_screen_so_you_can_check_it
so you can make sure contents are correct and you have a trailing
carriage return.
>> Except for the prompt, you should see something on your screen as follows:
>>
>> [root@alweiner ~]# cat /etc/hosts
>> 127.0.0.1 localhost.localdomain localhost
>> 192.168.1.150 alweiner.nowhere.invalid alweiner
>> ::1 localhost6.localdomain6 localhost6
>> [root@alweiner ~]#
>>
>> I had expected the gui to add the ::1 line. So much for assuming.
>>
> Again, I used Kedit.
And you missed the point. I wanted the cat /etc/hosts to look like
[root@alweiner ~]# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.150 alweiner.nowhere.invalid alweiner
::1 localhost6.localdomain6 localhost6
[root@alweiner ~]#
not like
[root@alweiner ~]# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.150 alweiner.invalid alweiner
::1 localhost6.localdomain6 localhost6
[root@alweiner ~]#
and not like
[root@alweiner ~]# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.150 alweiner.nowhere.invalid alweiner
::1 localhost6.localdomain6 localhost6[root@alweiner ~]#
-
Re: Troubleshooting connection loss (continued)
On Sun, 11 Nov 2007 21:36:23 GMT, Allen Weiner wrote:
>
> I don't know if this is relevant to your diagnosis of my connection-loss
> problem. Most of the time, I only run WinME on Saturdays. The Fedora
> connection-loss problem happens (apparently) randomly throughout the week.
Ah Frap. Nothing like spending hours troubleshooting the wrong problem.
Ok, final SWAG. Your router looses it's mind every once in awhile.
Why, you ask. If fedora runs with dhcp ip for more than a day, you know
it was able to renew/rebind the lease and keep the connection.
Your ifconfig showed no errors/dropped/overruns/frame tx/rx hardware problems.
AS a matter of fact, while editing that big long reply about 2 to 3
replies back, my modem lost it's mind and the post failed.
Leds normal.
Tried pinging yahoo.com failed. did a service network restart. still failed.
pinged modem. worked. What the F? First time I had this problem since
getting FiOS
click router web page. hangs. Dang. Power cycled modem it worked.
Stupid, stupid, stupid. Should have pinged yahoo.com ip first to
see if it was modem dns problem. Maybe dns server(s) in modem were AFU.
Should have pinged them. Guess I'll write a little script to
troubleshoot the problem. 
Adding fallback dns nameserver to my resolve.conf as I type.
>> I think I might have to poke him a little harder and ask him to read
>> http://www.catb.org/~esr/faqs/smart-questions.html
>
> Is this about the point you made about me not snipping enough when I
> reply, or something else?
Hehehe, well that is in there also, but mosly about you thinking about the
question(s) you ask. :-)
Eveytime you had to say What I meant was,.... should tell you where
your wheel ran off. :-D
-
Re: Troubleshooting connection loss (continued)
Bit Twister wrote:
> On Sun, 11 Nov 2007 21:36:23 GMT, Allen Weiner wrote:
>> I don't know if this is relevant to your diagnosis of my connection-loss
>> problem. Most of the time, I only run WinME on Saturdays. The Fedora
>> connection-loss problem happens (apparently) randomly throughout the week.
>
> Ah Frap. Nothing like spending hours troubleshooting the wrong problem.
> Ok, final SWAG. Your router looses it's mind every once in awhile.
>
Given this new theory, what troubleshooting steps would you recommend
the next time I get a connection loss and "service network retart" hangs.
I had another connection loss this afternoon.
Following is troubleshooting info plus recent online history.
11/10 Booted WinME with dynamic IP at approx 3:00 PM. Modem was not
powered on. Powered modem on at 8:00 PM, then rebooted into Fedora
(static IP). Changed Fedora static IP address and hostname.
11/11 Booted WinME with dynamic IP at 6:40 AM. Modem was not powered on.
Configured WinME to static IP. Modem powered on around 7:00 AM. Rebooted
into Fedora. System powered off at 8:40 AM.
11/11 Booted Fedora at 3:00 PM.
5:18 PM Connection loss (approx 2 hours into session, as with many
other instances)
5:22 PM Powered off modem and disconnected ethernet cable.
5:29 PM Reconnected ethernet cable and powered up modem.
5:35 PM ran Bit Twister script (dumps configuration info)
5:38 PM Issued "service network restart", which hung.
(I did not try ethtool -r eth0)
Following is troubleshooting data (taken after connection loss but
before "service network restart":
Sun Nov 11 17:35:59 EST 2007
======== cat /etc/fedora-release ==========
Fedora release 7 (Moonshine)
======== cat /etc/redhat-release ==========
Fedora release 7 (Moonshine)
======== uname -rvi =============
2.6.23.1-21.fc7 #1 SMP Thu Nov 1 21:09:24 EDT 2007 i386
======== lsb_release -a ==========
LSB Version:
:core-3.1-ia32:core-3.1-noarch:graphics-3.1-ia32:graphics-3.1-noarch
Distributor ID: Fedora
Description: Fedora release 7 (Moonshine)
Release: 7
Codename: Moonshine
======== free ==========
total used free shared buffers cached
Mem: 125128 122408 2720 0 2464 37924
-/+ buffers/cache: 82020 43108
Swap: 771080 173584 597496
Default run level is 5
======== chkconfig --list ==========
Double check if /avahi/ needs to be disabled on boot
avahi-daemon 0
ff 1
ff 2
ff 3
n 4
n 5
ff 6
ff
avahi-dnsconfd 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
Double check if /named/ needs to be disabled on boot
named 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
ConsoleKit 0
ff 1
ff 2
ff 3
n 4
n 5
n 6
ff
NetworkManager 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
NetworkManagerDispatcher 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
acpid 0
ff 1
ff 2
ff 3
n 4
n 5
n 6
ff
anacron 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
apmd 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
atd 0
ff 1
ff 2
ff 3
n 4
n 5
n 6
ff
autofs 0
ff 1
ff 2
ff 3
n 4
n 5
n 6
ff
avahi-daemon 0
ff 1
ff 2
ff 3
n 4
n 5
ff 6
ff
avahi-dnsconfd 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
bluetooth 0
ff 1
ff 2
n 3
n 4
n 5
ff 6
ff
capi 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
cpuspeed 0
ff 1
n 2
n 3
n 4
n 5
ff 6
ff
crond 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
cups 0
ff 1
ff 2
n 3
n 4
n 5
ff 6
ff
dhcdbd 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
dund 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
firestarter 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
firstboot 0
ff 1
ff 2
ff 3
n 4
ff 5
ff 6
ff
gkrellmd 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
gpm 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
haldaemon 0
ff 1
ff 2
ff 3
n 4
n 5
n 6
ff
hddtemp 0
ff 1
ff 2
ff 3
ff 4
ff 5
n 6
ff
hidd 0
ff 1
ff 2
n 3
n 4
n 5
ff 6
ff
hplip 0
ff 1
ff 2
n 3
n 4
n 5
ff 6
ff
httpd 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
ip6tables 0
ff 1
ff 2
n 3
n 4
n 5
ff 6
ff
iptables 0
ff 1
ff 2
ff 3
ff 4
ff 5
n 6
ff
irda 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
irqbalance 0
ff 1
ff 2
n 3
n 4
n 5
ff 6
ff
isdn 0
ff 1
ff 2
n 3
n 4
n 5
ff 6
ff
kdump 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
kudzu 0
ff 1
ff 2
ff 3
n 4
n 5
n 6
ff
lisa 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
lm_sensors 0
ff 1
ff 2
n 3
n 4
n 5
ff 6
ff
mcstrans 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
mdmonitor 0
ff 1
ff 2
n 3
n 4
n 5
ff 6
ff
messagebus 0
ff 1
ff 2
ff 3
n 4
n 5
n 6
ff
named 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
nasd 0
ff 1
ff 2
ff 3
n 4
n 5
n 6
ff
netconsole 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
netfs 0
ff 1
ff 2
ff 3
n 4
n 5
ff 6
ff
netplugd 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
network 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
nfs 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
nfslock 0
ff 1
ff 2
ff 3
n 4
n 5
ff 6
ff
nscd 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
ntpd 0
ff 1
ff 2
ff 3
ff 4
ff 5
n 6
ff
pand 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
psacct 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
rdisc 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
readahead_early 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
readahead_later 0
ff 1
ff 2
ff 3
ff 4
ff 5
n 6
ff
restorecond 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
rpcbind 0
ff 1
ff 2
ff 3
n 4
n 5
ff 6
ff
rpcgssd 0
ff 1
ff 2
ff 3
n 4
n 5
ff 6
ff
rpcidmapd 0
ff 1
ff 2
ff 3
n 4
n 5
ff 6
ff
rpcsvcgssd 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
saslauthd 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
sendmail 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
smartd 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
spamassassin 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
sshd 0
ff 1
ff 2
n 3
n 4
n 5
ff 6
ff
syslog 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
tomcat5 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
vncserver 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
winbind 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
wpa_supplicant 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
xfs 0
ff 1
ff 2
n 3
n 4
n 5
n 6
ff
ypbind 0
ff 1
ff 2
ff 3
ff 4
ff 5
ff 6
ff
yum-updatesd 0
ff 1
ff 2
ff 3
n 4
n 5
n 6
ff
======== grep hosts: /etc/nsswitch.conf ==========
#hosts: db files nisplus nis dns
hosts: files dns
======== grep -v '^#' /etc/resolv.conf ==========
; generated by /sbin/dhclient-script
search myhome.westell.com
nameserver 192.168.1.1
nameserver 192.168.1.1
======== hostname ==========
alweiner.nowhere.invalid
======== grep eth /etc/mod*.conf ==========
alias eth0 e100
======== grep -v '^#' /etc/host.conf ==========
order hosts,bind
================ ifconfig -a ==============
eth0 Link encap:Ethernet HWaddr 00:07:E9:01:B2:09
inet addr:192.168.1.150 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::207:e9ff:fe01:b209/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:4014 errors:0 dropped:0 overruns:0 frame:0
TX packets:1942 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:498352 (486.6 KiB) TX bytes:208165 (203.2 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:4144 errors:0 dropped:0 overruns:0 frame:0
TX packets:4144 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5700347 (5.4 MiB) TX bytes:5700347 (5.4 MiB)
============== route -n =================
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use
Iface
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 eth0
======== cat /etc/sysconfig/network ==========
NETWORKING=yes
HOSTNAME=alweiner.nowhere.invalid
========== head -15 /etc/hosts ===========
127.0.0.1 alweiner.nowhere.invalid alweiner localhost
192.168.1.1 gateway
192.168.1.150 alweiner.nowhere.invalid alweiner
======== ethtool eth0 ==========
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: g
Current message level: 0x00000007 (7)
Link detected: yes
=== dmesg | grep eth0 | grep -v SRC= ===
e100: eth0: e100_probe: addr 0xfc9ff000, irq 11, MAC addr 00:07:E9:01:B2:09
ADDRCONF(NETDEV_UP): eth0: link is not ready
e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
eth0: no IPv6 routers present
NETDEV WATCHDOG: eth0: transmit timed out
=== grep eth0 /var/log/messages | tail -10 ===
Nov 11 17:14:06 alweiner kernel: Inbound IN=eth0 OUT=
MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3767 DF PROTO=TCP
SPT=1197 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
Nov 11 17:14:11 alweiner kernel: Inbound IN=eth0 OUT=
MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3768 DF PROTO=TCP
SPT=1197 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
Nov 11 17:14:35 alweiner kernel: Inbound IN=eth0 OUT=
MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3769 DF PROTO=TCP
SPT=1197 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
Nov 11 17:14:54 alweiner kernel: Inbound IN=eth0 OUT=
MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3786 DF PROTO=TCP
SPT=1198 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
Nov 11 17:14:59 alweiner kernel: Inbound IN=eth0 OUT=
MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3787 DF PROTO=TCP
SPT=1198 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
Nov 11 17:15:23 alweiner kernel: Inbound IN=eth0 OUT=
MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3788 DF PROTO=TCP
SPT=1198 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
Nov 11 17:15:42 alweiner kernel: Inbound IN=eth0 OUT=
MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3805 DF PROTO=TCP
SPT=1199 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
Nov 11 17:15:47 alweiner kernel: Inbound IN=eth0 OUT=
MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3806 DF PROTO=TCP
SPT=1199 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
Nov 11 17:16:11 alweiner kernel: Inbound IN=eth0 OUT=
MAC=00:07:e9:01:b2:09:00:18:3a:53:f7:fb:08:00 SRC=192.168.1.1
DST=192.168.1.150 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=3807 DF PROTO=TCP
SPT=1199 DPT=80 WINDOW=8192 RES=0x00 SYN URGP=0
Nov 11 17:32:05 alweiner kernel: NETDEV WATCHDOG: eth0: transmit timed out
======== cat /etc/sysconfig/network-scripts/ifcfg-eth0 ==========
# Intel Corporation 82557/8/9 [Ethernet Pro 100]
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=none
HWADDR=00:07:e9:01:b2:09
TYPE=Ethernet
USERCTL=yes
IPV6INIT=no
PEERDNS=yes
NETMASK=255.255.255.0
IPADDR=192.168.1.150
GATEWAY=192.168.1.1
======== tail -18 /var/lib/dhclient/dhclient-eth0.leases ==========
rebind 3 2007/11/7 12:23:43;
expire 3 2007/11/7 15:23:43;
}
lease {
interface "eth0";
fixed-address 192.168.1.47;
option subnet-mask 255.255.255.0;
option routers 192.168.1.1;
option dhcp-lease-time 86400;
option dhcp-message-type 5;
option domain-name-servers 192.168.1.1,192.168.1.1;
option dhcp-server-identifier 192.168.1.1;
option broadcast-address 255.255.255.255;
option domain-name "myhome.westell.com";
renew 3 2007/11/7 05:23:24;
rebind 3 2007/11/7 15:31:25;
expire 3 2007/11/7 18:31:25;
}
=== dmesg | grep eth1 | grep -v SRC= ===
=== grep eth1 /var/log/messages | tail -10 ===
=== dmesg | grep eth2 | grep -v SRC= ===
=== grep eth2 /var/log/messages | tail -10 ===
======== grep -v '^#' /etc/hosts.allow ==========
======== grep -v '^#' /etc/hosts.deny ==========
==== end of config/network data dump =======
-
Re: Troubleshooting connection loss (continued)
On Mon, 12 Nov 2007 00:57:23 GMT, Allen Weiner wrote:
Allen, take note, there are no smiley faces/emoticons in this post.
Read this whole reply before doing any changes.
Respond to all my questions.
Make the change to /etc/hosts last, and reboot.
> Bit Twister wrote:
>>
>> Ah Frap. Nothing like spending hours troubleshooting the wrong problem.
>> Ok, final SWAG. Your router looses it's mind every once in awhile.
>>
> Given this new theory, what troubleshooting steps would you recommend
> the next time I get a connection loss
Reset the router.
> and "service network retart" hangs.
Fix /etc/resolv.conf as will I suggest again.
Fix /etc/host as I suggest, yet again.
empty /var/lib/dhclient/dhclient-eth0.leases as I suggest.
> 11/11 Booted WinME with dynamic IP at 6:40 AM. Modem was not powered on.
> Configured WinME to static IP. Modem powered on around 7:00 AM. Rebooted
> into Fedora. System powered off at 8:40 AM.
Hmmm, new data, "Modem powered on" and "System powered off"
> 11/11 Booted Fedora at 3:00 PM.
> 5:18 PM Connection loss (approx 2 hours into session, as with many
> other instances)
Are the majority of the disconnects happening "approximately 2 hours"
after the modem is powered up?
> 5:22 PM Powered off modem and disconnected ethernet cable.
> 5:29 PM Reconnected ethernet cable and powered up modem.
You can discontinue that step, I was hopping cable dis/connect
would reset the modems dhcp lease.
> 5:35 PM ran Bit Twister script (dumps configuration info)
> 5:38 PM Issued "service network restart", which hung.
>
> (I did not try ethtool -r eth0)
That would indicate if modem and node did the handshake nic to nic.
If link not OK that can cause a hang.
Cannot rule that out yet because SOMEONE is not setting resolv.conf
hosts, leases as requested.
> Following is troubleshooting data (taken after connection loss but
> before "service network restart":
> ======== grep -v '^#' /etc/resolv.conf ==========
> ; generated by /sbin/dhclient-script
> search myhome.westell.com
> nameserver 192.168.1.1
> nameserver 192.168.1.1
I realy, realy, realy, realy, realy, want you to do a
echo "nameserver 192.168.1.1" > /etc/resolv.conf
Hopping the ; is causing the restart hang and the SUGGESTION will fix
your problem.
Not doing the SUGGESTION, will force me to place you in my kill file.
Do you know what a kill file is?
> ========== head -15 /etc/hosts ===========
> 127.0.0.1 alweiner.nowhere.invalid alweiner localhost
> 192.168.1.1 gateway
> 192.168.1.150 alweiner.nowhere.invalid alweiner
For the last time. Change /etc/hosts to match the following:
127.0.0.1 localhost
192.168.1.1 gateway
192.168.1.150 alweiner.nowhere.invalid alweiner
That SUGGESTION, may also clear up your restart hang.
having alweiner.nowhere.invalid resolving to 127.0.0.1 and 192.168.1.150
is not fair to the system and will make
ping alweiner.nowhere.invalid hide where a problem exists when trying
to debug connection problems.
> ======== tail -18 /var/lib/dhclient/dhclient-eth0.leases ==========
> rebind 3 2007/11/7 12:23:43;
> expire 3 2007/11/7 15:23:43;
> }
> lease {
> interface "eth0";
> fixed-address 192.168.1.47;
> option subnet-mask 255.255.255.0;
> option routers 192.168.1.1;
> option dhcp-lease-time 86400;
> option dhcp-message-type 5;
> option domain-name-servers 192.168.1.1,192.168.1.1;
> option dhcp-server-identifier 192.168.1.1;
> option broadcast-address 255.255.255.255;
> option domain-name "myhome.westell.com";
> renew 3 2007/11/7 05:23:24;
> rebind 3 2007/11/7 15:31:25;
> expire 3 2007/11/7 18:31:25;
> }
I would like for you to do a
cp /dev/null /var/lib/dhclient/dhclient-eth0.leases
I want to rule out that your dhcp server is no longer running.
The following is MANDATORY, Do a
ls /etc/sysconfig/network-scripts/*~
If you get any file names returned, you NEED to delete them.
I DO NOT want any edit backup files (*~) in that directory.
-
Re: Troubleshooting connection loss (continued)
Bit Twister wrote:
> Are the majority of the disconnects happening "approximately 2 hours"
> after the modem is powered up?
It seems that way. But I haven't kept a log book.
>
>
>
>
>> ======== grep -v '^#' /etc/resolv.conf ==========
>> ; generated by /sbin/dhclient-script
>> search myhome.westell.com
>> nameserver 192.168.1.1
>> nameserver 192.168.1.1
>
>
> I realy, realy, realy, realy, realy, want you to do a
> echo "nameserver 192.168.1.1" > /etc/resolv.conf
>
> Hopping the ; is causing the restart hang and the SUGGESTION will fix
> your problem.
>
> Not doing the SUGGESTION, will force me to place you in my kill file.
>
That's your choice. I did a Google search on resolv.conf & generated. I
saw several examples similar to mine. Here's one:
t 11:30 AM 12/30/2005, Jerry57 (GMail) wrote:
>Hello Robert,
>
> What is listed in /etc/resolv.conf? You should have something like:
> search my.domain
> nameserver 10.0.0.1
I got that:
cat resolv.conf
; generated by /sbin/dhclient-script
search htt-consult.com
nameserver 65.84.78.211
nameserver 65.84.78.209
So, I doubt that that strange first line with the leading semicolon is
causing a problem. If you choose to "plonk" me, let me take this
opportunity to again thank you for all the help you've given me.
>
>
>
> The following is MANDATORY, Do a
> ls /etc/sysconfig/network-scripts/*~
>
Result was "no such file or directory". There are no backup files in
network-scripts.
-
Re: Troubleshooting connection loss (continued)
On Mon, 12 Nov 2007 04:46:20 GMT, Allen Weiner wrote:
> Bit Twister wrote:
>
>> Are the majority of the disconnects happening "approximately 2 hours"
>> after the modem is powered up?
>
> It seems that way. But I haven't kept a log book.
My guess, is there might be a loose connection inside the modem.
You power up, about 2hrs later, heat causes the problem. little while
later the heat makes the connection go back together.
Imagin a loose sodder connection on a pin. Sorry for the bad graphics.
cold connection (* ) works
warm connection ( * ) breaks
warmer connection ( *) working again
connection in this context is physical connection.
> So, I doubt that that strange first line with the leading semicolon is
> causing a problem.
Well I am happy, you have learned all you need to know.
Guess we are done.
Here is a present to play with.
--------------- script starts below this line ----------------
#!/bin/bash
#************************************************* ****************
#*
#* ck_connection - Check internet connection.
#*
#*
#* Install procedure:
#* Save into a file named ck_connection
#* actual location should be somewhere in $PATH
#* chmod +x ck_connection
#*
#*
#* Code walks through the png array to test each point
#* in the path to/though the internet. DNS are also tested.
#*
#* You will need to modify the script to use system's gateway
#* and insert the ISP's gateway value.
#*
#* You may have to get into the modem's web page to find
#* the modem's gateway (ISP's gateway) for the modem.
#*
#* Depending on your distribution, the $(hostname -s) and
#* $(hostname) may need changing.
#*
#* On Mandriva linux hostname returns the FQDN and
#* hostname -s returns the short name for the node.
#*
#************************************************* ****************
function net_info {
cat <
There are settings which define where and what for DNS search order.
In the following, I'll give commands, results and maybe comments.
The command line starts with a $ so you can tell it from results and
my comments. You do not use the leading $ when you run the command.
You can get more help about the command with
man first_word_here
Example: you would do a man grep to get grep command manual.
The commands and example values follow:
$ grep hosts: /etc/nsswitch.conf
hosts: files dns nis
For speed, mine has
hosts: files dns
$ grep -v '^#' /etc/host.conf
order hosts,bind
multi on
nospoof on
spoofalert on
$ grep -v '^#' /etc/resolv.conf
nameserver 192.168.0.0
nameserver 0.238.0.12
nameserver 0.203.0.86
For speed improvements, I alwasy remove any search or domain lines.
Do not use the above numbers on your system. They are examples only.
If a nameserver fails to return anything, the next server is tried.
Because of that, I like to have the last server to be my ISP's public DNS
For routing check, there is
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.1.0 0.0.0.0 255.255.255.0 U 10 0 0 eth0
0.0.0.0 192.168.1.1 0.0.0.0 UG 10 0 0 eth0
In the above, UG in the Flags column indicate that line will be used
as the default Gateway route to ip addresses that can not be routed
via the lines above it.
The ip address in the Gateway column is where that traffic is sent.
If you can ping that address, you know that device is alive and
packets are leaving your node.
$ ifconfig
will allow you to see the ip address assigned to your nic and allow you
to check if you are getting unreasonable counts for errors, dropped,
overruns, frame, carrier and collisions.
If you want to check internet speeds to somewhere, Example:
$ traceroute -n yahoo.com
Some nodes drop those trace packets, so you may want to use
$ traceroute -In yahoo.com
For dns testing there is something like
$ dig google.com @isp_name_server1
You will get information about how isp_name_server1 performed
researching google.com lookup .
EOF
} # end net_info
#********************************************
#*
#* The following are not acutal checks
#* The comment box is about what the ping value
#* will be used to make what check/verification.
#*
#* You will need to make changes to match your setup.
#* If you want to skip a test you either put
#* 127.0.0.1 in the png[x] test to skip.
#*
#* Or you delete the png[] and msg[] lines,
#* and renumber them to keep the numbers continuous
#* through the png[12]="done" line.
#*
#* NOTE:
#* The png[12]="done" line has to remain and
#* must be the last one in the png array.
#*
#* When renumbering, check the msg[] text to verify
#* if there is a png[] value used in the text.
#*
#* You will also have to fix the code whcih
#* uses png[9].
#*
#********************************************
#********************************************
#* check ping works on the node
#********************************************
png[1]="127.0.0.1"
msg[1]="$(hostname -s) problem,
No idea where to look, I never had the problem
"
#********************************************
#* check dns on my node
#********************************************
png[2]="localhost"
msg[2]="Check $(hostname -s) /etc/hosts localhost line.
I assume you have a line like
127.0.0.1 localhost.localdomain localhost
man hosts for more info"
#********************************************
#* check pinging my ip address works
#********************************************
png[3]="192.168.1.130"
msg[3]="Check $(hostname -s) /etc/hosts $(hostname) ip addy.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"
#********************************************
#* check dns reads my /etc/hosts by full name
#********************************************
png[4]="$(hostname)"
msg[4]="Check $(hostname -s) /etc/hosts $(hostname) line.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"
#********************************************
#* check dns reads my /etc/hosts by alias
#********************************************
png[5]="$(hostname -s)"
msg[5]="Check $(hostname -s) /etc/hosts $(hostname) line for an alias.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"
#********************************************
#* check my gatway device is alive
#********************************************
png[6]="192.168.1.1"
msg[6]="Check physical connection to next device to internet (gateway).
run mii-tool -v eth0
or ethtool eth0
You are looking for link ok line
or Link detected: yes depending on which tool used
run route -n to verify you have a UG Flags line
$(net_info)"
#********************************************
#* check my gatway alias in /etc/hosts
#********************************************
png[7]="router"
msg[7]="Check $(hostname -s) /etc/hosts router line
I assume you have a
192.168.1.1 router line
man hosts for more info
$(net_info)"
#********************************************
#* check my ISP's gateway connected to router
#********************************************
png[8]="71.252.137.1"
msg[8]="Check leds on internet device.
poweroff internet device (adsl/cable modem)
wait 30 seconds by watch/clock to let capacitors discharge
and reset device
power up, wait for leds to settle down
run service network restart
Leds not right, check wiring out to telephone pole
call your ISP
$(net_info)"
#********************************************
#* check if DNS server is alive
#********************************************
_dns_ip=9
png[$_dns_ip]="192.168.1.1"
msg[$_dns_ip]="Check $(hostname -s) /etc/resolv.conf nameserver line
You will have to check the device which has the name server running.
Your internet device (adsl/cable modem your dns server)
If none of the above, ${png[$_dns_ip]} is down
Work around, change namesever ip_here to a public nameserver
in /etc/resolv.conf
man resolv.conf for more info
$(net_info)"
#********************************************
#* check ISP can route to yahoo.com
#********************************************
png[10]="66.94.234.13"
msg[10]="cannot ping yahoo by ip address
yahoo.com is down or ip address changed.
check google.com with ping -c1 72.14.207.99
If that fails, google.com is down or ip address changed
or it is an ISP/internet problem
$(net_info)"
#********************************************
#* check DNS can resolve yahoo.com
#********************************************
png[11]="yahoo.com"
msg[11]="Cannot ping yahoo.com by name
yahoo.com just went down, or dns is broke on your ISP or somewhere else.
$(net_info)"
png[12]="done"
msg[12]="Last array element to tell while loop we are done pinging"
#********************************************
#* Actual testing starts here
#********************************************
#********************************************
#* get the first dns server from /etc/reso.conf
#********************************************
set -- $(grep nameserver /etc/resolv.conf | grep -v '^#' | head -1)
_ip=$2
if [ -z "$_ip" ] ; then
echo "/etc/resolv.conf does not have a nameserver line.
man resolv.conf
for more information"
exit 1
else
pgn[$_dns_ip]=$_ip
fi
#********************************************
#* loop through all ip/name tests
#********************************************
i=1
while [ "${png[$i]}" != "done" ] ; do
echo "running ping -c 1 -w 3 ${png[$i]} "
ping -c 1 -w 3 ${png[$i]} > /dev/null
if [ $? -ne 0 ] ; then
/bin/echo -e "\nFailure: ping -c 1 -w 3 ${png[$i]} "
/bin/echo -e "${msg[$i]} "
exit 1
fi
i=$i+1
done
#********************************************
#* loop through all nameservers in /etc/resov.conf
#********************************************
while read line
do
set -- $line
_ip=$2
if [ "$1" = "nameserver" ] ; then
echo "running ping -c 1 -w 3 $_ip "
ping -c 1 -w 3 $_ip > /dev/null
if [ $? -ne 0 ] ; then
/bin/echo -e "\nDNS nameserver Failure: ping -c 1 -w 3 $_ip "
echo "nameserver $_ip in /etc/resolv.conf is not responding to pings."
echo "$(net_info)"
exit 1
fi
fi
done < /etc/resolv.conf
#********* end ck_connection **********************************
-
Re: Troubleshooting connection loss (continued)
Allen Weiner wrote:
>Bit Twister wrote:
>>> ======== grep -v '^#' /etc/resolv.conf ==========
>>> ; generated by /sbin/dhclient-script
>>> search myhome.westell.com
>>> nameserver 192.168.1.1
>>> nameserver 192.168.1.1
>> I realy, realy, realy, realy, realy, want you to
>> do a echo "nameserver 192.168.1.1" > /etc/resolv.conf
>> Hopping the ; is causing the restart hang and the
>> SUGGESTION will fix
>> your problem.
>> Not doing the SUGGESTION, will force me to place you
>> in my kill file.
>>
>That's your choice. I did a Google search on resolv.conf
>& generated. I saw several examples similar to
>mine. Here's one:
Here's a better one... Download virtually any source code
to libc, and look in the .../resolv/res_init.c file for
this code:
if ((fp = fopen(_PATH_RESCONF, "r")) != NULL) {
/* read the config file */
while (fgets_unlocked(buf, sizeof(buf), fp) != NULL) {
/* skip comments */
if (*buf == ';' || *buf == '#')
continue;
/* read default domain name */
if (MATCH(buf, "domain")) {
What that is doing is reading the /etc/resolv.conf file, and
skipping any line that begins with either ';' or '#'.
Personally, I would fault it for not initially removing all
leading white space, but....
--
Floyd L. Davidson
Ukpeagvik (Barrow, Alaska) floyd@apaflo.com
-
Re: Troubleshooting connection loss (continued)
Floyd L. Davidson wrote:
> Allen Weiner wrote:
>> Bit Twister wrote:
>>>> ======== grep -v '^#' /etc/resolv.conf ==========
>>>> ; generated by /sbin/dhclient-script
>>>> search myhome.westell.com
>>>> nameserver 192.168.1.1
>>>> nameserver 192.168.1.1
>>> I realy, realy, realy, realy, realy, want you to
>>> do a echo "nameserver 192.168.1.1" > /etc/resolv.conf
>>> Hopping the ; is causing the restart hang and the
>>> SUGGESTION will fix
>>> your problem.
>>> Not doing the SUGGESTION, will force me to place you
>>> in my kill file.
>>>
>> That's your choice. I did a Google search on resolv.conf
>> & generated. I saw several examples similar to
>> mine. Here's one:
>
> Here's a better one... Download virtually any source code
> to libc, and look in the .../resolv/res_init.c file for
> this code:
>
> if ((fp = fopen(_PATH_RESCONF, "r")) != NULL) {
> /* read the config file */
> while (fgets_unlocked(buf, sizeof(buf), fp) != NULL) {
> /* skip comments */
> if (*buf == ';' || *buf == '#')
> continue;
> /* read default domain name */
> if (MATCH(buf, "domain")) {
>
> What that is doing is reading the /etc/resolv.conf file, and
> skipping any line that begins with either ';' or '#'.
>
> Personally, I would fault it for not initially removing all
> leading white space, but....
>
Thanks very much Floyd for your reply. I'm a Linux novice and am a long
way from having the savvy to do what you did.
By the way, for many years I subscribed to comp.dcom.modems. I always
found your posts highly informative. I'm really astounded by how much
more function my small Westell DSL modem/router has than my old USR
dial-up modem.
-
Re: Troubleshooting connection loss (continued)
Bit Twister wrote:
> On Mon, 12 Nov 2007 04:46:20 GMT, Allen Weiner wrote:
>> Bit Twister wrote:
>>
>
> My guess, is there might be a loose connection inside the modem.
> You power up, about 2hrs later, heat causes the problem. little while
> later the heat makes the connection go back together.
> Imagin a loose sodder connection on a pin. Sorry for the bad graphics.
>
> cold connection (* ) works
> warm connection ( * ) breaks
> warmer connection ( *) working again
>
> connection in this context is physical connection.
If that is the problem, the broken connection must be short-lived,
because without fail, the moment I reboot, My Internet connection is
restored.
So let's assume there is a momentary connection loss. The next time it
occurs, what troubleshooting steps can I perform to determine why
"service network restart" hangs?
We're saying the problem is local, so there is no point in trying to
verify DNS, or ping outside servers.
>
>> So, I doubt that that strange first line with the leading semicolon is
>> causing a problem.
>
> Well I am happy, you have learned all you need to know.
> Guess we are done.
>
The post in this thread by Floyd Davidson should close the issue. We
ought to be done pursuing the angle that there is a DHCP problem. What
would be worthwhile to me is a troubleshooting procedure for the
"service network restart" hang that is not predicated on a DHCP problem.
> Here is a present to play with.
Thanks. But that isn't applicable to diagnosing the hang of "sewrvice
network restart".
-
Re: Troubleshooting connection loss (continued)
On Mon, 12 Nov 2007 18:46:26 GMT, Allen Weiner wrote:
>
> If that is the problem, the broken connection must be short-lived,
> because without fail, the moment I reboot, My Internet connection is
> restored.
Hehe, think about it, router chip connection opens, software goes
insane and quits working for your internet, sometime later you notice
connection drop, start process of restart. Plenty of time for the
metal to keep expanding to the other side of the hols. Those hole are
pretty tight. Not to mention the chips that are just laided on the
board and soldered.
>
> So let's assume there is a momentary connection loss. The next time it
> occurs, what troubleshooting steps can I perform to determine why
> "service network restart" hangs?
You already know how to troubleshoot to which component is not working.
You refuse to do the three things I want done rule out possible and
get more information.
It was bad enough to have to work under the hood of your car through
the tail pipe, now that you have tied my hands, I can not help you
with that problem. :-P
> The post in this thread by Floyd Davidson should close the issue.
Saw that and your reply. Had to laugh, you just got your feet wet with
scripting in bash. Floyd's post showd the C or C++ (I forget which)
which is another programming language if you want to drill that far
down to learn what is going on.
> We ought to be done pursuing the angle that there is a DHCP problem.
I THINK so, but you will not let me rule that out. :-(
> What would be worthwhile to me is a troubleshooting procedure for the
> "service network restart" hang that is not predicated on a DHCP problem.
Make my 3 SUGGESTIONS, and see if the problem goes away while in a
static ip setup.
>> Here is a present to play with.
>
> Thanks. But that isn't applicable to diagnosing the hang of "sewrvice
> network restart".
True, just a nice script to know what is not working next time connection drops.
By the way, here is the lastest one with info on more network trouble
shooting commands and prints out what is being tested at each point.
Save/run in your user accout. Does not require root privs to run.
Run as is and I think it should fail on testing ISP gateway to modem.
It will give the number of the array to modify with your modems value.
#!/bin/bash
#************************************************* ****************
#*
#* ck_connection - Check internet connection.
#*
#* Install procedure:
#* Save into a file named ck_connection
#* actual location should be somewhere in $PATH
#* chmod +x ck_connection
#*
#*
#* Code walks through the png array to test each point
#* in the path to/though the internet. DNS are also tested.
#*
#* You will need to modify the script to use node's gateway
#* ip in png[$_gate_loc], Usually your modem's ip.
#* and insert the ISP's gateway value at png[8]
#*
#* You may have to get into the modem's web page to find
#* the modem's gateway (ISP's gateway) for the modem.
#* If you cannot find it, just change png[8] to 127.0.0.1
#*
#* Depending on your distribution, the $(hostname -s) and
#* $(hostname) may need changing.
#*
#* On Mandriva linux hostname returns the FQDN and
#* hostname -s returns the short name for the node.
#*
#************************************************* ****************
if [ $# -gt 0 ] ; then
_arg2=$1
fi
function net_info {
if [ -z "$_arg2" ] ; then
echo "$0 hints will give you more research tools/info"
return
fi
cat <
Note: just because you can ping a server does not mean
it is serving up what it is supposed to be serving. 
There are settings which define where and what DNS search order.
In the following, I'll give commands, results and maybe comments. The
command line starts with a $ so you can tell command linefrom results
and my comments. You do not use the leading $ when you run the command.
You can get more help about the command with
man first_word_here
Example: you would do a man grep to get grep command manual.
The commands and example values follow:
$ grep hosts: /etc/nsswitch.conf
hosts: files dns nis
For speed, mine has
hosts: files dns
$ grep -v '^#' /etc/host.conf
order hosts,bind
multi on
nospoof on
spoofalert on
$ grep -v '^#' /etc/resolv.conf
nameserver 192.168.0.0
nameserver 0.238.0.12
nameserver 0.203.0.86
For speed improvements, I alwasy remove any search or domain lines.
Do not use the above numbers on your system. They are examples only.
If a nameserver fails to return anything, the next server is tried.
Because of that, I like to have the last server to be my ISP's public DNS
For routing check, there is
$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.1.0 0.0.0.0 255.255.255.0 U 10 0 0 eth0
0.0.0.0 192.168.1.1 0.0.0.0 UG 10 0 0 eth0
In the above, UG in the Flags column indicate that line will be used
as the default Gateway route to ip addresses that can not be routed
via the lines above it.
The ip address in the Gateway column is where that traffic is sent.
If you can ping that address, you know that device is alive and
packets are leaving your node.
$ ifconfig
will allow you to see the ip address assigned to your nic and allow you
to check if you are getting unreasonable counts for errors, dropped,
overruns, frame, carrier and collisions.
If you want to check internet speeds to somewhere, Example:
$ traceroute -n yahoo.com
Some nodes drop those trace packets, so you may want to use
$ traceroute -In yahoo.com
For dns testing there is something like
$ dig google.com @isp_name_server1
You will get information about how isp_name_server1 performed
researching google.com lookup .
EOF
} # end net_info
#********************************************
#*
#* You will need to make changes to match your setup.
#* Read script header for details
#* If you want to skip a test you either put
#* 127.0.0.1 in the png[x] test to skip.
#*
#* Or you delete the png[], tst[] and msg[] lines,
#* and renumber them to keep the numbers continuous
#* through the png[12]="done" line.
#*
#* NOTE:
#* The png[12]="done" line has to remain and
#* must be the last one in the png array.
#*
#* When renumbering, check the msg[] text to verify
#* if there is a png[] value used in the text.
#*
#********************************************
png[1]="127.0.0.1"
tst[1]="that ping is working on $(hostname -s) "
msg[1]="$(hostname -s) problem,
No idea where to look, I never had the problem
"
png[2]="localhost"
tst[2]="that resolver reads /etc/hosts "
msg[2]="Check $(hostname -s) /etc/hosts localhost line.
I assume you have a line like
127.0.0.1 localhost.localdomain localhost
man hosts for more info"
png[3]="192.168.1.130"
tst[3]="nic access by ip address"
msg[3]="Check $(hostname -s) /etc/hosts $(hostname) ip addy.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"
png[4]="$(hostname)"
tst[4]="that resolver reads /etc/hosts by full name "
msg[4]="Check $(hostname -s) /etc/hosts $(hostname) line.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"
png[5]="$(hostname -s)"
tst[5]="that resolver reads /etc/hosts by alias "
msg[5]="Check $(hostname -s) /etc/hosts $(hostname) line for an alias.
I assume you have a line like
192.168.1.130 $(hostname) $(hostname -s)
man hosts for more info"
#********************************************
#* Script fills in real value in later.
#********************************************
_gate_loc=6
png[$_gate_loc]="192.168.1.1"
tst[$_gate_loc]="that $(hostname -s) gateway is alive "
msg[$_gate_loc]="Check connection to next device to internet (gateway).
run mii-tool -v eth0
or ethtool eth0
You are looking for link ok
or Link detected: yes
depending on which tool used. run
route -n
to verify you have a UG in the Flags column of the last line
$(net_info)"
png[7]="gateway"
tst[7]="if gateway alias works via /etc/hosts "
msg[7]="Check $(hostname -s) /etc/hosts gateway line
I assume you have added a
192.168.1.1 gateway
line to /etc/hosts
That lets you do a quick test by doing a
ping -c1 router
at a terminal
man hosts for more info
$(net_info)"
#********************************************
#* Look in modem's web page or dhcp leases file.
#********************************************
png[8]="71.252.137.1"
tst[8]="modem talks to ISP gateway "
msg[8]="Check leds on internet device.
poweroff internet device (adsl/cable modem)
wait 30 seconds by watch/clock to let capacitors discharge
and reset device
power up, wait for leds to settle down
run service network restart
Leds not right, check wiring out to telephone pole
call your ISP
$(net_info)"
#********************************************
#* Script fill in real value from /etc/resolv.conf
#********************************************
_dns_loc=9
png[$_dns_loc]="127.0.0.1"
tst[$_dns_loc]="if DNS server is alive "
msg[$_dns_loc]="Check $(hostname -s) /etc/resolv.conf nameserver line
You will have to check the device which has the name server running.
Your internet device (adsl/cable modem your dns server)
If none of the above, ${png[$_dns_loc]} is down
Work around, change namesever ip_here to a public nameserver
in /etc/resolv.conf
man resolv.conf for more info
$(net_info)"
png[10]="66.94.234.13"
tst[10]="that ISP can route to yahoo.com "
msg[10]="cannot ping yahoo by ip address
yahoo.com is down or ip address changed.
check google.com with ping -c1 72.14.207.99
If that fails, google.com is down or ip address changed
or it is an ISP/internet problem
$(net_info)"
png[11]="yahoo.com"
tst[11]="ISP can get a DNS resolve yahoo.com"
msg[11]="Cannot ping yahoo.com by name
yahoo.com just went down, or dns is broke on your ISP or somewhere else.
$(net_info)"
png[12]="done"
tst[12]="We never use this because png done is "
msg[12]="last array element to tell while loop we are done pinging"
#********************************************
#*
#* Actual testing starts here
#*
#********************************************
tput clear
#********************************************
#* get/save the first dns server from /etc/resov.conf
#********************************************
set -- $(grep nameserver /etc/resolv.conf | grep -v '^#' | head -1)
_ip=$2
if [ -z "$_ip" ] ; then
echo "/etc/resolv.conf does not have a nameserver line.
man resolv.conf
for more information
If using dhcp, resolv.conf is updated by contents of leases file,
depending on which dhcp client being used.
locate leases | grep var/
should find it.
I assume you have mlocate or slocate installed so you can use the
locate command.
Going to use ${pgn[$_dns_loc]=$_ip} to make test run farther to
help find the failure.
Press any key to continue
"
read -n 1
exit 1
else
pgn[$_dns_loc]=$_ip
fi
#********************************************
#* get/save the gateway ip address
#********************************************
set -- $(route -n | grep 'UG' | tail -1)
_ip=$2
if [ -z "$_ip" ] ; then
echo "no default gateway line found in
route -n
results. Expected to see last line something like
0.0.0.0 192.168.1.1 0.0.0.0 UG 10 0 0 eth0
that UG line is missing which can be because the network did not
come up correctly. Usually a dhcp access problem.
using ${png[$_gate_loc]}
Press any key to continue
"
read -n 1
else
png[$_gate_loc]=$_ip
fi
#********************************************
#* loop through all ip/name tests
#********************************************
i=1
while [ "${png[$i]}" != "done" ] ; do
echo "$i Test ${tst[$i]}"
ping -c 1 -w 3 ${png[$i]} > /dev/null
if [ $? -ne 0 ] ; then
/bin/echo -e "\nFailure: ping -c 1 -w 3 ${png[$i]} "
/bin/echo -e "${msg[$i]} "
exit 1
fi
i=$(( $i + 1 ))
done
#********************************************
#* loop through all nameservers in /etc/resov.conf
#********************************************
while read line
do
set -- $line
_ip=$2
if [ "$1" = "nameserver" ] ; then
echo "Test /etc/resolv.conf nameserver $_ip is alive"
ping -c 1 -w 3 $_ip > /dev/null
if [ $? -ne 0 ] ; then
/bin/echo -e "\nDNS nameserver Failure: ping -c 1 -w 3 $_ip "
echo "nameserver $_ip in /etc/resolv.conf is not responding to pings."
echo "$(net_info)"
exit 1
fi
fi
done < /etc/resolv.conf
echo " "
echo "Basic network connectivity is working to yahoo.com"
echo " "
#********* end ck_connection **********************************
-
Re: Troubleshooting connection loss (continued)
Allen Weiner wrote:
> Bit Twister wrote:
>
> So let's assume there is a momentary connection loss. The next time it
> occurs, what troubleshooting steps can I perform to determine why
> "service network restart" hangs?
>
The "service network restart" hangs after eth0 is closed down.
It seems to me that an effective troubleshooting approach to isolate the
hang would be to put hooks in the scripts that "service network restart"
invokes. But being a Linux novice, I'd prefer not play with the
networking scripts (although I could make backups).
Another possible approach to isolating the hang that avoids modifying
networking scripts would be to turn on strace from the terminal before
issuing "service network restart". To cut down on strace output, it
would be even better to turn on strace after eth0 is closed down. I have
no idea how to do this. Suggestions would be appreciated.
-
Re: Troubleshooting connection loss (continued)
On Thu, 15 Nov 2007 03:11:31 GMT, Allen Weiner wrote:
>
> The "service network restart" hangs after eth0 is closed down.
Well, WE will not be working that problem, unless you take my
suggestions as to what config files are to look like.
> It seems to me that an effective troubleshooting approach to isolate the
> hang would be to put hooks in the scripts that "service network restart"
> invokes.
Hehe, I spent a day in those 1 or two years ago.
What I had to do was create 8 desktops, pretty near each desktop had 3
or 4 terminals up, 1 term following the code, another to see config files,
another to hunt down man pages and doucments, ..
When a script would call another script, I would open it in another desktop
so I could keep drilling down reading code. When I finally hit the
bottom of the script, I would go back to the desktop which called the script.
> But being a Linux novice, I'd prefer not play with the
> networking scripts (although I could make backups).
Sounds good in theory, takes a very methodical, conscientious person
to make that work, and you better damn well know your backups are good.
That is why a multi-boot system, with selection to boot a copy of your
"Production Install" is handy for screwing with system scripts that
could hurt you. :-D
> Another possible approach to isolating the hang that avoids modifying
> networking scripts would be to turn on strace from the terminal before
> issuing "service network restart".
Never tried it, but pretty sure trying to do a
strace /etc/init.d/network restart is not going to work. 
> To cut down on strace output, it
> would be even better to turn on strace after eth0 is closed down. I have
> no idea how to do this. Suggestions would be appreciated.
You would do a service network stop,
enable your tracing, then do the service network start.
Restart is just an easy call to stop/start.
FYI: I assume you are always logged into a user account, not root.
When you need root privs, you click up a terminal and su - root
as a security percation.
For debugging scripts, I find playing with the set command can help.
I would like you to click up a terminal and add
set -xv
to the first line of .bash_profile, save exit.
Now do the following command
su - $USER
exit
Up Arrow
and change set -xv to set -x, save exit
Up Arrow
exit
Up Arrow
and remove the set line.
-
Re: Troubleshooting connection loss (continued)
Bit Twister wrote:
> On Thu, 15 Nov 2007 03:11:31 GMT, Allen Weiner wrote:
>> The "service network restart" hangs after eth0 is closed down.
>
> Well, WE will not be working that problem, unless you take my
> suggestions as to what config files are to look like.
>
I did change the hosts file.
My dhclient-eth0.leases has not changed in the past week. Lease expires
on 11/7. DHCP isn't being invoked.
Suppose either the leases file or the resolv.conf was causing the
problem. Should that cause "service network restart" to hang?
>
>
> Hehe, I spent a day in those 1 or two years ago.
> What I had to do was create 8 desktops, pretty near each desktop had 3
> or 4 terminals up, 1 term following the code, another to see config files,
> another to hunt down man pages and doucments, ..
>
It's interesting and discouraging to hear of your experience. It would
be interesting to hear what troubleshooting technique you use for this
situation.
>
>
> Never tried it, but pretty sure trying to do a
> strace /etc/init.d/network restart is not going to work. 
>
Could you elaborate on why this won't work?
> You would do a service network stop,
> enable your tracing, then do the service network start.
>
Thanks very much for pointing that out.
>
You might find this interesting. My modem/router uses the AR7 ADSL chip.
A leading ISP feels this chip provides unreliable connections.
http://www.theregister.com/2007/10/2...neon_bt_fault/
-
Re: Troubleshooting connection loss (continued)
On Thu, 15 Nov 2007 15:51:09 GMT, Allen Weiner wrote:
> Bit Twister wrote:
>> On Thu, 15 Nov 2007 03:11:31 GMT, Allen Weiner wrote:
>>> The "service network restart" hangs after eth0 is closed down.
>>
>> Well, WE will not be working that problem, unless you take my
>> suggestions as to what config files are to look like.
>>
> I did change the hosts file.
And I know this, how?
And would you provide what you did.
> My dhclient-eth0.leases has not changed in the past week. Lease expires
> on 11/7. DHCP isn't being invoked.
Does not, matter, I was not troubleshooting dhclient-eth0.leases file change.
That information is one one aspect of your problem needing checking.
Glad you picked up on that tibit, Sorry you refused my suggestion on
what it is to contain.
> Suppose either the leases file or the resolv.conf was causing the
> problem. Should that cause "service network restart" to hang?
Told you, "WE will not be working that problem, unless you take my
suggestions as to what config files are to look like"
> It's interesting
Dang, tip on how to follow a complex script gone to waste on the OP. 
> and discouraging to hear of your experience.
Sorry to hear that. It was not hard, just lots to things to look at,
man some_cmd_here to get a feel to what is cmd did. I gave me the
experience, to what to play with, when, and why, not to mention seeing
tricks and what you can do with bash scripting language.
> It would be interesting to hear what troubleshooting technique you
> use for this situation.
I have been giving you basic troubleshooting techniques and smart
question link to read, and for all my trouble, I was given was static
about what you believe, should not make a difference, and was not
going to change the file, go ahead and kill file me if you want.....
Instead of reading the whole document, this is the section I have in mind.
http://www.catb.org/~esr/faqs/smart-....html#symptoms
for the above paragraph.
>> Never tried it, but pretty sure trying to do a
>> strace /etc/init.d/network restart is not going to work. 
> Could you elaborate on why this won't work?
You have to use the proper tool for the job at hand.
Do not get me wrong, on the whole, I applaude how well you are doing
and what you have done.
I want you to keep in mind, I try to keep the lurkers in mind when I
post, and teach you how to fish. Not cut the pole, sping the line,
catch the fish, fry, cut it up and feed you.
I do try to keep in mind the poster's skill, and knowledge when making
my respones.
sevice is basically a wrapper script which runs what is found in /etc/init.d
If you were to look at the files in /etc/init.d, you would see that
they for the most part scripts.
Generally speaking, in my mind, you have program/scripts which do the work.
Scripts are what you can view with the cat command. Programs are
compiled into a binary form.
Easy way to tell, try less /bin/ls
less ~/.bashrc
See the difference.
Now instead of less, use strace and see what you can see.
Next time you go to ask about a command, you need to
Read The Fine Manual (RTFM), try the commmand to see what it does.
You never experiment when logged in as root, if possible.
You boot and play in a hot backup partition.
Always as a user, if possible. If afraid of hurting your account,
create a junk account. I do not recommend calling it test.
Log into junk and play around there. You can alwasy delete/create it again.
>> You would do a service network stop,
>> enable your tracing, then do the service network start.
>>
> Thanks very much for pointing that out.
That is a function of /etc/init.d/network, not service.
So, doing a bit of reading in /etc/init.d/network, you would find
stop, start, restart, reload, status were commands available for
service network cmd_here.
> http://www.theregister.com/2007/10/2...neon_bt_fault/
Yep, saw that article on the site when they posted it.