Mysterious server lockups with Ubuntu Hardy - Ubuntu
This is a discussion on Mysterious server lockups with Ubuntu Hardy - Ubuntu ; Ignoramus9283 wrote:
> The server is almost new,
More reason to suspect hardware, I'd say. What is the longest uptime
you've had?...
-
Re: Mysterious server lockups with Ubuntu Hardy
Ignoramus9283 wrote:
> The server is almost new,
More reason to suspect hardware, I'd say. What is the longest uptime
you've had?
-
Re: Mysterious server lockups with Ubuntu Hardy
On 2008-08-25, Matt wrote:
> Ignoramus9283 wrote:
>
>> The server is almost new,
>
>
> More reason to suspect hardware, I'd say. What is the longest uptime
> you've had?
10 days
--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/
-
Re: Mysterious server lockups with Ubuntu Hardy
"Ignoramus9283" wrote in message
news
PydnUtJ0qq4Ji_VnZ2dnUVZ_hudnZ2d@giganews.com...
> On 2008-08-25, Matt wrote:
>> Ignoramus9283 wrote:
>>
>>> The server is almost new,
>>
>>
>> More reason to suspect hardware, I'd say. What is the longest uptime
>> you've had?
>
> 10 days
We run Ubuntu servers here at work, rock solid. The only crap we had with
crashes seemed to be caused by 'electricity surges', we installed UPS's
which seemed to sort it out. Its a long shot I admit but perhaps something
to look at?
Regards
-
Re: Mysterious server lockups with Ubuntu Hardy
Matt wrote:
> Ignoramus9283 wrote:
>
>> The server is almost new,
>
>
> More reason to suspect hardware, I'd say. What is the longest uptime
> you've had?
Yep. with the new info, its probably something on the edge of spec, or
slightly beyond..
I hate these sorts of problems: they can be terribly sensitive to
exactly what code is being run at what locations.
Standard memtest stuff doesn't always work either..if youu have a dodgy
peripheral card or controller, memtest wont show it, but it can still
corrupt memory when its being accessed, or occasionally when ANY stuff
on the IO bus is being accessed.
-
Re: Mysterious server lockups with Ubuntu Hardy
Ignoramus9283 wrote:
> On 2008-08-25, Matt wrote:
>> Ignoramus9283 wrote:
>>
>>> The server is almost new,
>>
>> More reason to suspect hardware, I'd say. What is the longest uptime
>> you've had?
>
> 10 days
>
Bummer. That makes it really hard to reproduce the fault reliably enough
to be sure you have fixed it..
Short of systmatically switching parts around till it works, I cant
think of an answer short of hooking a shed load of expesnive gear onto
the bus, and hiring a quality engineer to drive it.
Get swapping parts.
One reason I stick to one local expensive PC supplier, is he doesn't get
shirty when I have an issue like this. Parts get swapped without
question till it all works..
-
Re: Mysterious server lockups with Ubuntu Hardy
Ignoramus17662 wrote:
> I was able to insert these noacpi options by editing the commented out
> kopt= parameter and rerunning update-group. Booted with all ACPI
> disabled.
>
> I will see how it goes.
>
> # kopt=root=UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ro noapic nolapic pci=noacpi acpi=off
>
> So far it seems that the server is running OK, but only time will
> tell.
>
> i
>
> On 2008-05-12, Ignoramus17662 wrote:
>> On 2008-05-12, Darren Salt wrote:
>>> I demand that Ignoramus4557 may or may not have written...
>>>
>>>> I was able to insmod netconsole.
>>>> root@server:~# modprobe netconsole \
>>>> netconsole="1941@10.1.xxx.xxx/eth0,1941@192.168.xxx.xxx/00:14:xx:xx:xx:xx"
>>>> mypc:myusername:~ ==>nc -u -l 1941
>>>> However, I did not receive any kernel messages on the target machine.
>>>> That's despite having generated some kernel messages by inserting a
>>>> superfluous module and verifying that with dmesg.
>>> The IP addresses say that you're using two separate logical networks. That's
>>> fine, so long as the destination IP address is routable and the destination
>>> MAC address is one hop away from the source machine (the server). Is this the
>>> case?
>>>
>>> [snip]
>> That could be it.
>>
>> I have modified the line in /boot/grub/menu.lst
>>
>> from
>>
>> defoptions=quiet splash
>>
>> to
>>
>> defoptions=quiet splash noapic nolapic pci=noacpi
>>
>> I will try this after hours tonight.
>>
>> I certainly do not need ACPI on this server.
>
Lost initial message... did you say this was on an AMD64X2? I just
bought one a couple weeks ago (custom build) and was experiencing some
random system freezes (surfing, idle). I found 'powernowd' was not
running when it should be and further inspection showed that CPU scaling
(AMD Cool & Quiet) was off in the bios. Turned it on and 'powernowd' is
running like it should (I can see the cpus go up and down) and the
system seems stable but I need some time to tell. Might be something to
check.
--
Norman
Registered Linux user #461062
-
Re: Mysterious server lockups with Ubuntu Hardy
On 2008-08-25, The Natural Philosopher wrote:
> Ignoramus9283 wrote:
>> On 2008-08-25, Matt wrote:
>>> Ignoramus9283 wrote:
>>>
>>>> The server is almost new,
>>>
>>> More reason to suspect hardware, I'd say. What is the longest uptime
>>> you've had?
>>
>> 10 days
>>
> Bummer. That makes it really hard to reproduce the fault reliably enough
> to be sure you have fixed it..
Yep. Always not sure if "this helped now for sure" or "I am lucky so
far". Like, now I am on 2.4.25, going for 6 days, knocking on wood
vigorously, and I am not sure if this 2.4.25 is really helping or it
is just a fluke.
>
> Short of systmatically switching parts around till it works, I cant
> think of an answer short of hooking a shed load of expesnive gear onto
> the bus, and hiring a quality engineer to drive it.
>
> Get swapping parts.
>
> One reason I stick to one local expensive PC supplier, is he doesn't get
> shirty when I have an issue like this. Parts get swapped without
> question till it all works..
I am not convinced that it is "parts". The more likely answer is the
drivers or kernel. Bad memory usually gives different symptoms (went
through that once). Bad disks usually leave some trace in logfiles.
--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/
-
Re: Mysterious server lockups with Ubuntu Hardy
On 2008-08-25, Norman Peelman wrote:
> Ignoramus17662 wrote:
>> I was able to insert these noacpi options by editing the commented out
>> kopt= parameter and rerunning update-group. Booted with all ACPI
>> disabled.
>>
>> I will see how it goes.
>>
>> # kopt=root=UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ro noapic nolapic pci=noacpi acpi=off
>>
>> So far it seems that the server is running OK, but only time will
>> tell.
>>
>> i
>>
>> On 2008-05-12, Ignoramus17662 wrote:
>>> On 2008-05-12, Darren Salt wrote:
>>>> I demand that Ignoramus4557 may or may not have written...
>>>>
>>>>> I was able to insmod netconsole.
>>>>> root@server:~# modprobe netconsole \
>>>>> netconsole="1941@10.1.xxx.xxx/eth0,1941@192.168.xxx.xxx/00:14:xx:xx:xx:xx"
>>>>> mypc:myusername:~ ==>nc -u -l 1941
>>>>> However, I did not receive any kernel messages on the target machine.
>>>>> That's despite having generated some kernel messages by inserting a
>>>>> superfluous module and verifying that with dmesg.
>>>> The IP addresses say that you're using two separate logical networks. That's
>>>> fine, so long as the destination IP address is routable and the destination
>>>> MAC address is one hop away from the source machine (the server). Is this the
>>>> case?
>>>>
>>>> [snip]
>>> That could be it.
>>>
>>> I have modified the line in /boot/grub/menu.lst
>>>
>>> from
>>>
>>> defoptions=quiet splash
>>>
>>> to
>>>
>>> defoptions=quiet splash noapic nolapic pci=noacpi
>>>
>>> I will try this after hours tonight.
>>>
>>> I certainly do not need ACPI on this server.
>>
>
> Lost initial message... did you say this was on an AMD64X2? I just
yes
> bought one a couple weeks ago (custom build) and was experiencing some
> random system freezes (surfing, idle). I found 'powernowd' was not
> running when it should be and further inspection showed that CPU scaling
> (AMD Cool & Quiet) was off in the bios. Turned it on and 'powernowd' is
> running like it should (I can see the cpus go up and down) and the
> system seems stable but I need some time to tell. Might be something to
> check.
>
This is just to save electricity, right? There is not much point in
CPU scaling on a server, besides that. I cannot see how this could be
the culprit, in any case.
--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/
-
Re: Mysterious server lockups with Ubuntu Hardy
Ignoramus9283 wrote:
> On 2008-08-25, The Natural Philosopher wrote:
>> Ignoramus9283 wrote:
>>> On 2008-08-25, Matt wrote:
>>>> Ignoramus9283 wrote:
>>>>
>>>>> The server is almost new,
>>>> More reason to suspect hardware, I'd say. What is the longest uptime
>>>> you've had?
>>> 10 days
>>>
>> Bummer. That makes it really hard to reproduce the fault reliably enough
>> to be sure you have fixed it..
>
> Yep. Always not sure if "this helped now for sure" or "I am lucky so
> far". Like, now I am on 2.4.25, going for 6 days, knocking on wood
> vigorously, and I am not sure if this 2.4.25 is really helping or it
> is just a fluke.
>
>> Short of systmatically switching parts around till it works, I cant
>> think of an answer short of hooking a shed load of expesnive gear onto
>> the bus, and hiring a quality engineer to drive it.
>>
>> Get swapping parts.
>>
>> One reason I stick to one local expensive PC supplier, is he doesn't get
>> shirty when I have an issue like this. Parts get swapped without
>> question till it all works..
>
> I am not convinced that it is "parts". The more likely answer is the
> drivers or kernel. Bad memory usually gives different symptoms (went
> through that once). Bad disks usually leave some trace in logfiles.
>
Well that is possible I guess..built a non linux kernel that did that
meself.Crashed every couple of hours..there were two bytes of code where
a timer interrupt would be guaranteed to screw it into a never to return
loop.
-
Re: Mysterious server lockups with Ubuntu Hardy
Ignoramus9283 wrote:
> On 2008-08-25, Norman Peelman wrote:
>> Ignoramus17662 wrote:
>>> I was able to insert these noacpi options by editing the commented out
>>> kopt= parameter and rerunning update-group. Booted with all ACPI
>>> disabled.
>>>
>>> I will see how it goes.
>>>
>>> # kopt=root=UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ro noapic nolapic pci=noacpi acpi=off
>>>
>>> So far it seems that the server is running OK, but only time will
>>> tell.
>>>
>>> i
>>>
>>> On 2008-05-12, Ignoramus17662 wrote:
>>>> On 2008-05-12, Darren Salt wrote:
>>>>> I demand that Ignoramus4557 may or may not have written...
>>>>>
>>>>>> I was able to insmod netconsole.
>>>>>> root@server:~# modprobe netconsole \
>>>>>> netconsole="1941@10.1.xxx.xxx/eth0,1941@192.168.xxx.xxx/00:14:xx:xx:xx:xx"
>>>>>> mypc:myusername:~ ==>nc -u -l 1941
>>>>>> However, I did not receive any kernel messages on the target machine.
>>>>>> That's despite having generated some kernel messages by inserting a
>>>>>> superfluous module and verifying that with dmesg.
>>>>> The IP addresses say that you're using two separate logical networks. That's
>>>>> fine, so long as the destination IP address is routable and the destination
>>>>> MAC address is one hop away from the source machine (the server). Is this the
>>>>> case?
>>>>>
>>>>> [snip]
>>>> That could be it.
>>>>
>>>> I have modified the line in /boot/grub/menu.lst
>>>>
>>>> from
>>>>
>>>> defoptions=quiet splash
>>>>
>>>> to
>>>>
>>>> defoptions=quiet splash noapic nolapic pci=noacpi
>>>>
>>>> I will try this after hours tonight.
>>>>
>>>> I certainly do not need ACPI on this server.
>> Lost initial message... did you say this was on an AMD64X2? I just
>
> yes
>
>> bought one a couple weeks ago (custom build) and was experiencing some
>> random system freezes (surfing, idle). I found 'powernowd' was not
>> running when it should be and further inspection showed that CPU scaling
>> (AMD Cool & Quiet) was off in the bios. Turned it on and 'powernowd' is
>> running like it should (I can see the cpus go up and down) and the
>> system seems stable but I need some time to tell. Might be something to
>> check.
>>
>
> This is just to save electricity, right? There is not much point in
> CPU scaling on a server, besides that. I cannot see how this could be
> the culprit, in any case.
It's just a suggestion... that's where my googling led me. If
powernowd can't run properly Ubuntu (or the powernowd pkg) defaults to a
more generic method that evidently may or may not work right (especially
if the bios setting is OFF). As to why it would freeze while in use or
in the middle of the night while idle is beyond me. If turning on cpu
scaling in the bios makes it work right then I'm happy.
As for there 'not being much point in CPU scaling on a server...' I
don't really understand why you would think that. If the server is so
busy that it runs at MAX all the time, great. If it finds itself not so
busy and it can scale itself back, why would it matter?
If powernowd is installed then try removing it if you don't see a
need for it.
--
Norman
Registered Linux user #461062
-
Re: Mysterious server lockups with Ubuntu Hardy
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ignoramus23901 pravi:
> There is NOTHING in the logs when it locks up. It locks up once per
> several days.
>
> A few days ago, I said **** it, and installed a 2.6.25 kernel that is
> praised for its stability. It's been several days (not enough to tell)
> and it is still up and running. Time will tell if 2.6.25 will help.
I have never seen or heard of a Linux box locking up and I have seen
servers that ran dead memory, servers overheating at 75˚C for months on
end, etc. The worst that ever happened is processes get mysteriously
killed and disk access not doing what it should when the hdd controller
vanished. The software bit is very error tolerant, so I (and everybody
else here) find it hard to believe a widely used out-of-the-box Linux
distro could have problems like that.
It is much more likely something is wrong with your hardware.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFItDiNB6mNZXe93qgRApPnAKC0ZNjtFP4NK7Sc1WpPpU Z8tfWAsACeLbLq
0HcTSXyrerR0IhAW39/jDK0=
=sqwm
-----END PGP SIGNATURE-----
-
Re: Mysterious server lockups with Ubuntu Hardy
On 2008-08-26, Jure Sah wrote:
>
> Ignoramus23901 pravi:
>> There is NOTHING in the logs when it locks up. It locks up once per
>> several days.
>>
>> A few days ago, I said **** it, and installed a 2.6.25 kernel that is
>> praised for its stability. It's been several days (not enough to tell)
>> and it is still up and running. Time will tell if 2.6.25 will help.
>
> I have never seen or heard of a Linux box locking up and I have seen
> servers that ran dead memory, servers overheating at 75?C for months on
> end, etc. The worst that ever happened is processes get mysteriously
> killed and disk access not doing what it should when the hdd controller
> vanished. The software bit is very error tolerant, so I (and everybody
> else here) find it hard to believe a widely used out-of-the-box Linux
> distro could have problems like that.
>
> It is much more likely something is wrong with your hardware.
Which hardware?
--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/
-
Re: Mysterious server lockups with Ubuntu Hardy
On 2008-08-25, Dogma Discharge wrote:
>
> "Ignoramus9283" wrote in message
> news
PydnUtJ0qq4Ji_VnZ2dnUVZ_hudnZ2d@giganews.com...
>> On 2008-08-25, Matt wrote:
>>> Ignoramus9283 wrote:
>>>
>>>> The server is almost new,
>>>
>>>
>>> More reason to suspect hardware, I'd say. What is the longest uptime
>>> you've had?
>>
>> 10 days
>
> We run Ubuntu servers here at work, rock solid. The only crap we had with
> crashes seemed to be caused by 'electricity surges', we installed UPS's
> which seemed to sort it out. Its a long shot I admit but perhaps something
> to look at?
>
The whole server room is on one giant UPS. No other servers crash at
that time.
i
--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/
-
Re: Mysterious server lockups with Ubuntu Hardy
On 2008-08-26, Ignoramus9283 wrote:
>
> This is just to save electricity, right? There is not much point in
> CPU scaling on a server, besides that. I cannot see how this could be
> the culprit, in any case.
There's no point in wasting power when it's not needed... The cpu
scaling will change the speed the cpu's are running at as needed
(ondemand).
Running the cpu's at a lower speed will reduce the heat production
along with energy usage. It is worthwhile to run it, and it will not
have a negative impact on your IO speed, so long as you properly
configure it.
--
Joe - Linux User #449481/Ubuntu User #19733
joe at hits - buffalo dot com
"Hate is baggage, life is too short to go around pissed off all the
time..." - Danny, American History X