Mysterious server lockups with Ubuntu Hardy - Ubuntu

This is a discussion on Mysterious server lockups with Ubuntu Hardy - Ubuntu ; Ignoramus9283 wrote: > The server is almost new, More reason to suspect hardware, I'd say. What is the longest uptime you've had?...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 34 of 34

Thread: Mysterious server lockups with Ubuntu Hardy

  1. Re: Mysterious server lockups with Ubuntu Hardy

    Ignoramus9283 wrote:

    > The server is almost new,



    More reason to suspect hardware, I'd say. What is the longest uptime
    you've had?

  2. Re: Mysterious server lockups with Ubuntu Hardy

    On 2008-08-25, Matt wrote:
    > Ignoramus9283 wrote:
    >
    >> The server is almost new,

    >
    >
    > More reason to suspect hardware, I'd say. What is the longest uptime
    > you've had?


    10 days

    --
    Due to extreme spam originating from Google Groups, and their inattention
    to spammers, I and many others block all articles originating
    from Google Groups. If you want your postings to be seen by
    more readers you will need to find a different means of
    posting on Usenet.
    http://improve-usenet.org/

  3. Re: Mysterious server lockups with Ubuntu Hardy


    "Ignoramus9283" wrote in message
    newsPydnUtJ0qq4Ji_VnZ2dnUVZ_hudnZ2d@giganews.com...
    > On 2008-08-25, Matt wrote:
    >> Ignoramus9283 wrote:
    >>
    >>> The server is almost new,

    >>
    >>
    >> More reason to suspect hardware, I'd say. What is the longest uptime
    >> you've had?

    >
    > 10 days


    We run Ubuntu servers here at work, rock solid. The only crap we had with
    crashes seemed to be caused by 'electricity surges', we installed UPS's
    which seemed to sort it out. Its a long shot I admit but perhaps something
    to look at?

    Regards



  4. Re: Mysterious server lockups with Ubuntu Hardy

    Matt wrote:
    > Ignoramus9283 wrote:
    >
    >> The server is almost new,

    >
    >
    > More reason to suspect hardware, I'd say. What is the longest uptime
    > you've had?


    Yep. with the new info, its probably something on the edge of spec, or
    slightly beyond..

    I hate these sorts of problems: they can be terribly sensitive to
    exactly what code is being run at what locations.



    Standard memtest stuff doesn't always work either..if youu have a dodgy
    peripheral card or controller, memtest wont show it, but it can still
    corrupt memory when its being accessed, or occasionally when ANY stuff
    on the IO bus is being accessed.


  5. Re: Mysterious server lockups with Ubuntu Hardy

    Ignoramus9283 wrote:
    > On 2008-08-25, Matt wrote:
    >> Ignoramus9283 wrote:
    >>
    >>> The server is almost new,

    >>
    >> More reason to suspect hardware, I'd say. What is the longest uptime
    >> you've had?

    >
    > 10 days
    >

    Bummer. That makes it really hard to reproduce the fault reliably enough
    to be sure you have fixed it..


    Short of systmatically switching parts around till it works, I cant
    think of an answer short of hooking a shed load of expesnive gear onto
    the bus, and hiring a quality engineer to drive it.

    Get swapping parts.

    One reason I stick to one local expensive PC supplier, is he doesn't get
    shirty when I have an issue like this. Parts get swapped without
    question till it all works..

  6. Re: Mysterious server lockups with Ubuntu Hardy

    Ignoramus17662 wrote:
    > I was able to insert these noacpi options by editing the commented out
    > kopt= parameter and rerunning update-group. Booted with all ACPI
    > disabled.
    >
    > I will see how it goes.
    >
    > # kopt=root=UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ro noapic nolapic pci=noacpi acpi=off
    >
    > So far it seems that the server is running OK, but only time will
    > tell.
    >
    > i
    >
    > On 2008-05-12, Ignoramus17662 wrote:
    >> On 2008-05-12, Darren Salt wrote:
    >>> I demand that Ignoramus4557 may or may not have written...
    >>>
    >>>> I was able to insmod netconsole.
    >>>> root@server:~# modprobe netconsole \
    >>>> netconsole="1941@10.1.xxx.xxx/eth0,1941@192.168.xxx.xxx/00:14:xx:xx:xx:xx"
    >>>> mypc:myusername:~ ==>nc -u -l 1941
    >>>> However, I did not receive any kernel messages on the target machine.
    >>>> That's despite having generated some kernel messages by inserting a
    >>>> superfluous module and verifying that with dmesg.
    >>> The IP addresses say that you're using two separate logical networks. That's
    >>> fine, so long as the destination IP address is routable and the destination
    >>> MAC address is one hop away from the source machine (the server). Is this the
    >>> case?
    >>>
    >>> [snip]

    >> That could be it.
    >>
    >> I have modified the line in /boot/grub/menu.lst
    >>
    >> from
    >>
    >> defoptions=quiet splash
    >>
    >> to
    >>
    >> defoptions=quiet splash noapic nolapic pci=noacpi
    >>
    >> I will try this after hours tonight.
    >>
    >> I certainly do not need ACPI on this server.

    >


    Lost initial message... did you say this was on an AMD64X2? I just
    bought one a couple weeks ago (custom build) and was experiencing some
    random system freezes (surfing, idle). I found 'powernowd' was not
    running when it should be and further inspection showed that CPU scaling
    (AMD Cool & Quiet) was off in the bios. Turned it on and 'powernowd' is
    running like it should (I can see the cpus go up and down) and the
    system seems stable but I need some time to tell. Might be something to
    check.

    --
    Norman
    Registered Linux user #461062

  7. Re: Mysterious server lockups with Ubuntu Hardy

    On 2008-08-25, The Natural Philosopher wrote:
    > Ignoramus9283 wrote:
    >> On 2008-08-25, Matt wrote:
    >>> Ignoramus9283 wrote:
    >>>
    >>>> The server is almost new,
    >>>
    >>> More reason to suspect hardware, I'd say. What is the longest uptime
    >>> you've had?

    >>
    >> 10 days
    >>

    > Bummer. That makes it really hard to reproduce the fault reliably enough
    > to be sure you have fixed it..


    Yep. Always not sure if "this helped now for sure" or "I am lucky so
    far". Like, now I am on 2.4.25, going for 6 days, knocking on wood
    vigorously, and I am not sure if this 2.4.25 is really helping or it
    is just a fluke.

    >
    > Short of systmatically switching parts around till it works, I cant
    > think of an answer short of hooking a shed load of expesnive gear onto
    > the bus, and hiring a quality engineer to drive it.
    >
    > Get swapping parts.
    >
    > One reason I stick to one local expensive PC supplier, is he doesn't get
    > shirty when I have an issue like this. Parts get swapped without
    > question till it all works..


    I am not convinced that it is "parts". The more likely answer is the
    drivers or kernel. Bad memory usually gives different symptoms (went
    through that once). Bad disks usually leave some trace in logfiles.

    --
    Due to extreme spam originating from Google Groups, and their inattention
    to spammers, I and many others block all articles originating
    from Google Groups. If you want your postings to be seen by
    more readers you will need to find a different means of
    posting on Usenet.
    http://improve-usenet.org/

  8. Re: Mysterious server lockups with Ubuntu Hardy

    On 2008-08-25, Norman Peelman wrote:
    > Ignoramus17662 wrote:
    >> I was able to insert these noacpi options by editing the commented out
    >> kopt= parameter and rerunning update-group. Booted with all ACPI
    >> disabled.
    >>
    >> I will see how it goes.
    >>
    >> # kopt=root=UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ro noapic nolapic pci=noacpi acpi=off
    >>
    >> So far it seems that the server is running OK, but only time will
    >> tell.
    >>
    >> i
    >>
    >> On 2008-05-12, Ignoramus17662 wrote:
    >>> On 2008-05-12, Darren Salt wrote:
    >>>> I demand that Ignoramus4557 may or may not have written...
    >>>>
    >>>>> I was able to insmod netconsole.
    >>>>> root@server:~# modprobe netconsole \
    >>>>> netconsole="1941@10.1.xxx.xxx/eth0,1941@192.168.xxx.xxx/00:14:xx:xx:xx:xx"
    >>>>> mypc:myusername:~ ==>nc -u -l 1941
    >>>>> However, I did not receive any kernel messages on the target machine.
    >>>>> That's despite having generated some kernel messages by inserting a
    >>>>> superfluous module and verifying that with dmesg.
    >>>> The IP addresses say that you're using two separate logical networks. That's
    >>>> fine, so long as the destination IP address is routable and the destination
    >>>> MAC address is one hop away from the source machine (the server). Is this the
    >>>> case?
    >>>>
    >>>> [snip]
    >>> That could be it.
    >>>
    >>> I have modified the line in /boot/grub/menu.lst
    >>>
    >>> from
    >>>
    >>> defoptions=quiet splash
    >>>
    >>> to
    >>>
    >>> defoptions=quiet splash noapic nolapic pci=noacpi
    >>>
    >>> I will try this after hours tonight.
    >>>
    >>> I certainly do not need ACPI on this server.

    >>

    >
    > Lost initial message... did you say this was on an AMD64X2? I just


    yes

    > bought one a couple weeks ago (custom build) and was experiencing some
    > random system freezes (surfing, idle). I found 'powernowd' was not
    > running when it should be and further inspection showed that CPU scaling
    > (AMD Cool & Quiet) was off in the bios. Turned it on and 'powernowd' is
    > running like it should (I can see the cpus go up and down) and the
    > system seems stable but I need some time to tell. Might be something to
    > check.
    >


    This is just to save electricity, right? There is not much point in
    CPU scaling on a server, besides that. I cannot see how this could be
    the culprit, in any case.
    --
    Due to extreme spam originating from Google Groups, and their inattention
    to spammers, I and many others block all articles originating
    from Google Groups. If you want your postings to be seen by
    more readers you will need to find a different means of
    posting on Usenet.
    http://improve-usenet.org/

  9. Re: Mysterious server lockups with Ubuntu Hardy

    Ignoramus9283 wrote:
    > On 2008-08-25, The Natural Philosopher wrote:
    >> Ignoramus9283 wrote:
    >>> On 2008-08-25, Matt wrote:
    >>>> Ignoramus9283 wrote:
    >>>>
    >>>>> The server is almost new,
    >>>> More reason to suspect hardware, I'd say. What is the longest uptime
    >>>> you've had?
    >>> 10 days
    >>>

    >> Bummer. That makes it really hard to reproduce the fault reliably enough
    >> to be sure you have fixed it..

    >
    > Yep. Always not sure if "this helped now for sure" or "I am lucky so
    > far". Like, now I am on 2.4.25, going for 6 days, knocking on wood
    > vigorously, and I am not sure if this 2.4.25 is really helping or it
    > is just a fluke.
    >
    >> Short of systmatically switching parts around till it works, I cant
    >> think of an answer short of hooking a shed load of expesnive gear onto
    >> the bus, and hiring a quality engineer to drive it.
    >>
    >> Get swapping parts.
    >>
    >> One reason I stick to one local expensive PC supplier, is he doesn't get
    >> shirty when I have an issue like this. Parts get swapped without
    >> question till it all works..

    >
    > I am not convinced that it is "parts". The more likely answer is the
    > drivers or kernel. Bad memory usually gives different symptoms (went
    > through that once). Bad disks usually leave some trace in logfiles.
    >


    Well that is possible I guess..built a non linux kernel that did that
    meself.Crashed every couple of hours..there were two bytes of code where
    a timer interrupt would be guaranteed to screw it into a never to return
    loop.


  10. Re: Mysterious server lockups with Ubuntu Hardy

    Ignoramus9283 wrote:
    > On 2008-08-25, Norman Peelman wrote:
    >> Ignoramus17662 wrote:
    >>> I was able to insert these noacpi options by editing the commented out
    >>> kopt= parameter and rerunning update-group. Booted with all ACPI
    >>> disabled.
    >>>
    >>> I will see how it goes.
    >>>
    >>> # kopt=root=UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ro noapic nolapic pci=noacpi acpi=off
    >>>
    >>> So far it seems that the server is running OK, but only time will
    >>> tell.
    >>>
    >>> i
    >>>
    >>> On 2008-05-12, Ignoramus17662 wrote:
    >>>> On 2008-05-12, Darren Salt wrote:
    >>>>> I demand that Ignoramus4557 may or may not have written...
    >>>>>
    >>>>>> I was able to insmod netconsole.
    >>>>>> root@server:~# modprobe netconsole \
    >>>>>> netconsole="1941@10.1.xxx.xxx/eth0,1941@192.168.xxx.xxx/00:14:xx:xx:xx:xx"
    >>>>>> mypc:myusername:~ ==>nc -u -l 1941
    >>>>>> However, I did not receive any kernel messages on the target machine.
    >>>>>> That's despite having generated some kernel messages by inserting a
    >>>>>> superfluous module and verifying that with dmesg.
    >>>>> The IP addresses say that you're using two separate logical networks. That's
    >>>>> fine, so long as the destination IP address is routable and the destination
    >>>>> MAC address is one hop away from the source machine (the server). Is this the
    >>>>> case?
    >>>>>
    >>>>> [snip]
    >>>> That could be it.
    >>>>
    >>>> I have modified the line in /boot/grub/menu.lst
    >>>>
    >>>> from
    >>>>
    >>>> defoptions=quiet splash
    >>>>
    >>>> to
    >>>>
    >>>> defoptions=quiet splash noapic nolapic pci=noacpi
    >>>>
    >>>> I will try this after hours tonight.
    >>>>
    >>>> I certainly do not need ACPI on this server.

    >> Lost initial message... did you say this was on an AMD64X2? I just

    >
    > yes
    >
    >> bought one a couple weeks ago (custom build) and was experiencing some
    >> random system freezes (surfing, idle). I found 'powernowd' was not
    >> running when it should be and further inspection showed that CPU scaling
    >> (AMD Cool & Quiet) was off in the bios. Turned it on and 'powernowd' is
    >> running like it should (I can see the cpus go up and down) and the
    >> system seems stable but I need some time to tell. Might be something to
    >> check.
    >>

    >
    > This is just to save electricity, right? There is not much point in
    > CPU scaling on a server, besides that. I cannot see how this could be
    > the culprit, in any case.


    It's just a suggestion... that's where my googling led me. If
    powernowd can't run properly Ubuntu (or the powernowd pkg) defaults to a
    more generic method that evidently may or may not work right (especially
    if the bios setting is OFF). As to why it would freeze while in use or
    in the middle of the night while idle is beyond me. If turning on cpu
    scaling in the bios makes it work right then I'm happy.
    As for there 'not being much point in CPU scaling on a server...' I
    don't really understand why you would think that. If the server is so
    busy that it runs at MAX all the time, great. If it finds itself not so
    busy and it can scale itself back, why would it matter?

    If powernowd is installed then try removing it if you don't see a
    need for it.

    --
    Norman
    Registered Linux user #461062

  11. Re: Mysterious server lockups with Ubuntu Hardy

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    Ignoramus23901 pravi:
    > There is NOTHING in the logs when it locks up. It locks up once per
    > several days.
    >
    > A few days ago, I said **** it, and installed a 2.6.25 kernel that is
    > praised for its stability. It's been several days (not enough to tell)
    > and it is still up and running. Time will tell if 2.6.25 will help.


    I have never seen or heard of a Linux box locking up and I have seen
    servers that ran dead memory, servers overheating at 75˚C for months on
    end, etc. The worst that ever happened is processes get mysteriously
    killed and disk access not doing what it should when the hdd controller
    vanished. The software bit is very error tolerant, so I (and everybody
    else here) find it hard to believe a widely used out-of-the-box Linux
    distro could have problems like that.

    It is much more likely something is wrong with your hardware.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    iD8DBQFItDiNB6mNZXe93qgRApPnAKC0ZNjtFP4NK7Sc1WpPpU Z8tfWAsACeLbLq
    0HcTSXyrerR0IhAW39/jDK0=
    =sqwm
    -----END PGP SIGNATURE-----

  12. Re: Mysterious server lockups with Ubuntu Hardy

    On 2008-08-26, Jure Sah wrote:
    >
    > Ignoramus23901 pravi:
    >> There is NOTHING in the logs when it locks up. It locks up once per
    >> several days.
    >>
    >> A few days ago, I said **** it, and installed a 2.6.25 kernel that is
    >> praised for its stability. It's been several days (not enough to tell)
    >> and it is still up and running. Time will tell if 2.6.25 will help.

    >
    > I have never seen or heard of a Linux box locking up and I have seen
    > servers that ran dead memory, servers overheating at 75?C for months on
    > end, etc. The worst that ever happened is processes get mysteriously
    > killed and disk access not doing what it should when the hdd controller
    > vanished. The software bit is very error tolerant, so I (and everybody
    > else here) find it hard to believe a widely used out-of-the-box Linux
    > distro could have problems like that.
    >
    > It is much more likely something is wrong with your hardware.


    Which hardware?
    --
    Due to extreme spam originating from Google Groups, and their inattention
    to spammers, I and many others block all articles originating
    from Google Groups. If you want your postings to be seen by
    more readers you will need to find a different means of
    posting on Usenet.
    http://improve-usenet.org/

  13. Re: Mysterious server lockups with Ubuntu Hardy

    On 2008-08-25, Dogma Discharge wrote:
    >
    > "Ignoramus9283" wrote in message
    > newsPydnUtJ0qq4Ji_VnZ2dnUVZ_hudnZ2d@giganews.com...
    >> On 2008-08-25, Matt wrote:
    >>> Ignoramus9283 wrote:
    >>>
    >>>> The server is almost new,
    >>>
    >>>
    >>> More reason to suspect hardware, I'd say. What is the longest uptime
    >>> you've had?

    >>
    >> 10 days

    >
    > We run Ubuntu servers here at work, rock solid. The only crap we had with
    > crashes seemed to be caused by 'electricity surges', we installed UPS's
    > which seemed to sort it out. Its a long shot I admit but perhaps something
    > to look at?
    >


    The whole server room is on one giant UPS. No other servers crash at
    that time.

    i
    --
    Due to extreme spam originating from Google Groups, and their inattention
    to spammers, I and many others block all articles originating
    from Google Groups. If you want your postings to be seen by
    more readers you will need to find a different means of
    posting on Usenet.
    http://improve-usenet.org/

  14. Re: Mysterious server lockups with Ubuntu Hardy

    On 2008-08-26, Ignoramus9283 wrote:
    >
    > This is just to save electricity, right? There is not much point in
    > CPU scaling on a server, besides that. I cannot see how this could be
    > the culprit, in any case.


    There's no point in wasting power when it's not needed... The cpu
    scaling will change the speed the cpu's are running at as needed
    (ondemand).

    Running the cpu's at a lower speed will reduce the heat production
    along with energy usage. It is worthwhile to run it, and it will not
    have a negative impact on your IO speed, so long as you properly
    configure it.


    --
    Joe - Linux User #449481/Ubuntu User #19733
    joe at hits - buffalo dot com
    "Hate is baggage, life is too short to go around pissed off all the
    time..." - Danny, American History X

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2