e450 *FULL* diag? - SUN

This is a discussion on e450 *FULL* diag? - SUN ; I know most of you prefer Solaris on Sparc architecture but I have a "play box" that I'm running Linux on. We had the discussion a few months ago about having an e450 as a "play box" so skip it ...

+ Reply to Thread
Results 1 to 17 of 17

Thread: e450 *FULL* diag?

  1. e450 *FULL* diag?

    I know most of you prefer Solaris on Sparc architecture but I have a
    "play box" that I'm running Linux on.

    We had the discussion a few months ago about having an e450 as a "play
    box" so skip it please.

    What I'm interested in is running the fullest hardware diag possible on
    the box without the OS being loaded because there is one small problem
    that bothers me. As soon as the Linux kernel takes control the "GENERAL
    FAULT" LED lights up. Others running the same hardware and OS do not
    have this issue so I suspect there is actually a hardware problem on
    the machine that Linux is tagging when it loads it's initial drivers.
    It seems to run fine but that light bothers me. It's been up about 48
    hours now since the last testing reboot.

    The basic "test all", "probe-scsi-all", and other OBP level tests work
    fine before the linux kernel is loaded. I can't run them after the
    kernel loads because at that point Linux has control of the hardware
    and I get all kinds of nifty, machine freezing, errors.

    Poking around in the documentation I can't figure out what is the
    deepest level of diag I can run from boot or the OBP. If Solaris 10 has
    ways to further probe hardware I'll even install it --- but, once
    again, I have no experience with Solaris and would be lost trying to
    actually find the commands to run.

    Any help appreciated.

    Thanks,

    Gerald

    --



  2. Re: e450 *FULL* diag?

    Gerald V. Livington II wrote:

    > Poking around in the documentation I can't figure out what is the
    > deepest level of diag I can run from boot or the OBP. If Solaris 10 has
    > ways to further probe hardware I'll even install it --- but, once
    > again, I have no experience with Solaris and would be lost trying to
    > actually find the commands to run.


    In Solaris:

    prtdiag - display system diagnostic information
    (it might be available if you boot single-user from CD "boot cdrom -s")

    # /usr/sbin/prtdiag -v

    System Configuration: Sun Microsystems sun4u Sun Enterprise 450 (2 X
    UltraSPARC-II 400MHz)
    System clock frequency: 100 MHz
    Memory size: 1024 Megabytes


    No failures found in System
    ===========================

    System Temperatures (Celsius):
    ------------------------------
    AMBIENT 19
    CPU 1 40
    CPU 3 40
    =================================

    Front Status Panel:
    -------------------
    Keyswitch position is in On mode.

    System LED Status: POWER GENERAL ERROR ACTIVITY
    [ ON] [OFF] [OFF]
    DISK ERROR THERMAL ERROR POWER SUPPLY ERROR
    [OFF] [OFF] [OFF]

    Fans:
    -----
    Fan Bank Speed Status
    -------- ----- ------
    CPU 49 OK
    PWR 31 OK


    Power Supplies:
    ---------------
    Supply Rating Temp Status
    ------ ------ ---- ------
    0 550 W 33 OK
    1 550 W 29 OK
    2 550 W 31 OK


  3. Re: e450 *FULL* diag?

    Oscar del Rio wrote:

    > Gerald V. Livington II wrote:
    >
    > > Poking around in the documentation I can't figure out what is the
    > > deepest level of diag I can run from boot or the OBP. If Solaris 10
    > > has ways to further probe hardware I'll even install it --- but,
    > > once again, I have no experience with Solaris and would be lost
    > > trying to actually find the commands to run.

    >
    > In Solaris:
    >
    > prtdiag - display system diagnostic information
    > (it might be available if you boot single-user from CD "boot cdrom
    > -s")
    >
    > # /usr/sbin/prtdiag -v
    >
    > System Configuration: Sun Microsystems sun4u Sun Enterprise 450 (2
    > X UltraSPARC-II 400MHz) System clock frequency: 100 MHz
    > Memory size: 1024 Megabytes


    Thanks, grabbing he first CD now to try single-mode boot. I hate
    hammering my T-1 during the day but there don't seem to be any
    ..torrents for the Sparc version DVD. The "Sun download manager" doesn't
    let me directly control the speed of the download like my BitTorrent
    client does (and I don't want to poke around the network to control it
    hrough the NIC). If that doesn't work I'll wait until traffic is down
    tonight and grab the DVD to do a full install.

    Thanks again,

    Gerald

    --



  4. Re: e450 *FULL* diag?

    Gerald V. Livington II wrote:
    > Thanks, grabbing he first CD now to try single-mode boot. I hate
    > hammering my T-1 during the day but there don't seem to be any
    > .torrents for the Sparc version DVD. The "Sun download manager" doesn't
    > let me directly control the speed of the download like my BitTorrent
    > client does (and I don't want to poke around the network to control it
    > hrough the NIC). If that doesn't work I'll wait until traffic is down
    > tonight and grab the DVD to do a full install.


    I just use 'wget -c --limit-rate=' to be the rate I want. With
    -c, it can be killed and restarted to change the rate.

    --
    Darren Dunham ddunham@taos.com
    Senior Technical Consultant TAOS http://www.taos.com/
    Got some Dr Pepper? San Francisco, CA bay area
    < This line left intentionally blank to confuse you. >

  5. Re: e450 *FULL* diag?

    Oscar del Rio wrote:

    > Gerald V. Livington II wrote:
    >
    > > Poking around in the documentation I can't figure out what is the
    > > deepest level of diag I can run from boot or the OBP. If Solaris 10
    > > has ways to further probe hardware I'll even install it --- but,
    > > once again, I have no experience with Solaris and would be lost
    > > trying to actually find the commands to run.

    >
    > In Solaris:
    >
    > prtdiag - display system diagnostic information
    > (it might be available if you boot single-user from CD "boot cdrom
    > -s")
    >
    > # /usr/sbin/prtdiag -v
    >


    OK, prtdiag must be spitting data out to stderr. I can't get it to page
    by piping it through 'more'.

    Ideas?

    Thanks,

    Gerald

    --



  6. Re: e450 *FULL* diag?

    On Aug 3, 4:16 pm, "Gerald V. Livington II"
    wrote:

    > OK, prtdiag must be spitting data out to stderr. I can't get it to page
    > by piping it through 'more'.
    >
    > Ideas?
    >


    bash / sh style shells: prtdiag 2>&1 | more
    csh, I think it's prtdiag |& more


  7. Re: e450 *FULL* diag?

    kschendel wrote:

    > On Aug 3, 4:16 pm, "Gerald V. Livington II"
    > wrote:
    >
    > > OK, prtdiag must be spitting data out to stderr. I can't get it to
    > > page by piping it through 'more'.
    > >
    > > Ideas?
    > >

    >
    > bash / sh style shells: prtdiag 2>&1 | more
    > csh, I think it's prtdiag |& more


    Neither works. I'm pulling the DVD now. I'll grab a spare drive and
    install Solaris so I can try to get the info to redirect to a file.

    I'm thinking prtdiag may not be digging deep enough though. The Solaris
    install shell doesn't light up the "General Error" LED while ANY
    (2.6.x) version of the Linux kernel does. But others I've talked to who
    are running essentially the same hardware don't get the G/E LED.

    --



  8. Re: e450 *FULL* diag?

    Gerald V. Livington II wrote:

    >> # /usr/sbin/prtdiag -v
    >>

    >
    > OK, prtdiag must be spitting data out to stderr. I can't get it to page
    > by piping it through 'more'.


    TERM is probably not defined, or wrong. If you're on the console it
    should be TERM=sun

    TERM=sun; export TERM
    prtdiag -v | more


  9. Re: e450 *FULL* diag?

    Gerald V. Livington II wrote:


    > What I'm interested in is running the fullest hardware diag possible
    > on the box without the OS being loaded because there is one small
    > problem that bothers me. As soon as the Linux kernel takes control
    > the "GENERAL FAULT" LED lights up. Others running the same hardware
    > and OS do not have this issue so I suspect there is actually a
    > hardware problem on the machine that Linux is tagging when it loads
    > it's initial drivers. It seems to run fine but that light bothers
    > me. It's been up about 48 hours now since the last testing reboot.


    I'm burning the Solaris10 DVD as I type this but I now suspect the
    problem is a marginally failing CPU or RAM stick. The problem is
    exacerbated with system heat.

    After leaving the system running, booted to single-user mode on the
    Solaris10 install CD-#1, for several hours yesterday I decided to
    reboot into Linux so my little, non-critical, personal web pages would
    be back on the Internet.

    The reboot failed repeatedly with an "Illegal instruction" error when
    the Linux kernel began loading. I did a full power-cycle and also a
    "reset-all" (just in case the power cycle magically didn't reset
    something ) but it continued to fail.

    So, I shut it down and went to bed. When I woke up an hour or so ago I
    hit the power switch and went to make coffee. When I came back in the
    server room later it is fired up, booted up, and running fine (except
    for that glowing orange Genereal Error LED) again.

    So, when I get a base Solaris10 install done, what is available to
    stress the hardware and provide good reports. Linux is somehow tagging
    the hardware problem as soon as it boots. Solaris isn't noticing it at
    all so far as I can tell. So, I need something that will run under
    Solaris that will dig into all the deep, dark, corners of the machine
    in an attempt to make it really angry.

    Thanks,

    Gerald


    --



  10. Re: e450 *FULL* diag?

    Gerald V. Livington II wrote:
    > Gerald V. Livington II wrote:
    >
    >
    >> What I'm interested in is running the fullest hardware diag possible
    >> on the box without the OS being loaded because there is one small
    >> problem that bothers me. As soon as the Linux kernel takes control
    >> the "GENERAL FAULT" LED lights up. Others running the same hardware
    >> and OS do not have this issue so I suspect there is actually a
    >> hardware problem on the machine that Linux is tagging when it loads
    >> it's initial drivers. It seems to run fine but that light bothers
    >> me. It's been up about 48 hours now since the last testing reboot.

    >
    > I'm burning the Solaris10 DVD as I type this but I now suspect the
    > problem is a marginally failing CPU or RAM stick. The problem is
    > exacerbated with system heat.
    >
    > After leaving the system running, booted to single-user mode on the
    > Solaris10 install CD-#1, for several hours yesterday I decided to
    > reboot into Linux so my little, non-critical, personal web pages would
    > be back on the Internet.
    >
    > The reboot failed repeatedly with an "Illegal instruction" error when
    > the Linux kernel began loading. I did a full power-cycle and also a
    > "reset-all" (just in case the power cycle magically didn't reset
    > something ) but it continued to fail.
    >
    > So, I shut it down and went to bed. When I woke up an hour or so ago I
    > hit the power switch and went to make coffee. When I came back in the
    > server room later it is fired up, booted up, and running fine (except
    > for that glowing orange Genereal Error LED) again.
    >
    > So, when I get a base Solaris10 install done, what is available to
    > stress the hardware and provide good reports. Linux is somehow tagging
    > the hardware problem as soon as it boots. Solaris isn't noticing it at
    > all so far as I can tell. So, I need something that will run under
    > Solaris that will dig into all the deep, dark, corners of the machine
    > in an attempt to make it really angry.
    >
    > Thanks,
    >
    > Gerald
    >
    >

    SunVTS is delivered as extra software on the S10 DVD.
    See http://www.sun.com/oem/products/vts/features.html for details.

    Regards
    Andreas

  11. Re: e450 *FULL* diag?

    Andreas Wacknitz wrote:

    > Gerald V. Livington II wrote:
    > > So, when I get a base Solaris10 install done, what is available to
    > > stress the hardware and provide good reports. Linux is somehow
    > > tagging the hardware problem as soon as it boots. Solaris isn't
    > > noticing it at all so far as I can tell. So, I need something that
    > > will run under Solaris that will dig into all the deep, dark,
    > > corners of the machine in an attempt to make it really angry.
    > >
    > > Thanks,
    > >
    > > Gerald
    > >
    > >

    > SunVTS is delivered as extra software on the S10 DVD.
    > See http://www.sun.com/oem/products/vts/features.html for details.
    >
    > Regards
    > Andreas


    Thank you. I'll be digging for a spare drive and doing the Solaris
    install as soon as I get a free moment. VTS in offline mode looks like
    it will do what I need.

    hmph -- I need more 540-3024 spuds and screws. EBay here I come.

    --



  12. Re: e450 *FULL* diag?

    Gerald V. Livington II wrote:

    > Andreas Wacknitz wrote:
    >
    > > Gerald V. Livington II wrote:
    > > > So, when I get a base Solaris10 install done, what is available to
    > > > stress the hardware and provide good reports. Linux is somehow
    > > > tagging the hardware problem as soon as it boots. Solaris isn't
    > > > noticing it at all so far as I can tell. So, I need something that
    > > > will run under Solaris that will dig into all the deep, dark,
    > > > corners of the machine in an attempt to make it really angry.
    > > >
    > > > Thanks,
    > > >
    > > > Gerald
    > > >
    > > >

    > > SunVTS is delivered as extra software on the S10 DVD.
    > > See http://www.sun.com/oem/products/vts/features.html for details.
    > >
    > > Regards
    > > Andreas

    >
    > Thank you. I'll be digging for a spare drive and doing the Solaris
    > install as soon as I get a free moment. VTS in offline mode looks like
    > it will do what I need.
    >
    > hmph -- I need more 540-3024 spuds and screws. EBay here I come.


    Blech -- the joys of working across different platforms. The DVD burn
    gives me an "Invalid Magic Number" error. CD burns boot/read OK so I've
    started the install with CD1 while downloading the others.

    --



  13. Re: e450 *FULL* diag?

    Gerald V. Livington II wrote:

    > Gerald V. Livington II wrote:
    >
    > > Andreas Wacknitz wrote:
    > >
    > > > Gerald V. Livington II wrote:
    > > > > So, when I get a base Solaris10 install done, what is available
    > > > > to stress the hardware and provide good reports. Linux is
    > > > >
    > > > > Gerald
    > > > >
    > > > >
    > > > SunVTS is delivered as extra software on the S10 DVD.
    > > > See http://www.sun.com/oem/products/vts/features.html for details.
    > > >
    > > > Regards
    > > > Andreas

    > >
    > > Thank you. I'll be digging for a spare drive and doing the Solaris
    > > install as soon as I get a free moment. VTS in offline mode looks
    > > like it will do what I need.
    > >
    > > hmph -- I need more 540-3024 spuds and screws. EBay here I come.

    >
    > Blech -- the joys of working across different platforms. The DVD burn
    > gives me an "Invalid Magic Number" error. CD burns boot/read OK so
    > I've started the install with CD1 while downloading the others.


    Well, I installed Solaris 10. Fired up VTS. Ran it for over a day in a
    loop. No errors were showing up. So, I thought it might be a quirk of
    the Linux kernel that makes the front panel LEDs light or not light
    differently than Solaris.

    I reloaded Linux and let it run. No noticeable problems. Then I was
    poking around in syslog and ... ECC errors showing up. Turns out Linux
    DID find a hardware problem that Solaris wasn't noticing.

    Looked like the numbers reported indicated a problem in the first bank
    so I swapped the RAM around between Bank0 and Bank3. The errors went
    away. So, I assume I have a slowly failing memory stick sitting in the
    last bank (which I won't have enough load to hit for a while). I have a
    new set on order to repopulate that bank and see if the error light
    goes out.

    Current, WORKING, configuration:

    E450
    4GB RAM
    4 x 480MHz CPU
    Linux 2.6.18 kernel (Debian 4.0 distro)
    1 extra SCSI controller running an 8 port backplane
    Sil 3114 (Silicon Image) 4 port SATA 150 controller
    3 x 18G SCA drives formatted using LVM for the base system
    1 x 250G SATA drive as a test unit for the media storage drives
    Intel Pro 1000 MT dual port Gigabit ethernet controller
    Some random ATI graphics card I stuck in when I suspected the Creator3D
    might be the source of the error LED. I work on the machine via ssh 99%
    of the time anyway.

    Running very well. I'll see if the error LED goes out when I swap the
    RAM.

    I have a new question I'll post separately.

    Thanks all for various suggestions.

    Gerald

    --



  14. Re: e450 *FULL* diag?

    "Gerald V. Livington II" wrote in message
    news:1cydnT_p4ap1C07bnZ2dnUVZ_rmjnZ2d@sysmatrix.ne t...
    > Well, I installed Solaris 10. Fired up VTS. Ran it for over a day in a
    > loop. No errors were showing up. So, I thought it might be a quirk of
    > the Linux kernel that makes the front panel LEDs light or not light
    > differently than Solaris.
    >
    > I reloaded Linux and let it run. No noticeable problems. Then I was
    > poking around in syslog and ... ECC errors showing up. Turns out Linux
    > DID find a hardware problem that Solaris wasn't noticing.
    >
    > Looked like the numbers reported indicated a problem in the first bank
    > so I swapped the RAM around between Bank0 and Bank3. The errors went
    > away. So, I assume I have a slowly failing memory stick sitting in the
    > last bank (which I won't have enough load to hit for a while). I have a
    > new set on order to repopulate that bank and see if the error light
    > goes out.


    There are many options in SunVTS for memory testing.
    Most are disabled by default.

    Under Memory there is a sub menu for 'mem' and 'kmem'.
    Both have an ECC Error Monitor that is Disabled by default.
    You can Enable this to catch ECC errors during SunVTS testing.

    Under 'mem' you can set the amount of memory form 0% to 100%.
    You may as well test 100% of what you have.

    Under 'kmem' most of the optional tests are Disabled.
    You can Enable the tests:

    Random Test
    Page Striding Test
    March Test
    Block Copy Test

    Run SunVTS for at least a day.
    I usually run for 72 hours minimum for burning in a system.

    Trinean



  15. Re: e450 *FULL* diag?

    Trinean wrote:

    > "Gerald V. Livington II" wrote in message
    > news:1cydnT_p4ap1C07bnZ2dnUVZ_rmjnZ2d@sysmatrix.ne t...
    > > Well, I installed Solaris 10. Fired up VTS. Ran it for over a day
    > > in a loop. No errors were showing up. So, I thought it might be a
    > > quirk of the Linux kernel that makes the front panel LEDs light or
    > > not light differently than Solaris.
    > >
    > > I reloaded Linux and let it run. No noticeable problems. Then I was
    > > poking around in syslog and ... ECC errors showing up. Turns out
    > > Linux DID find a hardware problem that Solaris wasn't noticing.
    > >
    > > Looked like the numbers reported indicated a problem in the first
    > > bank so I swapped the RAM around between Bank0 and Bank3. The
    > > errors went away. So, I assume I have a slowly failing memory stick
    > > sitting in the last bank (which I won't have enough load to hit for
    > > a while). I have a new set on order to repopulate that bank and see
    > > if the error light goes out.

    >
    > There are many options in SunVTS for memory testing.
    > Most are disabled by default.
    >
    > Under Memory there is a sub menu for 'mem' and 'kmem'.
    > Both have an ECC Error Monitor that is Disabled by default.
    > You can Enable this to catch ECC errors during SunVTS testing.
    >
    > Under 'mem' you can set the amount of memory form 0% to 100%.
    > You may as well test 100% of what you have.
    >
    > Under 'kmem' most of the optional tests are Disabled.
    > You can Enable the tests:
    >
    > Random Test
    > Page Striding Test
    > March Test
    > Block Copy Test
    >
    > Run SunVTS for at least a day.
    > I usually run for 72 hours minimum for burning in a system.
    >
    > Trinean


    What gets me is that under Sol-10 the front panel error light never
    comes on. That may be because I never loaded it hard enough to touch
    the bad RAM even when it was in the first bank.

    As soon as the rest of my drives arrive I'll be doing the RAM swap and
    may fire up Solaris again to see if I can find the MEM test options
    that I missed. I'm still playing with how I want the drives arranged.

    I have four 147GB and one 73GB drive on the way. I have the built in
    controller running the 4 drive backplane and one extra controller
    running an 8 drive backplane.

    Thanks,

    Gerald

    --



  16. Re: e450... may I ask where you're getting the HDs from?

    Gerald V. Livington II wrote:
    > Trinean wrote:
    >
    >
    >>"Gerald V. Livington II" wrote in message
    >>news:1cydnT_p4ap1C07bnZ2dnUVZ_rmjnZ2d@sysmatrix.ne t...
    >>
    >>>Well, I installed Solaris 10. Fired up VTS. Ran it for over a day
    >>>in a loop. No errors were showing up. So, I thought it might be a
    >>>quirk of the Linux kernel that makes the front panel LEDs light or
    >>>not light differently than Solaris.
    >>>
    >>>I reloaded Linux and let it run. No noticeable problems. Then I was
    >>>poking around in syslog and ... ECC errors showing up. Turns out
    >>>Linux DID find a hardware problem that Solaris wasn't noticing.
    >>>
    >>>Looked like the numbers reported indicated a problem in the first
    >>>bank so I swapped the RAM around between Bank0 and Bank3. The
    >>>errors went away. So, I assume I have a slowly failing memory stick
    >>>sitting in the last bank (which I won't have enough load to hit for
    >>>a while). I have a new set on order to repopulate that bank and see
    >>>if the error light goes out.

    >>
    >>There are many options in SunVTS for memory testing.
    >>Most are disabled by default.
    >>
    >>Under Memory there is a sub menu for 'mem' and 'kmem'.
    >>Both have an ECC Error Monitor that is Disabled by default.
    >>You can Enable this to catch ECC errors during SunVTS testing.
    >>
    >>Under 'mem' you can set the amount of memory form 0% to 100%.
    >>You may as well test 100% of what you have.
    >>
    >>Under 'kmem' most of the optional tests are Disabled.
    >>You can Enable the tests:
    >>
    >>Random Test
    >>Page Striding Test
    >>March Test
    >>Block Copy Test
    >>
    >>Run SunVTS for at least a day.
    >>I usually run for 72 hours minimum for burning in a system.
    >>
    >>Trinean

    >
    >
    > What gets me is that under Sol-10 the front panel error light never
    > comes on. That may be because I never loaded it hard enough to touch
    > the bad RAM even when it was in the first bank.
    >
    > As soon as the rest of my drives arrive I'll be doing the RAM swap and
    > may fire up Solaris again to see if I can find the MEM test options
    > that I missed. I'm still playing with how I want the drives arranged.
    >
    > I have four 147GB and one 73GB drive on the way. I have the built in
    > controller running the 4 drive backplane and one extra controller
    > running an 8 drive backplane.
    >
    > Thanks,
    >
    > Gerald
    >


    Hi, I have an E250 running Sol 10. It is 100% functional, but I would
    like to buy some more large hard drives for it if I can get them cheap
    enough. May I inquire as to where and for how much $$ you got your 147gb
    drives? Four of those babies would be real nice in my server.

    Thanks, Max.

  17. Re: e450... may I ask where you're getting the HDs from?

    maxodyne wrote:

    > Gerald V. Livington II wrote:


    > > What gets me is that under Sol-10 the front panel error light never
    > > comes on. That may be because I never loaded it hard enough to touch
    > > the bad RAM even when it was in the first bank.
    > >
    > > As soon as the rest of my drives arrive I'll be doing the RAM swap
    > > and may fire up Solaris again to see if I can find the MEM test
    > > options that I missed. I'm still playing with how I want the drives
    > > arranged.
    > >
    > > I have four 147GB and one 73GB drive on the way. I have the built in
    > > controller running the 4 drive backplane and one extra controller
    > > running an 8 drive backplane.
    > >
    > > Thanks,
    > >
    > > Gerald
    > >

    >
    > Hi, I have an E250 running Sol 10. It is 100% functional, but I would
    > like to buy some more large hard drives for it if I can get them
    > cheap enough. May I inquire as to where and for how much $$ you got
    > your 147gb drives? Four of those babies would be real nice in my
    > server.
    >
    > Thanks, Max.


    Sorry for the delay. Lots going on so I'm not frequenting UseNet.

    Ebay. http://stores.ebay.com/The-Best-Deals-Anywhere <--- I've bought
    all mine at this dealer. Price depends on the other bidders. Shipping
    is always $15.99 with no combined shipping discount though if you order
    multiples you can contact them by email to get the ORDERS combined into
    a single payment and they will box them in a single package. I paid
    under $60 each total with shipping.

    They are refurb Fujitsu drives that come back showing up as "ModusLnk"
    drives. Under Linux the 73GB drive shows as "ModusLnk MXJ3073SC800600X
    5704" and the 147GB drives show *ONLY* "ModusLnk" with no serial/model
    number. I need to play with that a bit as it's giving me fits
    identifying the drives. Hot-swap is twitchy because drive
    identification is VERY slot specific when they show no identifying
    serial numbers for udev to pick up on.

    I can't find any info on ModusLnk. Full drive refurb right down to the
    firmware though.


    Also just try this search
    http://search.ebay.com/search/search...m37&satitle=14
    7*+10K+scsi

    Gerald



    --


+ Reply to Thread