pSeries machine "HealthCheck" - Aix

This is a discussion on pSeries machine "HealthCheck" - Aix ; hello all, we are currently working on a generic "healthcheck" for our pseries machines. so far, we've come up with a sort of checklist to go through, for any given pseries machine, that will give us a general idea of ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: pSeries machine "HealthCheck"

  1. pSeries machine "HealthCheck"

    hello all,
    we are currently working on a generic "healthcheck" for our pseries
    machines. so far, we've come up with a sort of checklist to go
    through, for any given pseries machine, that will give us a general
    idea of a systems' health and general status... just to give us a
    general idea of what all resources the machine has, and how theyre
    performing. i've copied & pasted most of the checklist below, and what
    i'm looking for now is just some general ideas/opinions about this
    checklist, as well as what anybody thinks is missing, or what sort of
    routine other people use for this sort of thing.
    any feedback at all would be most appreciated.

    Thanks in advance, and hav a great wekend!!!

    Sincerely,
    -p
    BTW... i know that i havent included any of the stuff from the
    performance toolbox (perf.tools)... that stuff will be implemented in
    the next iteration, and i'm also looking for opinions as to which of
    the tools in perf.tools are the most useful for this purpose.
    Thanks again!
    -p

    HEALTHCHECK CHECKLIST:
    Health Check workflow & script logic to capture:
    • Customer header information
    • Current system variables
    • Hardware configuration
    • Filesystems, VG’s & LV’s
    • Security logs, processes and network
    • Performance metrics

    Customer header capture
    Customer Name
    # echo “ C--- N--- “
    Date
    # date + “%A, %T”
    Name of host
    # hostname
    Canonical, real name
    # uname –n

    ////////////////// LETS BEGIN WITH SOME ‘DIAG’ CMDS… THESE WILL
    CHECK ALL SYSTEM RESOURCES FOR PROBLEMS, RUNNING A BUNCH OF HW TESTS,
    AND IF ANYTHING TURNS UP, IT WILL TELL THE USER (STDOUT), AND GENERATE
    THE APPROPRIATE ERROR REPORT ////////////////

    #diag sys0
    #diag ent0
    etc…
    --or—
    #smitty diag * entire system



    • Current system variables
    Machine model type & ID
    # lsattr –El sys0 –a modelname –a systemid

    Machine ID & Model Name
    # uname –m
    # uname -M

    AIX release & version
    # uname –r
    # uname –v
    # oslevel –s [newer AIX only]
    # oslevel –r [If needed report to level]
    # oslevel –rl [5300-0x]
    Use IBM web tool SUMA to resolve & provide update files <


    ///////////////LETS CHECK THE MICROCODE LEVELS HERE////////////////

    // for system:
    #lscfg –vp | grep alterable

    // lets check m-code levels for critical HW devices too:
    // for Ethernet interface 0:
    #lscfg –l ent0 –v

    // for cd-rom:
    #lscfg –l cd0 –v

    // now repeat for all other dev’s:
    #lscfg –l rmt0 –v
    etc…



    Number of licensed users
    # lslicense
    Max per/user process
    # lsattr –El sys0 –a maxuproc

    Paging space, size
    # lsps -a

    SET Shell System Variables
    # env
    # set
    # export
    1-

    //////////// Lets check that all currently installed software is
    correctly entered in the Software Vital Product Database (SWVPD). To
    verify that all filesets have all required requisites and are
    completely installed:

    #lppchk –v




    • Hardware configuration Use script for multi-processor systems

    > List of processors

    # lsdev –Cc processor
    Processor speed
    # lsattr –El proc0 –a frequency – per CPU
    L2 cache
    # lsattr –El L2cache0
    Memory
    # lsattr –El mem0 – per DIMM

    > Adapters – long list, look at scsi bus structure

    # lsdev –Cc adapter

    > Disks

    # lsdev –Cc disk
    # lsattr –El disk0 – per disk

    Check bootable disks
    # ipl_varyon –i | head

    Check bootlist(s)
    # bootlist –m normal –o
    # bootlist –m service -o

    RAID configuration To test run script

    > Tape devices

    # lsdev –Cc tape
    # lsattr –El rmt0

    Full system configuration details - extracted

    //Lets look for virtual adapters… their names & status:
    #lsdev –type adapter –virtual –field name status

    //Lets look for disks… their names & locations:

    //Lets check out our installed memory:
    #lsdev –CHc memory

    //Check out installed Licensed Program Products (LPP’s):
    #lslpp –lc
    #lslpp –La

    //Lets check parameters for AIX OS (stuff like autorestart, maxuproc):
    #lssattr –El sys0

    //Lets check for # of licensed users:
    #smitty chlicense


    #prtconf

    #lscfg -v

    Check hardware error log – visual check 1st
    # errpt –s
    # errpt –a
    # errclear 0

    Check root mail for hardware reports
    # mail

    Physical hardware & visual connection check
    2-

    • Filesystems, VG’s & LV’s
    > List all volume groups

    # lsvg
    # lsvg vg_name
    Physical view
    # lsvg –p physical
    Logical view
    # lsvg –l shows LV members
    -Y/N Rootvg mirrored

    Other VG Mirroring config and sync
    Free and used PP’s, by VG

    > List all file systems

    # lsfs
    # cat /etc/filesystems
    look for log volume location(s)

    Check each FS and report % of space used
    # df -k
    # du

    Check disk(s) view of use:
    # lspv –l
    # lspv –p | -M

    Check future use and config of:
    # cd /var; ls -l
    # cd /tmp; ls –l

    # cat /etc/passwd
    Review user files /etc/passwd
    - Y/N Is password ageing implemented ?

    Get filesize limits From:
    # cat /etc/security/limits

    Get random write threshold
    # ioo –L maxrandwrt
    3-
    Security logs, process and network
    # cat /etc/security/login.cfg

    /etc/security directory view all other appropriate files; passwd, user,
    group, limits, environ, login.cfg

    > Log-File(s) Check:

    • Last login log – from /var/adm/wtmp
    # last | head
    # who /var/adm/wtmp
    • Failed log in attempts
    # who /etc/security/failedlogin
    • Users logged in
    # who – from /etc/utmp
    • Root privilege su-use
    # cat /var/adm/sulog
    • Root log in access
    # cat /etc/security/user
    # cat /etc/security/lastlog

     Check /etc/inetd.conf for un-needed BSD r-services
    # cat /etc/inetd.conf
    .. Check for remote log-in password bypass
    # find / -name hosts.equiv - print
    # find / -name .rhosts - print
    • Check root mail for security reports
    # mail
    • Check & report root cron tab entries
    # cd /var/spool/cron/crontabs; cat root
    - Y/N is /usr/sbin/skulker enabled

    Processes & daemons
    > System Resource Controller – master daemon

    • Report all active subsystems
    #lssrc –a
    - Y/N Disable sendmail ?

    > Process profile

    • Defunct processes
    # ps –ef | grep defunct
    # ps –elf
    # ps -gv
    # ps aux
    # topas ?
    4-

    Network access support files
    # cat /etc/hosts

    Host network interface(s) configuration
    # ifconfig –a

    Check hosts known routes
    # netstat -r
    # aarp -a

    • Performance metrics [expand ]
    Reboot history
    # last reboot

    Virtual memory
    # vmstat 2 20

    Paging configuration lsps Script to test during activity > save file
    # lsps –a

    System dump configuration
    # sysdumpdev –l
    Y/N to dumpcheck script

    Disk use sec x times
    # iostat 2 20
    - mirroring
    - load balancing

    Network metrics
    #netstat


  2. Re: pSeries machine "HealthCheck"

    Would it be better just to type snap -gc command?
    Of course there is some extra information in the output may be not
    much needed.

    p595pimp wrote:
    > hello all,
    > we are currently working on a generic "healthcheck" for our pseries
    > machines. so far, we've come up with a sort of checklist to go
    > through, for any given pseries machine, that will give us a general
    > idea of a systems' health and general status... just to give us a
    > general idea of what all resources the machine has, and how theyre
    > performing. i've copied & pasted most of the checklist below, and what
    > i'm looking for now is just some general ideas/opinions about this
    > checklist, as well as what anybody thinks is missing, or what sort of
    > routine other people use for this sort of thing.
    > any feedback at all would be most appreciated.
    >
    > Thanks in advance, and hav a great wekend!!!
    >
    > Sincerely,
    > -p
    > BTW... i know that i havent included any of the stuff from the
    > performance toolbox (perf.tools)... that stuff will be implemented in
    > the next iteration, and i'm also looking for opinions as to which of
    > the tools in perf.tools are the most useful for this purpose.
    > Thanks again!
    > -p
    >
    > HEALTHCHECK CHECKLIST:
    > Health Check workflow & script logic to capture:
    > • Customer header information
    > • Current system variables
    > • Hardware configuration
    > • Filesystems, VG’s & LV’s
    > • Security logs, processes and network
    > • Performance metrics
    >
    > Customer header capture
    > Customer Name
    > # echo “ C--- N--- “
    > Date
    > # date + “%A, %T”
    > Name of host
    > # hostname
    > Canonical, real name
    > # uname –n
    >
    > ////////////////// LETS BEGIN WITH SOME ‘DIAG’ CMDS… THESE WILL
    > CHECK ALL SYSTEM RESOURCES FOR PROBLEMS, RUNNING A BUNCH OF HW TESTS,
    > AND IF ANYTHING TURNS UP, IT WILL TELL THE USER (STDOUT), AND GENERATE
    > THE APPROPRIATE ERROR REPORT ////////////////
    >
    > #diag sys0
    > #diag ent0
    > etc…
    > --or—
    > #smitty diag * entire system
    >
    >
    >
    > • Current system variables
    > Machine model type & ID
    > # lsattr –El sys0 –a modelname –a systemid
    >
    > Machine ID & Model Name
    > # uname –m
    > # uname -M
    >
    > AIX release & version
    > # uname –r
    > # uname –v
    > # oslevel –s [newer AIX only]
    > # oslevel –r [If needed report to level]
    > # oslevel –rl [5300-0x]
    > Use IBM web tool SUMA to resolve & provide update files <
    >
    >
    > ///////////////LETS CHECK THE MICROCODE LEVELS HERE////////////////
    >
    > // for system:
    > #lscfg –vp | grep alterable
    >
    > // lets check m-code levels for critical HW devices too:
    > // for Ethernet interface 0:
    > #lscfg –l ent0 –v
    >
    > // for cd-rom:
    > #lscfg –l cd0 –v
    >
    > // now repeat for all other dev’s:
    > #lscfg –l rmt0 –v
    > etc…
    >
    >
    >
    > Number of licensed users
    > # lslicense
    > Max per/user process
    > # lsattr –El sys0 –a maxuproc
    >
    > Paging space, size
    > # lsps -a
    >
    > SET Shell System Variables
    > # env
    > # set
    > # export
    > 1-
    >
    > //////////// Lets check that all currently installed software is
    > correctly entered in the Software Vital Product Database (SWVPD). To
    > verify that all filesets have all required requisites and are
    > completely installed:
    >
    > #lppchk –v
    >
    >
    >
    >
    > • Hardware configuration Use script for multi-processor systems
    >
    > > List of processors

    > # lsdev –Cc processor
    > Processor speed
    > # lsattr –El proc0 –a frequency – per CPU
    > L2 cache
    > # lsattr –El L2cache0
    > Memory
    > # lsattr –El mem0 – per DIMM
    >
    > > Adapters – long list, look at scsi bus structure

    > # lsdev –Cc adapter
    >
    > > Disks

    > # lsdev –Cc disk
    > # lsattr –El disk0 – per disk
    >
    > Check bootable disks
    > # ipl_varyon –i | head
    >
    > Check bootlist(s)
    > # bootlist –m normal –o
    > # bootlist –m service -o
    >
    > RAID configuration To test run script
    >
    > > Tape devices

    > # lsdev –Cc tape
    > # lsattr –El rmt0
    >
    > Full system configuration details - extracted
    >
    > //Lets look for virtual adapters… their names & status:
    > #lsdev –type adapter –virtual –field name status
    >
    > //Lets look for disks… their names & locations:
    >
    > //Lets check out our installed memory:
    > #lsdev –CHc memory
    >
    > //Check out installed Licensed Program Products (LPP’s):
    > #lslpp –lc
    > #lslpp –La
    >
    > //Lets check parameters for AIX OS (stuff like autorestart, maxuproc):
    > #lssattr –El sys0
    >
    > //Lets check for # of licensed users:
    > #smitty chlicense
    >
    >
    > #prtconf
    >
    > #lscfg -v
    >
    > Check hardware error log – visual check 1st
    > # errpt –s
    > # errpt –a
    > # errclear 0
    >
    > Check root mail for hardware reports
    > # mail
    >
    > Physical hardware & visual connection check
    > 2-
    >
    > • Filesystems, VG’s & LV’s
    > > List all volume groups

    > # lsvg
    > # lsvg vg_name
    > Physical view
    > # lsvg –p physical
    > Logical view
    > # lsvg –l shows LV members
    > -Y/N Rootvg mirrored
    >
    > Other VG Mirroring config and sync
    > Free and used PP’s, by VG
    >
    > > List all file systems

    > # lsfs
    > # cat /etc/filesystems
    > look for log volume location(s)
    >
    > Check each FS and report % of space used
    > # df -k
    > # du
    >
    > Check disk(s) view of use:
    > # lspv –l
    > # lspv –p | -M
    >
    > Check future use and config of:
    > # cd /var; ls -l
    > # cd /tmp; ls –l
    >
    > # cat /etc/passwd
    > Review user files /etc/passwd
    > - Y/N Is password ageing implemented ?
    >
    > Get filesize limits From:
    > # cat /etc/security/limits
    >
    > Get random write threshold
    > # ioo –L maxrandwrt
    > 3-
    > Security logs, process and network
    > # cat /etc/security/login.cfg
    >
    > /etc/security directory view all other appropriate files; passwd, user,
    > group, limits, environ, login.cfg
    >
    > > Log-File(s) Check:

    > • Last login log – from /var/adm/wtmp
    > # last | head
    > # who /var/adm/wtmp
    > • Failed log in attempts
    > # who /etc/security/failedlogin
    > • Users logged in
    > # who – from /etc/utmp
    > • Root privilege su-use
    > # cat /var/adm/sulog
    > • Root log in access
    > # cat /etc/security/user
    > # cat /etc/security/lastlog
    >
    >  Check /etc/inetd.conf for un-needed BSD r-services
    > # cat /etc/inetd.conf
    > . Check for remote log-in password bypass
    > # find / -name hosts.equiv - print
    > # find / -name .rhosts - print
    > • Check root mail for security reports
    > # mail
    > • Check & report root cron tab entries
    > # cd /var/spool/cron/crontabs; cat root
    > - Y/N is /usr/sbin/skulker enabled
    >
    > Processes & daemons
    > > System Resource Controller – master daemon

    > • Report all active subsystems
    > #lssrc –a
    > - Y/N Disable sendmail ?
    >
    > > Process profile

    > • Defunct processes
    > # ps –ef | grep defunct
    > # ps –elf
    > # ps -gv
    > # ps aux
    > # topas ?
    > 4-
    >
    > Network access support files
    > # cat /etc/hosts
    >
    > Host network interface(s) configuration
    > # ifconfig –a
    >
    > Check hosts known routes
    > # netstat -r
    > # aarp -a
    >
    > • Performance metrics [expand ]
    > Reboot history
    > # last reboot
    >
    > Virtual memory
    > # vmstat 2 20
    >
    > Paging configuration lsps Script to test during activity > save file
    > # lsps –a
    >
    > System dump configuration
    > # sysdumpdev –l
    > Y/N to dumpcheck script
    >
    > Disk use sec x times
    > # iostat 2 20
    > - mirroring
    > - load balancing
    >
    > Network metrics
    > #netstat



  3. Re: pSeries machine "HealthCheck"

    You could use a utility like cfg2html for the non performance data ....
    http://tech.groups.yahoo.com/group/cfg2html/
    This is handy in that you can browse the info from a web browser if you
    install a web server (i.e. Apache). You could also run this from cron,
    create date appended files so you can track changes.

    I would use NMON to capture performance metrics ...
    http://www-128.ibm.com/developerwork...u-analyze_aix/


    A.Teterkin wrote:
    > Would it be better just to type snap -gc command?
    > Of course there is some extra information in the output may be not
    > much needed.
    >
    > p595pimp wrote:
    > > hello all,
    > > we are currently working on a generic "healthcheck" for our pseries
    > > machines. so far, we've come up with a sort of checklist to go
    > > through, for any given pseries machine, that will give us a general
    > > idea of a systems' health and general status... just to give us a
    > > general idea of what all resources the machine has, and how theyre
    > > performing. i've copied & pasted most of the checklist below, and what
    > > i'm looking for now is just some general ideas/opinions about this
    > > checklist, as well as what anybody thinks is missing, or what sort of
    > > routine other people use for this sort of thing.
    > > any feedback at all would be most appreciated.
    > >
    > > Thanks in advance, and hav a great wekend!!!
    > >
    > > Sincerely,
    > > -p
    > > BTW... i know that i havent included any of the stuff from the
    > > performance toolbox (perf.tools)... that stuff will be implemented in
    > > the next iteration, and i'm also looking for opinions as to which of
    > > the tools in perf.tools are the most useful for this purpose.
    > > Thanks again!
    > > -p
    > >
    > > HEALTHCHECK CHECKLIST:
    > > Health Check workflow & script logic to capture:
    > > • Customer header information
    > > • Current system variables
    > > • Hardware configuration
    > > • Filesystems, VG’s & LV’s
    > > • Security logs, processes and network
    > > • Performance metrics
    > >
    > > Customer header capture
    > > Customer Name
    > > # echo “ C--- N--- “
    > > Date
    > > # date + “%A, %T”
    > > Name of host
    > > # hostname
    > > Canonical, real name
    > > # uname –n
    > >
    > > ////////////////// LETS BEGIN WITH SOME ‘DIAG’ CMDS… THESE WILL
    > > CHECK ALL SYSTEM RESOURCES FOR PROBLEMS, RUNNING A BUNCH OF HW TESTS,
    > > AND IF ANYTHING TURNS UP, IT WILL TELL THE USER (STDOUT), AND GENERATE
    > > THE APPROPRIATE ERROR REPORT ////////////////
    > >
    > > #diag sys0
    > > #diag ent0
    > > etc…
    > > --or—
    > > #smitty diag * entire system
    > >
    > >
    > >
    > > • Current system variables
    > > Machine model type & ID
    > > # lsattr –El sys0 –a modelname –a systemid
    > >
    > > Machine ID & Model Name
    > > # uname –m
    > > # uname -M
    > >
    > > AIX release & version
    > > # uname –r
    > > # uname –v
    > > # oslevel –s [newer AIX only]
    > > # oslevel –r [If needed report to level]
    > > # oslevel –rl [5300-0x]
    > > Use IBM web tool SUMA to resolve & provide update files <
    > >
    > >
    > > ///////////////LETS CHECK THE MICROCODE LEVELS HERE////////////////
    > >
    > > // for system:
    > > #lscfg –vp | grep alterable
    > >
    > > // lets check m-code levels for critical HW devices too:
    > > // for Ethernet interface 0:
    > > #lscfg –l ent0 –v
    > >
    > > // for cd-rom:
    > > #lscfg –l cd0 –v
    > >
    > > // now repeat for all other dev’s:
    > > #lscfg –l rmt0 –v
    > > etc…
    > >
    > >
    > >
    > > Number of licensed users
    > > # lslicense
    > > Max per/user process
    > > # lsattr –El sys0 –a maxuproc
    > >
    > > Paging space, size
    > > # lsps -a
    > >
    > > SET Shell System Variables
    > > # env
    > > # set
    > > # export
    > > 1-
    > >
    > > //////////// Lets check that all currently installed software is
    > > correctly entered in the Software Vital Product Database (SWVPD). To
    > > verify that all filesets have all required requisites and are
    > > completely installed:
    > >
    > > #lppchk –v
    > >
    > >
    > >
    > >
    > > • Hardware configuration Use script for multi-processor systems
    > >
    > > > List of processors

    > > # lsdev –Cc processor
    > > Processor speed
    > > # lsattr –El proc0 –a frequency – per CPU
    > > L2 cache
    > > # lsattr –El L2cache0
    > > Memory
    > > # lsattr –El mem0 – per DIMM
    > >
    > > > Adapters – long list, look at scsi bus structure

    > > # lsdev –Cc adapter
    > >
    > > > Disks

    > > # lsdev –Cc disk
    > > # lsattr –El disk0 – per disk
    > >
    > > Check bootable disks
    > > # ipl_varyon –i | head
    > >
    > > Check bootlist(s)
    > > # bootlist –m normal –o
    > > # bootlist –m service -o
    > >
    > > RAID configuration To test run script
    > >
    > > > Tape devices

    > > # lsdev –Cc tape
    > > # lsattr –El rmt0
    > >
    > > Full system configuration details - extracted
    > >
    > > //Lets look for virtual adapters… their names & status:
    > > #lsdev –type adapter –virtual –field name status
    > >
    > > //Lets look for disks… their names & locations:
    > >
    > > //Lets check out our installed memory:
    > > #lsdev –CHc memory
    > >
    > > //Check out installed Licensed Program Products (LPP’s):
    > > #lslpp –lc
    > > #lslpp –La
    > >
    > > //Lets check parameters for AIX OS (stuff like autorestart, maxuproc):
    > > #lssattr –El sys0
    > >
    > > //Lets check for # of licensed users:
    > > #smitty chlicense
    > >
    > >
    > > #prtconf
    > >
    > > #lscfg -v
    > >
    > > Check hardware error log – visual check 1st
    > > # errpt –s
    > > # errpt –a
    > > # errclear 0
    > >
    > > Check root mail for hardware reports
    > > # mail
    > >
    > > Physical hardware & visual connection check
    > > 2-
    > >
    > > • Filesystems, VG’s & LV’s
    > > > List all volume groups

    > > # lsvg
    > > # lsvg vg_name
    > > Physical view
    > > # lsvg –p physical
    > > Logical view
    > > # lsvg –l shows LV members
    > > -Y/N Rootvg mirrored
    > >
    > > Other VG Mirroring config and sync
    > > Free and used PP’s, by VG
    > >
    > > > List all file systems

    > > # lsfs
    > > # cat /etc/filesystems
    > > look for log volume location(s)
    > >
    > > Check each FS and report % of space used
    > > # df -k
    > > # du
    > >
    > > Check disk(s) view of use:
    > > # lspv –l
    > > # lspv –p | -M
    > >
    > > Check future use and config of:
    > > # cd /var; ls -l
    > > # cd /tmp; ls –l
    > >
    > > # cat /etc/passwd
    > > Review user files /etc/passwd
    > > - Y/N Is password ageing implemented ?
    > >
    > > Get filesize limits From:
    > > # cat /etc/security/limits
    > >
    > > Get random write threshold
    > > # ioo –L maxrandwrt
    > > 3-
    > > Security logs, process and network
    > > # cat /etc/security/login.cfg
    > >
    > > /etc/security directory view all other appropriate files; passwd, user,
    > > group, limits, environ, login.cfg
    > >
    > > > Log-File(s) Check:

    > > • Last login log – from /var/adm/wtmp
    > > # last | head
    > > # who /var/adm/wtmp
    > > • Failed log in attempts
    > > # who /etc/security/failedlogin
    > > • Users logged in
    > > # who – from /etc/utmp
    > > • Root privilege su-use
    > > # cat /var/adm/sulog
    > > • Root log in access
    > > # cat /etc/security/user
    > > # cat /etc/security/lastlog
    > >
    > >  Check /etc/inetd.conf for un-needed BSD r-services
    > > # cat /etc/inetd.conf
    > > . Check for remote log-in password bypass
    > > # find / -name hosts.equiv - print
    > > # find / -name .rhosts - print
    > > • Check root mail for security reports
    > > # mail
    > > • Check & report root cron tab entries
    > > # cd /var/spool/cron/crontabs; cat root
    > > - Y/N is /usr/sbin/skulker enabled
    > >
    > > Processes & daemons
    > > > System Resource Controller – master daemon

    > > • Report all active subsystems
    > > #lssrc –a
    > > - Y/N Disable sendmail ?
    > >
    > > > Process profile

    > > • Defunct processes
    > > # ps –ef | grep defunct
    > > # ps –elf
    > > # ps -gv
    > > # ps aux
    > > # topas ?
    > > 4-
    > >
    > > Network access support files
    > > # cat /etc/hosts
    > >
    > > Host network interface(s) configuration
    > > # ifconfig –a
    > >
    > > Check hosts known routes
    > > # netstat -r
    > > # aarp -a
    > >
    > > • Performance metrics [expand ]
    > > Reboot history
    > > # last reboot
    > >
    > > Virtual memory
    > > # vmstat 2 20
    > >
    > > Paging configuration lsps Script to test during activity > save file
    > > # lsps –a
    > >
    > > System dump configuration
    > > # sysdumpdev –l
    > > Y/N to dumpcheck script
    > >
    > > Disk use sec x times
    > > # iostat 2 20
    > > - mirroring
    > > - load balancing
    > >
    > > Network metrics
    > > #netstat



  4. Re: pSeries machine "HealthCheck"

    okay, snap & nmon... I was going to include nmon for sure anyway... but
    what about stuff from perf.tools?

    Anybody have any opinions about good stuff from the Performance Toolbox
    that would come in handy in a generic system healthcheck script?


    Thanks!
    -p





    unixsystems@gmail.com wrote:
    > You could use a utility like cfg2html for the non performance data ....
    > http://tech.groups.yahoo.com/group/cfg2html/
    > This is handy in that you can browse the info from a web browser if you
    > install a web server (i.e. Apache). You could also run this from cron,
    > create date appended files so you can track changes.
    >
    > I would use NMON to capture performance metrics ...
    > http://www-128.ibm.com/developerwork...u-analyze_aix/
    >
    >
    > A.Teterkin wrote:
    > > Would it be better just to type snap -gc command?
    > > Of course there is some extra information in the output may be not
    > > much needed.
    > >
    > > p595pimp wrote:
    > > > hello all,
    > > > we are currently working on a generic "healthcheck" for our pseries
    > > > machines. so far, we've come up with a sort of checklist to go
    > > > through, for any given pseries machine, that will give us a general
    > > > idea of a systems' health and general status... just to give us a
    > > > general idea of what all resources the machine has, and how theyre
    > > > performing. i've copied & pasted most of the checklist below, and what
    > > > i'm looking for now is just some general ideas/opinions about this
    > > > checklist, as well as what anybody thinks is missing, or what sort of
    > > > routine other people use for this sort of thing.
    > > > any feedback at all would be most appreciated.
    > > >
    > > > Thanks in advance, and hav a great wekend!!!
    > > >
    > > > Sincerely,
    > > > -p
    > > > BTW... i know that i havent included any of the stuff from the
    > > > performance toolbox (perf.tools)... that stuff will be implemented in
    > > > the next iteration, and i'm also looking for opinions as to which of
    > > > the tools in perf.tools are the most useful for this purpose.
    > > > Thanks again!
    > > > -p
    > > >
    > > > HEALTHCHECK CHECKLIST:
    > > > Health Check workflow & script logic to capture:
    > > > • Customer header information
    > > > • Current system variables
    > > > • Hardware configuration
    > > > • Filesystems, VG’s & LV’s
    > > > • Security logs, processes and network
    > > > • Performance metrics
    > > >
    > > > Customer header capture
    > > > Customer Name
    > > > # echo “ C--- N--- “
    > > > Date
    > > > # date + “%A, %T”
    > > > Name of host
    > > > # hostname
    > > > Canonical, real name
    > > > # uname –n
    > > >
    > > > ////////////////// LETS BEGIN WITH SOME ‘DIAG’ CMDS… THESE WILL
    > > > CHECK ALL SYSTEM RESOURCES FOR PROBLEMS, RUNNING A BUNCH OF HW TESTS,
    > > > AND IF ANYTHING TURNS UP, IT WILL TELL THE USER (STDOUT), AND GENERATE
    > > > THE APPROPRIATE ERROR REPORT ////////////////
    > > >
    > > > #diag sys0
    > > > #diag ent0
    > > > etc…
    > > > --or—
    > > > #smitty diag * entire system
    > > >
    > > >
    > > >
    > > > • Current system variables
    > > > Machine model type & ID
    > > > # lsattr –El sys0 –a modelname –a systemid
    > > >
    > > > Machine ID & Model Name
    > > > # uname –m
    > > > # uname -M
    > > >
    > > > AIX release & version
    > > > # uname –r
    > > > # uname –v
    > > > # oslevel –s [newer AIX only]
    > > > # oslevel –r [If needed report to level]
    > > > # oslevel –rl [5300-0x]
    > > > Use IBM web tool SUMA to resolve & provide update files <
    > > >
    > > >
    > > > ///////////////LETS CHECK THE MICROCODE LEVELS HERE////////////////
    > > >
    > > > // for system:
    > > > #lscfg –vp | grep alterable
    > > >
    > > > // lets check m-code levels for critical HW devices too:
    > > > // for Ethernet interface 0:
    > > > #lscfg –l ent0 –v
    > > >
    > > > // for cd-rom:
    > > > #lscfg –l cd0 –v
    > > >
    > > > // now repeat for all other dev’s:
    > > > #lscfg –l rmt0 –v
    > > > etc…
    > > >
    > > >
    > > >
    > > > Number of licensed users
    > > > # lslicense
    > > > Max per/user process
    > > > # lsattr –El sys0 –a maxuproc
    > > >
    > > > Paging space, size
    > > > # lsps -a
    > > >
    > > > SET Shell System Variables
    > > > # env
    > > > # set
    > > > # export
    > > > 1-
    > > >
    > > > //////////// Lets check that all currently installed software is
    > > > correctly entered in the Software Vital Product Database (SWVPD). To
    > > > verify that all filesets have all required requisites and are
    > > > completely installed:
    > > >
    > > > #lppchk –v
    > > >
    > > >
    > > >
    > > >
    > > > • Hardware configuration Use script for multi-processor systems
    > > >
    > > > > List of processors
    > > > # lsdev –Cc processor
    > > > Processor speed
    > > > # lsattr –El proc0 –a frequency – per CPU
    > > > L2 cache
    > > > # lsattr –El L2cache0
    > > > Memory
    > > > # lsattr –El mem0 – per DIMM
    > > >
    > > > > Adapters – long list, look at scsi bus structure
    > > > # lsdev –Cc adapter
    > > >
    > > > > Disks
    > > > # lsdev –Cc disk
    > > > # lsattr –El disk0 – per disk
    > > >
    > > > Check bootable disks
    > > > # ipl_varyon –i | head
    > > >
    > > > Check bootlist(s)
    > > > # bootlist –m normal –o
    > > > # bootlist –m service -o
    > > >
    > > > RAID configuration To test run script
    > > >
    > > > > Tape devices
    > > > # lsdev –Cc tape
    > > > # lsattr –El rmt0
    > > >
    > > > Full system configuration details - extracted
    > > >
    > > > //Lets look for virtual adapters… their names & status:
    > > > #lsdev –type adapter –virtual –field name status
    > > >
    > > > //Lets look for disks… their names & locations:
    > > >
    > > > //Lets check out our installed memory:
    > > > #lsdev –CHc memory
    > > >
    > > > //Check out installed Licensed Program Products (LPP’s):
    > > > #lslpp –lc
    > > > #lslpp –La
    > > >
    > > > //Lets check parameters for AIX OS (stuff like autorestart, maxuproc):
    > > > #lssattr –El sys0
    > > >
    > > > //Lets check for # of licensed users:
    > > > #smitty chlicense
    > > >
    > > >
    > > > #prtconf
    > > >
    > > > #lscfg -v
    > > >
    > > > Check hardware error log – visual check 1st
    > > > # errpt –s
    > > > # errpt –a
    > > > # errclear 0
    > > >
    > > > Check root mail for hardware reports
    > > > # mail
    > > >
    > > > Physical hardware & visual connection check
    > > > 2-
    > > >
    > > > • Filesystems, VG’s & LV’s
    > > > > List all volume groups
    > > > # lsvg
    > > > # lsvg vg_name
    > > > Physical view
    > > > # lsvg –p physical
    > > > Logical view
    > > > # lsvg –l shows LV members
    > > > -Y/N Rootvg mirrored
    > > >
    > > > Other VG Mirroring config and sync
    > > > Free and used PP’s, by VG
    > > >
    > > > > List all file systems
    > > > # lsfs
    > > > # cat /etc/filesystems
    > > > look for log volume location(s)
    > > >
    > > > Check each FS and report % of space used
    > > > # df -k
    > > > # du
    > > >
    > > > Check disk(s) view of use:
    > > > # lspv –l
    > > > # lspv –p | -M
    > > >
    > > > Check future use and config of:
    > > > # cd /var; ls -l
    > > > # cd /tmp; ls –l
    > > >
    > > > # cat /etc/passwd
    > > > Review user files /etc/passwd
    > > > - Y/N Is password ageing implemented ?
    > > >
    > > > Get filesize limits From:
    > > > # cat /etc/security/limits
    > > >
    > > > Get random write threshold
    > > > # ioo –L maxrandwrt
    > > > 3-
    > > > Security logs, process and network
    > > > # cat /etc/security/login.cfg
    > > >
    > > > /etc/security directory view all other appropriate files; passwd, user,
    > > > group, limits, environ, login.cfg
    > > >
    > > > > Log-File(s) Check:
    > > > • Last login log – from /var/adm/wtmp
    > > > # last | head
    > > > # who /var/adm/wtmp
    > > > • Failed log in attempts
    > > > # who /etc/security/failedlogin
    > > > • Users logged in
    > > > # who – from /etc/utmp
    > > > • Root privilege su-use
    > > > # cat /var/adm/sulog
    > > > • Root log in access
    > > > # cat /etc/security/user
    > > > # cat /etc/security/lastlog
    > > >
    > > >  Check /etc/inetd.conf for un-needed BSD r-services
    > > > # cat /etc/inetd.conf
    > > > . Check for remote log-in password bypass
    > > > # find / -name hosts.equiv - print
    > > > # find / -name .rhosts - print
    > > > • Check root mail for security reports
    > > > # mail
    > > > • Check & report root cron tab entries
    > > > # cd /var/spool/cron/crontabs; cat root
    > > > - Y/N is /usr/sbin/skulker enabled
    > > >
    > > > Processes & daemons
    > > > > System Resource Controller – master daemon
    > > > • Report all active subsystems
    > > > #lssrc –a
    > > > - Y/N Disable sendmail ?
    > > >
    > > > > Process profile
    > > > • Defunct processes
    > > > # ps –ef | grep defunct
    > > > # ps –elf
    > > > # ps -gv
    > > > # ps aux
    > > > # topas ?
    > > > 4-
    > > >
    > > > Network access support files
    > > > # cat /etc/hosts
    > > >
    > > > Host network interface(s) configuration
    > > > # ifconfig –a
    > > >
    > > > Check hosts known routes
    > > > # netstat -r
    > > > # aarp -a
    > > >
    > > > • Performance metrics [expand ]
    > > > Reboot history
    > > > # last reboot
    > > >
    > > > Virtual memory
    > > > # vmstat 2 20
    > > >
    > > > Paging configuration lsps Script to test during activity > save file
    > > > # lsps –a
    > > >
    > > > System dump configuration
    > > > # sysdumpdev –l
    > > > Y/N to dumpcheck script
    > > >
    > > > Disk use sec x times
    > > > # iostat 2 20
    > > > - mirroring
    > > > - load balancing
    > > >
    > > > Network metrics
    > > > #netstat



+ Reply to Thread