CPU Looper Monitor Script - VMS

This is a discussion on CPU Looper Monitor Script - VMS ; Hi, I need to write a script that will check for looping processes on a VMS system. I have no problem writing it, but was just checking whether anyone has something they could post that has already been written to ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: CPU Looper Monitor Script

  1. CPU Looper Monitor Script

    Hi,

    I need to write a script that will check for looping processes on a
    VMS system. I have no problem writing it, but was just checking
    whether anyone has something they could post that has already been
    written to save me the time. Basically just something that would
    check all process over an interval and identify those with high cpu
    utilization.

    thanks

  2. Re: CPU Looper Monitor Script

    In article
    <5cdc7f43-93b9-4c0d-b6bb-5a4fb8b0f686@j64g2000hsj.googlegroups.com>,
    tcarterhb@gmail.com writes:

    > I need to write a script that will check for looping processes on a
    > VMS system. I have no problem writing it, but was just checking
    > whether anyone has something they could post that has already been
    > written to save me the time. Basically just something that would
    > check all process over an interval and identify those with high cpu
    > utilization.


    Depending on your needs, you might want to check high CPU usage coupled
    with no I/O.


  3. Re: CPU Looper Monitor Script

    On Dec 26, 11:35*am, hel...@astro.multiCLOTHESvax.de (Phillip Helbig---
    remove CLOTHES to reply) wrote:
    > In article
    > <5cdc7f43-93b9-4c0d-b6bb-5a4fb8b0f...@j64g2000hsj.googlegroups.com>,
    >
    > tcarte...@gmail.com writes:
    > > I need to write a script that will check for looping processes on a
    > > VMS system. *I have no problem writing it, but was just checking
    > > whether anyone has something they could post that has already been
    > > written to save me the time. * Basically just something that would
    > > check all process over an interval and identify those with high cpu
    > > utilization.

    >
    > Depending on your needs, you might want to check high CPU usage coupled
    > with no I/O.


    Yes, you are correct. I was just looking for a DCL script someone
    has already written, so I don't have to re-invent the wheel. I
    haven't found anything yet at openvms.org, or dcl.openvms.org.
    I guess it would be the opposite of a Watchdog type procedure that
    monitors idle users,

    Anyone have anything?
    thanks.

  4. Re: CPU Looper Monitor Script

    I can't imagine writing this in DCL without suffering
    memory leaks. In fact I have a hard time imagining
    writing it in DCL at all, due to the limited array capabilities.
    I've written performance monitoring software for years,
    and trying to capture the running history of 'n' processes
    over a long period of time is not a job DCL is suited for.
    That said, I suppose it would look like this:

    $ limit = 300 ! hundredths of a second
    $ OUTLOOP:
    $ context = ""
    $ PIDLOOP:
    $ runpid = f$pid(context)
    $ if runpid .nes. ""
    $ then
    $ curcpu = f$getjpi(runpid, "CPUTIM")
    $ if f$type(array_'runpid'_cpu) .eqs. ""
    $ then
    $ ! first time this process has been seen, nothing to compare
    $ else
    $ delta = curcpu - array_'runpid'_cpu
    $ if delta .gt. limit then -
    write sys$output "process ''runpid' has exceeded the CPU
    threshold"
    $ endif
    $ array_'runpid'_cpu = curcpu
    $ goto PIDLOOP
    $ endif
    $ wait 00:10:00
    $ goto OUTLOOP

    Notice the problems? For example, what happens when
    a pid is re-used? How meaningful is it that a process uses
    3 CPU seconds in 10 realtime seconds? How many intervals
    do you want to track before deciding a process is looping?

    ok
    dpm

  5. Re: CPU Looper Monitor Script

    On Dec 26, 1:16*pm, David_Mur...@murphyfamily.org wrote:
    > I can't imagine writing this in DCL without suffering
    > memory leaks. *In fact I have a hard time imagining
    > writing it in DCL at all, due to the limited array capabilities.
    > I've written performance monitoring software for years,
    > and trying to capture the running history of 'n' processes
    > over a long period of time is not a job DCL is suited for.
    > That said, I suppose it would look like this:
    >
    > $ * * limit = 300 * *! hundredths of a second
    > $ OUTLOOP:
    > $ * * * context = ""
    > $ PIDLOOP:
    > $ * * * runpid = f$pid(context)
    > $ * * * if runpid .nes. ""
    > $ * * * then
    > $ * * * * * * * curcpu = f$getjpi(runpid, "CPUTIM")
    > $ * * * * * * * if f$type(array_'runpid'_cpu) .eqs. ""
    > $ * * * * * * * then
    > $ * * * * * * * * * * * ! first time this process has been seen, nothing to compare
    > $ * * * * * * * else
    > $ * * * * * * * * * * * delta = curcpu - array_'runpid'_cpu
    > $ * * * * * * * * * * * if delta .gt. limit then -
    > * * * * * * * * * * * * * * * * write sys$output "process ''runpid' has exceeded the CPU
    > threshold"
    > $ * * * * * * * endif
    > $ * * * * * * * array_'runpid'_cpu = curcpu
    > $ * * * * * * * goto PIDLOOP
    > $ * * * endif
    > $ * * * wait 00:10:00
    > $ * * * goto OUTLOOP
    >
    > Notice the problems? *For example, what happens when
    > a pid is re-used? *How meaningful is it that a process uses
    > 3 CPU seconds in 10 realtime seconds? *How many intervals
    > do you want to track before deciding a process is looping?
    >
    > ok
    > dpm


    Yes, I see the potential problems, but this will really only check
    every 5 minutes or so, and compare the delta (5 min) with the cpu
    time of a process, so if I check every 5 minutes, and a process has
    almost 5 minutes of CPU time, it is a potential looper. Just a
    ballpark. Yes , I know that there will be plenty of times when the
    system has processes that utilize more cpu, but this is just to catch
    any problems. We had a problem the past few days where a user was
    connected via Putty and SSH protocol,and he just disconnected his vpn
    connection without logging out, and it sent the lingering SSH process
    looping away, 100% cpu. Bug in SSH client for VMS (TCPIP v5.4 eco
    6, vms v7.3-2) I would think. Anyway, The powers that be are
    looking for ways to prevent, or detect this so it doesn't happen in
    the future. I don;t want to be looking at a Monitor window all day
    long. I have caught more than a few SSH BG devices looping in the
    past week. I'll call HP and see if they have had this problem
    reported.

    There used to be an old DEC product called System Watchdog that did a
    nice job with this, but I think it got swallowed up by Computer
    Associates and their outrageous Unicenter family.

    thanks.

  6. Re: CPU Looper Monitor Script

    In article , tcarterhb@gmail.com writes:
    >
    >
    >On Dec 26, 1:16{equal}A0pm, David_Mur...@murphyfamily.org wrote:
    >> I can't imagine writing this in DCL without suffering
    >> memory leaks. {equal}A0In fact I have a hard time imagining
    >> writing it in DCL at all, due to the limited array capabilities.
    >> I've written performance monitoring software for years,
    >> and trying to capture the running history of 'n' processes
    >> over a long period of time is not a job DCL is suited for.
    >> That said, I suppose it would look like this:
    >>
    >> $ {equal}A0 {equal}A0 limit {equal}3D 300 {equal}A0 {equal}A0! hundredths of a second
    >> $ OUTLOOP:
    >> $ {equal}A0 {equal}A0 {equal}A0 context {equal}3D ""
    >> $ PIDLOOP:
    >> $ {equal}A0 {equal}A0 {equal}A0 runpid {equal}3D f$pid(context)
    >> $ {equal}A0 {equal}A0 {equal}A0 if runpid .nes. ""
    >> $ {equal}A0 {equal}A0 {equal}A0 then
    >> $ {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 curcpu {equal}3D f$getjpi(runpid, "CPUTIM")
    >> $ {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 if f$type(array_'runpid'_cpu) .eqs. ""
    >> $ {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 then
    >> $ {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 ! first time this process ha{equal}

    >s been seen, nothing to compare
    >> $ {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 else
    >> $ {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 delta {equal}3D curcpu - array_'ru{equal}

    >npid'_cpu
    >> $ {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 if delta .gt. limit then -
    >> {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 write sys${equal}

    >output "process ''runpid' has exceeded the CPU
    >> threshold"
    >> $ {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 endif
    >> $ {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 array_'runpid'_cpu {equal}3D curcpu
    >> $ {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 {equal}A0 goto PIDLOOP
    >> $ {equal}A0 {equal}A0 {equal}A0 endif
    >> $ {equal}A0 {equal}A0 {equal}A0 wait 00:10:00
    >> $ {equal}A0 {equal}A0 {equal}A0 goto OUTLOOP
    >>
    >> Notice the problems? {equal}A0For example, what happens when
    >> a pid is re-used? {equal}A0How meaningful is it that a process uses
    >> 3 CPU seconds in 10 realtime seconds? {equal}A0How many intervals
    >> do you want to track before deciding a process is looping?
    >>
    >> ok
    >> dpm

    >
    >Yes, I see the potential problems, but this will really only check


    I see another problem because your news software sends quoted-printable.

    I've substitued every = with {equal} so you can see how UGLY this is for
    those of us who have news readers which send out and read plain text as
    usenet was intended. I'd suspect your reader would translate the = and
    hex code such that you couldn't see the problem if I didn't use {equal}.

    David's initial post was fine. It's your news reader that quoted David
    and horked up his code.



    >every 5 minutes or so, and compare the delta (5 min) with the cpu
    >time of a process, so if I check every 5 minutes, and a process has
    >almost 5 minutes of CPU time, it is a potential looper. Just a
    >ballpark. Yes , I know that there will be plenty of times when the
    >system has processes that utilize more cpu, but this is just to catch
    >any problems. We had a problem the past few days where a user was
    >connected via Putty and SSH protocol,and he just disconnected his vpn
    >connection without logging out, and it sent the lingering SSH process
    >looping away, 100% cpu. Bug in SSH client for VMS (TCPIP v5.4 eco
    >6, vms v7.3-2) I would think. Anyway, The powers that be are
    >looking for ways to prevent, or detect this so it doesn't happen in
    >the future. I don;t want to be looking at a Monitor window all day
    >long. I have caught more than a few SSH BG devices looping in the
    >past week. I'll call HP and see if they have had this problem
    >reported.
    >
    >There used to be an old DEC product called System Watchdog that did a
    >nice job with this, but I think it got swallowed up by Computer
    >Associates and their outrageous Unicenter family.


    Yup. That's what Google shows.

    --
    VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)COM

    "Well my son, life is like a beanstalk, isn't it?"

    http://tmesis.com/drat.html

  7. Re: CPU Looper Monitor Script

    tcarterhb@gmail.com wrote:
    >
    > Hi,
    >
    > I need to write a script that will check for looping processes on a
    > VMS system. I have no problem writing it, but was just checking
    > whether anyone has something they could post that has already been
    > written to save me the time. Basically just something that would
    > check all process over an interval and identify those with high cpu
    > utilization.
    >
    > thanks


    Take a look at this:
    http://www.djesys.com/freeware/vms/sys_mon.zip

    This a VMS .ZIP archive containing a piece of DCL code I'm currently
    running at work. Admittedly, it's rather a "q and d", and as such, needs
    to be stopped and restarted daily due to the "memory leak" issues noted
    by other posters.

    To make it "release quality", you'd want to add a list of PIDs being
    watched to see if any have gone away, and then clean the associated
    symbols out of the process environment. All that before scanning using
    F$CONTEXT() and F$PID() to find new processes or gather stat.'s on
    existing processes for comparison to the previous samples.

    SYS_MON tracks CPU utilization and I/O counts to determine whether
    action should be taken. It might make a good starting point for some
    considerably more sophisticated DCL code. It is intended run as a daemon
    (detached process), and is designed to issue notification by both e-mail
    and pager; however, the paging interface will likely need to be
    customized for any given site. Ours uses the former archwireless.com and
    sends pages using POST method via WGET (Thanx, SMS!).

    SYS_MON was written in haste to handle a problem with Cerner code which
    can get its undies in a bunch. Dunno if the same issue exists in the AIX
    version of that same Cerner code. ("Server 200".)

    David J Dachtera
    DJE Systems

  8. Re: CPU Looper Monitor Script

    On Dec 26, 1:55*pm, tcarte...@gmail.com wrote:
    > Hi,
    >
    > I need to write a script that will check for looping processes on a
    > VMS system. *I have no problem writing it, but was just checking
    > whether anyone has something they could post that has already been
    > written to save me the time. * Basically just something that would
    > check all process over an interval and identify those with high cpu
    > utilization.
    >
    > thanks


    As others have already pointed out, while this could be done in DCL it
    is better to choose a high level language. Here is a link to my free
    watchdog program which could be modified to suit your purposes.

    http://www3.sympatico.ca/n.rieck/dem...c-watchdog.zip

    A variation of this program has been in production for almost 20 years
    and it works. The key to this utility is to create an array at run-
    time which is no larger than sysgen parameter "MaxProcessCnt" (which
    can be read by calling SYS$GETSYI). You then do a wild-card GETJPI at
    some specified interval (60 seconds ?) taking note of PID numbers,
    user names, imagenames etc. If any of these change then the PID has
    been reassigned to a new process so you only need to record the new
    information then move on to the next PID. Otherwise, you gather CPU
    and I/O stats since the last pass and either store them or take some
    sort of action.

    Way back when we had to determine if someone was consuming too many
    BIOs because people would actually jam a paper-clip into the VT220
    keyboard's key in order to defeat the watchdog at coffee time.

    Ah those crazy users....

    Neil Rieck
    Kitchener/Waterloo/Cambridge,
    Ontario, Canada.
    http://www3.sympatico.ca/n.rieck/lin...l_openvms.html
    http://www3.sympatico.ca/n.rieck/lin...vms_demos.html



  9. Re: CPU Looper Monitor Script

    On Dec 27, 12:47 pm, Neil Rieck wrote:

    > The key to this utility is to create an array at run-time
    > which is no larger than sysgen parameter "MaxProcessCnt"


    The maximum value for that parameter is 8192 on this VAX 400-700A
    running VMS version V6.2 . . . you might want to watch out for it
    being
    much larger on more recent releases, depending upon how much data
    is being stored per process.

    > You then do a wild-card GETJPI at
    > some specified interval (60 seconds ?) taking note of PID numbers,
    > user names, imagenames etc. If any of these change then the PID has
    > been reassigned to a new process


    I would think that simply comparing the current value of JPI$_LOGINTIM
    to the stored value would suffice. In fact, a different JPI$_IMAGENAM
    is certainly not an indication in itself that a new process owns that
    PID.

    ok
    dpm


  10. Re: CPU Looper Monitor Script

    In article , David_Murphy@murphyfamily.org writes:
    >
    >
    >On Dec 27, 12:47 pm, Neil Rieck wrote:
    >
    >> The key to this utility is to create an array at run-time
    >> which is no larger than sysgen parameter "MaxProcessCnt"

    >
    >The maximum value for that parameter is 8192 on this VAX 400-700A
    >running VMS version V6.2 . . . you might want to watch out for it
    >being
    >much larger on more recent releases, depending upon how much data
    >is being stored per process.
    >
    >> You then do a wild-card GETJPI at
    >> some specified interval (60 seconds ?) taking note of PID numbers,
    >> user names, imagenames etc. If any of these change then the PID has
    >> been reassigned to a new process

    >
    >I would think that simply comparing the current value of JPI$_LOGINTIM
    >to the stored value would suffice. In fact, a different JPI$_IMAGENAM
    >is certainly not an indication in itself that a new process owns that
    >PID.


    The OP posted that he was using TCPIP Services V5.4 eco 6 on a VMS V7.3-2
    system. I've tried to replicate the looping SSH processes to no avail.
    I'd like to know where the process is spending its time looping. That'd,
    perhaps, help find a solution to the looping problem instead of spending
    more CPU cycles simply to identify such looping processes.

    --
    VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)COM

    "Well my son, life is like a beanstalk, isn't it?"

    http://tmesis.com/drat.html

  11. Re: CPU Looper Monitor Script

    On Dec 27, 1:30*pm, David_Mur...@murphyfamily.org wrote:
    > On Dec 27, 12:47 pm, Neil Rieck wrote:
    >
    > > The key to this utility is to create an array at run-time
    > > which is no larger than sysgen parameter "MaxProcessCnt"

    >
    > The maximum value for that parameter is 8192 on this VAX 400-700A
    > running VMS version V6.2 . . . you might want to watch out for it
    > being
    > much larger on more recent releases, depending upon how much data
    > is being stored per process.
    >
    > > You then do a wild-card GETJPI at
    > > some specified interval (60 seconds ?) taking note of PID numbers,
    > > user names, imagenames etc. If any of these change then the PID has
    > > been reassigned to a new process

    >
    > I would think that simply comparing the current value of JPI$_LOGINTIM
    > to the stored value would suffice. *In fact, a different JPI$_IMAGENAM
    > is certainly not an indication in itself that a new process owns that
    > PID.
    >
    > ok
    > dpm


    Although 8192 is available as a "maximum value" on your VAX, I've
    never seen it higher than 500 on any of my VAX or Alpha platforms. Of
    course it all depends on how you are using the machine AND who tuned
    it.

    Here is what you see on my dual-CPU AS-DS20e (OpenVMS-8.3)

    Parameter Name Current Default Min. Max. Unit Dynamic
    -------------- ------- ------- ------- ------- ---- -------
    MAXPROCESSCNT 400 32 12 32767 Processes
    BALSETCNT 344 30 8 32765 Processes D

    Nine months ago I doubled these values when we started forcing people
    to log in via SSH. This meant every interactive user now required a
    minimum of two process slots.

    But you are correct. If someone had set MAXPROCESSCNT to a large
    number then my monitor program would need much more memory to keep
    track of all the processes. But then again these are virtual memory
    systems :-)

    Neil Rieck
    Kitchener/Waterloo/Cambridge,
    Ontario, Canada.
    http://www3.sympatico.ca/n.rieck/lin...l_openvms.html
    http://www3.sympatico.ca/n.rieck/lin...vms_demos.html


  12. Re: CPU Looper Monitor Script

    In article <5cdc7f43-93b9-4c0d-b6bb-5a4fb8b0f686@j64g2000hsj.googlegroups.com>, tcarterhb@gmail.com writes:
    > Hi,
    >
    > I need to write a script that will check for looping processes on a
    > VMS system. I have no problem writing it, but was just checking
    > whether anyone has something they could post that has already been
    > written to save me the time. Basically just something that would
    > check all process over an interval and identify those with high cpu
    > utilization.


    Instead of rolling your own, you may want to checkout the UAF
    value CPUTIME. I've only had to use this once, but it was quite
    affective.


  13. Re: CPU Looper Monitor Script

    This topic was talked about on the ITRC forum.

    http://forums11.itrc.hp.com/service/...readId=1161911

+ Reply to Thread