What causes a STKOVF - VMS

This is a discussion on What causes a STKOVF - VMS ; The following involves OpenVMS V8.3A with various ECOs applied. It happens in a DECwindows app. I'd have to find out hardware & firmware details, if necessary. Okay, I've read the OpenVMS FAQ and the pertinent portions of Hoff's Wizard stuff ...

+ Reply to Thread
Results 1 to 12 of 12

Thread: What causes a STKOVF

  1. What causes a STKOVF

    The following involves OpenVMS V8.3A with various ECOs applied. It
    happens in a DECwindows app. I'd have to find out hardware & firmware
    details, if necessary.

    Okay, I've read the OpenVMS FAQ and the pertinent portions of Hoff's
    Wizard stuff on stack overflow exceptions, but most of them simply say
    "get a reproducer and talk with the support center."

    First off, I don't know how to reproduce the particular stack overflow
    we're seeing (it's in a large multiprocess system with external inputs
    out the wazoo); I'm not even sure I know if it was operator actions
    that caused it or the external inputs. The code for just the one
    process that got the SS$_STKOVF exception is huge.

    What I need is an idea of when the RTL or lower layers detect a "stack
    overflow" in a non-threaded situation (though it's a DECwindows app,
    just in case it's multithreading for some reason), so I might get some
    idea of where to look. (The place where the exception happened didn't
    reveal too much.)

    It sounds like the RTL actually pre-checks for a case where pushing n
    bytes onto the stack would cause the stack to overflow the thread's
    stated stack size (in a multithreaded app), and signal SS$_STKOVF
    before it goes off the deep end. What does it do for a single threaded
    app?

  2. Re: What causes a STKOVF

    Joe,

    can you create a process dump ? SET PROC/DUMP before running the image
    or use RUN/DUMP for a detached process. Then you have the complete
    process address space (including the stack) available for analysis
    with ANAL/PROC.

    Volker.

  3. Re: What causes a STKOVF

    On Dec 7, 10:36 am, Joe Sewell wrote:
    > The following involves OpenVMS V8.3A with various ECOs applied. It
    > happens in a DECwindows app. I'd have to find out hardware & firmware
    > details, if necessary.
    >
    > Okay, I've read the OpenVMS FAQ and the pertinent portions of Hoff's
    > Wizard stuff on stack overflow exceptions, but most of them simply say
    > "get a reproducer and talk with the support center."
    >
    > First off, I don't know how to reproduce the particular stack overflow
    > we're seeing (it's in a large multiprocess system with external inputs
    > out the wazoo); I'm not even sure I know if it was operator actions
    > that caused it or the external inputs. The code for just the one
    > process that got the SS$_STKOVF exception is huge.
    >
    > What I need is an idea of when the RTL or lower layers detect a "stack
    > overflow" in a non-threaded situation (though it's a DECwindows app,
    > just in case it's multithreading for some reason), so I might get some
    > idea of where to look. (The place where the exception happened didn't
    > reveal too much.)
    >
    > It sounds like the RTL actually pre-checks for a case where pushing n
    > bytes onto the stack would cause the stack to overflow the thread's
    > stated stack size (in a multithreaded app), and signal SS$_STKOVF
    > before it goes off the deep end. What does it do for a single threaded
    > app?



    Joe,

    First, let me welcome you to posting in COMP.OS.VMS.

    The actual details of the stack overflow handling are likely (I do not
    have one of my copies handy) in the Internals and Data Structures
    manual. The gross details of this have not changed in a VERY long
    time.

    I would also be concerned about the possibility that someone has
    overwitten a saved stack pointer in a call frame, and as a result the
    stack is effectively corupt when the RETURN is executed. These can be
    devilishly difficult to localize (been there, done that).

    There are a variety of strategies that can be used to localize this
    type of problem. Which is appropriate depends on many factors. The
    most central question is: What (if any) tracking/debugging code is
    already present in your application that can help reduce the size of
    the search.

    - Bob Gezelter, http://www.rlgsc.com


  4. Re: What causes a STKOVF

    In article <854de914-e03c-4f8b-bf33-dd0df67afae5@a39g2000pre.googlegroups.com>, Bob Gezelter writes:
    >
    > I would also be concerned about the possibility that someone has
    > overwitten a saved stack pointer in a call frame, and as a result the
    > stack is effectively corupt when the RETURN is executed. These can be
    > devilishly difficult to localize (been there, done that).


    One of the first problems I had to debug on my first Alpha was a
    return to 0. I'd never seen one on a VAX and it didn't occur to me
    that a program running on VMS could do such a stupid thing until I
    saw it. What I got first was a last-chance exception handler dump
    of registers that didn't point anywhere usefull. (At that point the
    process seems to have no stack, so no traceback handler).

    A process dump in that case didn't tell me anything I didn't already
    know, return to 0 pretty much wiped out pointers to everything
    usefull.

    I had to run with the debugger many times, doing a binary search for
    the line of code that caused the error, and then study the machine
    listing to figure out what was going on. (Reading through compiler
    generated prolog code is such fun! Made me really miss CALLx/RET!)



  5. Re: What causes a STKOVF

    On Dec 7, 11:34 am, Volker Halle wrote:
    > Joe,
    >
    > can you create a process dump ? SET PROC/DUMP before running the image
    > or use RUN/DUMP for a detached process. Then you have the complete
    > process address space (including the stack) available for analysis
    > with ANAL/PROC.
    >
    > Volker.


    In this case, we do have process dumps (an unusual occurrence). I
    haven't seen anything that leaps out at me yet.

  6. Re: What causes a STKOVF

    On Dec 7, 3:03 pm, Bob Gezelter wrote:
    > On Dec 7, 10:36 am, Joe Sewell wrote:
    >
    >
    >
    >
    >
    > > The following involves OpenVMS V8.3A with various ECOs applied. It
    > > happens in a DECwindows app. I'd have to find out hardware & firmware
    > > details, if necessary.

    >
    > > Okay, I've read the OpenVMS FAQ and the pertinent portions of Hoff's
    > > Wizard stuff on stack overflow exceptions, but most of them simply say
    > > "get a reproducer and talk with the support center."

    >
    > > First off, I don't know how to reproduce the particular stack overflow
    > > we're seeing (it's in a large multiprocess system with external inputs
    > > out the wazoo); I'm not even sure I know if it was operator actions
    > > that caused it or the external inputs. The code for just the one
    > > process that got the SS$_STKOVF exception is huge.

    >
    > > What I need is an idea of when the RTL or lower layers detect a "stack
    > > overflow" in a non-threaded situation (though it's a DECwindows app,
    > > just in case it's multithreading for some reason), so I might get some
    > > idea of where to look. (The place where the exception happened didn't
    > > reveal too much.)

    >
    > > It sounds like the RTL actually pre-checks for a case where pushing n
    > > bytes onto the stack would cause the stack to overflow the thread's
    > > stated stack size (in a multithreaded app), and signal SS$_STKOVF
    > > before it goes off the deep end. What does it do for a single threaded
    > > app?

    >
    > Joe,
    >
    > First, let me welcome you to posting in COMP.OS.VMS.
    >
    > The actual details of the stack overflow handling are likely (I do not
    > have one of my copies handy) in the Internals and Data Structures
    > manual. The gross details of this have not changed in a VERY long
    > time.
    >
    > I would also be concerned about the possibility that someone has
    > overwitten a saved stack pointer in a call frame, and as a result the
    > stack is effectively corupt when the RETURN is executed. These can be
    > devilishly difficult to localize (been there, done that).
    >
    > There are a variety of strategies that can be used to localize this
    > type of problem. Which is appropriate depends on many factors. The
    > most central question is: What (if any) tracking/debugging code is
    > already present in your application that can help reduce the size of
    > the search.
    >
    > - Bob Gezelter,http://www.rlgsc.com


    I've got a 5.5 version handy; wish I had thought of that sooner.
    Thanks.

    It's possible that something smashed the stack, but all the call
    frames look correct otherwise, something that I've found to be rare
    when the stack gets puked upon.

  7. Re: What causes a STKOVF

    On Dec 7, 4:43 pm, koeh...@eisner.nospam.encompasserve.org (Bob
    Koehler) wrote:
    > In article <854de914-e03c-4f8b-bf33-dd0df67af...@a39g2000pre.googlegroups.com>, Bob Gezelter writes:
    >
    >
    >
    > > I would also be concerned about the possibility that someone has
    > > overwitten a saved stack pointer in a call frame, and as a result the
    > > stack is effectively corupt when the RETURN is executed. These can be
    > > devilishly difficult to localize (been there, done that).

    >
    > One of the first problems I had to debug on my first Alpha was a
    > return to 0. I'd never seen one on a VAX and it didn't occur to me
    > that a program running on VMS could do such a stupid thing until I
    > saw it. What I got first was a last-chance exception handler dump
    > of registers that didn't point anywhere usefull. (At that point the
    > process seems to have no stack, so no traceback handler).
    >
    > A process dump in that case didn't tell me anything I didn't already
    > know, return to 0 pretty much wiped out pointers to everything
    > usefull.
    >
    > I had to run with the debugger many times, doing a binary search for
    > the line of code that caused the error, and then study the machine
    > listing to figure out what was going on. (Reading through compiler
    > generated prolog code is such fun! Made me really miss CALLx/RET!)


    Been there, done that. The problem is we cannot seem to reproduce this
    reliably; all I've got is the afore-mentioned process dump.

  8. Re: What causes a STKOVF

    Joe Sewell wrote:

    > What I need is an idea of when the RTL or lower layers detect a "stack
    > overflow" in a non-threaded situation (though it's a DECwindows app,
    > just in case it's multithreading for some reason), so I might get some
    > idea of where to look. (The place where the exception happened didn't
    > reveal too much.)
    >
    > It sounds like the RTL actually pre-checks for a case where pushing n
    > bytes onto the stack would cause the stack to overflow the thread's
    > stated stack size (in a multithreaded app), and signal SS$_STKOVF
    > before it goes off the deep end. What does it do for a single threaded
    > app?


    To answer your question: In a multi-threaded/multi-stacked application,
    each stack is a fixed size (no automatic expansion) and there are
    'yellow zones' to help DECthreads know when you are near the edge of the
    stack. The Calling Standard has lots of details on how stack checking
    is implemented by the compilers.

    In a traditional single-stack application, there is no yellow zone since
    there is automatic stack expansion. The stack will expand and expand
    until you run out of page file quota. You'll eventually end up with an
    ACCVIO I believe.

    --
    John Reagan
    OpenVMS Pascal/Macro-32/COBOL Project Leader
    Hewlett-Packard Company

  9. Re: What causes a STKOVF

    Joe,

    if you have process dumps available, that's a starting point.

    Start with DBG> SHOW CALL

    Which routine/module is the first (top-most) in the call chain ?
    Always the same in all the dumps ?
    What does DBG> EXA/INS tell you ?

    Volker.

  10. Re: What causes a STKOVF

    Joe,

    you'll get a STKOVF (instead of just an ACCVIO), if the process is
    running a Thread Manager (like PTHREADs) or you're not running on the
    process's initial kernel thread. Use SDA> SHOW PROC/IMA to see,
    whether PTHREAD$RTL is in the image list. I'll bet it is for a
    DECwindows image.

    Check the stack pointer SP with DBG> EX SP

    then examine the stack addresses and limits

    DBG> SDA
    SDA> EXA ctl$aq_stack;20
    SDA> EXA ctl$aq_stacklim;20

    SDA will show 1 quadword for each stack (offset 0=kernel, then exec,
    super, user)

    Try to figure out, if the current SP is near the limits (or outside)
    the stack.

    Volker.

  11. Re: What causes a STKOVF

    In article , John Reagan writes:
    >
    >
    >Joe Sewell wrote:
    >
    >> What I need is an idea of when the RTL or lower layers detect a "stack
    >> overflow" in a non-threaded situation (though it's a DECwindows app,
    >> just in case it's multithreading for some reason), so I might get some
    >> idea of where to look. (The place where the exception happened didn't
    >> reveal too much.)
    >>
    >> It sounds like the RTL actually pre-checks for a case where pushing n
    >> bytes onto the stack would cause the stack to overflow the thread's
    >> stated stack size (in a multithreaded app), and signal SS$_STKOVF
    >> before it goes off the deep end. What does it do for a single threaded
    >> app?

    >
    >To answer your question: In a multi-threaded/multi-stacked application,
    >each stack is a fixed size (no automatic expansion) and there are
    >'yellow zones' to help DECthreads know when you are near the edge of the
    >stack. The Calling Standard has lots of details on how stack checking
    >is implemented by the compilers.
    >
    >In a traditional single-stack application, there is no yellow zone since
    >there is automatic stack expansion. The stack will expand and expand
    >until you run out of page file quota. You'll eventually end up with an
    >ACCVIO I believe.


    I thought he said it wasn't threaded in the initial post. I came across
    many of these when working on a DECthreaded application. I setup a file
    of configuration parameters and one was a stack size value to pass along
    to pthread_attr_setstacksize().

    --
    VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)COM

    "Well my son, life is like a beanstalk, isn't it?"

    http://tmesis.com/drat.html

  12. Re: What causes a STKOVF

    On Dec 10, 12:00 pm, Volker Halle wrote:
    > Joe,
    >
    > you'll get a STKOVF (instead of just an ACCVIO), if the process is
    > running a Thread Manager (like PTHREADs) or you're not running on the
    > process's initial kernel thread. Use SDA> SHOW PROC/IMA to see,
    > whether PTHREAD$RTL is in the image list. I'll bet it is for a
    > DECwindows image.
    >
    > Check the stack pointer SP with DBG> EX SP
    >
    > then examine the stack addresses and limits
    >
    > DBG> SDA
    > SDA> EXA ctl$aq_stack;20
    > SDA> EXA ctl$aq_stacklim;20
    >
    > SDA will show 1 quadword for each stack (offset 0=kernel, then exec,
    > super, user)
    >
    > Try to figure out, if the current SP is near the limits (or outside)
    > the stack.
    >
    > Volker.


    Thanks for the info; I'll do this.

    You say that you wouldn't be surprised if a DECwindows image is
    multithreaded. I cannot speak for what DECwindows itself is doing, but
    *we* aren't multithreading it. On the other hand, I *do* see PTHREAD
    $RTL high up in the call stack.

    Assuming DECwindows is instigating multithreading (or perhaps the
    X11R6 update -- that makes Xt "thread safe" -- does just enough to
    kick this in), then much is explained.

+ Reply to Thread