HP-UX 11.11 overwriting core file(s) - How to prevent? - HP UX

This is a discussion on HP-UX 11.11 overwriting core file(s) - How to prevent? - HP UX ; I reported a core file from our product but it turns out there are 2 core dumps. The second (which I reported) is overwriting the /core file. Rather than rename an existing core file before writing another one, or simply ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: HP-UX 11.11 overwriting core file(s) - How to prevent?

  1. HP-UX 11.11 overwriting core file(s) - How to prevent?

    I reported a core file from our product but it turns out there are 2
    core dumps. The second (which I reported) is overwriting the /core
    file. Rather than rename an existing core file before writing another
    one, or simply using a unique name when creating the core file, HP-UX
    instead just overwrites an existing "core" named file with another
    "core" named file. The real problem was caused and identified by the
    first core file that got stepped on by the second core file.

    There are several daemons running for the product which could produce
    core files. If more than one of them cores, they should get different
    filenames in which to save the dump. On Solaris, the core file gets
    named "core.." so they don't step on each other.
    After each test, we check for core files in several places and save
    them elsewhere. Then the core files are deleted so any new ones found
    after the next test are known to have been generated during that test.

    We wrote a script that scans for core files and will append a timestamp
    string onto the file name. So, for example, if "/core" is found then
    it gets renamed to "/core.,". It scans every 2 seconds (so
    the script doesn't consume the CPU). It seems to work when I tested it
    but it is possible that another core file gets reproduced within the
    2-second window (or a 1-second window if we had the script wait only
    that long). It's not yet a startup script so we just run "nohup
    &" to put it in background and allow exiting the telnet
    session.

    Seems there should be a more elegant scheme of saving core files than
    having them all use the same name and overwrite the last one. Is there
    a better way?


  2. Re: HP-UX 11.11 overwriting core file(s) - How to prevent?

    "Vanguard" writes:

    > I reported a core file from our product but it turns out there are 2
    > core dumps. The second (which I reported) is overwriting the /core
    > file.


    This is how all UNIX machines used to work.

    If you don't want that, create a unique directory for each daemon
    and chdir() into it.

    > On Solaris, the core file gets named "core.."


    Does not. On newer solaris machines, core filename can be configured
    (as you described above, or in many other ways; "man coreadm").

    > Seems there should be a more elegant scheme of saving core files than
    > having them all use the same name and overwrite the last one. Is there
    > a better way?


    See above for one possible (and portable) solution.

    Another solution (writing core into core.pid) can be found in this thread:
    http://forums1.itrc.hp.com/service/f...hreadId=898593

    I'll repeat it here (in case the above page goes away):

    By A. Clay Stephenson Jun 6, 2005 17:36:25 GMT

    The only option for HP-UX is to optionally append the PID to "core"
    but the file is always written to the current working directory.

    echo " core_addpid/W1" | adb -w /stand/vmunix /dev/kmem

    This will ONLY change the memory image of the running kernel
    and leave the object file, /stand/vmunix, untouched. It's a safe
    command and this is a common practice for changing kernel values
    that otherwise can't be modified. You could also force the write
    to the object file but I prefer to simply change the image in
    /dev/kmem and if you want this to be a "permanent" change rather
    than writing the object file, I prefer to setup a startup script
    in /sbin/init.d.

    Cheers,
    --
    In order to understand recursion you must first understand recursion.
    Remove /-nsp/ for email.

  3. Re: HP-UX 11.11 overwriting core file(s) - How to prevent?

    "Paul Pluzhnikov" wrote in message
    news:m3slh369g5.fsf@somewhere.in.california.localh ost...
    > "Vanguard" writes:
    >
    >> I reported a core file from our product but it turns out there are
    >> 2
    >> core dumps. The second (which I reported) is overwriting the /core
    >> file.

    >
    > This is how all UNIX machines used to work.
    >
    > If you don't want that, create a unique directory for each daemon
    > and chdir() into it.


    Unfortunately for QA, I have to test the product as it is packaged.
    Also, many of the scripts have relative paths so it wouldn't work for
    me to rearrange the hierarchy of the directories. I'll have to look
    at the start script used to load all the daemons. Even if I changed
    it to cd to the and then to the bin, admin, or goodies
    subdirectories before loading each daemon, several reside in the same
    directory. They'd still step on each other's core file.

    Just to be sure, is the core file put in the current working directory
    at the time it gets produced? Or does it get saved in whatever was
    the current directory at the time the daemon was loaded? If it is the
    current directory at the time the daemon got loaded then maybe I can
    modify the startup script for the daemons to make and change to a
    subdirectory under /logs/ and then load the
    daemon. Then the core file would get put into the log subdirectory by
    that daemon's name. Of course, this won't help when the product or
    test scripts force a reload of the failed daemon for the next test in
    the long testlist and there is another core dump by the same daemon
    which would end up overwriting the previous one produced earlier by
    the same daemon. The separate log subdirectory for each daemon would
    eliminate other deamons from stepping atop core files for different
    daemons but not for it stepping atop its own core file. The
    enterprise product will attempt to recover (i.e., daemons watch each
    other) and will restart a failed and required daemon but if it fails
    again then the previous core file gets stepped on.

    >> On Solaris, the core file gets named "core.."

    >
    > Does not. On newer solaris machines, core filename can be configured
    > (as you described above, or in many other ways; "man coreadm").


    On the 3 Solaris boxes that I get to use, after a test has completed
    and if a core file was produced, the filename is as I mentioned. I
    suppose it is possible that someone defined a rename script like we
    are now trying on HP-UX but I doubt it since it is not described in
    the procedures when setting up the host after reimaging it (i.e., it
    such a script is being used, it won't be after we reimage the host).
    I've looked in the Perl scripts that we used for the automated testing
    and haven't found that they do the rename.

    From what the developer said, and also from what I've Googled, there
    is no coreadm on HP-UX. I did see mention of savecrash and some
    config file (forgot its path but didn't see anything in the config
    file that would dictate the filenaming scheme for core files). I did
    try scanning the man page for coreadm just for background but got
    interrupted, so I didn't see how a scheme is specified for the
    filenaming of core files. I'll have to check tomorrow if I get time.


  4. Re: HP-UX 11.11 overwriting core file(s) - How to prevent?

    "Vanguard" writes:

    >> If you don't want that, create a unique directory for each daemon
    >> and chdir() into it.

    >
    > Unfortunately for QA, I have to test the product as it is
    > packaged. Also, many of the scripts have relative paths so it wouldn't


    What I meant is: have each daemon create a directory for itself:

    char buf[1024];
    sprintf(buf, "/tmp/%s.%d", DAEMON_NAME, getpid());
    if (-1 == mkdir(buf, 0700)) { ... error handling ... }
    if (-1 == chdir(buf)) { ... error handling ... }

    Do this after the daemon has read all of its config files, but
    before it starts "servicing requests".

    > Just to be sure, is the core file put in the current working directory
    > at the time it gets produced?


    Yes.

    >>> On Solaris, the core file gets named "core.."

    >>
    >> Does not. On newer solaris machines, core filename can be configured
    >> (as you described above, or in many other ways; "man coreadm").

    >
    > On the 3 Solaris boxes that I get to use, after a test has completed
    > and if a core file was produced, the filename is as I mentioned.


    Because *default* Solaris coreadm.conf does that.

    But if you count on that, your "enterprize" system will spectacularly
    fail to find cores on a machine that was reconfigured to save core
    files elsewhere.

    > From what the developer said, and also from what I've Googled, there
    > is no coreadm on HP-UX. I did see mention of savecrash and some
    > config file


    savecrash is *not* what you are interested in.
    It is used to save crash data when the system itself panics.
    It has nothing to do with user-level core files.

    Cheers,
    --
    In order to understand recursion you must first understand recursion.
    Remove /-nsp/ for email.

  5. Re: HP-UX 11.11 overwriting core file(s) - How to prevent?


    Paul Pluzhnikov wrote:
    > "Vanguard" writes:
    >
    > >> If you don't want that, create a unique directory for each daemon
    > >> and chdir() into it.

    > >
    > > Unfortunately for QA, I have to test the product as it is
    > > packaged. Also, many of the scripts have relative paths so it wouldn't

    >
    > What I meant is: have each daemon create a directory for itself:
    >
    > char buf[1024];
    > sprintf(buf, "/tmp/%s.%d", DAEMON_NAME, getpid());
    > if (-1 == mkdir(buf, 0700)) { ... error handling ... }
    > if (-1 == chdir(buf)) { ... error handling ... }
    >
    > Do this after the daemon has read all of its config files, but
    > before it starts "servicing requests".


    I'll pass this on to the developers about adding code in the daemon so
    it manages where its core file gets saved and its name. They already
    have an /log directory so they could add
    /log/ for holding the core file(s) for each
    daemon. Thanks.


+ Reply to Thread