Re: Debugging : Hard Time.! ? - Unix

This is a discussion on Re: Debugging : Hard Time.! ? - Unix ; On 15 Mar, 19:40, Thomas Maier-Komor wrote: > Sheth Raxit schrieb: > > > > > > > One more hard debugging time, one more crash and i think got stucked, > > > Sun OS 10 - sparc > ...

+ Reply to Thread
Results 1 to 4 of 4

Thread: Re: Debugging : Hard Time.! ?

  1. Re: Debugging : Hard Time.! ?

    On 15 Mar, 19:40, Thomas Maier-Komor wrote:
    > Sheth Raxit schrieb:
    >
    >
    >
    >
    >
    > > One more hard debugging time, one more crash and i think got stucked,

    >
    > > Sun OS 10 - sparc

    >
    > > copying some info here,

    >
    > > plz I may not want exact solution but i want what are the approach
    > > experts take to debug this. I

    >
    > > can't attach the gdb/dbx to running process , because of
    > > administration restriction.! ) I am having

    >
    > > core , having binary (compiled with -g )

    >
    > > dbx Mybin core :

    >
    > > For information about new features see `help changes'
    > > To remove this message, put `dbxenv suppress_startup_message 7.5' in
    > > your .dbxrc
    > > Reading
    > > core file header read successfully
    > > Reading ld.so.1
    > > Reading libcurses.so.1
    > > Reading libnsl.so.1
    > > Reading libsocket.so.1
    > > Reading libelf.so.1
    > > Reading libcrypt_i.so.1
    > > Reading libpthread.so.1
    > > Reading libdl.so.1
    > > Reading librt.so.1
    > > Reading libc.so.1
    > > Reading libgen.so.1
    > > Reading libaio.so.1
    > > Reading libmd5.so.1
    > > Reading libc_psr.so.1
    > > Reading liba
    > > Reading lib
    > > (dbx) where
    > > current thread: t@1
    > > [1] _adjtime(0x4, 0xffbff928, 0xffbff91c, 0x1, 0x0, 0x95e351), at
    > > 0xff14056c
    > > =>[2] Tcp_receive(port_no = 9900), line 259 in "b.c"
    > > [3] main(argc = 1, argv = 0xffbffa34), line 310 in "a.c"
    > > (dbx) print sockfd
    > > sockfd = 21

    >
    > > Also from Tcp_receive some other functions are also called but I dont
    > > know why dbx is not showing them

    >
    > > _adjtime --Google/Dbx is not helpful here..

    >
    > > on same core
    > > $bash pstack core
    > > ---some info about threads---
    > > ----------------- lwp# 2 / thread# 2 --------------------
    > > ff140478 ___sigtimedwait (c97e0, 0, 0, ff052400, 0, 0) + 8
    > > ff12c6a4 __posix_sigwait (c97e0, fef7bf8c, 11, 0, 0, 0) + 18
    > > 0003a5bc ???????? (c97e0, fef7bf8c, c9400, ff052400, ff16cbc0, 1)
    > > 0003d588 sig_thr (0, fef7c000, 0, 0, 3d530, 0) + 58
    > > ff13fff0 _lwp_start (0, 0, 0, 0, 0, 0)
    > > ---some info about threads---

    >
    > > there are more thread info here but i found above entry doubtfull , I
    > > am using pstck first time, so i may be wrong.

    >
    > > --Raxit Sheth

    >

    Tom/Other,

    thanks for suggstion, I think still its causing crash outside my
    function, and the arguments i am passing seems to OK

    > when loading a core into dbx it won't necessarily show the crashing
    > thread. So just issue a "threads" first to take a look at which threads

    this is like info threads in gdb ? (i think yes,) ( I need to re-read
    dbx manual carefully!)
    > where running. Then you can use something like "thread t@2" to switch to
    > the thread you are interested in.
    >
    > pstack will show you all threads and if you look for SIGSEGV, you will


    pstack is not showing SIGSEV but dbx threads is showing something
    (partially copied below)

    > also find the thread that caused the crash.




    (dbx)threads
    --some other threads info--
    o>t@262123 b l@262123 request_thread() signal SIGSEGV in
    _free_unlocked()
    --some more threads--


    (dbx) thread t@262123
    t@262123 (l@262123) stopped in _free_unlocked at 0xff0d4fcc
    0xff0d4fcc: _free_unlocked+0x00ac: add %i3, -128, %i0
    (dbx) where
    [1] _free_unlocked(0xfffffffc, 0x0, 0x932f4, 0xff16fad4, 0xff168284,
    0xfffffffc), at 0xff0d4fcc
    [2] _free_unlocked(0xfffffffc, 0xc9138, 0x93334, 0xff0a23f0,
    0xff168284, 0xc9138), at 0xff0d4f6c
    =>[3] fclose(0xffffffff, 0xc91b8, 0x0, 0xff124dfc, 0xff168284,
    0xff16c4d0), at 0xff124d94
    [4]
    [5]
    [6]

    I am calling fclose,
    _free_* is i think some internal function (libc i guess !,)



    pstack core
    other output deleted of pstack

    ----------------- lwp# 262123 / thread# 262123 --------------------
    (complete output of thred 262123,
    ff0d4fcc _free_unlocked (fffffffc, 0, 932f4, ff16fad4, ff168284,
    fffffffc) + 4c
    ff0d4f6c free (fffffffc, c9138, 93334, ff0a23f0, ff168284, c9138)
    + 24
    ff124d94 fclose (ffffffff, c91b8, 0, ff124dfc, ff168284, ff16c4d0)
    + dc
    0005d92c My_func2 (fdd7b4e4, fdd7b932, fdd7a7c9, fdd7b967, fdd7a4d4,
    1010101) + 5c8
    00027964 My_func1 (18, fdd7b1d8, fdd7a9d8, fdd7b1d8, fdd7a9d8, 2735c)
    + 608
    0001b0f0 request_thread (95e338, fdd7c000, 0, 0, 1ab18, 0) + 5d8
    ff13fff0 _lwp_start (0, 0, 0, 0, 0, 0)
    -----



    >
    > HTH,
    > Tom- Hide quoted text -
    >
    > - Show quoted text -


    I (always) think my code is OK !

    anyhelp on this ?

    thnx
    --Raxit


  2. Re: Debugging : Hard Time.! ?

    Sheth Raxit schrieb:
    >
    > pstack is not showing SIGSEV but dbx threads is showing something
    > (partially copied below)
    >
    >> also find the thread that caused the crash.

    >
    >
    >
    > (dbx)threads
    > --some other threads info--
    > o>t@262123 b l@262123 request_thread() signal SIGSEGV in
    > _free_unlocked()
    > --some more threads--
    >
    >
    > (dbx) thread t@262123
    > t@262123 (l@262123) stopped in _free_unlocked at 0xff0d4fcc
    > 0xff0d4fcc: _free_unlocked+0x00ac: add %i3, -128, %i0
    > (dbx) where
    > [1] _free_unlocked(0xfffffffc, 0x0, 0x932f4, 0xff16fad4, 0xff168284,
    > 0xfffffffc), at 0xff0d4fcc
    > [2] _free_unlocked(0xfffffffc, 0xc9138, 0x93334, 0xff0a23f0,
    > 0xff168284, 0xc9138), at 0xff0d4f6c
    > =>[3] fclose(0xffffffff, 0xc91b8, 0x0, 0xff124dfc, 0xff168284,
    > 0xff16c4d0), at 0xff124d94
    > [4]
    > [5]
    > [6]
    >
    > I am calling fclose,
    > _free_* is i think some internal function (libc i guess !,)
    >
    >
    >
    > pstack core
    > other output deleted of pstack
    >
    > ----------------- lwp# 262123 / thread# 262123 --------------------
    > (complete output of thred 262123,
    > ff0d4fcc _free_unlocked (fffffffc, 0, 932f4, ff16fad4, ff168284,
    > fffffffc) + 4c
    > ff0d4f6c free (fffffffc, c9138, 93334, ff0a23f0, ff168284, c9138)
    > + 24
    > ff124d94 fclose (ffffffff, c91b8, 0, ff124dfc, ff168284, ff16c4d0)
    > + dc
    > 0005d92c My_func2 (fdd7b4e4, fdd7b932, fdd7a7c9, fdd7b967, fdd7a4d4,
    > 1010101) + 5c8
    > 00027964 My_func1 (18, fdd7b1d8, fdd7a9d8, fdd7b1d8, fdd7a9d8, 2735c)
    > + 608
    > 0001b0f0 request_thread (95e338, fdd7c000, 0, 0, 1ab18, 0) + 5d8
    > ff13fff0 _lwp_start (0, 0, 0, 0, 0, 0)
    > -----
    >
    >
    >
    >> HTH,
    >> Tom- Hide quoted text -
    >>
    >> - Show quoted text -

    >
    > I (always) think my code is OK !
    >
    > anyhelp on this ?
    >
    > thnx
    > --Raxit
    >


    If it is crashing in fclose, I am almost certain you have heap
    corruption going on. Try debugging with libumem enabled. Read 'man
    umem_debug' to find out how it works.

    Basically you start by executing your program (a.out) with
    $ env LD_PRELOAD=libumem.so UMEM_DEBUG=default
    UMEM_LOGGING=transaction=1M,contents=1M,fail=1M ./a.out

    After the crash occured, load the core into mdb:
    $ mdb core
    Then use the uma_* commands to debug the memory situation. If you have
    luck, libumem will trigger the core dump and point you at the location
    where the corruption is going on.

    There is also some docs on this that you probably should read:
    http://access1.sun.com/techarticles/libumem.html

    Addtionally, there are is an interesting blog entry concerning libumem
    debugging:
    http://blogs.sun.com/ahl/date/200407...s_10_top_11_20

    HTH,
    Thomas

  3. Re: Debugging : Hard Time.! ?

    On Mar 16, 9:58 am, "Sheth Raxit" wrote:
    > On 15 Mar, 19:40, Thomas Maier-Komor wrote:
    >
    >
    >
    > > Sheth Raxit schrieb:

    >
    > > > One more hard debugging time, one more crash and i think got stucked,

    >
    > > > Sun OS 10 - sparc

    >
    > > > copying some info here,

    >
    > > > plz I may not want exact solution but i want what are the approach
    > > > experts take to debug this. I

    >
    > > > can't attach the gdb/dbx to running process , because of
    > > > administration restriction.! ) I am having

    >
    > > > core , having binary (compiled with -g )

    >
    > > > dbx Mybin core :

    >
    > > > For information about new features see `help changes'
    > > > To remove this message, put `dbxenv suppress_startup_message 7.5' in
    > > > your .dbxrc
    > > > Reading
    > > > core file header read successfully
    > > > Reading ld.so.1
    > > > Reading libcurses.so.1
    > > > Reading libnsl.so.1
    > > > Reading libsocket.so.1
    > > > Reading libelf.so.1
    > > > Reading libcrypt_i.so.1
    > > > Reading libpthread.so.1
    > > > Reading libdl.so.1
    > > > Reading librt.so.1
    > > > Reading libc.so.1
    > > > Reading libgen.so.1
    > > > Reading libaio.so.1
    > > > Reading libmd5.so.1
    > > > Reading libc_psr.so.1
    > > > Reading liba
    > > > Reading lib
    > > > (dbx) where
    > > > current thread: t@1
    > > > [1] _adjtime(0x4, 0xffbff928, 0xffbff91c, 0x1, 0x0, 0x95e351), at
    > > > 0xff14056c
    > > > =>[2] Tcp_receive(port_no = 9900), line 259 in "b.c"
    > > > [3] main(argc = 1, argv = 0xffbffa34), line 310 in "a.c"
    > > > (dbx) print sockfd
    > > > sockfd = 21

    >
    > > > Also from Tcp_receive some other functions are also called but I dont
    > > > know why dbx is not showing them

    >
    > > > _adjtime --Google/Dbx is not helpful here..

    >
    > > > on same core
    > > > $bash pstack core
    > > > ---some info about threads---
    > > > ----------------- lwp# 2 / thread# 2 --------------------
    > > > ff140478 ___sigtimedwait (c97e0, 0, 0, ff052400, 0, 0) + 8
    > > > ff12c6a4 __posix_sigwait (c97e0, fef7bf8c, 11, 0, 0, 0) + 18
    > > > 0003a5bc ???????? (c97e0, fef7bf8c, c9400, ff052400, ff16cbc0, 1)
    > > > 0003d588 sig_thr (0, fef7c000, 0, 0, 3d530, 0) + 58
    > > > ff13fff0 _lwp_start (0, 0, 0, 0, 0, 0)
    > > > ---some info about threads---

    >
    > > > there are more thread info here but i found above entry doubtfull , I
    > > > am using pstck first time, so i may be wrong.

    >
    > > > --Raxit Sheth

    >
    > Tom/Other,
    >
    > thanks for suggstion, I think still its causing crash outside my
    > function, and the arguments i am passing seems to OK
    >
    > > when loading a core into dbx it won't necessarily show the crashing
    > > thread. So just issue a "threads" first to take a look at which threads

    >
    > this is like info threads in gdb ? (i think yes,) ( I need to re-read
    > dbx manual carefully!)
    >
    > > where running. Then you can use something like "thread t@2" to switch to
    > > the thread you are interested in.

    >
    > > pstack will show you all threads and if you look for SIGSEGV, you will

    >
    > pstack is not showing SIGSEV but dbx threads is showing something
    > (partially copied below)
    >
    > > also find the thread that caused the crash.

    >
    > (dbx)threads
    > --some other threads info--
    > o>t@262123 b l@262123 request_thread() signal SIGSEGV in
    > _free_unlocked()
    > --some more threads--
    >
    > (dbx) thread t@262123
    > t@262123 (l@262123) stopped in _free_unlocked at 0xff0d4fcc
    > 0xff0d4fcc: _free_unlocked+0x00ac: add %i3, -128, %i0
    > (dbx) where
    > [1] _free_unlocked(0xfffffffc, 0x0, 0x932f4, 0xff16fad4, 0xff168284,
    > 0xfffffffc), at 0xff0d4fcc
    > [2] _free_unlocked(0xfffffffc, 0xc9138, 0x93334, 0xff0a23f0,
    > 0xff168284, 0xc9138), at 0xff0d4f6c
    > =>[3] fclose(0xffffffff, 0xc91b8, 0x0, 0xff124dfc, 0xff168284,
    > 0xff16c4d0), at 0xff124d94
    > [4]
    > [5]
    > [6]
    >
    > I am calling fclose,
    > _free_* is i think some internal function (libc i guess !,)
    >
    > pstack core
    > other output deleted of pstack
    >
    > ----------------- lwp# 262123 / thread# 262123 --------------------
    > (complete output of thred 262123,
    > ff0d4fcc _free_unlocked (fffffffc, 0, 932f4, ff16fad4, ff168284,
    > fffffffc) + 4c
    > ff0d4f6c free (fffffffc, c9138, 93334, ff0a23f0, ff168284, c9138)
    > + 24
    > ff124d94 fclose (ffffffff, c91b8, 0, ff124dfc, ff168284, ff16c4d0)
    > + dc
    > 0005d92c My_func2 (fdd7b4e4, fdd7b932, fdd7a7c9, fdd7b967, fdd7a4d4,
    > 1010101) + 5c8
    > 00027964 My_func1 (18, fdd7b1d8, fdd7a9d8, fdd7b1d8, fdd7a9d8, 2735c)
    > + 608
    > 0001b0f0 request_thread (95e338, fdd7c000, 0, 0, 1ab18, 0) + 5d8
    > ff13fff0 _lwp_start (0, 0, 0, 0, 0, 0)



    I am trying to call fclose(fp); Twice.! (i.e. some condition became
    true and it is being called twice,)

    After finding this i have write simple program which open the files
    and try to close(using fclose) twice/thrice, Its not giving segfault
    on Same OS. ! ( Its Undefined behaviour-- i come to know from
    comp.std.c discussion)

    I am having few query.

    1.
    fclose (ffffffff, c91b8, 0, ff124dfc, ff168284, ff16c4d0) <--- I
    think fclose(FILE *fp); so what other args are .?


    2.

    (dbx) print *fp
    *fp = {
    _cnt = 0
    _ptr = (nil)
    _base = (nil)
    _flag = ''
    _file = '\030'
    __orientation = 0
    __ionolock = 0
    __seekable = 1U
    __filler = 0
    }

    this is after first fclose(fp) { and I can't say It is after second
    fclose(fp) , because second fclose dump core, and may internally
    updated some value. }

    Is it Defined/Correct behaviour ?


    > -----
    >
    >
    >
    > > HTH,
    > > Tom- Hide quoted text -

    >
    > > - Show quoted text -

    >
    > I (always) think my code is OK !
    >
    > anyhelp on this ?
    >
    > thnx
    > --Raxit- Hide quoted text -
    >
    > - Show quoted text -



    --Raxit


  4. Re: Debugging : Hard Time.! ?


    "Sheth Raxit" wrote in message
    news:1174474885.954563.135380@n76g2000hsh.googlegr oups.com...
    On Mar 16, 9:58 am, "Sheth Raxit" wrote:
    > On 15 Mar, 19:40, Thomas Maier-Komor wrote:
    >




    >I am trying to call fclose(fp); Twice.! (i.e. some condition became
    >true and it is being called twice,)


    >After finding this i have write simple program which open the files
    >and try to close(using fclose) twice/thrice, Its not giving segfault
    >on Same OS. ! ( Its Undefined behaviour-- i come to know from
    >comp.std.c discussion)


    Yes this is correct. I would guess that the library is just calling
    free on some internal structure which is of course undefined. A good
    habit to get into would be to set any pointers you have to NULL after
    you have finished using them, so if you try and access it again you
    will know immediately (by crashing), instead of later when your
    program may get into an inconsistent state.

    e.g.
    FILE* fp = fopen(..., "r");
    // read from file etc...
    fclose(fp);
    fp = NULL;


    >I am having few query.
    >
    >1.
    > fclose (ffffffff, c91b8, 0, ff124dfc, ff168284, ff16c4d0) <--- I
    >think fclose(FILE *fp); so what other args are .?


    When a function has no debug information, dbx will show the contents
    of the first 6 registers for that call.
    So in other words those values could be anything. I would guess only
    one value is actually the value you want, but I'm not sure which. If I
    had to guess, probably "c91b8" unless you set the file pointer to NULL
    after you closed it the first time?

    >
    >2.
    >
    >(dbx) print *fp
    >*fp = {
    > _cnt = 0
    > _ptr = (nil)
    > _base = (nil)
    > _flag = ''
    > _file = '\030'
    > __orientation = 0
    > __ionolock = 0
    > __seekable = 1U
    > __filler = 0
    >}
    >
    >this is after first fclose(fp) { and I can't say It is after
    >second
    >fclose(fp) , because second fclose dump core, and may internally
    >updated some value. }
    >
    >Is it Defined/Correct behaviour ?


    The contents of a FILE structure are (I'm pretty sure) implementation
    dependent.
    So the fields inside it and the "normal" values for these will vary
    depending on your operating system and platform.
    Unless you have any reason to believe fclose is not closing a file
    correctly the *first* time you call it then I don't see any reason to
    look at this data. And to be honest if there was ever a bug in that
    code, a lot of people would be very unhappy so it is likely a patch
    would be released as fast as possible.

    HTH
    Mark



+ Reply to Thread