threads suspend in posix_memalign - Linux

This is a discussion on threads suspend in posix_memalign - Linux ; I have a multithreaded app and I'm seeing a strange issue where (according to a gdb attached to the process) one of the threads will get into posix_memalign and never come out. (gdb) where #0 0x40e62f2d in posix_memalign () from ...

+ Reply to Thread
Results 1 to 5 of 5

Thread: threads suspend in posix_memalign

  1. threads suspend in posix_memalign

    I have a multithreaded app and I'm seeing a strange issue where
    (according to a gdb attached to the process) one of the threads will
    get into posix_memalign and never come out.

    (gdb) where
    #0 0x40e62f2d in posix_memalign () from /lib/tls/libc.so.6
    #1 0x40e60777 in mallopt () from /lib/tls/libc.so.6
    #2 0x40e5e4bf in malloc () from /lib/tls/libc.so.6
    #3 0x40d8448e in operator new () from /usr/lib/libstdc++.so.5
    #4 0x080818eb in Broadcast::createCommand (this=0x825a948,
    commandID=201, clientID=554934) at Broadcast.cpp:1062

    I've seen this several times. This thread doesn't return back up the
    chain and eventually my other threads stop because they need something
    from this one.

    Anybody seen this or have an idea what could cause it?

    Thanks!


  2. Re: threads suspend in posix_memalign

    On Jul 2, 2:07 pm, j...@riverstyx.net wrote:
    > I have a multithreaded app and I'm seeing a strange issue where
    > (according to a gdb attached to the process) one of the threads will
    > get into posix_memalign and never come out.
    >
    > (gdb) where
    > #0 0x40e62f2d in posix_memalign () from /lib/tls/libc.so.6
    > #1 0x40e60777 in mallopt () from /lib/tls/libc.so.6
    > #2 0x40e5e4bf in malloc () from /lib/tls/libc.so.6
    > #3 0x40d8448e in operator new () from /usr/lib/libstdc++.so.5
    > #4 0x080818eb in Broadcast::createCommand (this=0x825a948,
    > commandID=201, clientID=554934) at Broadcast.cpp:1062
    >
    > I've seen this several times. This thread doesn't return back up the
    > chain and eventually my other threads stop because they need something
    > from this one.
    >
    > Anybody seen this or have an idea what could cause it?
    >
    > Thanks!


    I have never seen this precise problem, but the short answer is that
    something is almost certainly wrong with your allocator. One thing
    that could do it is if you called 'malloc' from an improper context
    (such as a signal handler).

    If you check all the threads running at the time, maybe you will find
    that there is always at least one thread that it in an obviously wrong
    situation. Check the stack for an allocator function that was
    interrupted somehow and where the interrupting code called an
    allocator function as well.

    It may help to link in a different allocator. Although it won't fix
    the problem, it may do a better job of reporting the problem to you.
    (If everything works fine with a different allocator, DO NOT assume
    the problem is not your problem. Most likely, the other allocator just
    tolerates the problem better.)

    DS


  3. Re: threads suspend in posix_memalign

    jeff@riverstyx.net writes:
    > I have a multithreaded app and I'm seeing a strange issue where
    > (according to a gdb attached to the process) one of the threads will
    > get into posix_memalign and never come out.
    >
    > (gdb) where
    > #0 0x40e62f2d in posix_memalign () from /lib/tls/libc.so.6
    > #1 0x40e60777 in mallopt () from /lib/tls/libc.so.6
    > #2 0x40e5e4bf in malloc () from /lib/tls/libc.so.6
    > #3 0x40d8448e in operator new () from /usr/lib/libstdc++.so.5
    > #4 0x080818eb in Broadcast::createCommand (this=0x825a948,
    > commandID=201, clientID=554934) at Broadcast.cpp:1062
    >
    > I've seen this several times. This thread doesn't return back up the
    > chain and eventually my other threads stop because they need something
    > from this one.
    >
    > Anybody seen this or have an idea what could cause it?


    The two most likely (IMO) guesses would be 'internal deadlock' and
    'looping because of heap corruption' (eg trying to find the end of a
    linked list which contains an item pointing to itself).

  4. Re: threads suspend in posix_memalign

    On Jul 3, 12:34 am, Rainer Weikusat wrote:

    > The two most likely (IMO) guesses would be 'internal deadlock'


    I agree.

    > and
    > 'looping because of heap corruption' (eg trying to find the end of a
    > linked list which contains an item pointing to itself).


    I've never seen this happen, but I supposed it's possible. If this
    were the case, the program would burn full CPU whereas if it was
    deadlock it would use little to no CPU. (Unless other threads spin
    waiting for these CPUs, I suppose.)

    See if the thread in posix_memalign is burning the CPU, if you can.
    Better yet, run with a debug version of the library and see *where* in
    posix_memalign it is.

    DS


  5. Re: threads suspend in posix_memalign

    David Schwartz writes:
    > On Jul 3, 12:34 am, Rainer Weikusat wrote:


    [...]

    >> and
    >> 'looping because of heap corruption' (eg trying to find the end of a
    >> linked list which contains an item pointing to itself).

    >
    > I've never seen this happen, but I supposed it's possible.


    At least the Opsec NG FP3 checkpoint firewall client library for Linux
    had the habit of causing this to happen. Application programming is
    much more interesting if one cannot call malloc or free because either
    of both would ocassionally 'never come back' ...

+ Reply to Thread