threads suspend in posix_memalign - Linux
This is a discussion on threads suspend in posix_memalign - Linux ; I have a multithreaded app and I'm seeing a strange issue where
(according to a gdb attached to the process) one of the threads will
get into posix_memalign and never come out.
(gdb) where
#0 0x40e62f2d in posix_memalign () from ...
-
threads suspend in posix_memalign
I have a multithreaded app and I'm seeing a strange issue where
(according to a gdb attached to the process) one of the threads will
get into posix_memalign and never come out.
(gdb) where
#0 0x40e62f2d in posix_memalign () from /lib/tls/libc.so.6
#1 0x40e60777 in mallopt () from /lib/tls/libc.so.6
#2 0x40e5e4bf in malloc () from /lib/tls/libc.so.6
#3 0x40d8448e in operator new () from /usr/lib/libstdc++.so.5
#4 0x080818eb in Broadcast::createCommand (this=0x825a948,
commandID=201, clientID=554934) at Broadcast.cpp:1062
I've seen this several times. This thread doesn't return back up the
chain and eventually my other threads stop because they need something
from this one.
Anybody seen this or have an idea what could cause it?
Thanks!
-
Re: threads suspend in posix_memalign
On Jul 2, 2:07 pm, j...@riverstyx.net wrote:
> I have a multithreaded app and I'm seeing a strange issue where
> (according to a gdb attached to the process) one of the threads will
> get into posix_memalign and never come out.
>
> (gdb) where
> #0 0x40e62f2d in posix_memalign () from /lib/tls/libc.so.6
> #1 0x40e60777 in mallopt () from /lib/tls/libc.so.6
> #2 0x40e5e4bf in malloc () from /lib/tls/libc.so.6
> #3 0x40d8448e in operator new () from /usr/lib/libstdc++.so.5
> #4 0x080818eb in Broadcast::createCommand (this=0x825a948,
> commandID=201, clientID=554934) at Broadcast.cpp:1062
>
> I've seen this several times. This thread doesn't return back up the
> chain and eventually my other threads stop because they need something
> from this one.
>
> Anybody seen this or have an idea what could cause it?
>
> Thanks!
I have never seen this precise problem, but the short answer is that
something is almost certainly wrong with your allocator. One thing
that could do it is if you called 'malloc' from an improper context
(such as a signal handler).
If you check all the threads running at the time, maybe you will find
that there is always at least one thread that it in an obviously wrong
situation. Check the stack for an allocator function that was
interrupted somehow and where the interrupting code called an
allocator function as well.
It may help to link in a different allocator. Although it won't fix
the problem, it may do a better job of reporting the problem to you.
(If everything works fine with a different allocator, DO NOT assume
the problem is not your problem. Most likely, the other allocator just
tolerates the problem better.)
DS
-
Re: threads suspend in posix_memalign
jeff@riverstyx.net writes:
> I have a multithreaded app and I'm seeing a strange issue where
> (according to a gdb attached to the process) one of the threads will
> get into posix_memalign and never come out.
>
> (gdb) where
> #0 0x40e62f2d in posix_memalign () from /lib/tls/libc.so.6
> #1 0x40e60777 in mallopt () from /lib/tls/libc.so.6
> #2 0x40e5e4bf in malloc () from /lib/tls/libc.so.6
> #3 0x40d8448e in operator new () from /usr/lib/libstdc++.so.5
> #4 0x080818eb in Broadcast::createCommand (this=0x825a948,
> commandID=201, clientID=554934) at Broadcast.cpp:1062
>
> I've seen this several times. This thread doesn't return back up the
> chain and eventually my other threads stop because they need something
> from this one.
>
> Anybody seen this or have an idea what could cause it?
The two most likely (IMO) guesses would be 'internal deadlock' and
'looping because of heap corruption' (eg trying to find the end of a
linked list which contains an item pointing to itself).
-
Re: threads suspend in posix_memalign
On Jul 3, 12:34 am, Rainer Weikusat wrote:
> The two most likely (IMO) guesses would be 'internal deadlock'
I agree.
> and
> 'looping because of heap corruption' (eg trying to find the end of a
> linked list which contains an item pointing to itself).
I've never seen this happen, but I supposed it's possible. If this
were the case, the program would burn full CPU whereas if it was
deadlock it would use little to no CPU. (Unless other threads spin
waiting for these CPUs, I suppose.)
See if the thread in posix_memalign is burning the CPU, if you can.
Better yet, run with a debug version of the library and see *where* in
posix_memalign it is.
DS
-
Re: threads suspend in posix_memalign
David Schwartz writes:
> On Jul 3, 12:34 am, Rainer Weikusat wrote:
[...]
>> and
>> 'looping because of heap corruption' (eg trying to find the end of a
>> linked list which contains an item pointing to itself).
>
> I've never seen this happen, but I supposed it's possible.
At least the Opsec NG FP3 checkpoint firewall client library for Linux
had the habit of causing this to happen. Application programming is
much more interesting if one cannot call malloc or free because either
of both would ocassionally 'never come back' ...