how to figure out where a process hung - Embedded
This is a discussion on how to figure out where a process hung - Embedded ; I've got a problem...
I have a process run from cron that hangs in an uninterruptible sleep.
It reads data from a webcam, and writes to a tmpfs partition.
This process runs every 15 minutes; and most of the time ...
-
how to figure out where a process hung
I've got a problem...
I have a process run from cron that hangs in an uninterruptible sleep.
It reads data from a webcam, and writes to a tmpfs partition.
This process runs every 15 minutes; and most of the time it will run just
fine, but once in a while, it hangs. Then cron runs it again, and it
hangs again. Pretty soon I have a bunch of hung processes that consume
all resources, and for all practical purposes my little system is dead.
The frustrating thing is that it happens rarely; the process runs every 15
minutes and sometimes it will run for days just fine, and then it will
start hanging.
Is there some way to find out where the process is hung after it is hung
up?
The program is spcacat, a very simple snapshot util for webcams using the
spca driver: . Anyone have any suggestions? I
have 3 weeks to get this up and running, and that doesn't give me much
time....
--Yan
--
o__
,>/'_ o__
(_)\(_) ,>/'_ o__
Yan Seiner, PE (_)\(_) ,>/'_ o__
Certified Personal Trainer (_)\(_) ,>/'_ o__
Licensed Professional Engineer (_)\(_) ,>/'_
Who says engineers have to be pencil necked geeks? (_)\(_)
-
Re: how to figure out where a process hung
> hangs in an uninterruptible sleep.
What does this mean ? A sleep() call needs to specify a time, so it
can't "hang".
Moreover, AFAIK, a user land process only can do uninterruptible sleep
(a very short nanosleep() ), if it is assigned very special attributes.
Is it possible that the process waits for some hardware event that does
not occur due to defective hardware ?
-Michael
-
Re: how to figure out where a process hung
Captain Dondo wrote:
> The frustrating thing is that it happens rarely; the process runs every 15
> minutes and sometimes it will run for days just fine, and then it will
> start hanging.
>
> Is there some way to find out where the process is hung after it is hung
> up?
Try to use "strace". When it hangs connect to it with "strace -p " and
you will see where it hangs (if it hangs in a system call). If this does
not help, try with "ltrace" instead.
Hope it helps
Juergen
-
Re: how to figure out where a process hung
Hello,
> This process runs every 15 minutes; and most of the time it will run just
> fine, but once in a while, it hangs. Then cron runs it again, and it
> hangs again. Pretty soon I have a bunch of hung processes that consume
> all resources, and for all practical purposes my little system is dead.
Your cronjob could kill all running instances before starting a new one.
That way the ressources would stay free and the system won't get problems.
It is not a clean solution, but at least it can keep your ressources free.
You could log if any instances are killed (instead of a clean exit) and
maybe find some event which causes the program to hang.
> have 3 weeks to get this up and running, and that doesn't give me much
> time....
That's an idea, but it will only help against the ressource leak of hung
processes, not against the problem.
I'd guess the program waits for something (a camera's event?), but never
gets it.
Regards,
Sebastian
-
Re: how to figure out where a process hung
On Tue, 01 Aug 2006 10:19:57 +0200, Michael Schnell wrote:
>> hangs in an uninterruptible sleep.
>
> What does this mean ? A sleep() call needs to specify a time, so it
> can't "hang".
>
> Moreover, AFAIK, a user land process only can do uninterruptible sleep
> (a very short nanosleep() ), if it is assigned very special attributes.
>
> Is it possible that the process waits for some hardware event that does
> not occur due to defective hardware ?
>
> -Michael
From 'man ps':
PROCESS STATE CODES
Here are the different values that the s, stat and state output
specifiers (header "STAT" or "S") will display to describe the state of
a process.
D Uninterruptible sleep (usually IO)
These processes show up as 'D', which means they cannot be killed.
I am guessing that thse processes are waiting for some camera event that
never occurs, but I have not figured out why only sometimes....
The camera shares the USB bus with a GPS, which is being polled almost
continously. I suspect there is some bus contention which triggers this,
but I have no idea where to start looking; all of the code I've looked at
looks OK so far.
--Yan
--
o__
,>/'_ o__
(_)\(_) ,>/'_ o__
Yan Seiner, PE (_)\(_) ,>/'_ o__
Certified Personal Trainer (_)\(_) ,>/'_ o__
Licensed Professional Engineer (_)\(_) ,>/'_
Who says engineers have to be pencil necked geeks? (_)\(_)