[ Date Index ][
Thread Index ]
[ <= Previous by date / thread ] [ Next by date / thread => ]
Mike, I don't know this program, but I suspect your guess with SIGPIPE is correct. As you know in Perl you have to $? >> 8 to get the exit code. This is not just being obscure, the C interface (wait/waitpid) is the same. It is possible (although bad programming) that it is just returning the full return value from waitpid without decoding it. A child sends a SIGCHLD to the parent when it dies, which at worst will produce a zombie, not kill the parent. However if a pipe/unix domain socket writer looses its reader, then SIGPIPE is raised. Might I suggest that you run the parent under strace? Try: strace -o strace.out -f c25dl1 2>&1 >> $log. The -o will dump all kernel calls to strace.out (might be large!) and the -f means "follow", ie. include child processes in the trace. You should then be able to see the SIGPIPE, or whatever is raised, in the strace.out file. Clive -----Original Message----- From: mike@xxxxxxxxxxxxx [mailto:mike@xxxxxxxxxxxxx] Sent: 23 November 2003 21:02 To: list@xxxxxxxxxxxx Subject: Re: [LUG] Error message On 23.11.2003 13:04 mike@xxxxxxxxxxxxx wrote: > G'day all, > > I have a bit of code... > > log="/log/x25dl1.log"; > while true > do > x25dl1 2>&1 >> $log > status=$? > dd=`date +"%d-%b-%Y %H:%M:%S"|tr [:lower:] [:upper:]` > echo "$dd [$$] FATAL: x25dl1 has died. Status = $status">>$log > sleep 5 > done > > In the log file I get... > 22-NOV-2003 19:00:02 [688] FATAL: x25dl1 has died. Status = 141 > > I can't find what the status of 141 is. Any ideas? > > x25dl1 is a c program. > If it was perl I would have said it was a SIGPIPE error, which would make sense since it uses sockets to communicate. OK, more on this. x25dl1 is a forking process (In more ways than one!!) The process that dies is the parent, when it dies it takes all the children with it. This could be bad, in that the children might actually be doing something at the time. I think there might be a problem with a child that is causing the parent to die, does that help/make sense? There is another problem. The children send and receive via sockets, it sends data to a another process and then waits for the other process to reply. Sometime the other process never answers and the x25dl1 child sits there for ever waiting for data that is never going to appear (why I am not sure yet, but I am working on that) In the mean time I have a process that runs from cron every 15 minutes and kills off x25dl1 children that have been running for more than 15 minutes. This is just to reduce the clutter, if the child is still running after about 15 seconds something has gone wrong. Could this kill process be causing a problem? I.E. causing the parent to die, I don't see a link as yet, but I am beginning to wonder. The log file shows that the parent dies at either 00,15,30,45 minutes past the hour which is when the kill process runs. But it also dies with the same error outside these times as well. -- 'ooroo Mike...(:)-) --------------------------------------------------- Email: mike@xxxxxxxxxxxxx o You need only two tools. o ///// A hammer and duct tape. If it /@ `\ /) ~ doesn't move and it should use > (O) X< ~ Fish!! the hammer. If it moves and `\___/' \) ~ shouldn't, use the tape. \\\ --------------------------------------------------- -- The Mailing List for the Devon & Cornwall LUG Mail majordomo@xxxxxxxxxxxx with "unsubscribe list" in the message body to unsubscribe. ________________________________________________________________________ This e-mail has been scanned for all viruses by Star Internet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ ________________________________________________________________________ This e-mail has been scanned for all viruses by Star Internet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ -- The Mailing List for the Devon & Cornwall LUG Mail majordomo@xxxxxxxxxxxx with "unsubscribe list" in the message body to unsubscribe.