Request for new feature - silent crash handling

Sep 19, 2007 at 11:00 PM
It does not appear that the JOBOBJECTLIMIT_ DIEONUNHANDLED_EXCEPTION limit flag is supported. Can this be added so that we can prevent apps that crash with unhandled exceptions from hanging a job forever?

Also - what is that strategy for handling applications that sit doing nothing forever, not using any real CPU. In some cases you can just assume that they are hung in a dialog box of some kind and want them to timeout and get killed.

-Bill
Coordinator
Sep 24, 2007 at 12:26 AM
Yes it is implemented:

using (JobObject jo = new JobObject("DieOnUnhandledExceptionJobObject"))
{
jo.Limits.IsDieOnUnhandledException = true;
}

Please check and see if this is what you needed.
I will add an example that uses this flag, check the project source code.

For the other question:
Vista has a new API that lets you check if a process is in deadlock using Wait Chain Traversal. We can use it to check if any of the process in the Job is in this state. Do you think we need to add this support?

Alon.



wradcliffe wrote:
It does not appear that the JOBOBJECTLIMIT_ DIEONUNHANDLED_EXCEPTION limit flag is supported. Can this be added so that we can prevent apps that crash with unhandled exceptions from hanging a job forever?

Also - what is that strategy for handling applications that sit doing nothing forever, not using any real CPU. In some cases you can just assume that they are hung in a dialog box of some kind and want them to timeout and get killed.

-Bill

Sep 24, 2007 at 5:54 PM
I checked out the new test program and it works as expected. There is a long delay as the crash dump is created, but the operation is silent. This will be very useful. Thanks!

There is an article on MSDN that indicates that the Wait Chain Traversal functionality is only applicable to native code.

http://msdn.microsoft.com/msdnmag/issues/07/07/Bugslayer/default.aspx

What would be more interesting is support for simple "realtime" timeouts on applications. Therre is currently no way to say - please abort this app if it runs for more then a hour of clock time. You can only (apparently) abort on User CPU time used. I understand that the OS does not track this for you.

-Bill
Coordinator
Sep 25, 2007 at 1:47 AM
Hi

You are correct, The WaitChainTraversal is a native API, but so does the CreateJobObject !!! We can call it from the C++/CLI implementation of the JobManagement project. However it will only work on Vista and Windows Server 2008 and it will not indicate a process that is doing nothing but capable of getting Windows messages (no deadlock)
In the current JobObjectWrapper we have added a mechanism of Absolute timer, but this will kill the process even if it does a lot of work. How do you indicate that a process is doing nothing? See its I/O counters? See its threads context switches performance counter? It memory working set size? What if there is a poling process that does nothing but spend CPU time?
We can set a timer for each process when we get a New Process event from the Job and capture those counters. When the timer goes off we can check if those counter values have changed, and decide whether to wait another timer interval or kill this process.
Do you think that this is a good pattern? Do you think that we should implement it in the JobObjectWrapper?

Alon.
Sep 25, 2007 at 5:55 AM
I now see the SetAbsoluteTimer methods. I missed that. I also see the IOCounters class, which seems to be a starting point for what you are discussing.

In my particular case, just want to detect an app that is "idle", which probabaly means minimal or no I/O and minimal CPU time.

Yes - I think this is a good pattern.