December 05, 2006

Handling Exceptions for Contingency

Quite often developers (myself included) will take notice of certain inconspicuous issues with code before they even cause any real problems during testing. Being the cautious developers that we are, it's not uncommon to take action ahead of time by adding try..catch blocks. For instance, adding local exception handling into methods in preparation for an exception that might occur at runtime is quite common. Sometimes there is good cause for such code to be in place, but local exception handling should always be implemented with caution.

When local try..catch blocks are added during development to catch possible, but unlikely exceptions, they may inadvertently catch exceptions that shouldn't be handled in the same way, causing another exception to be thrown or, even worse, suppression of an exception that should propagate to the AppDomain because it indicates an actual bug. This makes debugging very difficult, or even impossible if these unexpected exceptions are occurring in the production environment but being suppressed. Exceptions that you truly didn't expect shouldn't be caught in local try..catch blocks - that's the point of exceptions when they are unexpected!

Cautious But Fallible

Say, for example, that you're writing a logging function that relies on a System.IO.FileStream constructor overload, which can throw several different types of exceptions [1]. The example method below creates a new log file, the name of which is based on a static variable that is incremented each time the method is invoked (it doesn't matter how the variable is first initialized, so assume that it's initialized with an appropriate value). The code has a local catch block for a particular exception that you realized may be thrown at runtime, but not necessarily. You have added the catch block for contingency:

static string GetNextLogName()
{
    return "log" + ++lastLogSeqNo + ".txt";
}

static void CreateLog(string path, string data)
{
    string fullPath = Path.Combine(path, GetNextLogName());

    try
    {
        FileStream stream = new FileStream(fullPath, FileMode.CreateNew, FileAccess.Write);
        ...
    }
    catch (IOException)
    {
        // try again with the next log name in the sequence
        CreateLog(path, data);
    }
}

(File.Exists is not used here because it won't help at all.   There is an interprocess synchronization issue that needs to be dealt with - the same issue that you're attempting to solve through recursion.)

The actual name of the log file isn't important to you - only that it's unique.  While authoring this code you realized that a file with the current sequence number might already exist, probably resulting in some exception being thrown, so you pull open the docs to see what Type of exception may be thrown and you notice the following for the FileStream constructor that you're using [1]:

IOException

An I/O error occurs, such as specifying FileMode.CreateNew and the file specified by path already exists.

-or-

The stream has been closed.

So you added a catch block to catch that specific exception, as recommended by exception-handling standards [2]. You're not worried about the stream being closed since you're trying to create it, so you can safely assume that the latter cause of the exception doesn't apply.  You're aware that if some threading bug exists or some other process creates a file with a name that impedes on your internal naming convention, you'll want to try again with the next available log name, so you call the method recursively. We'll assume that you're not worried at all about a StackOverflowException since this code will be executed only once a week and the log files are cleaned periodically anyway by the same process, which will fail if the logs cannot be removed.

So, what's the problem?

DirectoryNotFoundException! Well, you didn't think of that when you wrote the method and tried to be slick by catching IOException, but there's an even bigger problem now.  You might think to yourself, "No there isn't! That exception won't be caught so it will be handled by my global exception handling routine".   But DirectoryNotFoundException derives from IOException, so it will be caught too.  This results in an infinitely recursive method call that will eventually eat up the entire stack, causing a StackOverflowException to be thrown even though you weren't expecting it. So you might think, "Well that's fine because an anomaly like this will be easy to debug later - just take the exception log written by the global error-handling code and examine the stack trace." Okay, what happens if there is no log and stack trace? Starting with the .NET Framework version 2.0, a StackOverflowException object cannot be caught by a try-catch block and the corresponding process is terminated by default [3]. StackOverflowException cannot be trusted by reliable applications and so should be avoided at all costs.  Now, imagine that the code above was running in production when a DirectoryNotFoundException occurs.  How could you possibly repro that error without any diagnostic information?

I admit that the code above might not be optimal, but surely you don't think that things like this are easily avoided, all of the time. I've seen some strange anomalies in developer code. Granted, most of them were due to issues with multi-threaded synchronization (actually, a lack thereof), however I'm quite certain that cases such as this are found throughout production code more often than expected.

The problem in this example is simply that you didn't realize DirectoryNotFoundException derives from IOException. Actually, there are a lot of exceptions in the framework that derive from base exceptions that are somewhat unintuitive.  For examples, take a look at the list of exceptions that derive from these commonly caught exceptions: IOException [4], ArgumentException [5] and ApplicationException [6]. I'm sure some will surprise you ;)

(Note: Microsoft no longer recommends that you derive custom exceptions from ApplicationException.  System.Exception is recommended as a base type, generally [2])

This particular situation can be resolved easily by catching DirectoryNotFoundException before IOException, and re-throwing it:

    catch (DirectoryNotFoundException)
    {
        throw;
    }
    catch (IOException)
    {
        // try again with the next log name in the sequence
        CreateLog(path, data);
    }

Unless you can absolutely guarantee that all of your exception-handling designs will be perfect, or that all unit tests will provide complete coverage before code is released, then there is no good excuse of which I'm aware for adding local try..catch blocks to handle exceptions for contingency without attempting to discover all of the possible exceptions and their hierarchies first, but even that takes time and may require you to redesign the actual structure of your outer code as well.  Discovering all of the exceptions that may be thrown and figuring out all of the possible ways that you might be able to recover without inadvertently causing other problems isn't always easy and recovery isn't always possible by simply adding a try..catch block.  For instance, it's quite possible that one method may throw ArgumentException while a method that it calls throws ArgumentNullException, which derives from ArgumentException [5].  If you are aware that the method your code will be calling throws ArgumentException, and you catch it, then you'd be catching ArgumentNullException as well.  It's not likely that you'll be able to write code that will handle this circumstance given that the null argument was the fault of the method into which you called.  This particular example might be considered a bug in the invoked method, but it's quite possible for something like this to occur and not be a bug (with exceptions that don't depend on arguments, for instance, but internal state instead).  In that case, you'd be catching an exception of which your code couldn't handle gracefully.  If it's being suppressed or handled inappropriately, you could easily end up with similar problems as I've illustrated above.

Solutions

One approach to designing exception handling for situations where you predict that an error might occur is to code without exception handling first, and then add it later only after the predicted exception actually occurs during testing. In cases where you may expect an exception to be unlikely, although still possible, it's best to architect a solution that will allow your code to gracefully handle the exception.  However, deferring local, contingent exception handling code until exceptions are thrown in unit testing, staging or production environments is a better way to produce reliable applications and helps to minimize the amount of time it takes to debug code (Yes, not catching exceptions helps you to debug your program!).  For those exceptions that are known to be possible, yet rare, you should take care in your local exception handling code to ensure that you aren't inadvertently causing other problems.

Another approach is to provide exception handling in the global exception handler for unlikely exceptions (a catch-all exception handler such as the AppDomain.UnhandledException event [7]). This way, you've ensured that the operation has backed-out entirely and the application can continue at some point before the failure, considering whether or not state has been corrupted, of course. Even in the case that the application cannot continue due to corruption, the error will be logged with details of the exception and the application can prompt the user so they are aware of the error. The user can choose whether to try the operation again (this may require you to code some sort of reset routine that can clean-up invalid state without having to restart the entire program). If it's an automated system, the system can automatically try the operation again.

A third approach, although subject to the same issues, is to add exception handling code in a higher level, but not quite in a global exception handler. Basically, if you can determine transactional points in your code where in the case of an exception you can assure that no state will be corrupt, or else it can be fixed before returning to the caller, then you can catch base exceptions such as IOException without a fear of causing any major problems. The key is to ensure that you don't do anything that depends on the semantics behind the exception. Instead, assume the exception was fatal and roll back the state to immediately before the point of failure.

The best approach is to avoid exceptions in the first place.  This doesn't mean that you should avoid throwing them in code, I just mean that you should look for other ways around having to use try..catch blocks when you expect that an exception may be thrown.  For example, instead of catching ArgumentException, verify the arguments before passing them to a method.  Another example would be to check CanSeek before attempting to Seek on a Stream.  However, when working with shared resources and multiple threads this approach isn't always possible due to its asynchronous nature.  For instance, calling File.Exists may return true but another process might slip in and delete the file before your next line of code gets a chance to open it. Your only option is to just try the operation again. Therefore, the second or third approach is better suited for this type of scenario.

--
References

[1] FileStream.FileStream(String, FileMode, FileAccess) Constructor
http://msdn2.microsoft.com/en-gb/library/tyhc0kft.aspx

[2] .NET Framework Developer's Guide, Best Practices for Handling Exceptions
http://msdn2.microsoft.com/en-us/library/seyhszts.aspx

[3] StackOverflowException Class
http://msdn2.microsoft.com/en-us/library/system.stackoverflowexception.aspx

[4] IOException Hierarchy
http://msdn2.microsoft.com/en-us/library/bb57z14b.aspx

[5] ArgumentException Hierarchy
http://msdn2.microsoft.com/en-us/library/hat7a83s.aspx

[6] ApplicationException Hierarchy
http://msdn2.microsoft.com/en-us/library/kw9wwk34.aspx

[7] AppDomain.UnhandledException Event
http://msdn2.microsoft.com/en-us/library/system.appdomain.unhandledexception.aspx

As always, I'd love to hear from anyone that has something to say about this post.  Drop me a comment.