Offensive programming: crash early, crash often.

You might have read that “defensive programming” is a way to write good software. Googeling for defensive programming turns up gems like this:

  • Use error detection programming code (i.e., assertions) to check if a program’s functional behavior corresponds to its intended behavior.
  • Ensure that there are no program paths that bypass the error detection code.
  • Check the values of all module input parameters and all data input from external sources (e.g., from a file or from a user). Verify that all types of data are within the allowable range and size.
  • Check all function return values for errors.

There might be some valid points in that if you are writing in a low level language. The problem is: there is very little code which can’t fail and you never are done checking for errors.

Lets have an random example in C++ which I found on the Internet:

01. #include
02. #include
03. #include
04. using namespace std;
05.
06. int main() {
07.     int sum = 0;
08.     int x;
09.     ifstream inFile;
10.     inFile.open("test.txt");
11.     if (!inFile) {
12.         cout << "Unable to open file";
13.         exit(1); // terminate with error
14.     }
15.
16.     while (inFile >> x) {
17.         sum = sum + x;
18.     }
19.
20.    inFile.close();
21.    cout << "Sum = " << sum << endl;
20.    return 0;
21. }

This snippet has error handling code and I claim it is terribly misguided. What can fail here?

Obviously opening the file can fail. So the author tries to detect that and act on that. Unfortunately I count 6 different things which have a reasonable probability of failing in this snippet.

In line 12. the output of the error message can fail if stdout is blocked or the system is out of memory.

In line 16. reading from the file can fail if the data beeing read can’t be parsed into an integer (thinc alone of ouncode decoding logic which might get in the way somewhere). Also it might be that basic process of reading fails due to timeouts, disk erroers, networ errors etc.

In line 17. there will be an integer overflow after some time.

In line 20 closing the file may fail. You didn’t know that? Check the man page of close(2): “Upon successful completion, a value of 0 is returned. Otherwise, a value of -1 is returned.” Closing may e.g. fail in network filesystem scenarios, during updating of file metadata and the like. And errors during file close have been the reason for several huge security vulnerabilities.

In line 21 output may fail again.

And finally even exiting may fail resulting in Zombie processes.

There is no way to check for all this error conditions and still effectively write code. Even if you do so you finally get mad if you are writing a library and have to communicate all the different errors you detected to the caller. I’t nearly impossible. The best Kerningham and Richie came up with for Unix/C was the global errno variable and that doesn’t work well

This is the reason smart people came up with so called Exceptions (actually they came up with that even before C was invented).

When an error occurs the runtime system halts your program, saves a bunch of interesting data (the traceback) and restarts execution somewhere higher in the callstack. If you havn’t specified a point for continuing the runtime outputs error information and quits the program.

This is an extremely elegant approach because of it’s implications:

1. You don’t have to check for errors to keep your program from creating damage if it runs on after an error. (E.g. a failed malloc). The program is terminated cleanly unless you mingle with the exception mechanism.

2. Error handling does not have to happen in the same code where the action that caused an error happened.

This allows a much more relaxed approach to coding.

Why handle errors

There are several reasons why you might implement error handling code:

1. Keeping the program from causing additional problems/destruction.

2. Solving the problem.

3. Reporting the error.

4. ignoring the error.

As explained if you use a modern language supporting exceptions there is usually no need to handle 1. yourself. You might have to ensure cleanup work, submitting of database records etc where except: finally: ... usually is the right approach.

Solving the problem (2) is a great thing. If your program is able to fix errors itself without user interaction more power to you!

Reporting an error (3) is usually handled by the runtime system. You might put ONE big try: ... except: ... block around your code to allow nice error reporting to the end user or error logging for administration, but for that you only need a dingle instance of error reporting code. Something along the lines of:

try:
    main()
except e:
    try: # suppress errors during mailing
        mail_admins("some error Occured %s" % e)
    except:
        pass
    raise # re-raise the 1. error - alternative print a friendly message.

This means that all other error reporting code usually is wrong:

try:
    fd = open('foobar.txt')
except:
    print "error opening file"
    sys.exit(1)

This code is plainly wrong (in most cases). It looses lots of information (line number, exact error cause) while adding nothing except that “the ugly trackback” doesn’t annoy the user. If your user is easily annoyed use a try: except around main. If you are writing stuff for admins, teach them how to read trackbacks.

An unhanded exception is not some kind of coding error!

And finally: ignoring an error is occasionally a really good choice. If you try to download 10.000.000 URLs and get an 404 exception on an IRL the wisest approach is to just ignore the fact and move on to the next URL.

When should exceptions been rised?

Your code should never bring the on disk and in database data you work with into an state where a crash in your code (or power failure or whatever) destroys data or makes it inconsistent. This is hard, but possible most of the time. This means that if your program crashes there is no risk of loosing anything.

This results in a very simple strategy for a good program:

If you detect a problem solve it. If you can’t solve it, crash and wait for somebody able to solve it.

This is a totally different approach than the one mentioned above. Because I know that I cann not know all the errors that might occur I don’t even try to handle them.