mikeash.com pyblog/friday-qa-2009-10-09-defensive-programming.html comments

mikeash - 2009-10-18 00:39:07

Sun, 18 Oct 2009 00:39:07 GMT

I agree with what you say, with the provision that "other developers" includes yourself, six months into the future.

Some people will use assert macros which are only enabled for debug builds, and which get compiled away in release builds. I think that even Wade would find it hard to argue against those. However, I think it's better to leave them in for release builds as well, for the reasons I discussed previously.

Michael Long - 2009-10-17 19:57:51

Sat, 17 Oct 2009 19:57:51 GMT

Also regarding asserts, adding them is an additional safeguard to help protect against other developers who may at some point be working with or maintaining your code.

You know that a specific function expects a valid file descriptor, but someone else just added to the team might not be as knowledgeable. Or it may be a simple bug in his code. Or yours.

Worse, due to various circumstances, what if the function fails silently to do a bad parameter? I had a developer come to be me once excited that his "improvements" to a sorting algorithm had decreased sort times by 300%.

Which, I found, is relatively easy to do if you never actually sort anything...

Steven Degutis - 2009-10-13 15:54:37

Tue, 13 Oct 2009 15:54:37 GMT

(Jeff: For what it's worth, killing people could roughly be considered "data-loss", so you'd still be right... :)

mikeash - 2009-10-13 03:24:52

Tue, 13 Oct 2009 03:24:52 GMT

How to deal with the NSData issues depends on exactly why you're reading the file.

If you're reading it because it's an application resource, I would just ensure that it will provide some kind of vaguely sensible log statement if something goes wrong and leave it at that. If your app bundle is hosed then you have big problems anyway.

If you're reading it because the user told you to read it, first, use the NSData methods that return an NSError, and present that error to the user upon failure. That takes care of problems where it can't be read. Next, validate the format extensively. You should be doing this anyway, of course, to ensure that your app can't be used as a security hole and such. Take advantage of system-provided APIs if you're dealing with a common format like XML or an image format. This takes care of the unexpected data problems. For enormous files, it can be good to have a sanity check on file sizes, but be careful not to make it too small. The user will not be pleased if he legitimately has a 10GB file that your application refuses to work on despite the fact that it's a 64-bit app and he has 15GB of RAM. For taking a long time to read, you could provide a progress indicator that pops up after a couple of seconds. However, showing a SPOD is acceptable even if it's not particularly good, because the user should understand that it's doing what he just told it to do. Opening a file is something that users are probably trained to understand that it can take a while.

The real torture case is opening arbitrary files that the user didn't explicitly tell you to open. This could be because you're indexing something, or it's some kind of external plugin data, or similar. Much of the previous advice holds, but you'll want to be even more paranoid about it. In particular, be sure that all IO of this nature is not performed on the main thread, or in any other way which could end up blocking user interaction. Assume that the IO could take forever, and make sure that your program tolerates that. Have sanity checks on anything you can think of. If you can tolerate not reading some files, then err on the side of caution. For example, if you're scanning for images, it may be better to skip over any image file bigger than, say, 200MB, even if it's potentially legitimate. Strategies will vary depending on exactly what you're doing, so you'll have to look at each situation individually, but that's the general idea.

Don - 2009-10-12 19:33:44

Mon, 12 Oct 2009 19:33:44 GMT

Great advice and I agree wholeheartedly. Practically, though, I fall more often than I'd like into "if you don't know how to handle them, then don't try". Logs are fine after the fact, but constructing a graceful response in advance seems to depend on the application, the specific interaction, and the application domain. For a practicum, for example, how would you specifically deal with the issues brought up by your NSData allocation example?

Jeff Johnson - 2009-10-10 15:44:05

Sat, 10 Oct 2009 15:44:05 GMT

Wade, are you saying you check for and 'gracefully' handle nil return for every [NSMutableArray array], for example?

Data corruption is the *worst* thing you can do as a programmer. (Well, you could kill people, but I'm assuming consumer Mac apps here and not air traffic controls systems.) If an app crashes without data corruption, you can simply relaunch and be back to work. Unsaved data will be lost, but that's why you implement some kind of autosave. ;-)

mikeash - 2009-10-10 14:28:54

Sat, 10 Oct 2009 14:28:54 GMT

Why do you say my advice is contrary? Where do I say that you shouldn't bother checking for errors if you don't have a way to handle it? The only place this idea exists is within a quote, which I immediately say is only half right. You might want to try going back and reading things again. What I actually say is that you should always check for errors, but you should not always try to handle them. In other words, you shouldn't try to recover from all error conditions. Sometimes an error just means you should log and abort. Trying to handle an error that you don't know how to handle is worse than not checking for it at all.

Your comment about detecting and recovering from malloc failures with extremely large requests is insightful. Extremely large allocations are a case where it can fail without necessarily taking down your entire program or putting it into a bad state.

Regarding asserts, you deliberately add further potential crash points to your app because it's better than the alternative. An assert should only be for constraint which, if broken, will lead to a crash or data corruption. Asserting early and causing a deliberate crash with concrete information about the reason is better than crashing later. You seem to think that data corruption is better than a crash. This is, simply put, insane. You think backups eliminate the need to worry about data integrity. What if your user doesn't discover the corruption until all of his pre-corruption backups are gone? If you honestly think that it's better to corrupt data than to crash, I'd appreciate it if you could post a list of applications you've worked on so that I can be sure to never, ever, ever run any of them on my system.

Wade Tregaskis - 2009-10-10 07:25:50

Sat, 10 Oct 2009 07:25:50 GMT

Too few people promote the use of actual error checking. Having spent many years maintaining other people's code, I'm about ready to beat to death with their own severed limbs the next person I catch writing ignorant code. It's stupid, it's unprofessional, and in a world with more wide-spread defect tracking, it'd get you fired.

But your advice is contrary. You say you should check for errors, only don't bother if you don't actually have a way to handle it. What? 99% of the errors that could possibly happen, you don't know how to handle, and the remaining 1% are usually such soft errors as "my prefs file might not open because this might be my first run", etc.

Myself, I *always* check for errors. Math errors, NULL pointers, anything. Even from malloc. You are not unjustified in saying malloc failure is very problematic, because indeed many many common frameworks and libraries just crash in such situations (*cough*CoreFoundation*cough*). But it's not a black or white thing. Some percentage of your program's mallocs are done in your own code. If you can simply not crash for that fraction of the time, then why accept any less.

And delving further into this specific example, "out of memory" is a funny thing. I can ask malloc for a gigabyte of memory, and it'll fail, "out of memory". Yet I can then allocate a thousand objects, keep calling functions and extending my stack, etc. A lot of apps can actually recover reasonably often from "out of memory" errors for this reason; detection of the error leads to failure in that code, which then unwinds all the way back, releasing whatever stuff you've allocated to that point, etc. Putting you back into a place where you've got at least some free memory.

And maybe you never anticipated asking for that much memory, because you shouldn't be - you might have an overflow or some other bug that causes you to ask malloc for the wrong thing. Having your app gracefully fail and log an error message stating exactly what it just tried to do (quoting the requested malloc size) will instantly lead you to the root problem. Otherwise, you'd probably assume your app just uses too much memory, and mistakenly fire up Object Alloc instead of fixing the real bug.

Getting back to the larger picture... it doesn't hurt anything to do consistent, exhaustive error checking. And by error checking I don't mean asserts - why the heck would you deliberately add further potential crash points to your app?!?!? - I mean catching the error, logging it, and failing gracefully. The resulting logging will help you fix problems much faster than just crashing would - and heck, that's assuming you've got a bug which is nice enough to merely cause a crash. And for end users, it's infinitely better - okay, so I clicked a button and nothing happened. That's a bug. I'll probably complain to the developer about that, if it actually prevents me getting what I want done. But I'm sure going to complain in a lot nicer tone than for a bug where clicking a button makes it crash, losing an hours work.

There is admittedly some argument worth broaching w.r.t. self-confidence within your app after an ambiguous or unqualified error, but then that tends to be handled incidentally by things you should already be doing anyway, for orthogonal reasons: i.e. automatically making backups when saving documents, saving files atomically, etc. And users should have Time Machine backups and so forth. And so much stuff is on the network these days (meaning as simple as emailed around, not necessarily some fancy-pants hipster cloud hype) that it's far better to risk data corruption - which could be recoverable after all, anyway - vs certain data "corruption" in the sense of just crashing and losing everything.

As an addendum, I'll concede that the practical issue with not crashing is that you don't get as much feedback. Users are less enraged when you don't crash - though that's the point, after all - and so are less likely to actually report problems. The real issue in my mind is that there's just no infrastructure to automatically send back soft error reports. I'd be perfectly happy to have the apps I use silently send back (anonymous) error reports (after I okay it the first time). Or throw up a CrashReporter-like dialog saying "hey, yeah, you just clicked a button and it failed, don't worry, I noticed, please click one more button so I can tell my evidently flawed creator about it". Or run an app which simply monitors the console log and spools the output from each app back to its respective developer. Things like that'd be a good project for someone to work on. ;)