Leaky Abstractions and the Value of Fast Failure

Joel Spolsky is the author of Joel on Software, a popular blog about computer programming. He wrote a great post in 2002 about what he called “leaky abstractions.” Programmers think about abstraction a lot because it’s indispensable for managing the complexity of modern computers. Abstraction is what allows us to think about “the Internet” rather than a collection of millions of routers, switches, and servers and “files” rather than strings of ones and zeros etched onto the magnetic platters on our hard drives. Abstraction allows both users and programmers to ignore the mind-boggling complexity under the hoods of our computers, and instead deal with a sparse, clean, intuitive interfaces.

The problem, Spolsky says, is that abstractions are leaky. he uses the TCP half of the Internet’s basic TCP/IP protocols, to illustrate the problem:

Earlier for the sake of simplicity I told a little fib, and some of you have steam coming out of your ears by now because this fib is driving you crazy. I said that TCP guarantees that your message will arrive. It doesn’t, actually. If your pet snake has chewed through the network cable leading to your computer, and no IP packets can get through, then TCP can’t do anything about it and your message doesn’t arrive. If you were curt with the system administrators in your company and they punished you by plugging you into an overloaded hub, only some of your IP packets will get through, and TCP will work, but everything will be really slow.

This is what I call a leaky abstraction. TCP attempts to provide a complete abstraction of an underlying unreliable network, but sometimes, the network leaks through the abstraction and you feel the things that the abstraction can’t quite protect you from.

One of the most important differences between a novice programmer and an experienced one is that the seasoned programmers understand the leakiness of the abstractions they use. They understand what’s going on “under the hood,” and this gives them a sense for how to use the tools they’re given effectively—and how to recover gracefully if they break. Novice programmers are like novice drivers in a stickshift car: because they don’t understand how the transmission works, they grind the gears. And novice programmers are blindsided when the abstraction breaks and they’re suddenly exposed to the messy underpinnings of the software stack.

One of the ways programmers deal with leaky abstraction is with fail-fast design. Fail-fast design says that when a component in a complex system encounters an error, it should immediately report the error and shut down, rather than trying to recover from the error and continuing. Fail-fast design helps users manage leaky abstractions, because it gives the user clear and immediate feedback that they’re using the abstraction wrong. This should prompt the programmer to read the documentation more thoroughly, change the way he’s using the abstraction, choose a different abstraction more suitable to the task, etc. It’s frustrating to develop software for a system that’s not built in a fail-fast fashion because a programmer might sink a ton of time into a program built on a poorly-chosen abstraction before problems crop up.

Fail-fast is a useful strategy in fields far removed from software development. Here is an essay about how important it is for entrepreneurs to fail fast. And, of course, failing fast is closely related to failing cheaply, a concept I’ve discussed here in the past.

Leaky Abstractions and the Value of Fast Failure

Leave a Reply Cancel reply

Archives

Blogroll

Search