Digging up old book reviews, I discovered for some reason I wrote two reviews of To Engineer is Human. Here is the second one. Or maybe the first.
When I worked in Silicon Valley, it seemed everyone wanted the title Software Architect. It’s a sexy-sounding title, maybe because in the movies all the cool people work in advertising or architecture firms, but in the real building world, while the architects get the glory, someone has to put it together and make sure it doesn’t fall apart.
Looking at other engineering fields may sound dull (that is why I majored in software engineering, after all), but reading a few books by Henry Petroski fixes that. The Evolution of Everyday Things and Small Things Considered: Why There is No Perfect Design are reminiscent of Donald Norman’s The Design of Everyday Things but with the focus less on why is everything so badly designed and more on how we got there. Wonder why there seems to be a new hot scripting language every year? Look at how forks evolved from sharp sticks to two-tine prongs and then the tines kept coming, along with endless variations of curvature and finish. Much of the impetus is fashion (and the Emily Post comments on silverware foreshadow the code style critics of today), but the five hundred variations of hammer design found just in seventeenth century England are likely due to the fact that when you have a hammer, everything really does look like a nail. This engineering as evolution is consistent with Eric Raymond’s observation that open source projects tend to start as itches that need to be scratched.
The belated advent of industrial design, exemplified by the career of Raymond Loewy, has its counterpart in today’s GUI’s. Starting with the Mac desktop interface, form followed aesthetics along with function. But these are just fun analogies — the book that gives me pause is Petroski’s To Engineer Is Human: The Role of Failure in Successful Design. I can’t think of the software equivalent of a metal fatigue crack that just keeps growing, since software (at least once you stop coding) is static. But, as with the Liberty Bell, I guess there is software that you use gingerly, knowing it’s frailty (click slowly and carefully!)
More obvious lessons come from examples like the Tacoma Narrows bridge which oscillated wildly and collapsed under high winds. (Watch those boundary conditions!) Not only lessons learned, but lessons that should have been learned already — it’s long been practice for soldiers to stop marching in sync when they cross a bridge. And then making expedient ad-hoc design changes and material substitutions is just asking for trouble, as with the Kansas City Hyatt Regency disaster.
When there’s no long-established precedent, sometimes there are more recent clues of something amiss, as in damaged O-rings in missions preceding the Challenger disaster (more low-level examples in software — there’s a race condition? Put in a sleep call. You’re getting an exception? Wrap an “ignore-errors” handler around it). Ignore at your own (or others) risk. Oh, yeah, and when the engineers you trusted to build the thing are concerned, listen to them.
But often you’re building something new, maybe not completely new, but new enough that test as you might, it fails in some unanticipated manner, like the Comet jet that unexplainedly disintegrated. Then you literally have to pick up the pieces (in this case, in the Mediterranean) and perform forensic analysis exceeding anything you’ve seen on CSI and perform test after test (with the Comet, submerging it and simulating thousands of pressurizations/depressurizations). With software, fortunately, we usually have the software intact and the input data to replicate the problem, although there are a few cases like spacecraft gone loco that would drive me to something less stressful, like heart surgery.
Petroski ends the book with a plea for transparency in the analysis of engineering failure. In game development, we have the excellent tradition of “post-mortem” articles written for Game Developer Magazine and archived on gamasutra, but those are mostly for games that have actually shipped with enough success that we want to read about them. Not much on outright failure. And in the rest of the industry, nada. How about a book on notable software failures — To Program Is Human?