Previously

This is the fifth in a series about TDD’s value to humans, following

  1. Keeping my job, in which I lost my manager’s trust,
  2. A last-minute feature, in which I regained some,
  3. Deadline pressure, in which I earned quite a bit more than I’d started with, and
  4. Continual delivery, in which the circle of trust grew to include our Operations team

More value, less risk

The discipline of Test-Driven Development let us share the reasons for our confidence in our code with Operations. Their confidence afforded us the option of the discipline of frequent, regular releases, which we immediately exercised to deliver working, valuable code into production every month. In so doing, we were taking aim at two main goals:

  1. Minimize delay in delivering features to customers
  2. Control operational risk by shipping in smaller increments

Smaller increments meant we were always deploying soon, so we’d try a little harder to make deploying safe. It meant that by release time we could still mostly remember the implications of our recent-enough changes. It meant that the aspects of release procedure that were too expensive to automate (or exhaustively document) were somewhat fresh in Operations’ and our minds. It meant that in case of trouble, we wouldn’t have to worry about un-delivering a pile of new features customers had long been waiting for: we could just roll back, figure out what we missed, and deliver again soon.

Just once

The rollback option was mostly hypothetical, because we could almost always get confident about how our code would behave in production. But for one release, we had no way to test well enough to get that confidence. So instead we thoroughly tested the rollback procedure, highlighted the more-likely-than-usual risk that we’d need to exercise it, and deployed. When the service became heavily degraded, and it was clear we needed to roll back, it wasn’t difficult and nobody was surprised.

Inconclusion

Our track record, all told: one safely failed monthly deployment, dozens of safe successes. What about that one failure, though? How could we, with all our discipline, ever have wound up running production code we couldn’t confidently change?