Previously

This is the sixth in a series about TDD’s value to humans, following

Cosmic justice

TDD had been helping us build things right. By enabling our new monthly delivery cadence, it had just begun helping us make more impactful choices about what to build. So by some conservation law or other, that’s precisely when a large, non-optional project landed atop our backlog with a hard deadline that was possibly impossible.

Why us?

Our first reaction was to inquire why this project, which was not a straightforward and harmonious logical extension of our product’s capabilities, had been assigned to our team. The response, reasonably, was that the other options under consideration were less straightforward and harmonious. (Unstated, but probably also a factor, was that we had proven ourselves an effective and reliable team.)

Why not us?

Our second reaction was to quickly determine whether the deadline was obviously impossible. To deliver the project outcome required that our storage backend scale to several orders of magnitude more data, along with a probable two- or threefold increase in concurrent network access. That these were achieveable was hardly given: our storage backend was SQLite, not designed for this usage pattern. If we couldn’t deliver the outcome without switching backends, then we’d know up front we couldn’t hit the deadline, and we’d be able to give the business almost all the time on the clock to pursue other options. So we spiked it. The result: it’d be mostly fine, and for several smaller bottlenecks there were effective actions we’d be able to take.

Test-driven planning

Our third reaction was to slowly and painstakingly make arrangements to find out ASAP along the way if we ever slipped into danger of missing the deadline. Our product planning and prioritizing had been haphazard and unstructured, but that hadn’t quite become our biggest problem. Now it clearly was. My teammate Nathan walked us through the reasoning:

  • We have a hard deadline in eight months.
  • We’ll be working on this project, and only this project, until it’s done.
  • We need early agreement from stakeholders that this is what they want, because we can’t afford to waste effort.
  • We need early agreement from the upstream DBA team to work to our schedule, because we can’t afford to be late.

”And that’s why we’re going to make a project backlog,” said Nathan — an ordered list of the fewest possible stories that’d get us to the project outcome, with the simplest possible estimates attached to each story, so that we can do the simplest possible math to get from the deadline to the velocity we need. “Then we’ll start completing stories, observe the velocity we have, and decide whether that’s okay. It might be okay. We’ll find out soon enough.”

The first time we checked, a few weeks in, our velocity was fine. A couple months later, still fine.

Stick to the plan

Meanwhile, we’d been continuing with our usual monthly production deployments, even though the expected business value most months was low, because the expected risk mitigation was high. We were regularly making larger-than-usual internal changes, and it helped everyone involved in the project to see pretty often that we still hadn’t broken anything.

Extending our identity-management system to manage this archaic database system’s not-exactly-identities, with their very different behaviors and business rules, necessitated a significant set of refactorings (afforded by our previous commitment to TDD, of course) and a new network service, for which we designed the API we needed from the DBA team and provided them a matching set of red tests. Then we mocked the API and test-drove around them. When they came back rather later asking whether they could provide a different API, we said no, with two whole legs to stand on:

  1. See our test results here? With the API as previously defined, our system is ready.
  2. See our project burndown here? With the deadline as previously defined, our system had better be ready.

Our discussion turned promptly to the red API tests we’d provided and how we could help them get to green. Which they eventually, not a moment too soon, did.

Change the plan

The third time we checked our velocity, thanks in part to the API kerfuffle, it had dropped a bunch. Not fine. So in the month leading up to the deadline, we made two adjustments:

  1. We de-scoped the last few weeks of stories (which everyone had agreed early on would be okay to leave out).
  2. We gritted our teeth and temporarily tolerated the API having been implemented in production only.

If we ever wanted to change it or the code that consumed it, we’d want a parallel version in non-prod. But DBA was having enough trouble meeting their commitment to us. We came up with a plan to safely end-to-end-test in production during weekend work windows, with the new capabilities turned off for all users except us. It took two Friday-into-Saturday all-nighters, but we got it tested well enough.

Mission accomplished

On the Friday of the deadline, we completed the release into production for all users, fulfilling the purpose of the project, with several whole minutes to spare before midnight.

After midnight and a bit of Nathan’s favorite single malt, we wrote down a backlog of all the corners that had been cut along the way and needed repairing, starting with getting our hands on a non-production instance of DBA’s new API, pronto. We sent the list to our management asking for their backing. Despite repeated pleas, nothing ever came of it — possibly because they wound up leaving the company within the year. And that’s how it came to pass that we wound up running some production code we couldn’t confidently change.

While it was disappointing that our management was unwilling to support or match our level of discipline, we’re still awfully glad we did it our way — and had they cared to notice, they might have been glad too.

Because of our commitment to TDD…

Our domain knowledge was easily shared among new team members. Because of our shared knowledge, we thought of all the right stories to include and defined them well.

Our development flow had become fairly smooth by the time this project landed on us. Because our flow was generally smooth, point estimates led us to useful predictions.

Every completed story was a shippable increment. Because we had the freedom to ship every month, we exercised that freedom to buy production-quality certainty in our progress.

Our just-enough-process plan actually worked. We made meaningful hypotheses and drew informed conclusions that enabled us to adjust the wheel, stay on schedule, and deliver an 8-month project with confidence.

Conclusion

We continued to expand the scope and impact of our disciplined way of working. Nathan had planted in our team the seed of Scrum. No standups, no iterations, no retros; nothing more than an ordered backlog with points. Just enough to solve the problem at hand.

Given how the project went, you can have exactly one guess whether that seed grew.

Posted Thu May 21 02:19:58 2015 Tags:

Previously

This is the fifth in a series about TDD’s value to humans, following

More value, less risk

The discipline of Test-Driven Development let us share the reasons for our confidence in our code with Operations. Their confidence afforded us the option of the discipline of frequent, regular releases, which we immediately exercised to deliver working, valuable code into production every month. In so doing, we were taking aim at two main goals:

  1. Minimize delay in delivering features to customers
  2. Control operational risk by shipping in smaller increments

Smaller increments meant we were always deploying soon, so we’d try a little harder to make deploying safe. It meant that by release time we could still mostly remember the implications of our recent-enough changes. It meant that the aspects of release procedure that were too expensive to automate (or exhaustively document) were somewhat fresh in Operations’ and our minds. It meant that in case of trouble, we wouldn’t have to worry about un-delivering a pile of new features customers had long been waiting for: we could just roll back, figure out what we missed, and deliver again soon.

Just once

The rollback option was mostly hypothetical, because we could almost always get confident about how our code would behave in production. But for one release, we had no way to test well enough to get that confidence. So instead we thoroughly tested the rollback procedure, highlighted the more-likely-than-usual risk that we’d need to exercise it, and deployed. When the service became heavily degraded, and it was clear we needed to roll back, it wasn’t difficult and nobody was surprised.

Inconclusion

Our track record, all told: one safely failed monthly deployment, dozens of safe successes. What about that one failure, though? How could we, with all our discipline, ever have wound up running production code we couldn’t confidently change?

Posted Wed May 13 22:04:14 2015 Tags:

Previously

This is the fourth in a series about TDD’s value to humans, following

Same idea, next level up

As a developer, I’d significantly reduced the waste and risk in my work by incrementally bringing our legacy code under test while test-driving new features (and bug fixes). As a newly minted product manager with a small budget and high customer expectations, if I wanted to maximize my tiny team’s business impact, I needed to find our next big waste-and-risk targets and bring them down.

Conveniently, they were the same target. We’d been releasing into production roughly every 3-6 months. That meant waste: customers had to wait a long time to get any value from our work. And it meant risk: we had to worry about a whole lot of haven’t-thought-about-those-in-a-while changes going into production all at once.

Doctor, it hurts when I do this

How had we settled on this release (in)frequency? Because originally, deciding when to release hadn’t been easy. On our end, having had no immediate feedback about the state of our code, we lacked ongoing confidence in it, so we had to steel ourselves and others for any change, so we released only when someone wanted the desired value badly enough. (Thereby perpetuating the cycle? Naturally.) On Operations’ end, supporting many products, their job was to review all proposed changes to production and reject unsafe ones, and they’d learned to treat even safe-looking changes as unsafe until proven otherwise. They’d set the bar high, with good reason.

If it hurts, do it more often

But by giving ourselves an immediate feedback mechanism, our part of the decision had become easy. As far as we were concerned, the meaningful and thorough automated tests we’d built put us well over that bar. At each desired release, our thousands of green tests (and zero red ones) covered not only the new functionality but also everything that had ever been tested before. Which, by now, was rather a lot. Of course we were confident to ship. Therefore, in the interest of minimizing branch-management effort, lead time, and release-process forgetfulness, we asked Operations for special permission not to have to freeze our code a week in advance, but to ship everything that had been done up to and including release day. Releases were on Friday evenings, and Ops needed a few hours to prep what we gave them, so that meant we had till early afternoon to get in whatever we could get in. (Given the time constraint, not a good day to do something complex or risky, but perfectly fine to try to squeeze in something simple and valuable.)

Operations saw that we were confident, and they saw our test results, and they were tentatively approving our non-standard releases — until one release day, after a teammate had pulled an all-nighter to get one last piece in, when Ops decided at the last minute that the release was no longer acceptable. Not because of anything my teammate had or hadn’t done, but because I hadn’t added a couple lines of documentation about something minor to the Ops wiki in advance of the release procedure. I’d been planning to make those tiny edits during the several Friday-evening hours I’d be on the phone with Ops alternately guiding them through the release and idle-waiting for the next step in our procedure.

If it hurts a lot, maybe take a step back

There would’ve been plenty of time. But they chose to cancel the release. As was their right. Mine, in turn, was to react by huffing and puffing about the spurious reasoning for the last-minute change of plan and the evident disrespect for my teammate’s effort. Predictable lot of good that did!

When I’d gotten that out of my system at everyone else’s expense, I had the whole weekend to think quietly about my actions before, during, and after, and about why Operations might have felt the need to choose as they did. They did not, by default, trust engineering teams to deliver with safety. They could not. It was their job to be skeptical, and on the rare occasions they hadn’t been sufficiently so, they’d been hurt. And here I was asking them to trust us not only to deliver with safety, but to let us skip a very visible part of the review process. I was asking them to let us deliver whatever we — not they — decided was ready, right up to the last minute, but I hadn’t shown them why they ought to, certainly not in terms that made sense to them. So they were likely inclined to feel that I was asking them to be negligent in their roles, and they were likely inclined to perceive any small failing of documentation as a red flag indicating sloppiness elsewhere. Now that I saw where they might be coming from, I couldn’t blame them. I wouldn’t have wanted to be negligent either.

Working together for shared understanding

Because I had retrospected, by the time we got together to discuss what had happened, I knew what I wanted to try. I apologized for my outburst and asked Luke, the Operations manager, if he’d be willing to pair with me to develop a new feature sometime. To my relief, and his great credit, he agreed.

It wasn’t long before we had our chance. Not unlike in A last-minute feature, a small feature request came in on a Wednesday, hoping to be delivered as part of the release scheduled for that Friday. For any other Engineering team, the rule about freezing a week in advance would have made this a non-starter. But this was exactly the kind of (apparently somewhat common) situation we wanted to handle with agility. And it was a small, simple set of new behaviors, perfect for pairing with someone who didn’t know our code but knew our system very well.

Luke and I got together Thursday in a conference room. I had set him up with access to our source code repositories and we started by running the tests. We broke something obvious and saw that some tests went red. Then we reverted to green and talked through what the new feature would look like when it was working right. When we had a shared understanding, I talked him through writing the equivalent test assertions. He ran them; red as expected. I talked him through how to change the code. He ran the new tests again; green as expected.

”That’s the whole feature. Cool, right?” I asked.

”Yeah, sure, kinda,” Luke allowed.

An unexpected twist

”Okay, before you check it in, let’s run the whole test suite.”

Whoa. Some other tests, seemingly unrelated, were now red. Hadn’t seen that coming. A smile came over Luke’s face. ”Now I get it,” he said. “This problem can never happen in production. We found the problem before we ever checked it in.” Exactly! Same for everything else we’ve got tests for. (And, boy howdy, have we got a lot of tests.)

After quickly fixing the fallout, Luke committed his new feature. It went into production on the following evening.

Conclusion

For as long as Luke was in that role, we had no further difficulty delivering just-in-time releases. And because we knew the sailing would be smooth, we began releasing once a month, wasting less and risking less — just as I’d hoped. More often was too often (busy work, lost Friday nights); less often wasn’t often enough (Eng and Ops both out of practice at release procedure); once a month was just right.

While continuous delivery was prohibitively difficult in our tightly controlled environment, we managed to achieve continual delivery on a cadence that met everyone’s needs. For Operations, we controlled the inherent risk of change by doing it in smaller batches, giving us less to roll back and binary-search through in case of trouble. For customers, we could honestly offer the possibility of helping them as soon as next month, limited only by our own constraints (priorities, WIP, nature of the request). And for ourselves, since we’d always be delivering to production soon, we could see more clearly the value and relevance of our work, and hope for more meaningful customer feedback about it.

TDD gave me an opening. Luke’s willingness and ability to collaborate (and to forgive my outburst) allowed us to walk through it together, making a sizable impact on the business we both served — and on me.

Posted Wed May 6 16:10:06 2015 Tags:

Do you write on your own site?

Are you involved in software development in any way? Are you already publishing your writing on a website you control? Me too. This post is based on my personal experience writing on this site (and administering it) since 1999. The word “blog” was apparently coined around the same time, but I hadn’t heard of it yet. I was calling this thing here an “online journal.” Quaint.

How’s that working for you?

When you’ve written a new piece, how do you publish it? Maybe by clicking a “Publish” button in the admin interface of the production instance of your CMS?

When you’ve written a final draft, how do you preview it? Do you have a non-production instance of your CMS? Probably not. Maybe you click “Publish” in a different mode that tries to avoid making the post visible, or maybe you edit in a WYSIWYG control that tries to render almost (but not exactly) the same as publishing would?

When you’ve written a first draft, how do you edit it? How do you see what you changed from one draft to the next? If you try some new wording, click “Save”, and decide it isn’t better, how do you go back? Can you go back?

When your network connection drops or your browser crashes, do you lose your writing in progress? Has either of these happened to you yet? How do you imagine you’ll feel when they do?

When your CMS frequently has security holes (such as WordPress has just had, again), how large is the attack surface? How easy is it apply the latest patch? How often is it necessary to patch the latest bug? Until you patch, how much of your work might you lose?

Even if your CMS doesn’t have security holes frequently, how recently have you made an offline backup? How recently have you test-restored from it? Are both steps easy to perform, or are they extracurricular activities you might skip?

”If you don’t test restores from your backups, you do not have backups.” —@garybernhardt

Do you feel safe about the continued existence of the words you’ve worked so hard to write? I ask because of how I felt when I was still running a traditional dynamic CMS.

Another approach

To have more satisfying answers to some of these questions, one idea is Maciej Cegłowski’s: run your CMS on a non-public address, crawl it to create a static cache, and publish the cache. But this is hacky. It complicates publishing, generates some broken internal links (at least on Maciej’s own site), and doesn’t solve versioning or backups — which is to say, restores.

Since static content is such a good idea, surely there must be CMSes designed to generate it? Two popular choices, Jekyll and its cousin Octopress, are topics of this weekend’s JekyllConf. Static site generators typically take away the heavyweight database and the entire dynamic-web-app attack surface, in exchange for which they give you the safety of writing offline, the power of revision control, and the ease of git push to publish. If you value those things, an SSG is a bargain. But in that bargain you’re also necessarily giving up a web admin interface, a searchable local index, and reader comments being stored somewhere you control.

That’s why I prefer a CMS that’s mostly a static site generator. Most of my sites don’t enable the web admin interface, but many of them are searchable, and this one allows comments. If you like any of the following neat shell tricks I can do now that I’ve switched schmonz.com to ikiwiki, feel free to comment. It’ll turn into a git commit, of course; next time I git pull I’ll have a local backup; and I’m sure I’ll make that backup, because I won’t be able to git push another post until I do. ;-)

Neat ikiwiki tricks

Naturally, these can be encapsulated in shell functions or small standalone scripts.

List drafts committed after the ‘textpattern’ tag

$ git diff --stat textpattern unpublished | awk '{print $1}' | grep '\.mdwn'

Write a new post with my permalink style

$ _TODAY=$(date '+%Y/%m/%d') && mkdir -p ${_TODAY} && vi ${_TODAY}/post-name.mdwn

Write a new comment offline, using $EDITOR

$ ikiwiki-comment ${_TODAY}/post-name.mdwn

List the most recent 12 posts

$ find . -type f -name '*.mdwn' | egrep '^\./[0-9]+\/' | sort -rn | head -12

List the most recent 12 comments on any posts

$ find . -type f -name '*._comment' | xargs stat -f '%Y%m%d %N' | sort -rn | head -12 | awk '{print $2}'

List all the comments on a given post

$ find ${_TODAY} -name '*._comment' | sort -rn

List all posts containing podcast-style enclosures

$ find . -type f | xargs grep -l 'meta enclosure=' | sort -rn

More possibilities

  • List posts which have been edited since being posted, and show the changes
  • Replace all occurrences in all files of XXX with YYY, review the changes, commit, be able to revert if needed (I’ve done a few of these)
  • Your ideas?
Posted Wed Apr 29 00:28:52 2015 Tags:

Previously

This is the third in a series about TDD’s value to humans, following

Learning the domain

The software I was hired to develop acted as user-facing glue to several non-user-facing infrastructural systems in a highly complex environment. On arrival, I didn’t see the whole picture, and knew I couldn’t understand the implications of changing any given bit of code. Whether or not I tried to consider the systems outside ours, I didn’t understand what our software was aiming to do. The Unix command-line tool’s “create” subcommand would have made sense, except that there was also an “insert” which was slightly different. I thought I grokked “update” until I noticed other subcommands for specific kinds of updates. And I was about to believe I had a handle on “assert” when I discovered that “accuse” was a way to speed it up. Yikes!

To (be able to) go fast, (be able to) go well

Making things worse, the code I and my lone teammate had inherited was freshly minted legacy code. If we’d been instructed to go fast, we’d have made dangerous mistakes, and we still wouldn’t be going anywhere near fast enough to make those mistakes tolerable. If we’d been instructed to go safely, we wouldn’t have been able to deliver much, plus some dangerous mistakes would still have slipped through. If we wanted to move quickly and safely, we needed to make a concerted effort to change the properties of our code. In A last-minute feature, we did, and it began to pay off.

Now please go fast

The big refactoring-etc. in that post had been driven in part by the anticipated business need for a graphical UI for non-Unix users, which in turn derived from the demonstrated business need to manage identities for non-Unix platforms in the same way our successful proof of concept was managing them for Unix. For reasons I may never have been privy to, and certainly don’t remember, the need to manage mainframe identities congealed into a hard deadline.

We had 1.5 weeks.

The code was built on Unix- and Kerberos-specific assumptions scattered throughout — some of them hard-coded, some of them invisible, and as yet relatively few of them documented by any kind of automated tests. Discovering and reacting to all those assumptions by building support for our first additional platform, while having it be an unfamiliar platform with its own weird yet-to-be-discovered rules, appeared rather unlikely to be doable at all, and exceedingly unlikely to be doable safely.

And don’t screw up

But maybe. And we had to try. So I notified my teammate and our manager that in order to avoid office distractions and enable strategic napping, I’d be working from home, around the clock, until stated otherwise. At midnight, I emailed them a summary of the day’s progress and my tasks for the following day that, if completed, would prevent the deadline from becoming obviously un-meetable. Any given day, things might not have gone according to plan, and that would’ve been the end of that; but every night, I kept being able to report that they had, and here’s what has to happen tomorrow to keep the string going. A week later, with a couple days to go, we were still — against all reasonable expectation for a schedule with zero slack — on track.

The next day, I went into the office to knowledge-share with my teammate, who had researched the important platform-specific behavior we’d need to implement. I took my understanding, turned it into unit tests, and asked Robert (as I’d asked Bill about that last-minute feature) to validate that they were testing for the desired behavior and only the desired behavior. He caught and fixed a few mistakes in the assertions. Then I went off and made the tests pass.

What size shoe, and when?

That night, our release scheduled for the following evening, I slept deeply. We’d done everything in our power. We would go into production with a credible effort that might prove good enough. Even so, I was worried that it might not. I knew many of my refactorings had outstripped our still limited test coverage, for which I attempted to compensate by carefully making one small mechanical change at a time, reasoning about what to manually check, and doing lots of manual checking. I was pretty sure the system as a whole wasn’t fundamentally broken, but I was also pretty sure it wouldn’t be thoroughly fine. When we got to production, there was no question whether the other shoe would drop. We hadn’t worked with sufficient discipline to avoid it.

A small shoe, immediately

During the course of the Operations team’s usual post-release system checks, they found a big regression: one of my less rigorous refactorings had broken a common usage in an important third-party client I’d forgotten to check. I immediately recalled a smell I’d ignored while doing that refactoring and knew an expedient way to fix the problem. A half hour later, our updated release indeed fixed the regression, and I’d made myself a note to add a few integration tests with that client.

A big shoe, later

I felt certain we hadn’t seen the last of the fallout. Some months later it hit us. I don’t remember how we found it, but there was one remaining invisible assumption in the code that hadn’t been flushed out and fixed, and it was a potentially very bad one: a SQL DELETE statement with an insufficiently constraining WHERE clause. Not only had we forced Ops to scour several months of application logs and database backups for signs of wrongly deleted data, but also we’d forced the business to operate for several months unknowingly without the data integrity they were relying on. Against all reasonable expectation, again, the damage was very low.

Conclusion

It proved to have been crucial that, by the time this deadline popped up, we’d already greatly improved the health of our code and the extent of our tests and knowledge. If either had been any worse, we’d have missed the deadline, made more costly mistakes, or both. Instead I was able to test-drive sometimes, manually test fairly effectively, and tolerate — as a time-boxed exception — working mostly alone at an unsustainable pace with an unsustainable lack of discipline. And because we had made ourselves able to do those things, we delivered what we understood the business to need, when they needed it.

But only because we lucked out. The unseen, unmanaged risk we incurred by working in this way, even for a short time, could have come back to bite us much harder than it did; since it happened not to, I earned more trust. When I was given responsibility for managing the product half a year later, I was told that this particular outcome had been the one that sealed the deal. In other words, the career opportunity to learn about software product development from an entirely different perspective came as a direct result of having chosen to pursue TDD. And when I landed in that role, I looked for ways to manage risk without having to be so lucky.

Test-driven ways.

Posted Thu Apr 23 11:41:30 2015 Tags: