I’ve been running a public web server since 1999, when my employer
schmonz.com for me as a gag gift. Last week, I
learned from Twitterbrausen
that in German, “Schmonz” means something akin to “bullshit”. That’s not
what my employer had meant by it; I consider nonetheless that my
incessant blogging has acquired a fine new patina of significance.
As I recall, when I was first looking for web server software, there was not a wide variety to choose from. Apache was popular and featureful, a safe default choice. As a novice programmer, I was very much taken with the idea of building dynamic sites, and Apache offered many ways to go about that. Done deal.
In the intervening years, my server machine has changed several times, from Macintosh IIci to Mini-ITX box to Mac Mini to Xen Virtual private server. (I’m particularly fond of the present arrangement wherein hardware is someone else’s problem and I continue to have root access.) No matter the system architecture, the OS has always been NetBSD, which remains unobtrusively thrilling, and the web server has always been Apache, which has gradually become more noisome.
Between my own sites and those of friends I’ve hosted, I’ve needed many times to adapt my Apache configuration to accommodate changes in external modules (such as mod_php), to interfaces (such as PHP via FastCGI instead), and within Apache itself (such as basic access control). Each time I forcibly revisited my config, I found myself revisiting my discomfort with its complexity. I never felt sure that I understood exactly, in its entirety, what my Apache installation would and wouldn’t do. And as a result of years of entanglement and unclarity, I never saw a way to give my users full administrative control over their own sites.
I’ve been imagining moving off Apache for a while. But it always seemed like a project, so I never did anything about it. I can’t usually afford to start on something unless I know I’m going to be able to stop soon, and I won’t usually want to stop unless I know how I can easily start next time. That leaves me needing a sequence of small-enough steps in my desired direction. Or, more precisely, two expectations: that at least one such sequence exists, and that I’ll be able to discover one as I go.
Conveniently, I’ve had plenty of professional practice at incremental problem-solving, enough to identify my first few steps and start making progress. Here’s the rest of the sequence, naming the refactorings I’ve found along the way.
Step 1: Extract Virtual Host
I wanted to see what I’d learn by persuading one site to become its own
self-contained thing running its own Apache instance. I picked a
relatively basic site, told the system Apache to reverse-proxy that
virtual host, added just enough configuration to start a site-specific
localhost, verified that as far as I could discern the site
worked equally well, and cut over to the new configuration.
Inserting a proxy usually means, at the very least, server logs start
reporting requests coming from the proxy’s IP rather than the browser’s.
For this to be a refactoring, the system Apache needed to send an
X-Forwarded-For header (it automatically does), and the site-specific Apache needed
to know to look for it (by enabling the bundled
Manually starting an instance of a service usually means the system
won’t automatically know how to do the same next time it boots up. For
this to be a refactoring, I needed to add an entry to the site owner’s
crontab. To validate that the site would continue to be served by its
own Apache as well as it’d been served the old way, I rebooted the
system. The site stayed up.
Step 2: Extract More Virtual Hosts
Good, because there were 17 more sites to go. Each of them would also be
listening on its own non-standard port on
localhost. To identify them
at a glance in
netstat, I added the port to
/etc/services. Now I had
a pattern worth repeating.
Some sites were more complex than others (PHP, language negotiation, other wrinkles), but I didn’t need to invent their configurations from scratch, merely uncover the tiny portions of the existing giant config that were relevant and copy them over.
Near the end, I couldn’t start new Apache instances without increasing
some kernel IPC parameters (
kern.ipc.msgmni from 40 to 80,
kern.ipc.semmni from 10 to 20). This felt like a small backward step.
I hoped to be able to undo it later.
It also might have felt like a small step backward to suddenly have lots more instances of Apache. But it was a large step forward in my understanding.
Step 3: Remove Dependency (on Apache Modules)
En route to that understanding, I was fairly sure I’d reduced the system
Apache to a single responsibility: being a reverse HTTP proxy. To
validate that it was no longer serving any other purpose, I turned off
LoadModule directives — even the typical and enabled-by-default
ones — leaving only those that prevented Apache from running when I
tried turning them off.
Step 4: Substitute Apache with Bozohttpd
I’d been hoping to replace Apache with
bozohttpd. Now that I had
small, explicit per-site configurations, I could try converting one. The
site worked, but the logs were missing lots of basic information. I
still think this is where I want to go, but since it’s not a
refactoring, I can’t go there yet.
Step 5: Substitute Apache with Lighttpd
I tried converting the same site from Apache to
lighttpd, which is a
little more featureful than
bozohttpd. The site worked, and with
enabled, its server logs were indistinguishable from
Apache’s. I gzipped the now-retired Apache config to prevent it from
being used by mistake while keeping it for reference, updated the
crontab entry to start Lighttpd instead of Apache, and
Step 6: Substitute More Apaches with Lighttpd
I converted a bunch more sites. After doing a few, I figured out how to extract shared configuration. Simpler sites have extremely short config files (just a few lines). More complex sites only define what’s unusual about them.
Step 7: Remove Dependency (on Apache PHP FastCGI)
With a few Apache-powered sites left to convert, I was pretty sure none
of them was using PHP. To test this hypothesis, I stopped the
service. After a week, with nothing broken, I uninstalled it.
With only a few Apache-powered sites remaining, could I return kernel IPC parameters to their default values? Yes, all the Lighttpd and Apache sites ran just fine that way.
Step 8: Get Married
Getting married is the opposite of a refactoring. There’s no internal change, but many callers have new expectations.
Step 9: Substitute Remaining Apaches with Lighttpd
I expected three sites to be relatively tricky to convert:
- theschleiers.com needed language negotiation to provide English or German content. I didn’t want to futz with it until there was clearly no longer any urgent need for information about the wedding.
- agilein3minut.es needed SSL, which I wasn’t sure whether to proxy at all. Turned out to be easy to proxy because it’s the only HTTPS site I host at present, and it looks like it might continue to not be a big deal if and when I host more.
- schmonz.com needed fancy URL rewriting for compatibility with the site’s previous incarnation. I assumed it was going to, anyway. I wound up being able to translate most of its Apache mod_rewrite config to Lighttpd’s expressive conditional redirects, and needed hardly any special-snowflake cleverness.
Once they were converted, there were zero remaining Apache-powered sites.
Step 10: Substitute Apache with Pound
A single Apache instance remained: the system one that was nothing but a reverse proxy to a bunch of Lighttpd instances.
Had I known that’d be its only job, I’d have chosen software designed for the purpose. I knew that now, and chose Pound. On a non-standard port, I figured out how to express a few sites’ worth of reverse proxying in Pound’s configuration language, continued until I’d translated everything in the Apache config, stopped Apache, and started Pound.
Step 11: Remove Dependency (on Apache)
Not a single Apache instance remained. To my knowledge, all sites were operating as normal. After a week, I uninstalled Apache, deleted its corresponding Unix user and group, and gzipped all its config files for reference.
Apache had been serving multiple roles. I brought the number down to zero, then got rid of it. To do that, I…
- Decoupled Apache (the virtual-host multiplexer) from Apache (the web server)
- Gave each site its own Apache web server instance
- Found a suitable replacement web server and converted all instances
- Found a suitable replacement virtual-host multiplexer and switched to it
- Turned software off, and left it off for a while, before uninstalling
For human site visitors, all of these steps were genuine refactorings. (Atypical and automated visitors might notice the HTTP header reporting different server software.) For site owners, most of these steps were also genuine refactorings. (In a couple cases, using the shared Lighttpd config required changing the names of log files by a small nonzero amount.)
I replaced one big application with two small ones. Better. Still, could be more better.
Room for improvement
The replacement virtual-host multiplexer (Pound) feels simple, good, and
necessary, in the sense that nothing like it is included with the OS.
The replacement web server (Lighttpd) feels simpler and better, by far
— I understand what it’s doing, my users finally have full
administrative control over their own sites, and unlike Apache, this
configuration doesn’t require extra system resources — but NetBSD
does include a web server, the one I experimented with in Step 4. If
bozohttpd did a few more things, then “Replace Lighttpd with
Bozohttpd” would be a refactoring, one that could be followed
immediately by “Remove Dependency (on Lighttpd)”.
In some kind of cosmic coincidence, next week I’ll be joining a project
that’s being developed primarily in C. Hacking on
bozohttpd will be
good practice. Here’s the incremental sequence of features awaiting my
next increment of time and attention, perhaps on tomorrow’s
- Optionally log to a file (instead of
- Optionally log more information (say, in Apache’s “combined” format)
- Optionally specify a proxy or proxies that can pass an
X-Forwarded-Forheader whose contents we’ll use as the true client source address (for logs, access control decisions, etc.)
Since I believe I’ll be able to stop, I’ll be able to start. It might not be terribly long before I have more progress to share.
After a week and a half in Chicago, we’re home. I’m still not entirely sure what happened. All I know is, we weren’t married when we left, and photos like these offer some clues:
As part of the deal, we both got ourselves a new name. Our friend Henry explained:
They wanted it to represent both of them…. Rebekka suggested the idea to Germify the “Schlair” name, which is already a German word meaning “veil”, by giving it its standard German spelling. This was obviously the best idea. It was a short and satisfying conversation.
He also saw fit to share a bit of my bad poetry in the ceremony. So I’ll share a few of my better words here:
It’s because of you that I keep getting better at knowing what I need, at choosing what to do about it, and at being able to do it. And it’s because of who you are that what I need most is you, and what I choose to do about it is this.
These are supposed to be my vows, but I have only one: I promise that for us to continue having a fulfilling life together, there aren’t any new promises we need to make.
She said some nice things to me too. We’re very happy.
Lately I’ve been trying to learn C, starting with online training a month ago, and continuing with daily practice. Yesterday, inspired by a personal story I told on Developer on Fire 139, I wrote a C program to draw the Mandelbrot set. The first image it generated, intentionally, didn’t look like much:
You’ll see that the image gets progressively better as I learn more. Not even a metaphor. Or is it?
I’m still getting the hang of C. But I’m doing pretty well with the hang-acquiring. You can too. Here’s my advice.
Bring your knowledge
You’ll have a head start if you already feel:
- Productive in your preferred text editor
- Confident with your preferred revision control system
- Proficient in some other programming language (including writing, running, and listening to automated tests)
Be mildly relentless
Do a tiny bit of C each day. Do more than a tiny bit, if you can. Don’t do less. If you miss a day, since you’re aiming for mild relentlessness, no big deal. Just make sure to do a tiny bit the next day. The tactic here is to bring C to your conscious attention often enough to support the strategy. The strategy: keep C rolling around in the back of your brain all the time.
Get a running start
Learning a new language means learning to navigate in a new environment. How to enter new code? build? run? change? test? organize? The C language is merely one aspect of the world you’re entering. If you’re not careful, your focus will be divided by more than one of the things you don’t know well: a language, a build tool, a test library, an IDE, etc.
If your goal is to learn C, pay as little attention to the other stuff as you can get away with. When you’re getting started, don’t try to choose the best compiler, build tool, test library, or IDE, and definitely don’t try to choose the best way to be installing them. You’re not having those problems yet, so you don’t have enough context to choose and the differences don’t matter much.
Start with an environment prepared by someone else. Cyber-Dojo provides compilers, test frameworks, and a text editor, and runs in your browser. You can be learning C right now.
Cyber-Dojo also includes code katas. Pick one at random. Write a trivial failing test and make sure it really fails. Then read the instructions, think a little, and code a little.
You might have an idea of what to do and not know how to express it in C. With a failing test, you can guess and check. If it doesn’t compile, it’s not valid C; try again until it compiles. If the test still fails, it’s C that doesn’t match your intention; try again until it does. Voilà: you’re learning C.
If you think taking a little time to work through a C tutorial would help you move faster, do it. If you think having a C reference at hand would get you unstuck more quickly, find one.
When you’re satisfied with your solution, Cyber-Dojo has plenty more katas. If you’d rather solve a real-world problem, maybe you’ve got one in mind. Or maybe someone you know can ask you for help with something.
Invest in your environment
As you practice, you’re also learning what would help you be more productive. For instance, your preferred text editor would probably help! When you’re making good progress in Cyber-Dojo, you wish it could go a little faster, and your tests are green, it’s time to continue the kata offline.
On your machine, install the same test framework and a compiler (ideally also the same). You might have to twiddle the code to get it to build on your system. Once the tests are green again, return to the kata: think a little, test a little, code a little. You’re learning C again — and now that you’re in an environment you control, it’s worth (1) noticing what gets in your way and (2) taking action to improve it.
If you want to run the tests frequently, teach your text editor to run
them with a keystroke.
If you want it to be easier to undo back to the last green state,
init and start committing whenever you’re green.
If you’re frustrated by how long it’s taking to figure out a particular improvement, it’s worth noticing that too. Leave it for later — if it keeps being annoying, you’ll get more chances to solve it — and get back to the cycle of learning: think a little, test a little, code a little, commit on green.
Even if it’s sometimes frustrating, incrementally removing bottlenecks to your productivity is incredibly worthwhile.
…Especially in tools that can support you
Your C compiler is willing to warn you about lots of potential mistakes, and to treat warnings as errors. You only have to ask.
Your editor might be able to show the same warnings and errors, in the context where you made them, before you try to invoke the tests. Seeing mistakes earlier helps you go faster. Try it.
Your memory-management logic is easy to get wrong.
Valgrind can tell you when and where you’ve made
You’ll want to run it less often than your unit tests, because it’s
slower, but when you’re dealing with
free() you’ll want
to run it pretty often.
(The longer you go between Valgrind runs, the harder it’ll be to figure
out where you screwed up.
git bisect might help.)
…Especially in your fastest feedback loop
You need to be able to rely on your tests to tell you when and where something is wrong.
If it’s hard to spot when a test is failing, fix that.
I noticed myself having this problem with
so I wrote a
tiny wrapper to run Unity tests from Perl’s
which paints failing tests red in a way I’m used to.
If it’s hard to know what a red test is trying to tell you, fix that.
for instance, when
ck_assert() needs to tell you something, it
can’t say much.
Maybe there’s a
more precise assertion available.
ck_assert_msg() at least lets you tell it what to tell you,
so you’ll understand what to do faster the next time it’s red.
Invite your friends
Pairing accelerates learning. During my 3-day online training a month ago, I worked with a remote pair the entire time. Whenever you can find someone to pair with you, do it.
Experts accelerate learning. That’s one of the reasons I took that training course from James Grenning. More recently, I asked some NetBSD developers — who understand from experience how to see the costs and risks of development in C — to review my code and suggest what I might pay more attention to. I’ve gotten some very thoughtful, articulate comments.
Exercism.io has a nascent C track, and its exercises come with ready-made tests you can enable, one at a time, to simulate part of the thinking that goes into TDD. Since you’re not trying to learn a thinking technique, this helps you focus on learning the language.
Exercism is also a way to invite code review. When you post a solution, other Exercism users might notice, read, and comment. Even if they don’t, your code’s posted at a stable URL and you can send the link to anyone you like. Or anyone you don’t like, if you like.
Choose small steps
While trying to draw the Mandelbrot set in C, given that there were lots of things I wasn’t sure how to do, I made extra sure to organize my work into small steps:
- Since I’ve never used GD before, start by generating an image — any image.
- Since I’m not sure GD offers what I need, make sure it can color individual pixels.
- Since I keep opening the image to see whether it’s different, write an approval test to turn red whenever the image has changed.
- Since I want to try plotting an equation, test-drive mapping pixels to (x,y) coordinates.
- Since I think I got that right, try plotting the equation for a Unit circle.
- Since I instead got a weird blocky thing, figure out which coordinate-mapping test I forgot to write, make it pass, and try again.
- Since I got a circle, refactor my (x,y) coordinates to a+bi (complex numbers).
- Since I don’t know C’s complex math functions, compute the Mandelbrot set using C’s ordinary math operations on a and b separately.
- Since that worked, refactor to C’s complex math functions.
- Add some color.
This post isn’t much about C after all, is it. It’s about how to apply what you know, and what you know about what you don’t know, to acquire what you want to know. When you’re trying to learn something, do the fastest thing that gets you a walking skeleton. From there, iterate in tiny steps. As soon as you’re satisfied, stop.
It so happens that the Mandelbrot algorithm itself relies on iteration. The more times you iterate over each point, the more accurate the picture. But there’s no number of iterations that guarantees perfect accuracy, and each iteration costs runtime. So you have to choose how many iterations is good enough, and stop. And that number depends on how fast your computer is, how far you’re zoomed in, and what you want — in other words, on context.
Not a bad metaphor after all.
Not a bad exercise, either, and I’m not satisfied with it yet. I’ve still got plenty of ideas for what I can learn.
This week I ran a workshop. Last week I was in one.
For as long as I can remember, and probably longer, I’ve had a self-directed drive to learn, strong enough to be salient to observers. Lately I’ve become aware that I also have a periodic need for tiny doses of structured learning. Last week made this the third consecutive year I’ve taken some kind of training course. I observed the new pattern when I added James Grenning’s TDD for Embedded C to my résumé and realized that if a recruiter or hiring manager saw nothing but the top three “Education” items, they’d get a remarkably accurate impression of me. I coach, I lead, I Agile, I solve problems, and I test-drive my code.
But not totally accurate. I don’t C.
Not much, anyway. I can think of a total of five bits of code I’ve ever written in C:
- A program to generate the Fibonacci sequence (iteratively and then recursively)
- A JNI library allowing Java programs to create a “link” (symlink on UNIX, shortcut on Windows)
casestatement in the NetBSD/macppc bootloader to allow system administrators to configure kernel behaviors (just like they already could on NetBSD/i386)
- A bugfix for a load-balancing appliance’s web admin GUI that wouldn’t display a particular table in (and only in) Internet Explorer if (and only if) the appliance lacked hardware SSL acceleration, traced to some uninitialized automatically allocated strings in the C CGI program that emitted JSON for the Dojo Toolkit GUI
forloop that instructs ikiwiki, when run as a post-commit hook in a website repository, to do nothing whatsoever when the triggering event was a
cvs add <directory>(because that acts immediately on the repo, does not constitute a commit, and justifiably confuses ikiwiki if not filtered out)
I’m pretty sure that’s everything.
What I hoped for
Last week’s course was designed for people who’ve been developing for embedded systems in C to become acquainted with Test-Driven Development in general and/or in that context. I, on the other hand, am very comfortable test-driving and I wanted to become better acquainted with C. I knew I wasn’t in the course’s target (ha!) audience — but I also knew that someone skilled with TDD can exploit its fast feedback to learn a programming language quite quickly.
With this in mind, I signed up for the course as an act of self-engineering designed to focus 3 days x 5 hours of my attention on the material I wanted to be learning. And I enlisted a pair partner to help me stay focused and un-stuck.
What I did
During exercises, my remote pair and I used Screenhero to share screen, mouse, keyboard, and voice. We didn’t use any particular pairing style, other than a little ping-pong to get rolling. My pair had written some C++ somewhat recently, and often had better guesses than mine about how to say our next idea in C. We both understood when and how to test our guesses as quickly as possible.
During the TDD lectures, we each muted our Screenhero to avoid echoing the incoming audio at each other. I generally continued working on the day’s exercise. Sometimes my pair would join me. Without sound, I’d wiggle my mouse cursor (Screenhero gives each user their own cursor) to indicate I wanted the keyboard.
Because my pair and I had similar levels of skill with TDD (good), C (meh), and pairing itself (good), I didn’t notice silence slowing our pace much. I did, however, notice it constraining the flow of humor.
Each evening, since we hadn’t gotten all the way down the exercise’s test list, I’d continue working (solo) until we had. The following morning, I’d review with my pair.
We’d been doing the training exercises online in Cyber-Dojo, which is very convenient for getting started, especially when the instructor provides ready-made exercises with test-driven supporting code and well-thought-out test lists, along with all the needed tools. Once the course was over, I knew that to sustain my momentum, I’d need to be able to edit, build, and run tests in my usual development machine — and pronto. The day after the course, I created a local git repository, added all three exercises as we’d done them, installed CppUTest, debugged the exercise builds on my particular host OS and compiler, and committed the fixes to
makefiles and code to make the tests green. Then I dumped the next few things I know I want to try, in a sensible order, to a to-do list.
Now that I can run the tests from my usual editor with my usual keystroke, and won’t forget my next goal, I’m confident that my learning will continue.
What I got
I wanted to learn some C. I got what I wanted.
I also got geek joy. When we figured out just enough about bitwise logical operators to test the right thing, make it pass, then refactor to more expressive symbols, that was a programmer’s high. I’ve always wanted to understand this stuff. Given a goal, a fast feedback loop, and a pair, I was able to start understanding it.
The joy was greater because it was shared. Usually I’d consider it risky to pair remotely for three days with someone I’d never met. But if that pair is someone who works for Pillar and thinks learning about TDD in C sounds awesome? I had no doubt in my mind we were gonna have a good time together, and we did.
Some of the joy was meta-joy. Since TDD is a great way to learn design and domain concepts, and I’ve used it to learn one programming language, I hypothesized that it might be a great way to learn another, and that sure feels true. I love it when a mental model holds up!
I’m also feeling joy in my self-mastery: knowing what I wanted to learn, knowing what I needed to start and continue learning it, and arranging to get what I needed. (I talked about when and how I learn, among other things, on this week’s Developer on Fire. I’d love to hear what you think.) And getting better in general at knowing what I want — like spending more of my time programming — and arranging to get it.
There’s a particular refactoring direction I want to take one of the exercises. If it works out the way I hope, I’ll learn some C. If it doesn’t, I’ll learn some C.
We didn’t get to deploy to hardware. I’ve got a Raspberry Pi and an Arduino, never used. I’d like to test-drive a “Hello World” and see it run, then test-drive the basic use of a device-specific hardware feature and see it work. That’ll teach me some embedded basics. Once I’ve done that, I’ll have a better idea what I need to learn next.
Many NetBSD kernel drivers can be recompiled, with no source changes, into standalone userland programs. (See also rump kernels.) This means test failures can crash the process, but never the kernel — so automated test suites can be run freely and frequently, and it might be possible and sensible to test-drive new functionality into the NetBSD kernel. I’d like to try. Maybe I’m closer than I think.