How to make backups using NetBSD's RAIDframe

Introduction

Data is precious. Backups are boring. Disks are cheap. I am lazy. RAIDframe is cool.

Benefits Of This Configuration

Drawbacks Of This Configuration

What You Need

An i386-compatible computer with two free 5.25” drive bays,
Two (2) 5.25” sleds-and-trays designed for 3.5” IDE drives (for example, Vantec MRK-102FD),
One (1) additional tray,
Everything you need to install NetBSD/i386 on a RAID1 root filesystem, and
A third hard drive just like the other two.

Setting It Up

Install NetBSD according to the RAID1 install guide. (Note: I have only root and swap (and not a separate /home) on RAID1. This document will assume that you do too.)
Run disklabel raid0 and note the fsize/bsize/cpg figures from the 4.2BSD partition.
Run disklabel wd0 and note the size and offset of the 'a' partition.
Run disklabel -e wd0 and add a partition 'e'. Compared to the numbers you observed in the previous two steps, its offset should be 64 larger, size 64 smaller, and fsize/bsize/cpg the same. (When you need to restore from a backup disk, you'll want to access the backup filesystem directly, not as part of the RAID set. You'll use this partition for that.)
Run disklabel -e wd1 and add the same 'e' partition to it.

Making A Backup

Shut down and power off the system.
Slide out wd1. (Note: once you've mastered these procedures, slide out either disk to balance the load.)
Put it somewhere safe.

Inserting A Brand-New Drive

Slide in the new drive as wd1.
Boot the system.
Copy the MBR and disklabel from wd0 to wd1 following the example in the RAID1 install guide.
Add the new drive as a spare: raidctl -a /dev/wd1a raid0.
Begin reconstruction onto the spare: raidctl -F component1 raid0
Monitor progress: raidctl -S raid0
Make the disk bootable following the example in the RAID1 install guide.

Inserting An Outdated Backup Drive

Slide in the outdated backup drive as wd1.
Boot the system.
If raidctl -s raid0 shows a failed component1, it's because the outdated backup drive was component0 when it was last used. Re-add it to the set:
1. Add the outdated backup drive as a spare: raidctl -a /dev/wd1a raid0.
2. Begin reconstruction onto the spare: raidctl -F component1 raid0
Otherwise, both components should be present:
1. Begin in-place reconstruction: raidctl -R /dev/wd1a raid0
Monitor progress: raidctl -S raid0

Recovering Some Files From Backup

Shut down and power off the system.
Slide out wd1.
Slide in the backup drive as wd1.
Boot the system into single-user mode.
mount -uw /
mount -r /dev/wd1e /mnt
Copy needed files from /mnt.
Shut down and power off the system.
Slide out the backup drive.
Slide in the now outdated drive as wd1.
Boot the system.
Begin in-place reconstruction: raidctl -R /dev/wd1a raid0.
Monitor progress: raidctl -S raid0

Recovering Entire Filesystems From Backup

When you've irreparably hosed the filesystem or otherwise royally screwed up, it's possible to “roll back” to the state of your backup drive. Under normal circumstances, when reconstructing, RAIDframe notes the “modification count” of each component in order to reconstruct onto the outdated component. In this case, we want to reconstruct from the outdated component, so we have to artificially lower the modification count of the up-to-date drives. One simple way is to remove the component labels from those drives, effectively making them forget they were ever part of a RAID set. Then they can be re-added to the set as though they were outdated backup drives.

At the bootloader, boot -as.
At the prompt for a root device, select wd0e.
raidctl -u raid0
for i in 0 1; do dd if=/dev/zero of=/dev/rwd${i}a skip=16k bs=1k count=1; done
Shut down and power off the system.
Slide out wd0.
Slide in the backup drive as wd0. (Greg notes: this drive does have a valid component label, and will get autoconfigured on boot.)
At the bootloader, boot -s.
Add the goofed-up drive as a spare: raidctl -a /dev/wd1a raid0.
Begin reconstruction onto the spare: raidctl -F component1 raid0
Monitor progress: raidctl -S raid0
Shut down and power off the system.
Slide out the backup drive in wd0.
Slide in the other goofed-up drive as wd0.
At the bootloader, boot hd1a:netbsd.
Add the remaining goofed-up drive as a spare: raidctl -a /dev/wd0a raid0.
Begin reconstruction onto the spare: raidctl -F component0 raid0
Monitor progress: raidctl -S raid0
Next time, try to screw up in a different way!

Ideas

Talk about drives as A/B/C.
In -current, put data on a large cgd -encrypted partition.
If you have true hot-swap drives, you needn't shut down to change disks. However, since you're not shutting down, you'll have to “checkpoint” the filesystem some other way (perhaps fss(4)) to ensure it's in a consistent state when you pull it. (Greg Oster)
RAIDframe doesn't mind if drives get renumbered. If you mind, use a kernel that hard-codes each wdN to its bus.
With a sufficiently large case, you could use a three-way mirror. (Dan Carosone)
The only way to get a three-way mirror with RAIDframe is to create a RAID1 consisting of a single component and another RAID1 set. (Greg Oster)
With more than one filesystem on RAID1, there'd be a proliferation of recovery partitions. Perhaps configure a different RAID set for recovery purposes. (Dan Carosone)
Use a backup drive several times the capacity of the onboard drives. newfs the backup drive. At backup time, mkfile a new image the size of your mirror, vnconfig, add the vnd to the RAID set, and reconstruct. Since it's a vnd, it could be over NFS. Beforehand, to improve compressibility: dd if=/dev/zero of=foo bs=32k & ; rm foo (Dan Carosone)
Modern IDE drives aren't designed to sit powered down for long periods of time. Take a backup approximately once a week. (Charles Hannum)
Scripts to handle drive swaps: detect new or outdated, detect same or different drive bay from last time, go. Maybe suggest which drive to pull on shutdown? As a NetBSD-only package of rc.d scripts? (Alistair Crooks)
Refer to Martti's guide at the end. Expand other references inline.
Expect netbsd-2-0 or newer. (”You can do this with 1.6.2 also; see…”) (Alistair Crooks)
Explicitly list all commands to run, every time. (Alistair Crooks)
Explain the cgd idea (lets you trade offsite backups with someone; would reduce throughput)
Explain every time how to “monitor” progress, explain how long it will take, and note that you don't need to supervise it all the time. (Alistair Crooks)
Note that you probably don't want to mount the 'e' partition at the same time that the RAID set is configured and mounted. You might be able to get away with doing it read-only, but that's a dangerous sort of thing to try, and why would you want to? (Alistair Crooks)
Note that the 'e' partition is a hack that works only because it's RAID1 and we have knowledge of the on-disk structures. (Alistair Crooks)

Thanks

The NetBSD Project for a terrific OS,
Martti Kuparinen for easy-to-follow RAID1 install docs,
Dan Carosone for comments and ideas, and
Greg Oster for suggestions, sanity checks, general assistance, and (of course) RAIDframe as found in NetBSD.

About Amitai

Speaking

Writing

Music

Code

Elsewhere