My approach to backups
In this post I describe my approach to backing up personal data. My backup process is something I've refined over many years, striving to keep it simple while ensuring my backups are complete and redundant.
The 3-2-1 backup strategy
I try to follow the 3-2-1 backup strategy for all my backups. The 3-2-1 strategy states that you should:
- Keep 3 copies of your data
- on 2 different media
- with at least 1 copy being kept off-site
This strategy covers most of the likely restore scenarios you'll run into, and some less likely ones. The obvious one is simply restoring your working copy from backup. By keeping at least two additional copies it ensures that you can still restore if your main backup media has become corrupt.
Keeping one of the copies off-site ensures that you can still restore your data in the event of a full disaster - such as a lightning strike frying all your electronics or a fire breaking out in your home.
When implementing my backup strategy I have a few additional considerations to the above strategy:
- All backup operations must happen automatically, i.e. I shouldn't spend any time on the backup process unless something breaks
- Two out of three backup copies should be kept on media under my control, not in the cloud. However, the cloud is a good choice for the off-site backup
- Backups must be encrypted
- Backups must be tested and verified periodically
- Backups must be pruned periodically
- Backups must be stored in an open and documented format
- I must be notified of any backup errors
In the 3-2-1 strategy I count the working copy of my data as one of those three copies.
Data overview
Before implementing a backup strategy, it's good to have an idea about what data you need to back up and where that data lives.
For example, I have three devices holding data I care about:
- Laptop
- File server
- iPhone
My laptop holds the working copy of all my important data. This includes:
- Code and configuration (in Git repositories)
- Documents
- Email, contacts and calendar
- Photos
- Data from cloud services I use
Since my laptop runs macOS, all my data is stored in a single location: $HOME
.
This makes it easy to backup everything, but I still add some exclude rules for
cache directories and other cruft.
My file server holds both a working copy of some less important data and my primary backups. My phone holds a subset of the data on my laptop, such as documents, email and photos.
Implementation
To implement my backup strategy I rely primarily on these programs:
I use restic
to create backup snapshots periodically on my laptop and my file
server. Both of these devices run restic backup
of my home directory once a
day, and store their data in a shared restic repository.
In the past I've used duply (a frontend for duplicity) to backup my file server, and Arq to backup my laptop. However, when redoing my setup I wanted to use a single program across all my devices.
My laptop backs up to the restic repository through SFTP, while my file server
accesses the repository locally as the repository resides on its redundant ZFS
filesystem. This results in having two copies of my data. One working copy and
one backup copy in the restic repository. To ensure I also have a third copy,
located off-site, my file server uses rclone sync
to mirror the restic
repository to Jottacloud nightly.
Because the home directory on both my laptop and file server is always backed up, any other data I need to backup up can simply be downloaded to my home directory. I do this periodically for data which is stored on remote services, such as my email.
All backup operations are automated with cron, which is installed by default on most UNIX-like systems. If an operation fails, cron sends me an email.
Backup overview:
laptop, server (working copy) | V restic repository on server (local copy) | V jottacloud (off-site copy)
Maintenance
Creating daily backups of two machines quickly adds up, so to prune backups I
have wopr run restic forget --prune
periodically. I tell restic to keep the
past 30 daily and 24 weekly (6 months) snapshots. Since restic de-duplicates
data that doesn't change between snapshots, this is relatively cheap in terms of
space.
As both my laptop and file server share the same restic repository, the laptop doesn't need to run the pruning process itself, which can be expensive and thus not ideal to run on a laptop.
All data contained in the restic repository is verified by my file server once a
month. This process is made easy by using restic check --read-data
which
verifies that the data contained in all backup snapshots can be restored and
that its matches the stored checksums.
Data stored in cloud services
Some of my data is integrated into their respective service, such as my Spotify listening history or check-ins on Untappd. Luckily, since the introduction of GDPR, most services now support exporting data to an open format.
For most services I've written scripts to export the data, but where this isn't possible (e.g. Spotify or Netflix) I do a manual export once per quarter.
Running restic robustly on macOS
My laptop runs macOS, which includes cron. Adding entries to crontab works fine for the most part, but when the laptop is asleep it naturally won't execute anything. It will also not execute any crontab entries that was missed after it wakes up again.
This is problematic when using cron to run backups. After briefly looking into the mess that is Launchd, I ended up writing once.sh - a shell script that will run a given command at most once per a given time unit, e.g. once per day.
By running restic
through once
frequently enough, I can be sure that a
backup is created if my laptop is awake for at least 15 minutes. The following
crontab entry runs restic backup
once every day, checking every 15 minutes in
the hours 15-23:
*/15 15-23 * * * once d restic --quiet backup $HOME
iCloud and sharing files between devices
Even though I'm generally skeptical of storing personal data in a cloud service, I still use iCloud on my Apple devices. I don't use it for backup purposes, but I've found it's the most convenient solution for sharing my documents and photos across all my mobile devices.
Conclusion
In this post I've shown how I use restic and rclone to implement a 3-2-1 backup strategy for my data, in an automated fashion.
I've been running this setup for the past six months and it has worked exceptionally well, with the exception of my cloud storage provider losing data. In my initial testing of restic, the pruning process was very slow, but this was fixed in version 0.12. Beyond that, both rclone and restic have been rock solid!
Appendix: once.sh
A simple script for running a command at most once per the given time unit. For
example once d ls
will run ls
once on the current day. Repeated executions
of once d ls
won't actually run ls
again until the day changes.
#!/usr/bin/env bash # Run a command successfully, at most once per given time unit. E.g. once daily # or hourly. set -euo pipefail declare -r once_root="${ONCE_ROOT:-${HOME}/.once}" declare -r once_cmd="${0##*/}" declare -a once_locks function fail { echo -e "$once_cmd: $*" 1>&2 exit 1 } function usage { echo -e "usage: $once_cmd Y|m|d|H|M|S COMMAND $once_cmd ls $once_cmd rm ID [ID]..." exit 1 } function lock { trap 'rm -fd -- "${once_locks[@]}"' INT TERM EXIT local id local lock_dir for id in "$@"; do lock_dir="$once_root/$id/lock" if ! mkdir "$lock_dir" 2> /dev/null; then fail "could not acquire lock: $lock_dir" fi once_locks+=("$lock_dir") done } function list { { echo -e "ID\tLAST\tCOMMAND" sort -k 2,2 -k 3,3 "$once_root"/*/state 2> /dev/null } | column -t -s $'\t' } function once { local -r fmt="$1" [[ "$fmt" =~ ^[YmdHMS]$ ]] || fail "invalid format: $fmt" shift local -r cmd="$*" local -r hash="$(echo -n "$cmd" | openssl md5 -r)" local -r id="${hash%% *}" # Split on space and return first value local -r state_root="$once_root/$id" local -r state_file="$state_root/state" mkdir -p "$state_root" lock "$id" local -r now="$(date "+${fmt}=%${fmt}")" local -r last_run="$(cut -f 2 "$state_file" 2> /dev/null)" if [[ "$last_run" != "$now" ]]; then "$@" echo -e "$id\t$now\t$cmd" > "$state_file" fi } function remove { shift lock "$@" local id local cmd_dir for id in "$@"; do cmd_dir="${once_root}/$id" rm -f -- "${cmd_dir}/state" rm -fd -- "${cmd_dir}/lock" "${cmd_dir}" done } function main { local -r cmd="${1:-}" if [[ "$cmd" == "ls" ]]; then list elif [[ $# -ge 2 ]]; then if [[ "$cmd" == "rm" ]]; then remove "$@" else once "$@" fi else usage fi } main "$@"