CommunityData:Backups (nada)

From CommunityData (our main Internet-connected research server at the University of Washington) has about 14TB of available disk space. The disks are in a RAID5 configuration. This means that if there's hardware failure on one drive, we won't lose data. If there is hardware failure on more than one drive through bad luck, some sort of physical accident that destroys the machine (e.g., fire), or if someone accidentally deletes files they need, we'd be out of luck. As a result, we have backups.

Although we would love to backup thing everything, backing up all 14TB would cost about $140/month! As a result, we are trying be smart about what/how we back things up. This page discusses the current backup setup and strategy.

Backups on Nada[edit]

Nada backups are full filesystem-wide backups using Duplicity. The backups are incremental backups done weekly using rdiff backup (think rsync), are encrypted using a GPG key under Mako's control, and are stored in Google Nearline storage which costs about $0.01/GB. Backups run once at the beginning of each week.

Everything is backed up except for the directories listed in /root/duplicitity_exclude. This page may not be up to date but the following files/directories are excluded at the time that this page was written:


Backing up Databases[edit]

MySQL Backups[edit]

Although /var/lib/mysql is excluded, some MySQL databases 'are backed up using a separate MySQL incremental backup script that calls Percona XtraBackup. These incremental MySQL backups are created once each week before the duplicity backup script is run. To add a new MySQL database to the backup list, you should edit the following files:


Minimizing Backup Size[edit]

If you have large datasets that are unlikely to change or be replaced, store a copy of these data in the /com/ directory in Hyak and keep the files in /home/<YOUR NAME>/nobackup and then symlink them from a more convenient location.