CommunityData:Backups (nada): Difference between revisions

From CommunityData
mNo edit summary
 
(8 intermediate revisions by one other user not shown)
Line 1: Line 1:
Nada has about 14TB of available disk space. Although we want to backup thing everything by default, backing up all 14TB would cost about $140/month. As a result, we should be smart about what/how we back things up.
'''nada.com.washington.edu''' (our main Internet-connected research server at the University of Washington) has about 14TB of available disk space. The disks are in a [https://enw.wikipedia.org/wiki/RAID5 RAID5 configuration]. This means that if there's hardware failure on one drive, we won't lose data. If there is hardware failure on more than one drive through bad luck, some sort of physical accident that destroys the machine (e.g., fire), or if someone accidentally deletes files they need, we'd be out of luck. As a result, we have backups.
 
Although we would love to backup thing everything, backing up all 14TB would cost about $140/month! As a result, we are trying be smart about what/how we back things up. This page discusses the current backup setup and strategy.


== Backups on Nada ==
== Backups on Nada ==


Nada backups are systemwide backups using [http://duplicity.nongnu.org/ Duplicity]. The backups are incremental backups using [http://www.nongnu.org/rdiff-backup/ rdiff backup] (think [https://rsync.samba.org/ rsync]), are encrypted using GPG, and then are stored in [https://cloud.google.com/storage/docs/nearline?hl=en Google Nearline storage] which costs about $0.01/GB.  
Nada backups are full filesystem-wide backups using [http://duplicity.nongnu.org/ Duplicity]. The backups are incremental backups done weekly using [http://www.nongnu.org/rdiff-backup/ rdiff backup] (think [https://rsync.samba.org/ rsync]), are encrypted using a GPG key under Mako's control, and are stored in [https://cloud.google.com/storage/docs/nearline?hl=en Google Nearline storage] which costs about $0.01/GB. Backups run once at the beginning of each week.
Backups scripts run once each week.


Everything is backed up except for the directories listed in <code>/root/duplicitity_exclude</code>. This page may not be up to date but the following files are '''excluded''' at the time that this page was written:
Everything is backed up except for the directories listed in <code>/root/duplicitity_exclude</code>. This page may not be up to date but the following files/directories are '''excluded''' at the time that this page was written:


  /mnt
  /mnt
Line 36: Line 37:
== Backing up Databases ==
== Backing up Databases ==


MySQL databases '''are''' backed up using a separate MySQL incremental backup script that calls [https://www.percona.com/software/mysql-database/percona-xtrabackup Percona XtraBackup]. These incremental MySQL backups are created once each week before the duplicity backup script is run. To add a new MySQL database to the backup list, you must edit the following files:
=== MySQL Backups ===
 
Although <code>/var/lib/mysql</code> is excluded, some MySQL databases '''are backed up'' using a separate MySQL incremental backup script that calls [https://www.percona.com/software/mysql-database/percona-xtrabackup Percona XtraBackup]. These incremental MySQL backups are created once each week before the duplicity backup script is run. To add a new MySQL database to the backup list, you should edit the following files:


  /usr/local/sbin/mysql_backup_full
  /usr/local/sbin/mysql_backup_full
  /usr/local/sbin/mysql_backup_inc
  /usr/local/sbin/mysql_backup_inc
== Minimizing Backup Size ==
If you have large datasets that are unlikely to change or be replaced, store a copy of these data in the <code>/com/</code> directory in Hyak and keep the files in <code>/home/<YOUR NAME>/nobackup</code> and then symlink them from a more convenient location.

Latest revision as of 20:38, 4 March 2021

nada.com.washington.edu (our main Internet-connected research server at the University of Washington) has about 14TB of available disk space. The disks are in a RAID5 configuration. This means that if there's hardware failure on one drive, we won't lose data. If there is hardware failure on more than one drive through bad luck, some sort of physical accident that destroys the machine (e.g., fire), or if someone accidentally deletes files they need, we'd be out of luck. As a result, we have backups.

Although we would love to backup thing everything, backing up all 14TB would cost about $140/month! As a result, we are trying be smart about what/how we back things up. This page discusses the current backup setup and strategy.

Backups on Nada[edit]

Nada backups are full filesystem-wide backups using Duplicity. The backups are incremental backups done weekly using rdiff backup (think rsync), are encrypted using a GPG key under Mako's control, and are stored in Google Nearline storage which costs about $0.01/GB. Backups run once at the beginning of each week.

Everything is backed up except for the directories listed in /root/duplicitity_exclude. This page may not be up to date but the following files/directories are excluded at the time that this page was written:

/mnt
/media
/mit
/nonexistent
/openafs_cache_fs
/tmp
/var/log
/var/lib/mysql
/var/lib/mongodb
/var/lib/redis
/var/lib/postgresql
/var/spool
/var/tmp
/var/cache
/lost+found
/lolo
/cdrom
/floppy
/proc
/sys
/root/.cache
/root/nobackup
/home/*/.cache
/home/*/nobackup
/home/awjordan

Backing up Databases[edit]

MySQL Backups[edit]

Although /var/lib/mysql is excluded, some MySQL databases 'are backed up using a separate MySQL incremental backup script that calls Percona XtraBackup. These incremental MySQL backups are created once each week before the duplicity backup script is run. To add a new MySQL database to the backup list, you should edit the following files:

/usr/local/sbin/mysql_backup_full
/usr/local/sbin/mysql_backup_inc

Minimizing Backup Size[edit]

If you have large datasets that are unlikely to change or be replaced, store a copy of these data in the /com/ directory in Hyak and keep the files in /home/<YOUR NAME>/nobackup and then symlink them from a more convenient location.