Create an inexpensive hourly remote backup
There are two kinds of people, those who backup regularly, and those that never had a hard drive fail
As you can tell the above is my favorite quote. It is so true and I believe everyone should evaluate how much their data (emails, documents, files) is worth to them and, based on that value, create a backup strategy that suits them. I know for sure that if I ever lost the pictures and videos of my family I would be devastated since those are irreplaceable.
So the question is how can I have an inexpensive backup solution? All my documents and emails are stored in Google, since my domain is on Google Apps. What happens to the live/development servers though that host all my work? I program on a daily basis and the code has to be backed up regularly so as to avoid any hard drive failures and thus result in loss of time and money.
So here is my solution. I have an old computer (IBM Thincentre) which I decided to beef up a bit. I bought 4Gb of RAM from eBay for less than $100 for it. Although this was not necessary since my solution would be based on Linux (Gentoo in particular), I wanted to have faster compilation times for packages.
I bought two external drives (750Gb and 500Gb respectively) and one 750Gb internal drive. I already have a 120Gb hard drive in the computer. The two external ones are connected to the computer using USB while the internal ones are connected using SATA.
The external drives are formatted using NTFS while the whole computer is built using ReiserFS.
Here is the approach:
- I have installed and have a working Gentoo installation on the machine
- I have an active Internet connection
- I have installed LVM on the machine and set up the core system on the 120Mb drive while the 500Mb is on LVM
- I have 300Mb active on the LVM (from the available 500Mb)
- I have generated a public SSH key (I will need this to exchange it with the target servers)
- I have mounted the internal 500Mb drive to the
/storage
folder - I have mounted the external USB 750Mb drive to the
/backup_hourly
folder - I have mounted the external USB 500Mb drive to the
/backup_daily
folder
Here is how my backup works:
Every hour a script runs. The script uses rsync to syncrhonize files and folders from a remote server locally. Those files and folders are kept in relevant server named subfolders in the /storage
folder (remember this is my LVM). So for instance my subfolders will be /storage/beryllium.niden.net
, /storage/nitrogen.niden.net
, /storage/argon.niden.net
etc.
Once the rsync completes, the script continues by compressing the relevant ‘server’ folder and creates the compressed file with a date-time stamp on its name.
When all compressions are completed, if the time that the script has executed is midnight, the backups are moved from the /storage
folder to the /backup_daily
folder (which has the external USB 500Gb mounted). If it is any other time, the files are moved in the /backup_hourly
folder (which has the external USB 750Gb mounted).
This way I ensure that I keep a lot of backups (daily and hourly ones). The backups are being recycled, so older ones get deleted. The amount of data that you need to archive as well as the storage space you have available dictate how far back you can go in your hourly and daily cycles.
So let’s get down to business. The script itself:
#!/bin/bash
DATE=`date +%Y-%m-%d-%H-%M`
DATE2=`date +%Y-%m-%d`
DATEBACK_HOUR=`date --date='6 days ago' +%Y-%m-%d`
DATEBACK_DAY=`date --date='60 days ago' +%Y-%m-%d`
FLAGS="--archive --verbose --numeric-ids --delete --rsh='ssh'"
BACKUP_DRIVE="/storage"
DAY_USB_DRIVE="/backup_daily"
HOUR_USB_DRIVE="/backup_hourly"
These are some variables that I need for the script to work. The DATE
and DATE2
are used to date/time stamp the backups, while the DATEBACK_
* are used to clear previous backups. In this case it is set to 6 days ago (for my system). It can be set to whatever you want provided that you do not run out of space.
The FLAGS
variable keeps the rsync command options while the BACKUP_DRIVE
, DAY_USB_DRIVE
and HOUR_USB_DRIVE
hold the locations of the rsync folders, daily backup and hourly backup sorage areas.
The script works with arrays. I have 4 arrays to do the work and the 3 of them must have exactly the same elements.
# RSync Information
rsync_info[1]="beryllium.niden.net html rsync"
rsync_info[2]="beryllium.niden.net db rsync"
rsync_info[3]="nitrogen.niden.net html rsync"
rsync_info[4]="nitrogen.niden.net html db"
rsync_info[5]="nitrogen.niden.net html svn"
rsync_info[6]="argon.niden.net html rsync"
This is the first array which holds descriptions to what needs to be done as far as source is concerned. These descriptions get appended to the log and helps me identify what step I am in.
# RSync Source Folders
rsync_source[1]="beryllium.niden.net:/var/www/localhost/htdocs/"
rsync_source[2]="beryllium.niden.net:/niden_backup/db/"
rsync_source[3]="nitrogen.niden.net:/var/www/localhost/htdocs/"
rsync_source[4]="nitrogen.niden.net:/niden_backup/db"
rsync_source[5]="nitrogen.niden.net:/niden_backup/svn"
rsync_source[6]="argon.niden.net:/var/www/localhost/htdocs/"
This array holds the source host and folder. Remember that I have already exchanged SSH keys with each server, therefore when the script runs there is a direct connection to the source server. If you need to keep things a bit more secure for you, then you will need to alter the contents of the rsync_source array so that it reflects the user that you log in with as well as the password.
# RSync Target Folders
rsync_target[1]="beryllium.niden.net/html/"
rsync_target[2]="beryllium.niden.net/db/"
rsync_target[3]="nitrogen.niden.net/html/"
rsync_target[4]="nitrogen.niden.net/db/"
rsync_target[5]="nitrogen.niden.net/svn/"
rsync_target[6]="argon.niden.net/html/"
This array holds the target locations for the rsync. These folders exist in my case under the /storage
subfolder.
# GZip target files
servers[1]="beryllium.niden.net"
servers[2]="nitrogen.niden.net"
servers[3]="argon.niden.net"
This array holds the names of the folders to be archived. These are the folders directly under the /storage
folder and I am also using this array for the prefix of the compressed files. The suffix of the compressed files is a date/time stamp.
Here is how the script evolves:
echo "BACKUP START" >> $BACKUP_DRIVE/logs/$DATE.log
date >> $BACKUP_DRIVE/logs/$DATE.log
echo "BACKUP START" >> $BACKUP_DRIVE/logs/$DATE.log
date >> $BACKUP_DRIVE/logs/$DATE.log
# Loop through the RSync process
element_count=${#rsync_info[@]}
let "element_count = $element_count + 1"
index=1
while [ "$index" -lt "$element_count" ]
do
echo ${rsync_info[$index]} > $BACKUP_DRIVE/logs/$DATE.log
rsync $FLAGS ${rsync_source[$index]} $BACKUP_DRIVE/${rsync_target[$index]} > $BACKUP_DRIVE/logs/$DATE.log
let "index = $index + 1"
done
The snippet above loops through the rsync_info
array and prints out the information in the log file. Right after that it uses the rsync_source
and rsync_target
arrays (as well as the FLAGS
variable) to rsync the contents of the source server with the local folder. Remember that all three arrays have to be identical in size (rsync_info
, rsync_source
, rsync_target
).
The next thing to do is zip the data (I loop through the servers array)
# Looping to GZip data
element_count=${#servers[@]}
let "element_count = $element_count + 1"
index=1
while [ "$index" -lt "$element_count" ]
do
echo "GZip ${servers[$index]}" > $BACKUP_DRIVE/logs/$DATE.log
tar cvfz $BACKUP_DRIVE/${servers[$index]}-$DATE.tgz $BACKUP_DRIVE/${servers[$index]} > $BACKUP_DRIVE/logs/$DATE.log
let "index = $index + 1"
done
The compression method I use is tar/gzip. I found it to be fast with a good compression ratio. You can choose anything you like.
Now I need to delete old files from the drives and copy the files on those drives. I use the servers array again.
# Looping to copy the produced files (if applicable) to the daily drive
element_count=${#servers[@]}
let "element_count = $element_count + 1"
index=1
while [ "$index" -lt "$element_count" ]
do
# Copy the midnight files
echo "Removing old daily midnight files" > $BACKUP_DRIVE/logs/$DATE.log
rm -f $DAY_USB_DRIVE/${servers[$index]}/${servers[$index]}-$DATEBACK_DAY*.* > $BACKUP_DRIVE/logs/$DATE.log
echo "Copying daily midnight files" > $BACKUP_DRIVE/logs/$DATE.log
cp -v $BACKUP_DRIVE/${servers[$index]}-$DATE2-00-*.tgz $DAY_USB_DRIVE/${servers[$index]}  >>; $BACKUP_DRIVE/logs/$DATE.log
rm -f $BACKUP_DRIVE/${servers[$index]}-$DATE2-00-*.tgz > $BACKUP_DRIVE/logs/$DATE.log
# Now copy the files in the hourly
echo "Removing old hourly files" > $BACKUP_DRIVE/logs/$DATE.log
rm -f $HOUR_USB_DRIVE/${servers[$index]}/${servers[$index]}-$DATEBACK_HOUR*.* > $BACKUP_DRIVE/logs/$DATE.log
echo "Copying daily midnight files" > $BACKUP_DRIVE/logs/$DATE.log
cp -v $BACKUP_DRIVE/${servers[$index]}-$DATE.tgz $HOUR_USB_DRIVE/${servers[$index]} > $BACKUP_DRIVE/logs/$DATE.log
rm -f $HOUR_USB_DRIVE/${servers[$index]}/${servers[$index]}-$DATEBACK*.* > $BACKUP_DRIVE/logs/$DATE.log
let "index = $index + 1"
done
echo "BACKUP END" >> $BACKUP_DRIVE/logs/$DATE.log
The last part of the script loops through the servers array and:
- Deletes the old files (recycling of space) from the daily backup drive (
/storage/backup_daily
) according to theDATEBACK_DAY
variable. If the files are not found a warning will appear in the log.