Good Backup Practice Short Guide

Presentation

This short guide is here to gather important (and somehow obvious) techniques about computer backups. It also explains the risks you take not following these principles. I thought this was obvious and well known by anyone, up to recently when I started getting feedback of people complaining about their lost data because of bad media or other reasons. To the question "have you tested your archive?", I was surprised to get the negative answers.

This guide is not especially linked to Disk ARchive (aka dar) no more than to any other tool, thus, you can take advantage of reading this document if you are not sure of your backup procedure, whatever is the backup software you use.

Notions

In the following we will speak about backup and archive:

by backup, is meant a copy of some data that remains in place in an operational system
by archive, is meant a copy of data that is removed afterward from an operational system. It stays available but is no more used frequently.

With the previous meaning of an archive you can also make a backup of an archive (for example a clone copy of your archive).

Backup

Backups act a bit like an archive, except that they are a copy of a changing set of data, which is moreover expected to stay on the original location (the system). But, as an archive, it is a good practice to at least test the resulting backups, and once a year if possible to test the overall backup process by doing a restoration of your system into a new virtual machine or a spare computer, checking that the recovered system is fully operational.

The fact that the data is changing introduces two problems:

A backup is quite never up to date, and you will probably loose data if you have to rely on it
A backup becomes soon obsolete.

The backup has also the role of keeping a recent history of changes. For example, you may have deleted a precious data from your system. And it is quite possible that you notice this mistake long ago after deletion. In that case, an old backup stays useful, in spite of many more recent backups.

In consequences, backup need to be done often for having a minimum delta in case of crash disk. But, having new backup do not mean that older can be removed. A usual way of doing that, is to have a set of media, over which you rotate the backups. The new backup is done over the oldest backup of the set. This way you keep a certain history of your system changes. It is your choice to decide how much archive you want to keep, and how often you will make a backup of your system.

Differential / incremental backup

A point that can increase the history while saving media space required by each backup is the differential backup. A differential backup is a backup done only of what have changed since a previous backup (the "backup of reference"). The drawback is that it is not autonomous and cannot be used alone to restore a full system. Thus there is no problem to keep the differential backup on the same medium as the one where is located the backup of reference.

Doing a lot of consecutive differential backup (taking the last backup as reference for the next differential backup, which some are used to call "incremental" backups), will reduce your storage requirement, but will extra timecost at restoration in case of computer accident. You will have to restore the full backup (of reference), then you will have to restore all the many backup you have done up to the last. This implies that you must keep all the differential backups you have done since the backup of reference, if you wish to restore the exact state of the filesystem at the time of the last differential backup.

It is thus up to you to decide how much differential backup you do, and how much often you make a full backup. A common scheme, is to make a full backup once a week and make differential backup each day of the week. The backup done in a week are kept together. You could then have ten sets of full+differential backups, and a new full backup would erase the oldest full backup as well as its associated differential backups, this way you keep a ten week history of backup with a backup every day, but this is just an example.

An interesting protection suggested by George Foot on the dar-support mailing-list: once you make a new full backup, the idea is to make an additional differential backup based on the previous full backup (the one just older than the one we have just built), which would acts as a substitute for the actual full backup in case something does go wrong with it later on.

Decremental Backup

Based on a feature request for dar made by "Yuraukar" on dar-support mailing-list, the decremental backup provides an interesting approach where the disk requirement is optimized as for the incremental backup, while the latest backup is always a full backup (while this is the oldest that is full, in the incremental backup approach). The drawback here is that there is some extra work at each new backup creation to transform the former more recent backup from a full backup to a so called "decremental" backup.

The decremental backup only contains the difference between the state of the current system and the state the system had at a more ancient date (the date of the full backup corresponding the decremental backup was made).

In other words, the building of decremental backups is the following:

Each time (each day for example), a new full backup is made
The full backup is tested, parity control is eventually built, and so on.
From the previous full backup and the new full backup, a decremental backup is made
The decremental backup is tested, parity control is eventually built, an so on.
The oldest full backup can then be removed

This way you always have a full backup as the lastest backup, and decremental backups as the older ones.

You may still have several sets of backup (one for each week, for example, containing at the end of a week a full backup and 6 decremental backups), but you also may just keep one set (a full backup, and a lot of decremental backups), when you will need more space, you will just have to delete the oldest decremental backups, thing you cannot do with the incremental approach, where deleting the oldest backup, means deleting the full backup that all others following incremental backup are based upon.

At the difference of the incremental backup approach, it is very easy to restore a whole system: just restore the latest backup (by opposition to restoring the more recent full backup, then the as many incremental backup that follow). If now you need to recover a file that has been erased by error, just use a the adequate decremental backup. And it is still possible to restore a whole system globally in a state it had long ago before the lastest backup was done: you will for that restore the full backup (latest backup), then in turn each decremental backup up to the one that correspond to the epoch of you wish. The probability that you have to use all decremental backup is thin compared to the probability you have to use all the incremental backups: there is effectively much more probability to restore a system in a recent state than to restore it in a very old state.

There is however several drawbacks:

time: Doing each time a full backup is time consumming and creating a decremental backup from two full backups is even more time consuming...
temporary disk space: Each time you create a new backup, you temporarily need more space than using the incremental backup, you need to keep two full backups during a short period, plus a decremental backup (usually much smaller than a full backup), even if at then end you remove the oldest full backup.

In conclusion, I would not tell that decremental backup is the panacea, however it exists and may be of interest to some of you. More information about dar's implementation of decremental backup can be found here.

Any other trick/idea/improvement/correction/evidences are welcome!