Benchmarking backup tools

Introduction

This document has for objective to compare common backup tools under Unix (Linux, FreeBSD, MACOS X...), among the most commonly available today.

The first target we want to address is being able to copy a directory tree and files with the best fidelity,
The second target is being able to backup and restore a whole system from a minimal environment without assistance of an already existing local server (disaster context).
The third target is being able to securely keep for the long term an archived data. Securely here means having the ability to detect data corruption and limit its impact on the rest of the archive.

Depending on the targets we may need compression and/or ciphering inside backup, but also denpending on the context (public cloud storage, removable media, ...), limited storage space.

Backup softwares that requires servers already running on the local network (For examples Bacula, Amanda, Bareos, UrBackup, Burp...) cannot address our second target as we would have first to reconstruct such server in case of disaster (from what then?) in order be able to restore our system and its data. They are over complex for the first target and are not suitable for the third.

Partition cloning systems (clonezilla, MondoRescue, RescueZilla, partclone, dump and consorts) are targetted at block copy and as such cannot backup a live system: you have to shutdown and boot on a CD/USB key or run in single user-mode in order to "backup". This cannot be automated and has a strong impact on the user as she/he has to interrupt her/his work during the whole backup operation.

Looking at the remaining backup tools, with or without Graphical User Interface, most of them rely on one of the three backend softwares, tar, rsync and dar:

Software based on dar: gdar, DarGUI, Baras, Darbup, Darbrrd, HUbackup, SaraB...
Software based on rsync: TimeShift, rsnapshot...
Software based on tar: BackupPC, Duplicity, fwbackups...

We will thus compare these three softwares for the different test famillies described below.

Tests Famillies

Several aspects are to be considered:

completness of the restoration: file permissions, dates precision, hardlinks, file attributes, Extended Attributes, sparse files...

main features around backup: differential backup, snapshot, deduplication, compression, encrytion, file's history...

robustness of the backup: how data corruption impact the backup, how it is reported...

execution performance: execution time, memory consumption, multi-threading support...

Benchmark Results

The results presented here are a synthesis of the test logs. This synthesis is in turn summarized one step further in conclusion of this document.

Completness of backup and restoration

Software	plain file	symlink	hardlinked files	hardlinked sockets	hardlinked pipes	user	group	perm.	ACL	Extended Attributes	FS Attributes	atime	mtime	ctime	btime	Spares File	Disk usage optimization
Dar	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	-	yes(1)	yes	yes
Rsync	yes	yes	yes	yes	yes	yes	yes	yes	yes(4)	yes(5)	-	-	yes	-	yes(1)	yes(6)	yes(6)
Tar	yes	yes	yes	- (2)	-	yes	yes	yes	yes(7)	yes(8)	-	-	yes(3)	-	yes(1)	yes(6)	-

(1) "Yes" under MACoS X, FreeBSD and BSD systems. As of today (year 2020), Linux has no way to set the btime aka birthtime or yet creation time
(2) tar does even not save and restore plain normal sockets, but that's not a big issue in fact as Unix sockets should be recreated by the applications that provide the corresponding service
(3) unless --xattrs is provided, mtime is saved by tar but with an accuracy of only 1 second, while today's systems provide nanosecond precision
(4) needs -A option
(5) needs -X option
(6) needs -S option
(7) needs --acl option
(8) needs --xattrs option

See the test logs for all the details.

Feature set

In addition to the exhaustivity of the restored data (seen above), several features are a must have when creating backups. Their description and what they bring to a backup process is given below, followed by a table of how they are supported on the different softwares under test:

Historization

Historization is the ability to restore a deleted file even long after the mistake has been made by rotating backups over an arbitrary large number of backup set. Having associated tools to quickly locate the backup where resides a particular file's version becomes important when the history increases. Historization can be done with only full backups, but of course better leverages differential and incremental backups.

Data filtering

Not all files need to be saved:

some directories (like /tmp, /proc, /sys, /dev, /home/*/.cache) are useless to save
some files based on their name or part of their name --- their extension for example, (like emacs's backup files *~ or your music files*.mp3 you already have archives somewhere, and so on) need not to be saved neither.
You may wish to ignore files located one or more particular mounted filesystem, or at the opposite, only consider certains volume/disk/mounted filesystem and ignore all others, and have different backup rotation cycles for those.
You may also find better to tag files one by one (manually by mean of an automated process of your own), to be excluded from or included in the backup
Instead of tagging you could also let a process define a long file listing to backup and/or to ignore.
Last, you may well need a mix of several of these mechanisms at the same time

Slicing (or multi-volume)

Having a backup split into several files of given max size can address several needs:

hold the backup on several removal media (CD, DVD, USB keys...) smaller than the backup itself
transfer the backup from a large space to another by mean of a smaller removable media
transfer the backup over the network and recover at the last transmitted slice rather than restarting the whole transfer in case of network issue
store the backup int the cloud where the provider limits the file size
be able to restore a backup on a system where storage space cannot hold both the backup and the restored system
transfer back from the cloud only a few slices to restore some files, when cloud provider does not provide adhoc protocols (sftp, ftp, ...) but only a user web based interface

Of course, multi-volume is really interesting if you don't have to concatenate all the slices to be able to have a usable backup.

Last the previously identified use cases for backup slicing turn around limited storage space, thus having compression available when multi-volume is used is a key point here.

Symmetric strong encryption

Symmetric strong encryption is the ability to cipher a backup with a password or passphrase and use that same key to decipher it. Some well known algorithms in this area are AES, blowfish, camellia...
Symmetric strong encryption is interesting for the following cases:

if your disk is ciphered, would you store your backup in clear on the cloud?
you do not trust your cloud provider to not inspect your data and make marketing profile of yourself with it.
You want to prevent your patented data or industrial secret recipies from falling into the competition's hands or goverment agencies that could clone it without fear of being prosecuted. This use case applies whether your backup is stored on local disk, removable media or public cloud.
Simply because in your country, you have the right and the freedom to have privacy.
Because your today democratic country could tomorrow verse into a dictatorship and based on some arbitrary criteria, (belief, political opinion, sexual orientation...) you could suffer tomorrow from having this information having been accessible today to the authorities or even having been publicly released, while you still need backup using arbitrary storage medium.

Asymmetric strong encryption

Asymmetrical strong encryption is the ability to cipher a backup with a public key and having the corresponding private key for deciphering it (PGP, GnuPG...).
Asymmetric encrypion is mainly interesting when exchanging data over Internet between different persons, or eventually for archiving data in the public cloud. Having it for backup seems not appropriate and is more complex than symmetric strong encryption, as restoration requires the private key, which thus must be stored outside the backup itself still be protected from unauthorized access. The private key use can still be protected with a password or a passphrase but this gives the same feature level as symmetrical encryption with a more complex process and not much more security.

Protection against plain-text attack

Ciphering data must be done with a minimum level of security, in particular when the ciphered data has well defined structure and patterns, like a backup file format is expected to have. Knowing such expected structure of the clear data may lead an attacker to undisclose the whole ciphered data. This is known as plain-text attack.

Key derivation function

Using the same password/passphrase for different backups is convenient but not secure. Having a key derivation function using a salt let you use the same password/passphrase while the data will be encrypted with a different key each time, this is the role of the Key Derivation Function (KDF) (PKCS5/PBKDF2, Argon2...).
Another need for a KDF is that usually the human provided password/passphrase are weak: Even when we use letters, digits and some special characters, passwords and passphrases are still located in a small area of possible keys that a dictionnary attack can leverage. As the KDF is also by design CPU intensive, it costs a lot of effort and time to an attacker to derive each word of a dictionnary to its resulting KDF transformed words. The required time to perform a dictionnary attack can thus be multiplied by several hundred thousand times, leading to an effective time of tens of years and even centuries rather than hours or days.

File change detection

When backing up a live system, it is important to detect, retry saving or flag files that changed during the time they were read for backup. In such situation, the backed file could be recorded in a state it never had: As the backup process reads sequentially from the beginning to the end, if a modification A is done at the end of file then a modification B is made at its beginning during this file's backup, the backup may contain B and not A while at not time the file contained B without A. Seen the short time a file can be read, time accuracy of micro or nanoseconds is mandatory to detect such file change during a backup process, else you will screw up your data in the backup and have nothing to rely on in the occurence of a deleted file by mistake, disk crash or disaster.
At restoration time, if the file has been saved anyway, it should be good to know the such file was not saved properly, maybe restoring a older version but a sane one would be better. Something the user/sysadmin cannot guess if the backup does not hold such type of information.

Multi-level backup

Multi-level backup is the ability to make use of full backups, differential backups and/or eventually incremental backups.
The advantage of differential and incremental backups compared to full ones is the much shorter time they require to complete and the reduces storage space and/or bandwidth they imply when transfered over the network.

Binary delta

Without binary delta, when performing a differential or incremental backup, if a file has changed since the previous backup, it will be resaved entirely. Some huge files made by some well know applications (mailboxes for example) would consume a lot of storage space and lead to a long backup time even when performing incremental or differential backups. Binary delta is the ability to only store the part of a file that changed since a reference state, this lead to important space gain and reduction of the backup duration.

Detecting suspicious modifications

When performing a backup based on a previous one (differential, incremental, decremental backups), it is possible to check the way the metadata of saved files have changed until then and warn the user when some uncommon pattens are met. Those may be the trace of a rootkit, virus, ransomware or trojan, trying to hide its presence and activities.

Snapshot

A snapshot is like a differential backup made right after the full backup (no file has changed): it is a minimal set of information that can be used to:

create an incremental or differential backup without having the full backup around or more generally the backup of reference: When backup are stored remotely, snapshot is a must.
compare the current living filesystem with a status it had at the time the snapshot was made
bring some metadata redundancy and repairing mean to face a corrupted backup

On-fly hashing

On-fly hashing is the ability to generate a hashing of the backup at the same time it is generated and before it is written to storage. Such hash can be used to:

validate a backup has been properly transfered to a public storage cloud having hash computation done in parallel
check that no data corruption has occured (doubt about disk or memory) even when the backup is written to local disk

Hashing validation is usually faster than backup testing or backup comparison, though it does not validate your ability to rely on the backup as deeply as these later operations. Hashing can be made after the backup has been completed but it will need to re-read the whole backup and you will have to wait for the necessary storage I/O for the operation to complete. On-fly hashing should leverage the fact the data is in memory so it saves the corresponding disk I/O and corresponding latency, thus it is much faster. As it is also done in memory it can help detect file corruption on the backup destination media (like USB keys or poor quality hardware).

Run custom command during operation

For an automated backup process, it is often necessary to run commands before and after the backup operation itself. But also during the backup process. For example, when entering a directory, one could need to run an arbitrary command generating a file that will be included in the backup. Or while exiting such directory performing some cleanup operation in that same directory. Another use case is found when slicing the backup, by the ability to perform after each slice generation a custom operation like uploading the slice to cloud, burning to DVD-/+RW, loading a tape from a tape library...

Dry-run execution

When tuning a backup process, it is often necessary to verify quickly that all will work flawlessly without having to wait for a backup to complete, consume storage resource and network bandwidth.

User message within backup

Allowing the user to add an arbitrary message within the backup may be useful when the filename is too small to hold the needed information (like the context the backup or archive was made, hint for the passphrase... and so on).

Backup sanity test

It is crutial in a backup process to validate that the generated backup is usable. There are many reasons it could not be the case, from a data corruption in memory, on disk or over the network ; a disk space saturation leading to truncated backup, down to a software bug.

Comparing with original data

One step further for backup and archiving validation is compairing file content and metadata with the system it has.

Tunable verbosity

When a backup process is in production and works nicely, it is usually interesting to have the minimal output possible for that any error still be possible to log. While when setting up a backup process, having more detailed information is required to understand and validate that the backup process follows the expected path.

Modify the backup's content

Once a backup has been completed, you might notice that you have saved extra files you ought not to save. Being able to drop them from the backup to save some space without having to restart the whole backup may lead to a huge time saving.

You might also need to add some extra files that were outside the backup scope, having the possibility to add them without restarting the whole backup process may also lead to a huge time saving.

Stdin/stdout backup read/write

Having the ability to pipe the generated backup to an arbitrary command is on of the ultimate key of backup software flexibility.

Remote network storage

This is the ability to produce directly a backup to a network storage without using local disk, and to be able to restore directly reading a backup from the such remote storage still without using local storage. Network/Remote storage is to be understood as remote network storage like public cloud, private cloud, personal NAS... that are accesible from the network by mean of a file transfer protocols (scp, sftp, ftp, rcp, http, https...)

Feature	Dar	Rsync	Tar
Historization	Yes	-	Yes
Data filtering by directory	Yes	Yes	Yes
Data filtering by filename	Yes	Yes	limited
Data filtering by filesystem	Yes	limited	limited
Data filtering by tag	limited	-	-
Data filtering by files listing	Yes	yes	limited
Slicing/multi-volume	Yes	-	limited
Symmetric encryption	Yes	-	Yes
Asymmetric encryption	Yes	-	Yes
Plain-text attack protection	Yes	-	-
PBKDF2 Key Derivation Function	Yes	-	-
ARGON2 Key Derivation Function	Yes	-	-
File change detection	Yes	-	limited
Multi-level backup	Yes	-	Yes
Binary delta	Yes	Yes	-
Detecting suspicious modifications	Yes	-	-
Snapshot for diff/incr. backup	Yes	-	Yes
Snapshot for comparing	Yes	-	-
Snapshot for redundancy	Yes	-	-
On-fly hashing	Yes	-	-
Run custom command during operation	Yes	-	limited
Dry-run execution	Yes	Yes	-
User message within backup	Yes	-	-
Backup sanity test	Yes	-	Yes
Comparing with original data	Yes	-	Yes
Tunable verbosity	Yes	Yes	limited
Modify the backup's content	Yes	Yes	limited
Stdin/stdout backup read/write	Yes	-	Yes
Remote network storage	Yes	limited	Yes

The presented results above is a synthesis of the test logs

Robustness

The objective here is to see how a minor data corruption can impacts the backup. Such type of corruption (a single bit invertion) can be caused by network transfert, cosmic particle hitting the memory bank, or simply due to the time passing stored on a particular medium. In real life data corruption may impact more than one bit, right. But if the ability to workaround the corruption of a single bit does not bring any information about the ability to recover larger volume of data corruption, the inability to recover a single bit, is enough to know that the same software will behave even worse when larger portion of data corruption will be met.

Behavior	Dar	Rsync	Tar alone	Tar + gzip
Detects backup corruption	Yes	-	-	Yes
Warn or avoid restoring corrupted data	Yes	-	-	Yes
Able to restore all files not concerned by the corruption	Yes	Yes	Yes	-

To protect your data, you can go one step further computing data redundancy with Parchive on top of your backup or archives. This will allow you to repair them in case of corruption.

Though, rsync is not adapted to that process as creating a global redundancy of a directory tree is much more complex and error-prone. At the opposite, tar and dar are pretty well adapted as a backup may be a single file or a few big files if using slicing or multi-volume backup.
Second, whatever is the redundancy level you select, if the data corruption exceed this level, you will not be able to repair your backups and archives. Thus, better relying on a robust and redundant backup file structure, and here dar has some big advantages.
Last, if execution time is important for you, having a sliced backup with a slice size smaller than the available RAM and running Parchive right after each slice created, will save a lot of disk I/O and can speed up the overall process by more than 40%. But here too, only dar provides this possibility.

The presented results above is a synthesis of the test logs.

Performance

In the following, we have distinguished two purposes of backup tools: the "identical" copy of a set of files and directories (short term operation) and the usual backup operation (long term storage and historization).

Performance of file copy operation

The performance aspect to consider for this target is exclusively the execution speed, this may imply data reduction on the wire only if the bandwidth is low enough for the compression processing time added does not ruine the gain on transfer time. Compression time is not dependent on the backup tool but on the data, and we will see in the backup performances tests, the way the different backup tools do reduce data on the wire. For the execution time we get the following results:

Single huge file

The copied data was a Linux distro installation ISO file

cp: 2.58 s

Dar: 9.18 s

Rsync: 15.28 s

Tar: 6.51 s

Linux system

The copied data was a fresh fully featured Linux installed system

cp: 5.15 s

Dar: 16.78 s

Rsync: 16.59 s

Tar: 8.04 s

Conclusion

for local copy cp is the fastest but totally unusable for remote copy. At first sight one could think tar would be the best alternative for remote copy, but that would not take into account the fact you will probably want to use secured connection (unless all segments of the underlying network are physically yours, end to end). Thus once the backup will be generated, using tar will require an extra user operation, extra computing time to cipher/decipher and time to transfer the data while both alternatives, rsync and dar, have it integrated: they can copy and transfer at the same time, with both the gain of time and the absence of added operations for the user.

In consequence, for remote copy, if this is for a unique/single remote copy, dar will be faster than rsync most of the time (even when using compression to cope with low bandwidth, see the backup test results, below). But for recurring remote copy even if rsync is not faster that dar, it has the advantage of being designed espetially for this task as in that context we do not need to store the data compressed nor ciphered. Things we can summarize as follows:

Operation	Best Choice	Alternative
Local copy	cp	tar
One-time remote copy	dar	rsync
recurrent remote copy	rsync	dar

See the corresponding test logs for more details

Performance of backup operation

For backup we consider the following criteria by order of importance:

data reduction on backup storage
data reduction when transmitted over the network
execution time to restore a few files
execution time to restore a full and differential backups
execution time to create a full and differential backups

Why this order?

Because usually backup creation is done at low priority in background and on a day to day basis, the execution time is less important than reducing the storage usage: reducing storage usage gives longer backup history and increases the ability to recover accidentically removed files much later after the mistake has been done (which may be detected weeks or months afterward).
Next, while your backup storage can be anything, including low cost or high end dedicated one, we see more and more frequently externalized backups, which main declinaison is based on public cloud storage, leading to relatively cheap disaster recovery solution. However, your WAN/Internet acces will be drained by the backup volumes flying away and you probably don't want them to consume too much of this bandwidth which could slow down your business or Internet access. As a workaround, one could rate-limit the bandwidth for backup exchanges only. But doing so will extend the backup transfer time so much that you may have to reduce the backup frequency to not have two backups transfered at the same time. This would lead you to lose accuracy of saved data: A too low backup frequency will only allow you to restore your systems in the state they had several days instead of several hours or several tens of minutes, before the disaster occured. For that reason data reduction on the wire is the second criterium. Note that data reduction on storage usually implies data reduction on the wire, but the opposite is not always true, depending on the backup tool used.
Next, it is much more frequent to have to restore a few files (corrupted or deleted by mistake) and we need this to be quick because this is an interactive operation and that the missing data is mandatory to go forward for one's work, which workflow may impact several other persons.
The least frequent operation (hopefully) is the restoration of a whole system in case of disaster. Having it performing quick is of course important, but less than having a complete, robust, accurate and recent backup somewhere, that you can count on to restore your systems in the most recent possible state.

Note that the following result do not take into account the performance penalty implied by the network latency. Several reasons to that:

it would not measure the software performance but the network bandwidth and latency which is not the object of this benchmark and may vary with distance, link layer technology and number of devices crossed,
We can assume the network penalty to be proportional to data processed by each software, as all protocol used are usually TCP based (ftp, sftp, scp, ssh, ...), which performance is related to the operating system parameters (window size, MTU, etc.) not to the backup software itself. As we only rely on tmpfs filesystems for this benchmark to avoid mesuring the disk I/O performance, we may approximate that a network latency increase or a reduction of network bandwidth would just inflate the relative execution time of the different tested softwares in a linear manner. In other words, adding network between system and backup storage should thus not modify the relative performances of the softwares under test.

For all the backup performance tests that follow (but not for file copy performance tests seen above), compression has been activated using the same and most commonly supported algorithm: gzip at level 6. Other algorithms may complete faster or provide better compression ratio, but this is linked to chosen compression algorithm and data to compress, not to the backup tools tested here.

Data reduction on backup storage

Full backup

Dar: 1580562224 bytes

Dar+sparse: 1578428790 bytes

Dar+sparse+binary delta: 1602481058 bytes

Rsync: 4136318307 bytes

Rsync+sparse: 4136318307 bytes

tar: 1549799048 bytes

tar+sparse: 1549577862 bytes

Differential backup

Dar: 49498524 bytes

Dar+sparse: 49505251 bytes

Dar+sparse+binary delta: 23883368 bytes

Rsync: not supported

Rsync+sparse: not supported

tar: 44607904 bytes

tar+sparse: 44604194 bytes

Full + Differential backup

This is a extrapolation of the required volume for backup, after one week of daily backup of the Linux system under test, assuming the activity is as minimal each day as it was here between the initial day of the full backup and the day of the first differential backup (a few package upgrade and no user activity).

Dar: 1927051892 bytes

Dar+sparse: 1924965547 bytes

Dar+sparse+binary delta: 1769664634 bytes

Rsync: not supported

Rsync+sparse: not supported

tar: 1862054376 bytes

tar+sparse: 1861807220 bytes

This previous results concerns the backup of a steady Linux system, relative difference of data reduction might favorize both rsync and dar+binary delta when the proportion of large files being slightly modified increases (like mailboxe files).

Data reduction over network

Full backup

Dar: 1580562224 bytes

Dar+sparse: 1578428790 bytes

Dar+sparse+binary delta: 1602481058 bytes

Rsync: 1587714486 bytes

Rsync+sparse: 1587714474 bytes

tar: 1549799048 bytes

tar+sparse: 1549577862 bytes

Differential backup

Dar: 49498524 bytes

Dar+sparse: 49505251 bytes

Dar+sparse+binary delta: 23883368 bytes

Rsync: 29293958 bytes

Rsync+sparse: 29293958 bytes

tar: 44607904 bytes

tar+sparse: 44604194 bytes

Full + Differential backup

This is the same extrapolation done above (one week of daily backup), but for the volume of data transmitted over the network instead of the backup volume on storage.

Dar: 1927051892 bytes

Dar+sparse: 1924965547 bytes

Dar+sparse+binary delta: 1769664634 bytes

Rsync: 1792772192 bytes

Rsync+sparse: 1792772180 bytes

tar: 1862054376 bytes

tar+sparse: 1861807220 bytes

Execution time to restore a few files

Dar: 0.98 s

Dar+sparse: 1.13 s

Dar+sparse+binary delta: 1.27 s

Rsync: 3 ms

Rsync+sparse: 3 ms

tar: 25.15 s

tar+sparse: 25 s

Here the phenomenum is even more important when the file to restore is located near the end of the tar backup, as tar sequentially reads the whole backup up to the requested file.

Execution time to restore a whole system - full backup

Dar: 22.94 s

Dar+sparse: 30.36 s

Dar+sparse+binary delta: 30.35 s

Rsync: 157.81 s

Rsync+sparse: 158.39 s

tar: 26.72 s

tar+sparse: 26.27 s

Execution time to restore a single differential backup

Dar: 3.48 s

Dar+sparse: 3.48 s

Dar+sparse+binary delta: 3.44 s

Rsync: not supported

Rsync+sparse: not supported

tar: 1.48 s

tar+sparse: 1.5 s

Execution time to restore a whole system - full + differential backup

We use here the same extrapolation of a week of daily backup done above: the first backup being a full backup and differential/incremental backups done the next days.

Clarifying the terms used: the differential backup saves only what has changed since the full backup was made. The consequence is that each day the backup is slightlty bigger to process, depending on the way data changed (if all files change every day, like mailboxes, user files, ...) each new differential backup will have the same size and take the same processing time to complete. At the opposite, if new data is added each day, the differential backup size will be each day the sum of the incremental backups that could be done instead since the full backup was made.

At the difference of the differential backup, the incremental backup saves only what has changed since the last backup (full or incremental). For constant activity like the steady Linux system we used here, the incremental backup size should be the same along the time (and equivalent to the size of the first differential backup), thus the extrapolation is easy and not questionable: the restoration time is the time to restore the full and the time to restore the first differential backup times the number of days that passed.

Execution time to restore a whole system - lower bound

The lower bound, is the sum of the execution time of the restoration of the full backup and one differential backup seen just above. It corresponds the minimum execution time restoring a whole system from full+differnential backup.

Dar: 26.42 s

Dar+sparse: 33.84 s

Dar+sparse+binary delta: 33.79 s

Rsync: full backup only 157.81 s

Rsync+sparse: full backup only 158.39

tar: 28.2 s

tar+sparse: 27.77 s

Execution time to restore a whole system - higher bound

The higher bound, is the sum of the execution time of the restoration plus seven times the execution time of the differential backup. It corresponds the worse case scenario where each day new data is added (still using a steady Linux system with constant activity). It also corresponds the scenario of restoring a whole system from a full+incremental backups (7 incremental backup have to be restored, in that week span scenario):

Dar: 47.3 s

Dar+sparse: 54.72 s

Dar+sparse+binary delta: 54.43 s

Rsync: full backup only 157.81 s

Rsync+sparse: full backup only 158.39

tar: 37.08 s

tar+sparse: 36.77 s

Execution time to create a backup

Dar: 149.73 s

Dar+sparse: 157.99 s

Dar+sparse+binary delta: 162.62 s

Rsync: 156.98 s

Rsync+sparse: 183.44 s

tar: 148.59 s

tar+sparse: 149.38 s

Ciphering/deciphering performance

There is several reasons that implies the need of ciphering data:

if your disk is ciphered, would you store your backup in clear on the cloud?
do you trust your cloud provider to not inspect your data for marketing profiling?
Are you sure your patented data, secret industrial recipies will not be used by competition?
and so on

The ciphering execution time is independent on the nature of the backup, full or differential, compressed or not. To evaluate the ciphering performance we will use the same data sets as previously, both compressed and uncompressed. However not all software under test are able to cipher the resulting backup. rsync is not able to do so.

Full backup+restoration execution time

Dar: 9.13 s

Rsync: N/A

Tar (openssl): 7.39 s

Execution time for the restoration of a single file

Dar: 0.42 s

Rsync: N/A

Tar (openssl): 1.79 s

Storage requirement ciphered without compression

Dar: 1.46 GiB

Rsync: N/A

Tar (openssl): 1.49 GiB

See the corresponding test logs for more details.

Conclusion

So far we have measured different perfomance aspects, evaluated available features, tested backup robusness and observed backup exhaustivity of the different backup softwares under test. This gives a lot of information already summarized above. But it would still not be of a great use to anyone reading this document (espetially the one jumping to its conclusion ;^) ) so we have to get back to use cases and their respective requirements to obtain the essential oil drop anyone can use immediately:

Criteria for the different use cases

Use Cases	Key Point	Optional interesting features
Local directory copy	execution speed	completness of copied data and metadata
remote directory copy - wide network	execution speed	completness of copied data and metadata on wire ciphering
remote directory copy - narrow network	execution speed data reduction on wire	completness of copied data and metadata on wire ciphering
Full backups only	completness of backed up data and metadata data reduction on storage	fast restoration of a few files fast restoration of a whole backup
full+diff/incr. backup	completness of backed up data and metadata data reduced on storage	fast restoration of a few files fast restoration of a whole backup managing tool of backups rotation
Archiving of private data	data reduction on storage robustness of the archive	ciphering redundancy data
Archiving of public data	data reduction on storage robustness of the archive	signing fast decompression algorithm
Private data exchange over Internet	data reduction over the network asymmetric encryption and signing	redundancy data multi-volume backup/archive integrated network protocols in backup tool
Public data exchange over Internet	data reduction over the network	hashing sigining integrated network protocols in backup tool

Complementary criteria depending on the storage type

And depending on the target storage, the following adds on top:

Use Cases	Key Point	Optional interesting features
Local disk	execution speed	hashing
Data stored on private NAS	data reduction on storage	multi-volume backup integrated network protocols in backup tool ciphering
Data stored on public cloud	data reduction on storage and on wire ciphering	multi-volumes backup integrated network protocols in backup tool
Data stored on removable media (incl. tapes)	multi-volume backup data reduction on storage on-fly hashing	ciphering redundancy data

Essential oil drop

In summary, putting in front of these requirements the different measures we did:

exhasitivity of backed up data
available features around backup
backup robustness facing to media corruption
overall performance

We can summarize the best software to put in front of each particular use case:

Use Cases	Local disk storage	Private NAS	Public Cloud	Removable media
Local directory copy	cp dar not the fastest rsync not the fastest tar not the fastest	-	-	-
One time remote directory copy	-	dar rsyncnot the fastest tarno network protocol embedded	dar rsyncnot the fastest tarno network protocol embedded	dar rsyncnot the fastest tarno network protocol embedded
Recurrent remote directory copy	-	darfastest but automation is a bit less straight forward than using rsync rsync tarno network protocol embedded	darfastest but automation is a bit less straight forward than using rsync rsync tarno network protocol embedded	darfastest but automation is a bit less straight forward than using rsync rsync tarno network protocol embedded
Full backups only (private data)	darhas the advantage to provide long historization of backups rsyncno data reduction on storage, slow to restore a whole filesystem tarnot saving all file attributes and inode types, slow to restore a few files	dar rsyncno data reduction on storage tarnot saving all file attributes and inode types, slow to restore a few files, no network protocol embedded	dar rsyncno data ciphering and no reduction on storage tarnot embedded ciphering, not the strongest data encryption, not saving all file attributes and inode types, slow to restore a few files, no network protocol embedded	dar rsyncno multi-volume support, no data ciphering and no reduction on storage tarcompression and multi-volume are not supported at the same time, not saving all file attributes and inode types, not embedded ciphering, not the strongest data encryption
full+diff/incr. backups (priate data)	dar rsyncdifferential backup not supported, full backup is overwritten tarnot saving all file attributes and inode types, slow to restore a few files	dar rsyncdifferential backup not supported, full backup is overwritten tarnot saving all file attributes and inode types, slow to restore a few files, no network protocol embedded	dar rsyncdifferential backup not supported, full backup is overwritten tarnot embedded ciphering, not the strongest data encryption, not saving all file attributes and inode types, slow to restore a few files, no network protocol embedded	dar rsyncdifferential backup not supported, full backup is overwritten, no support for multi-volime, no data reduction, no ciphering tarcompression and multi-volume are not supported at the same time, not saving all file attributes and inode types, not embedded ciphering, not the strongest data encryption
Archiving of private data	dar rsyncno data reduction on storage, no detection of data corruption, complex parity data addition tarno detection of data corruption or loss of all data after the first corruption met	dar rsyncno data reduction, no detection of data corruption, complex parity data addition tarno detection of data corruption or loss of all data after the first corruption met	dar rsyncno ciphering, no data reduction, no detection of data corruption, complex parity data addition tarno detection of data corruption or loss of all data after the first corruption met, no embedded ciphering, no protection against plain-text attack	dar rsyncno data reduction, no multi-volume, no ciphering, no detection of data corruption, complex parity data addition tarcompression and multi-volume are not supported at the same time, no detection of data corruption or loss of all data after the first corruption met, no ciphering
Archiving of public data	darmost robust format but not as standard as tar's rsyncno reduction on storage tar	darmost robust archive format but not as standard as tar's rsyncno reduction on storage, complicated to download a directory tree and files from other protocols than rsync tar	darmost robust archive format but not as standard as tar rsyncno reduction on storage, complicated to download a directory tree and files from other protocols than rsync tar	dar rsyncno reduction on storage, no multi-volume, no detection of data corruption, complex parity data addition tarcompression and multi-volume are not supported at the same time
Private data exchange over Internet	dar rsyncnot the best data reduction over the network tarbest data reduction on network but no embedded ciphering, no integrated network protocols	dar rsyncno data reduction on storage, not the best data reduction over the network tarbest data reduction on network, but lack of embedded ciphering, lack of integrated network protocols	dar rsyncno ciphering and no data reduction on storage tarno embedded ciphering, no integrated network protocols, no protection against plain-text attack, only old KDF functions supported, complex and error prone use of openssl to cipher the archive	-
Public data exchange over Internet	darnot the best data reduction over the network rsyncnot the best data reduction over the network tar	darnot the best data reduction over the network rsyncno data reduction on storage, not the best data reduction over the network tar	darnot the best data reduction over the network rsyncno data reduction on storage, not the best data reduction over the network tar	-

In each cell of the previous table, the different softwares are listed in alphabetical order, they get colorized according to the following code:

Color codes	best solution	good solution	not optimal	not adapted

Hovering the mouse on a particular item gives more details about the reason it has not been selected as the best solution for a particular need.