DAR's Documentation

Benchmarking backup tools

Introduction

This document has for objective to compare common backup tools under Unix (Linux, FreeBSD, MACOS X...), among the most commonly available today.

Depending on the targets we may need compression and/or ciphering inside backup, but also denpending on the context (public cloud storage, removable media, ...), limited storage space.

Backup softwares that requires servers already running on the local network (For examples Bacula, Amanda, Bareos, UrBackup, Burp...) cannot address our second target as we would have first to reconstruct such server in case of disaster (from what then?) in order be able to restore our system and its data. They are over complex for the first target and are not suitable for the third.

Partition cloning systems (clonezilla, MondoRescue, RescueZilla, partclone, dump and consorts) are targetted at block copy and as such cannot backup a live system: you have to shutdown and boot on a CD/USB key or run in single user-mode in order to "backup". This cannot be automated and has a strong impact on the user as she/he has to interrupt her/his work during the whole backup operation.

Looking at the remaining backup tools, with or without Graphical User Interface, most of them rely on one of the three backend softwares, tar, rsync and dar:

We will thus compare these three softwares for the different test famillies described below.

Tests Famillies

Several aspects are to be considered:

  • completness of the restoration: file permissions, dates precision, hardlinks, file attributes, Extended Attributes, sparse files...
  • main features around backup: differential backup, snapshot, deduplication, compression, encrytion, file's history...
  • robustness of the backup: how data corruption impact the backup, how it is reported...
  • execution performance: execution time, memory consumption, multi-threading support...
  • Benchmark Results

    The results presented here are a synthesis of the test logs. This synthesis is in turn summarized one step further in conclusion of this document.

    Completness of backup and restoration

    Software plain file symlink hardlinked files hardlinked sockets hardlinked pipes user group perm. ACL Extended Attributes FS Attributes atime mtime ctime btime Spares File Disk usage optimization
    Dar yes yes yes yes yes yes yes yes yes yes yes yes yes - yes(1) yes yes
    Rsync yes yes yes yes yes yes yes yes yes(4) yes(5) - - yes - yes(1) yes(6) yes(6)
    Tar yes yes yes - (2) - yes yes yes yes(7) yes(8) - - yes(3) - yes(1) yes(6) -

    See the test logs for all the details.

    Feature set

    In addition to the exhaustivity of the restored data (seen above), several features are a must have when creating backups. Their description and what they bring to a backup process is given below, followed by a table of how they are supported on the different softwares under test:

    Historization
    Historization is the ability to restore a deleted file even long after the mistake has been made by rotating backups over an arbitrary large number of backup set. Having associated tools to quickly locate the backup where resides a particular file's version becomes important when the history increases. Historization can be done with only full backups, but of course better leverages differential and incremental backups.

    Data filtering
    Not all files need to be saved:
    • some directories (like /tmp, /proc, /sys, /dev, /home/*/.cache) are useless to save
    • some files based on their name or part of their name --- their extension for example, (like emacs's backup files *~ or your music files*.mp3 you already have archives somewhere, and so on) need not to be saved neither.
    • You may wish to ignore files located one or more particular mounted filesystem, or at the opposite, only consider certains volume/disk/mounted filesystem and ignore all others, and have different backup rotation cycles for those.
    • You may also find better to tag files one by one (manually by mean of an automated process of your own), to be excluded from or included in the backup
    • Instead of tagging you could also let a process define a long file listing to backup and/or to ignore.
    • Last, you may well need a mix of several of these mechanisms at the same time

    Slicing (or multi-volume)
    Having a backup split into several files of given max size can address several needs:
    • hold the backup on several removal media (CD, DVD, USB keys...) smaller than the backup itself
    • transfer the backup from a large space to another by mean of a smaller removable media
    • transfer the backup over the network and recover at the last transmitted slice rather than restarting the whole transfer in case of network issue
    • store the backup int the cloud where the provider limits the file size
    • be able to restore a backup on a system where storage space cannot hold both the backup and the restored system
    • transfer back from the cloud only a few slices to restore some files, when cloud provider does not provide adhoc protocols (sftp, ftp, ...) but only a user web based interface
    Of course, multi-volume is really interesting if you don't have to concatenate all the slices to be able to have a usable backup.

    Last the previously identified use cases for backup slicing turn around limited storage space, thus having compression available when multi-volume is used is a key point here.

    Symmetric strong encryption
    Symmetric strong encryption is the ability to cipher a backup with a password or passphrase and use that same key to decipher it. Some well known algorithms in this area are AES, blowfish, camellia...
    Symmetric strong encryption is interesting for the following cases:
    • if your disk is ciphered, would you store your backup in clear on the cloud?
    • you do not trust your cloud provider to not inspect your data and make marketing profile of yourself with it.
    • You want to prevent your patented data or industrial secret recipies from falling into the competition's hands or goverment agencies that could clone it without fear of being prosecuted. This use case applies whether your backup is stored on local disk, removable media or public cloud.
    • Simply because in your country, you have the right and the freedom to have privacy.
    • Because your today democratic country could tomorrow verse into a dictatorship and based on some arbitrary criteria, (belief, political opinion, sexual orientation...) you could suffer tomorrow from having this information having been accessible today to the authorities or even having been publicly released, while you still need backup using arbitrary storage medium.

    Asymmetric strong encryption
    Asymmetrical strong encryption is the ability to cipher a backup with a public key and having the corresponding private key for deciphering it (PGP, GnuPG...).
    Asymmetric encrypion is mainly interesting when exchanging data over Internet between different persons, or eventually for archiving data in the public cloud. Having it for backup seems not appropriate and is more complex than symmetric strong encryption, as restoration requires the private key, which thus must be stored outside the backup itself still be protected from unauthorized access. The private key use can still be protected with a password or a passphrase but this gives the same feature level as symmetrical encryption with a more complex process and not much more security.

    Protection against plain-text attack
    Ciphering data must be done with a minimum level of security, in particular when the ciphered data has well defined structure and patterns, like a backup file format is expected to have. Knowing such expected structure of the clear data may lead an attacker to undisclose the whole ciphered data. This is known as plain-text attack.

    Key derivation function
    • Using the same password/passphrase for different backups is convenient but not secure. Having a key derivation function using a salt let you use the same password/passphrase while the data will be encrypted with a different key each time, this is the role of the Key Derivation Function (KDF) (PKCS5/PBKDF2, Argon2...).
    • Another need for a KDF is that usually the human provided password/passphrase are weak: Even when we use letters, digits and some special characters, passwords and passphrases are still located in a small area of possible keys that a dictionnary attack can leverage. As the KDF is also by design CPU intensive, it costs a lot of effort and time to an attacker to derive each word of a dictionnary to its resulting KDF transformed words. The required time to perform a dictionnary attack can thus be multiplied by several hundred thousand times, leading to an effective time of tens of years and even centuries rather than hours or days.

    File change detection
    When backing up a live system, it is important to detect, retry saving or flag files that changed during the time they were read for backup. In such situation, the backed file could be recorded in a state it never had: As the backup process reads sequentially from the beginning to the end, if a modification A is done at the end of file then a modification B is made at its beginning during this file's backup, the backup may contain B and not A while at not time the file contained B without A. Seen the short time a file can be read, time accuracy of micro or nanoseconds is mandatory to detect such file change during a backup process, else you will screw up your data in the backup and have nothing to rely on in the occurence of a deleted file by mistake, disk crash or disaster.
    At restoration time, if the file has been saved anyway, it should be good to know the such file was not saved properly, maybe restoring a older version but a sane one would be better. Something the user/sysadmin cannot guess if the backup does not hold such type of information.

    Multi-level backup
    Multi-level backup is the ability to make use of full backups, differential backups and/or eventually incremental backups.
    The advantage of differential and incremental backups compared to full ones is the much shorter time they require to complete and the reduces storage space and/or bandwidth they imply when transfered over the network.

    Binary delta
    Without binary delta, when performing a differential or incremental backup, if a file has changed since the previous backup, it will be resaved entirely. Some huge files made by some well know applications (mailboxes for example) would consume a lot of storage space and lead to a long backup time even when performing incremental or differential backups. Binary delta is the ability to only store the part of a file that changed since a reference state, this lead to important space gain and reduction of the backup duration.

    Detecting suspicious modifications
    When performing a backup based on a previous one (differential, incremental, decremental backups), it is possible to check the way the metadata of saved files have changed until then and warn the user when some uncommon pattens are met. Those may be the trace of a rootkit, virus, ransomware or trojan, trying to hide its presence and activities.

    Snapshot
    A snapshot is like a differential backup made right after the full backup (no file has changed): it is a minimal set of information that can be used to:
    • create an incremental or differential backup without having the full backup around or more generally the backup of reference: When backup are stored remotely, snapshot is a must.
    • compare the current living filesystem with a status it had at the time the snapshot was made
    • bring some metadata redundancy and repairing mean to face a corrupted backup

    On-fly hashing
    On-fly hashing is the ability to generate a hashing of the backup at the same time it is generated and before it is written to storage. Such hash can be used to:
    • validate a backup has been properly transfered to a public storage cloud having hash computation done in parallel
    • check that no data corruption has occured (doubt about disk or memory) even when the backup is written to local disk
    Hashing validation is usually faster than backup testing or backup comparison, though it does not validate your ability to rely on the backup as deeply as these later operations. Hashing can be made after the backup has been completed but it will need to re-read the whole backup and you will have to wait for the necessary storage I/O for the operation to complete. On-fly hashing should leverage the fact the data is in memory so it saves the corresponding disk I/O and corresponding latency, thus it is much faster. As it is also done in memory it can help detect file corruption on the backup destination media (like USB keys or poor quality hardware).

    Run custom command during operation
    For an automated backup process, it is often necessary to run commands before and after the backup operation itself. But also during the backup process. For example, when entering a directory, one could need to run an arbitrary command generating a file that will be included in the backup. Or while exiting such directory performing some cleanup operation in that same directory. Another use case is found when slicing the backup, by the ability to perform after each slice generation a custom operation like uploading the slice to cloud, burning to DVD-/+RW, loading a tape from a tape library...

    Dry-run execution
    When tuning a backup process, it is often necessary to verify quickly that all will work flawlessly without having to wait for a backup to complete, consume storage resource and network bandwidth.

    User message within backup
    Allowing the user to add an arbitrary message within the backup may be useful when the filename is too small to hold the needed information (like the context the backup or archive was made, hint for the passphrase... and so on).

    Backup sanity test
    It is crutial in a backup process to validate that the generated backup is usable. There are many reasons it could not be the case, from a data corruption in memory, on disk or over the network ; a disk space saturation leading to truncated backup, down to a software bug.

    Comparing with original data
    One step further for backup and archiving validation is compairing file content and metadata with the system it has.

    Tunable verbosity
    When a backup process is in production and works nicely, it is usually interesting to have the minimal output possible for that any error still be possible to log. While when setting up a backup process, having more detailed information is required to understand and validate that the backup process follows the expected path.

    Modify the backup's content
    Once a backup has been completed, you might notice that you have saved extra files you ought not to save. Being able to drop them from the backup to save some space without having to restart the whole backup may lead to a huge time saving.

    You might also need to add some extra files that were outside the backup scope, having the possibility to add them without restarting the whole backup process may also lead to a huge time saving.

    Stdin/stdout backup read/write
    Having the ability to pipe the generated backup to an arbitrary command is on of the ultimate key of backup software flexibility.

    Remote network storage
    This is the ability to produce directly a backup to a network storage without using local disk, and to be able to restore directly reading a backup from the such remote storage still without using local storage. Network/Remote storage is to be understood as remote network storage like public cloud, private cloud, personal NAS... that are accesible from the network by mean of a file transfer protocols (scp, sftp, ftp, rcp, http, https...)

    Feature Dar Rsync Tar
    Historization Yes - Yes
    Data filtering by directory Yes Yes Yes
    Data filtering by filename Yes Yes limited
    Data filtering by filesystem Yes limited limited
    Data filtering by tag limited - -
    Data filtering by files listing Yes yes limited
    Slicing/multi-volume Yes - limited
    Symmetric encryption Yes - Yes
    Asymmetric encryption Yes - Yes
    Plain-text attack protection Yes - -
    PBKDF2 Key Derivation Function Yes - -
    ARGON2 Key Derivation Function Yes - -
    File change detection Yes - limited
    Multi-level backup Yes - Yes
    Binary delta Yes Yes -
    Detecting suspicious modifications Yes - -
    Snapshot for diff/incr. backup Yes - Yes
    Snapshot for comparing Yes - -
    Snapshot for redundancy Yes - -
    On-fly hashing Yes - -
    Run custom command during operation Yes - limited
    Dry-run execution Yes Yes -
    User message within backup Yes - -
    Backup sanity test Yes - Yes
    Comparing with original data Yes - Yes
    Tunable verbosity Yes Yes limited
    Modify the backup's content Yes Yes limited
    Stdin/stdout backup read/write Yes - Yes
    Remote network storage Yes limited Yes

    The presented results above is a synthesis of the test logs

    Robustness

    The objective here is to see how a minor data corruption can impacts the backup. Such type of corruption (a single bit invertion) can be caused by network transfert, cosmic particle hitting the memory bank, or simply due to the time passing stored on a particular medium. In real life data corruption may impact more than one bit, right. But if the ability to workaround the corruption of a single bit does not bring any information about the ability to recover larger volume of data corruption, the inability to recover a single bit, is enough to know that the same software will behave even worse when larger portion of data corruption will be met.

    Behavior Dar Rsync Tar alone Tar + gzip
    Detects backup corruption Yes - - Yes
    Warn or avoid restoring corrupted data Yes - - Yes
    Able to restore all files not concerned by the corruption Yes Yes Yes -

    To protect your data, you can go one step further computing data redundancy with Parchive on top of your backup or archives. This will allow you to repair them in case of corruption.

    The presented results above is a synthesis of the test logs.

    Performance

    In the following, we have distinguished two purposes of backup tools: the "identical" copy of a set of files and directories (short term operation) and the usual backup operation (long term storage and historization).

    Performance of file copy operation

    The performance aspect to consider for this target is exclusively the execution speed, this may imply data reduction on the wire only if the bandwidth is low enough for the compression processing time added does not ruine the gain on transfer time. Compression time is not dependent on the backup tool but on the data, and we will see in the backup performances tests, the way the different backup tools do reduce data on the wire. For the execution time we get the following results:

    Single huge file

    The copied data was a Linux distro installation ISO file

    cp: 2.58 s
    Dar: 9.18 s
    Rsync: 15.28 s
    Tar: 6.51 s
    Linux system

    The copied data was a fresh fully featured Linux installed system

    cp: 5.15 s
    Dar: 16.78 s
    Rsync: 16.59 s
    Tar: 8.04 s
    Conclusion

    for local copy cp is the fastest but totally unusable for remote copy. At first sight one could think tar would be the best alternative for remote copy, but that would not take into account the fact you will probably want to use secured connection (unless all segments of the underlying network are physically yours, end to end). Thus once the backup will be generated, using tar will require an extra user operation, extra computing time to cipher/decipher and time to transfer the data while both alternatives, rsync and dar, have it integrated: they can copy and transfer at the same time, with both the gain of time and the absence of added operations for the user.

    In consequence, for remote copy, if this is for a unique/single remote copy, dar will be faster than rsync most of the time (even when using compression to cope with low bandwidth, see the backup test results, below). But for recurring remote copy even if rsync is not faster that dar, it has the advantage of being designed espetially for this task as in that context we do not need to store the data compressed nor ciphered. Things we can summarize as follows:

    Operation Best Choice Alternative
    Local copy cp tar
    One-time remote copy dar rsync
    recurrent remote copy rsync dar

    See the corresponding test logs for more details

    Performance of backup operation

    For backup we consider the following criteria by order of importance:

    1. data reduction on backup storage
    2. data reduction when transmitted over the network
    3. execution time to restore a few files
    4. execution time to restore a full and differential backups
    5. execution time to create a full and differential backups

    Why this order?

    Note that the following result do not take into account the performance penalty implied by the network latency. Several reasons to that:

    For all the backup performance tests that follow (but not for file copy performance tests seen above), compression has been activated using the same and most commonly supported algorithm: gzip at level 6. Other algorithms may complete faster or provide better compression ratio, but this is linked to chosen compression algorithm and data to compress, not to the backup tools tested here.

    Data reduction on backup storage

    Full backup
    Dar: 1580562224 bytes
    Dar+sparse: 1578428790 bytes
    Dar+sparse+binary delta: 1602481058 bytes
    Rsync: 4136318307 bytes
    Rsync+sparse: 4136318307 bytes
    tar: 1549799048 bytes
    tar+sparse: 1549577862 bytes
    Differential backup
    Dar: 49498524 bytes
    Dar+sparse: 49505251 bytes
    Dar+sparse+binary delta: 23883368 bytes
    Rsync: not supported
    Rsync+sparse: not supported
    tar: 44607904 bytes
    tar+sparse: 44604194 bytes
    Full + Differential backup

    This is a extrapolation of the required volume for backup, after one week of daily backup of the Linux system under test, assuming the activity is as minimal each day as it was here between the initial day of the full backup and the day of the first differential backup (a few package upgrade and no user activity).

    Dar: 1927051892 bytes
    Dar+sparse: 1924965547 bytes
    Dar+sparse+binary delta: 1769664634 bytes
    Rsync: not supported
    Rsync+sparse: not supported
    tar: 1862054376 bytes
    tar+sparse: 1861807220 bytes

    This previous results concerns the backup of a steady Linux system, relative difference of data reduction might favorize both rsync and dar+binary delta when the proportion of large files being slightly modified increases (like mailboxe files).

    Data reduction over network

    Full backup
    Dar: 1580562224 bytes
    Dar+sparse: 1578428790 bytes
    Dar+sparse+binary delta: 1602481058 bytes
    Rsync: 1587714486 bytes
    Rsync+sparse: 1587714474 bytes
    tar: 1549799048 bytes
    tar+sparse: 1549577862 bytes
    Differential backup
    Dar: 49498524 bytes
    Dar+sparse: 49505251 bytes
    Dar+sparse+binary delta: 23883368 bytes
    Rsync: 29293958 bytes
    Rsync+sparse: 29293958 bytes
    tar: 44607904 bytes
    tar+sparse: 44604194 bytes
    Full + Differential backup

    This is the same extrapolation done above (one week of daily backup), but for the volume of data transmitted over the network instead of the backup volume on storage.

    Dar: 1927051892 bytes
    Dar+sparse: 1924965547 bytes
    Dar+sparse+binary delta: 1769664634 bytes
    Rsync: 1792772192 bytes
    Rsync+sparse: 1792772180 bytes
    tar: 1862054376 bytes
    tar+sparse: 1861807220 bytes

    Execution time to restore a few files

    Dar: 0.98 s
    Dar+sparse: 1.13 s
    Dar+sparse+binary delta: 1.27 s
    Rsync: 3 ms
    Rsync+sparse: 3 ms
    tar: 25.15 s
    tar+sparse: 25 s

    Here the phenomenum is even more important when the file to restore is located near the end of the tar backup, as tar sequentially reads the whole backup up to the requested file.

    Execution time to restore a whole system - full backup

    Dar: 22.94 s
    Dar+sparse: 30.36 s
    Dar+sparse+binary delta: 30.35 s
    Rsync: 157.81 s
    Rsync+sparse: 158.39 s
    tar: 26.72 s
    tar+sparse: 26.27 s

    Execution time to restore a single differential backup

    Dar: 3.48 s
    Dar+sparse: 3.48 s
    Dar+sparse+binary delta: 3.44 s
    Rsync: not supported
    Rsync+sparse: not supported
    tar: 1.48 s
    tar+sparse: 1.5 s

    Execution time to restore a whole system - full + differential backup

    We use here the same extrapolation of a week of daily backup done above: the first backup being a full backup and differential/incremental backups done the next days.

    Clarifying the terms used: the differential backup saves only what has changed since the full backup was made. The consequence is that each day the backup is slightlty bigger to process, depending on the way data changed (if all files change every day, like mailboxes, user files, ...) each new differential backup will have the same size and take the same processing time to complete. At the opposite, if new data is added each day, the differential backup size will be each day the sum of the incremental backups that could be done instead since the full backup was made.

    At the difference of the differential backup, the incremental backup saves only what has changed since the last backup (full or incremental). For constant activity like the steady Linux system we used here, the incremental backup size should be the same along the time (and equivalent to the size of the first differential backup), thus the extrapolation is easy and not questionable: the restoration time is the time to restore the full and the time to restore the first differential backup times the number of days that passed.

    Execution time to restore a whole system - lower bound

    The lower bound, is the sum of the execution time of the restoration of the full backup and one differential backup seen just above. It corresponds the minimum execution time restoring a whole system from full+differnential backup.

    Dar: 26.42 s
    Dar+sparse: 33.84 s
    Dar+sparse+binary delta: 33.79 s
    Rsync: full backup only 157.81 s
    Rsync+sparse: full backup only 158.39
    tar: 28.2 s
    tar+sparse: 27.77 s

    Execution time to restore a whole system - higher bound

    The higher bound, is the sum of the execution time of the restoration plus seven times the execution time of the differential backup. It corresponds the worse case scenario where each day new data is added (still using a steady Linux system with constant activity). It also corresponds the scenario of restoring a whole system from a full+incremental backups (7 incremental backup have to be restored, in that week span scenario):

    Dar: 47.3 s
    Dar+sparse: 54.72 s
    Dar+sparse+binary delta: 54.43 s
    Rsync: full backup only 157.81 s
    Rsync+sparse: full backup only 158.39
    tar: 37.08 s
    tar+sparse: 36.77 s

    Execution time to create a backup

    Dar: 149.73 s
    Dar+sparse: 157.99 s
    Dar+sparse+binary delta: 162.62 s
    Rsync: 156.98 s
    Rsync+sparse: 183.44 s
    tar: 148.59 s
    tar+sparse: 149.38 s

    Ciphering/deciphering performance

    There is several reasons that implies the need of ciphering data:

    The ciphering execution time is independent on the nature of the backup, full or differential, compressed or not. To evaluate the ciphering performance we will use the same data sets as previously, both compressed and uncompressed. However not all software under test are able to cipher the resulting backup. rsync is not able to do so.

    Full backup+restoration execution time
    Dar: 9.13 s
    Rsync: N/A
    Tar (openssl): 7.39 s
    Execution time for the restoration of a single file
    Dar: 0.42 s
    Rsync: N/A
    Tar (openssl): 1.79 s
    Storage requirement ciphered without compression
    Dar: 1.46 GiB
    Rsync: N/A
    Tar (openssl): 1.49 GiB

    See the corresponding test logs for more details.

    Conclusion

    So far we have measured different perfomance aspects, evaluated available features, tested backup robusness and observed backup exhaustivity of the different backup softwares under test. This gives a lot of information already summarized above. But it would still not be of a great use to anyone reading this document (espetially the one jumping to its conclusion ;^) ) so we have to get back to use cases and their respective requirements to obtain the essential oil drop anyone can use immediately:

    Criteria for the different use cases

    Use Cases Key Point Optional interesting features
    Local directory copy
    • execution speed
    • completness of copied data and metadata
    remote directory copy - wide network
    • execution speed
    • completness of copied data and metadata
    • on wire ciphering
    remote directory copy - narrow network
    • execution speed
    • data reduction on wire
    • completness of copied data and metadata
    • on wire ciphering
    Full backups only
    • completness of backed up data and metadata
    • data reduction on storage
    • fast restoration of a few files
    • fast restoration of a whole backup
    full+diff/incr. backup
    • completness of backed up data and metadata
    • data reduced on storage
    • fast restoration of a few files
    • fast restoration of a whole backup
    • managing tool of backups rotation
    Archiving of private data
    • data reduction on storage
    • robustness of the archive
    • ciphering
    • redundancy data
    Archiving of public data
    • data reduction on storage
    • robustness of the archive
    • signing
    • fast decompression algorithm
    Private data exchange over Internet
    • data reduction over the network
    • asymmetric encryption and signing
    • redundancy data
    • multi-volume backup/archive
    • integrated network protocols in backup tool
    Public data exchange over Internet
    • data reduction over the network
    • hashing
    • sigining
    • integrated network protocols in backup tool

    Complementary criteria depending on the storage type

    And depending on the target storage, the following adds on top:

    Use Cases Key Point Optional interesting features
    Local disk
    • execution speed
    • hashing
    Data stored on private NAS
    • data reduction on storage
    • multi-volume backup
    • integrated network protocols in backup tool
    • ciphering
    Data stored on public cloud
    • data reduction on storage and on wire
    • ciphering
    • multi-volumes backup
    • integrated network protocols in backup tool
    Data stored on removable media (incl. tapes)
    • multi-volume backup
    • data reduction on storage
    • on-fly hashing
    • ciphering
    • redundancy data

    Essential oil drop

    In summary, putting in front of these requirements the different measures we did:

    We can summarize the best software to put in front of each particular use case:

    Use Cases Local disk storage Private NAS Public Cloud Removable media
    Local directory copy
    cp
    dar not the fastest
    rsync not the fastest
    tar not the fastest
    - - -
    One time remote directory copy -
    dar
    rsyncnot the fastest
    tarno network protocol embedded
    dar
    rsyncnot the fastest
    tarno network protocol embedded
    dar
    rsyncnot the fastest
    tarno network protocol embedded
    Recurrent remote directory copy -
    darfastest but automation is a bit less straight forward than using rsync
    rsync
    tarno network protocol embedded
    darfastest but automation is a bit less straight forward than using rsync
    rsync
    tarno network protocol embedded
    darfastest but automation is a bit less straight forward than using rsync
    rsync
    tarno network protocol embedded
    Full backups only
    (private data)
    darhas the advantage to provide long historization of backups
    rsyncno data reduction on storage, slow to restore a whole filesystem
    tarnot saving all file attributes and inode types, slow to restore a few files
    dar
    rsyncno data reduction on storage
    tarnot saving all file attributes and inode types, slow to restore a few files, no network protocol embedded
    dar
    rsyncno data ciphering and no reduction on storage
    tarnot embedded ciphering, not the strongest data encryption, not saving all file attributes and inode types, slow to restore a few files, no network protocol embedded
    dar
    rsyncno multi-volume support, no data ciphering and no reduction on storage
    tarcompression and multi-volume are not supported at the same time, not saving all file attributes and inode types, not embedded ciphering, not the strongest data encryption
    full+diff/incr. backups
    (priate data)
    dar
    rsyncdifferential backup not supported, full backup is overwritten
    tarnot saving all file attributes and inode types, slow to restore a few files
    dar
    rsyncdifferential backup not supported, full backup is overwritten
    tarnot saving all file attributes and inode types, slow to restore a few files, no network protocol embedded
    dar
    rsyncdifferential backup not supported, full backup is overwritten
    tarnot embedded ciphering, not the strongest data encryption, not saving all file attributes and inode types, slow to restore a few files, no network protocol embedded
    dar
    rsyncdifferential backup not supported, full backup is overwritten, no support for multi-volime, no data reduction, no ciphering
    tarcompression and multi-volume are not supported at the same time, not saving all file attributes and inode types, not embedded ciphering, not the strongest data encryption
    Archiving of private data
    dar
    rsyncno data reduction on storage, no detection of data corruption, complex parity data addition
    tarno detection of data corruption or loss of all data after the first corruption met
    dar
    rsyncno data reduction, no detection of data corruption, complex parity data addition
    tarno detection of data corruption or loss of all data after the first corruption met
    dar
    rsyncno ciphering, no data reduction, no detection of data corruption, complex parity data addition
    tarno detection of data corruption or loss of all data after the first corruption met, no embedded ciphering, no protection against plain-text attack
    dar
    rsyncno data reduction, no multi-volume, no ciphering, no detection of data corruption, complex parity data addition
    tarcompression and multi-volume are not supported at the same time, no detection of data corruption or loss of all data after the first corruption met, no ciphering
    Archiving of public data
    darmost robust format but not as standard as tar's
    rsyncno reduction on storage
    tar
    darmost robust archive format but not as standard as tar's
    rsyncno reduction on storage, complicated to download a directory tree and files from other protocols than rsync
    tar
    darmost robust archive format but not as standard as tar
    rsyncno reduction on storage, complicated to download a directory tree and files from other protocols than rsync
    tar
    dar
    rsyncno reduction on storage, no multi-volume, no detection of data corruption, complex parity data addition
    tarcompression and multi-volume are not supported at the same time
    Private data exchange over Internet
    dar
    rsyncnot the best data reduction over the network
    tarbest data reduction on network but no embedded ciphering, no integrated network protocols
    dar
    rsyncno data reduction on storage, not the best data reduction over the network
    tarbest data reduction on network, but lack of embedded ciphering, lack of integrated network protocols
    dar
    rsyncno ciphering and no data reduction on storage
    tarno embedded ciphering, no integrated network protocols, no protection against plain-text attack, only old KDF functions supported, complex and error prone use of openssl to cipher the archive
    -
    Public data exchange over Internet
    darnot the best data reduction over the network
    rsyncnot the best data reduction over the network
    tar
    darnot the best data reduction over the network
    rsyncno data reduction on storage, not the best data reduction over the network
    tar
    darnot the best data reduction over the network
    rsyncno data reduction on storage, not the best data reduction over the network
    tar
    -

    In each cell of the previous table, the different softwares are listed in alphabetical order, they get colorized according to the following code:

    Color codes
    best solution
    good solution
    not optimal
    not adapted

    Hovering the mouse on a particular item gives more details about the reason it has not been selected as the best solution for a particular need.