Dar Documentation

DAR's FEATURES

This table lists the main features of dar/libdar tool. For each feature an overview is presented with some pointers you are welcome to follow for a more detailed information.

HARD LINK CONSIDERATION

hard links are properly saved in any case and properly restored if possible. For example, if restoring across a mounted file system, hard linking will fail, but dar will then duplicate the inode and file contents, issuing a warning. Hard link support includes the following inode types: plain files, char devices, block devices, symlinks (Yes, you can hard link symbolic links! Thanks to Wesley Leggette for the info ;-) )



SPARSE FILES references: man dar

--sparse-file-min-size, -ah

By default Dar takes care of sparse files, even if the underlying filesystem does not support sparse files(!).

When a long sequence of zeroed bytes is met in a file during backup, those are not stored into the backup file but the number of zeroed bytes is stored instead (structure known as a "hole"). When comes the time to restore that file, dar restores the normal data but when a hole is met in the backup, dar directly skips at the position of the data following that hole. If the underlying filesystem supports sparse files, this will (re)create a hole in the restored file, making a sparse file.

Sparse files can report to be several hundred gigabytes large while they need only a few bytes of disk space. Not being able to properly save and restore them can lead to storage waste to hold backups, but also to the impossibility to restore your data on a disk of the same size.



EXTENDED ATTRIBUTES (EA) references: man dar
keywords: -u -U -am -ae --alter=list-ea

Dar is able to save and restore EA, all or just those matching a given pattern.

File Forks (MacOS X) are implemented over EA as well as Linux's ACL, they are thus transparently saved, tested, compared and restored by dar. Note that ACL under MacOS seem to not rely on EA, thus while they are marginally used they are ignored by dar



FILESYSTEM SPECIFIC ATTRIBUTES (FSA) references: man dar
keyword: --fsa-family

Since release 2.5.0 dar is able to take care of filesystem specific attributes. Those are grouped by family strongly linked to the filesystem they have been read from, but perpendicularly each FSA is designated also by a function. This way it is possible to translate FSA from a filesystem into another filesystem when there is a equivalency in role.

currently two families are present:

  • HFS+ family contains only one function : the birthtime. In addition to ctime, mtime and atime, dar can backup, compare and restore all four dates of a given inode (well, ctime is not possible to restore).
  • extX family contains 12 functions (append_only, compressed, no_dump, immutable, journaling, secure_deletion, no_tail_merging, undeletable, noatime_update, synchronous_directory, synchronous_update, top_of_dir_hierarchy) found on ext2/3/4 and some other Linux filesystems. Dar can thus save and restore all of those for each file depending on the capabilities or permissions dar has at restoration time.


DIRTY FILES references: man dar

keywords: --dirty-behavior , --retry-on-change

At backup time, dar checks whether each saved file had not changed at the time it was read. If a file has changed in that situation, dar retries saving it up to three times (by default) and if it is still changing, is flagged as "dirty" in the backup, and handled differently from other files at restoration time. The dirty file handling is either to warn the user before restoring, to ignore and avoid restoring them, or to ignore the dirty flag and restore them normally.



FILTERS references: man dar command line usage notes

keywords: -I -X -P -g -[ -] -am --exclude-by-ea

dar is able to backup from a total file system to a single file, thanks to its filter mechanism. This one is dual headed: The first head let one decide which part of a directory tree to consider for the operation (backup, restoration, etc.) while the second head defines which type of file to consider (filter only based on filename, like for example the extension of the file).

For backup operation, files and directories can also be filtered out if they have been set with a given user defined EA.



NODUMP FLAG references: man dar

keywords: --nodump

Many filesystems, like ext2/3/4 filesystems provide for each inodes a set of flags, among which is the "nodump" flag. You can instruct dar to avoid saving files that have this flag set, as does the so-called dump backup program.



ONE FILESYSTEM references: man dar

keywords: -M

By default dar does not stop at filesystems boundaries unless the filtering mechanism described above excludes a mount point. But you can also ask dar to avoid recursing into a given filesystem, or at the opposite a list of filesystems to only recurse into, without the burden of finding and listing the directories to be excluded from the backup, which can be even more complicated when bind mount are used (i.e. a given filesystem mounted several times).



CACHE DIRECTORY TAGGING STANDARD references: man dar

keywords: --cache-directory-tagging

Many software use cache directories (mozilla web browser for example), directories where is stored temporaneous data that is not interesting to backup. The Cache Directory Tagging Standard provides a standard way for software applications to identify this type of data, which let dar (like some other backup softwares) ignore cache data designated as such by other applications.



DIFFERENTIAL BACKUP references: man dar/TUTORIAL

keywords: -A

When making a backup with dar, you have the possibility to make a full backup or a differential backup.

A full backup, as expected, makes backup of all files as specified with the optional filtering mechanisms.

Instead, a differential backup, saves only files that have changed since a given reference backup. Additionally, files that existed in the reference backup and which do no more exist at the time of the differential backup are recorded in the backup as "been removed". At recovery time, (unless you deactivate it), restoring a differential backup will update changed files and new files, but also remove files that have been recorded as "been removed".

Note that the reference backup can be a full backup or another differential backup (this second method is usually designed as incremental backup). This way you can make a first full backup, then many incremental backups, each taking as reference the last backup made, for example.



DECREMENTAL BACKUP references: man dar / Decremental backup

keywords: -+ -ad

As opposed to incremental backups, where the older one is a full backup and each subsequent backup contains only the changes from the previous backup, a decremental backup let the full backup be the more recent while the older ones only contain changes compared to the just more recent one.

This has the advantage of providing a single backup to use to restore a whole system in its latest known state, while reducing the overall amount of data to retain older versions of files (same amount required as with differential backup). It has also the advantage to not have to keep several set of backup as you just need to delete the oldest backup when you need storage space. However it has the default to require at each new cycle the creation of a full backup, then the transformation of the previous full backup into a so-called decremental backup. Yes, everything has a cost!



DELTA BINARY references: man dar

keywords: --delta sig, --include-delta-sig, --exclude-delta-sig, --delta-sig-min-size, --delta no-patch

Since release 2.6.0, for incremental and differential backups only, instead of saving an entire whole file when it has changed, dar/libdar provides the ability to save only the part that has changed in it. This feature called binary delta relies on librsync library. It is not activated by default considering the non null probability of collision between two different versions of a file. This is also the choice of the dar user community.

However it gives you one step further the differential backup, in terms of backup space optimization and network data transfer reduction.



PREVENTING ROOTKITS AND OTHER MALWARES references:man dar

keywords: -asecu

At backup time when a differential, incremental or decremental backup is done, dar compares the status of inode on the filesystem to the status they had at the time of the last backup. If the ctime of a file has changed while no other inode field changed, dar issues a warning considering that file as suspicious. This does not mean that your system has been compromised but you are strongly advised to check whether the concerned file has recently been updated (Some package manager may lead to that situation) or has its Extended Attributes changed since last backup was made. In normal situation this type of warning does not show often (false positive are rare but possible). However in case your system has been infected by a virus, compromised by a rootkit or by a trojan, dar will signal the problem if the intruder tried to hide its forfait.



DIRECTORY TREE SNAPSHOT references: man dar

keywords: -A +

Dar can make a snapshot of a directory tree and files or even of a whole system, recording the inode status of each files. This may be used to detect changes in filesystem, by "diffing" the resulting snapshot with the filesystem at a later time. The resulting snapshot can also be used as reference to save files that have changed since the snapshot has been done.

A snapshot is just a special dar backup, that is very small compared to the corresponding full backup but of course, it cannot be used to restore any data. As a dar backup, it can be created using compressed, slices, encryption...



SLICES references: man dar/ TUTORIAL

keywords: -s -S -p -aSI -abinary

Dar stands for Disk ARchive. From the beginning it was designed to be able to split an archive (or backup) over several removable media whatever their number is and whatever their size is. To restore from such a splitted archive, dar will directly fetch the requested data in the correct slice(s). dar is suitable for backup over old floppy disk, CD-R, DVD-R, CD-RW, DVD-RW, Zip, Jazz, but also cloud computing, when some have restriction on the maximum size a file can have.

Given the size, dar will split the archive/backup in several files (called SLICES), eventually pausing before creating the next one, and/or allowing the user to automate any action (like un/mount a medium, burn the file on CD-R, send it to the cloud, and so on)

Additionally, the size of the first slice can be specified separately, if for example you want first to fulfill a partially filled disk before starting using empty ones. Last, at restoration time, dar will just pause and prompt the user asking a slice only if it is missing, and allowing here too user to automate any particular action (dowloading the slice from the cloud, mount/unmounting a removable media and so on).

You can choose to have either more than one slice per medium without penalty from dar (no extra user interaction than asking the user to change the removable media when it has been read), or just one slice per medium or even a backup without slice, which is a single file, depending on your need.



COMPRESSION references: man dar

keywords: -z

dar can use compression. Actually gzip, bzip2, lzo, xz/lzma, zstd, lz4 algorithms are available, and there is still room available for any other compression algorithm. Note that, compression is made before slicing, which means that using compression together with slices, will not make slices smaller, but will probably make less slices in the backup.



SELECTIVE COMPRESSION references: man dar/ samples

keywords: -Y -Z -m -am

dar can be given a special filter that determines which files will be compressed or not. This way you can speed up the backup operation by not trying to compress *.mp3, *.mpg, *.zip, *.gz and other already compressed files, for example. Moreover another mechanism allows you to say that files below a given size (whatever their name is) will not be compressed.



STRONG ENCRYPTION references: man dar

keywords: -K -J -# -* blowfish, twofish, aes256, serpent256, camellia256, --kdf-param

Dar can use blowfish, twofish, aes256, serpent256 and camellia256 algorithms to encrypt the whole backup. Two "elastic buffers" are inserted and encrypted with the rest of the data, one at the beginning and one at the end of the archive to prevent a clear text attack or codebook attack.

For symmetric key encryption several Key Derivation Functions are available, from the legacy PBKDF2 (PKCS#5 v2) to the modern Argon2 algorithm. The user has the possibility to set the hash algorithm for the first and the interation count for both algorithms.



PUBLIC KEY ENCRYPTION references: man dar

keywords: -K, --key-length

Encryption based on GPG public key is available. A given backup can be encrypted for a recipient (or several recipients without visible overhead) using its public key. Only the recipient(s) will be able to read such encrypted backup.

The advantage over ciphering the backup as a whole is that you don't have to uncipher it all to extract a particular file or set of file, which brings a huge gain of CPU usage and execution time.



PRIVATE KEY SIGNATURE references: man dar

keywords: --sign

When using encryption with public key it is possible in addition to sign an archive with your own private key(s). Your recipients can then be sure the archive has been generated by you, dar will check the signature validity against the corresponding public key(s) each time the archive is used (restoration, testing, etc.) and a warning is issued if signature does not match or key is missing to verify the signature. You can also have the list of signatories of the archive while listing the archive content.



SLICE HASHING references: man dar

--hash, md5, sha1, sha512

When creating a backup, dar can compute an md5, sha1 or sha512 hash before the backup is even written to disk and produce a small file compatible with md5sum, sha1sum or sha512sum that let verify that the medium has not corrupted the slices of the backup.



DATA PROTECTION references: man dar / Parchive integration

keywords: -al

Dar is able to detect corruption in any part of a dar backup, but it cannot fix it.

Dar relies on the Parchive program for data protection against media errors. Thanks to dar's ability to run user command or script and thanks to the ad hoc provided scripts, dar can use Parchive as simply as adding a word (par2) on command-line. Depending on the context (backup, restoration, testing, ...), dar will by this mean create parity data for each slice, verify and if necessary repair the archive slices.

Without Parchive, dar can workaround a corruption, skipping the concerned file and restoring all others. For some more vital part of the backup files, like the "catalog" which is the table of contents, dar has the ability to use an isolated catalog at rescue of the internal catalog of the corrupted backup. It can also make use of tape marks that are used inside the backup for sequential reading as a way to overcome catalog corruption. The other vital information is the slice layout which is replicated in each slice and let dar overcome data corruption of that part too. As a last resort, Dar also proposes a "lax" mode in which the user is asked questions (like the compression algorithm used, ...) to help dar recover very corrupted archives and in which, many sanity checks are turned into warnings instead of aborting the operation. However this does not replace using Parchive. This "lax" mode has to be considered as the last resort option.



TRUNCATED ARCHIVE/BACKUP REPARATION reference: man dar

keyword: -y

Since version 2.6.0 an truncated archive (due to lack of disk space, power outage, or any other reason) can be repaired. A truncated archive lacks a table of content which is located at the end of the archive, without it you cannot know what file is saved and where to fetch its data from, unless you use the sequential reading mode which is slow as it implies reading the whole archive even for restoring just one file. To allow sequential reading of an archive, which is suitable for tape media, some metadata is by default inserted all along the archive. This metadata is globally the same information that should contain the missing table fo content, but spread by pieces all along the archive. Reparing an archive consists of gathering this inlined metadata and adding it at the end of the repaired archive to allow direct access mode (default mode) which is fast and efficient.



DIRECT ACCESS

Even using compression and/or encryption dar has not to read the whole backup to extract one file. This way if you just want to restore one file from a huge backup, the process will be very quick. Dar first reads the catalogue (i.e. the contents of the backup), then it goes directly to the location of the saved file(s) you want to restore and then proceeds to restoration. In particular using slices, dar will ask only for the slice(s) containing the file(s) to restore.

Since version 2.6.0 dar can also read a backup from a remote host by mean of FTP or SFTP. Here too dar can leverage its direct access ability to only download the necessary stuff in order to restore some files from a large backup, or list the backup content and even compare a set of file with the live filesystem.



SEQUENTIAL ACCESS references: man dar
(suitable for tapes) --sequential-read, -at

The direct access feature seen above is well adapted to random access media like disks, but not for tapes. Since release 2.4.0, dar provides a sequential mode in which dar sequentially read and write archives. It has the advantage to be efficient with tape but suffers from the same drawback as tar archive: it is slow to restore a single file from a huge archive. The second advantage is to be able to repair a truncated archive (lack of disk space, power outage, ...) as described above.



MULTI-VOLUME TAPES references: man dar_split

keywords: --sequential-read

The independant dar_split program provides a mean to output dar but also tar archives to several tapes. If takes care of splitting the archive when writing to tapes and gather pieces of archive from several tapes for dar/tar to work as if it was a single pieced archive.



ARCHIVE/BACKUP TESTING references: man dar / TUTORIAL / Good Backup Practice

keywords: -t

thanks to CRC (cyclic redundancy checks), dar is able to detect data corruption in a backup. Only the file where data corruption occurred will not be possible to restore, but dar will restore the others even when compression or encryption (or both) is used.



ISOLATION references: man dar

keywords: -C -A -@

The catalogue (i.e.: the contents of a backup), can be extracted as a copy (this operation is called isolation) to a small file, that can in turn be used as reference for differential backup and as rescue of the internal catalogue (in case of backup corruption).

There is then no need to provide a backup to be able to create a differential backup based on it, just its isolated catalogue can be used instead. Such an isolated catalogue



FLAT RESTORATION references: man dar

keywords: -f

It is possible to restore any file without restoring the directories and subdirectories it was in at the time of the backup. If this option is activated, all files will be restored in the (-R) root directory whatever their real position is recorded inside the backup.



USER COMMAND BETWEEN SLICES references: man dar dar_slave dar_xform / command line usage notes

keywords: -E -F -~

several hooks are provided for dar to call a given command once a slice has been written or before reading a slice. Several macros allow the user command or script to know the requested slice number, path and backup basename.



USER COMMAND BEFORE AND AFTER SAVING A DIRECTORY OR A FILE references: man dar / command line usage notes

keywords: -< -> -=

It is possible to define a set of file that will have a command executed before dar starts saving them and once dar has completed saved them. Before entering a directory dar will call the specified user command, then it will proceed to the backup of that directory. Once the whole directory has been saved, dar will call again the same user command again (with slightly different arguments) and then continue the backup process. Such user command may for example run a particular command which output will be redirected to a file of that directory, suitable for backup. Another purpose is to force auto-mounting filesystems that else would not be visible and thus not saved.



CONFIGURATION FILE references: man dar / conditional syntax and user targets

keywords: -B

dar can read parameters from file. This is a way to extends the command-line limited length input. A configuration file can ask dar to read (or to include) other configuration files. A simple but efficient mechanism forbids a file to include itself directly or not, and there is no limitation in the degree of recursion for the inclusion of configuration files.

Two special configuration files $HOME/.darrc and /etc/darrc are read if they exist. They share the same syntax as any configuration file which is the syntax used on the command-line, eventually completed by newlines and comments.

Any configuration file can also receive conditional statements, which describe which options are to be used in different conditions. Conditions are: "extract", "listing", "test", "diff", "create", "isolate", "merge", "reference", "auxiliary", "all", "default" (which may be useful in case or recursive inclusion of files) ... more about their meaning and use cases in dar man page.



REMOTE OPERATIONS references: command line usage notes / man dar/dar_slave/dar_xform

keywords: -i -o - -afile-auth

dar is able to read and write a backup to a remote server in three different ways:

  1. dar is able to produce an backup to its standard output or to a named pipe and is able to read a backup from its standard input or from a named pipe
  2. if the previous approach is fine to write down a backup over the network (through an ssh session for example), reading from a remote sever that way (using a single pipe) requires dar to read the whole backup which may be inefficient to just restore a single file. For that reason, dar is also able to read a backup through a pair of pipes (or named pipes) using dar_slave at the other side of the pipes. From the pair of pipes, one pipe let dar asking to dar_slave which portion of the backup file it has to send through the other pipe. This makes a remote restoration much more efficient and still allows these bidirectional exchanges to be encrypted over the network, simply running dar_slave through an ssh session.
  3. last, since release 2.6.0 dar can make use FTP or SFTP protocols to read or write a backup from or to a remote server. This method does not rely on anonymous or named pipes, is as efficient as option 2 for reading a remote backup and is compatible with slicing and slice hashing. however this option is restricted to these two network protocols: FTP (low CPU usage but insecure) SFTP (secure)


DAR MANAGER references: man dar_manager


The advantage of differential backup is that it takes much less space to store and time to complete than always making full backup. But, in the other hand, it may lead you having a lot of them due to the reduces space requirements. Then if you want to restore a particular file, you may spend time to figure out in which backup is located the most recent version. To solve this, dar_manager gathers contents information of all your backups into a database (a Dar Manager Database which ends as a single file). At restoration time, it will call dar for you to restore the asked file(s) from the proper backup.



RE-SHAPE SLICES OF AN EXISTING ARCHIVE/BACKUP references: man dar_xform


The provided program named dar_xform is able to change the size of slices of a given backup. The resulting backup is totally identical to the one directly created by dar. Source backup can be taken from a set of slice, from standard input or even a named pipe. Note that dar_xform can work on encrypted and/or compressed data without having to decompress or even decrypt it.



ARCHIVE/BACKUP MERGING references: man dar

keywords: -+ -ak -A -@

From version 2.3.0, dar supports the merging of two existing archives into a single one. This merging operation is assorted by the same filtering mechanism used for archive creation. This let the user define which file will be part of the resulting archive.

By extension, archive merging can also take as single source archive as input. This may sound a bit strange at first, but this let you make a subset of a given archive without having to extract any file to disk. In particular, if your filesystem does not support Extended Attributes (EA), thanks to this feature you can still cleanup an archive from files you do not want to keep anymore without loosing any EA or performing any change to standard file attributes (like modification dates for example) of files that will stay in the resulting archive.

Last, this merging feature give you also the opportunity to change the compression level or algorithm used as well as the encryption algorithm and passphrase. Of course, from a pair of source archive you can do all these sub features at the same time: filtering out files you do not want in the resulting archive, use a different compression level and algorithm or encryption password and algorithm than the source archive(s), you may also have a different archive slicing or no slicing at all (well dar_xform is more efficient for this feature only, see above "RE-SHAPE SLICES OF AN EXISTING ARCHIVE/BACKUP" for details).



ARCHIVE SUBSETTING references: man dar

keywords: -+ -ak

As seen above under the "archive merging" feature description, it is possible to define a subset of files from an archive and put them into a new archive without having to really extract these files to disk. To speed up the process, it is also possible to avoid uncompressing/recompressing files that are kept in the resulting archive or change their compression, as well change the encryption scheme used. Last, you may manipulate this way files and their EA while you don't have EA support available on your system.



DRY-RUN EXECUTION references: man dar

keywords: -e

You can run any feature without effectively performing the action. Dar will report any problem but will not create, remove or modify any file.



ARCHIVE/BACKUP USER COMMENTS references: man dar

keywords: --user-comment, -l -v, -l -q

The backup header can hold a message from the user. This message is never ciphered nor compressed and always available to any one listing the archive summary (-l and -q options). Several macro are available to add more confort using this option, like the current date, uid and gid, hostname, and command-line used at backup creation.



PADDED ZEROS TO SLICE NUMBER references: man dar

keywords: --min-digits

Dar slice are numbered by integers starting by 1. Which makes filename of the following form: archive.1.dar, archive.2.dar, ..., archive.10.dar, etc. However, the lexicographical order used by many directory listing tools, is not adapted to show the slices in order. For that reason, dar let the user define how much zeros to add in the slice numbers to have usual file browsers listing slices as expected. For example, with 3 as minimum digit, the slice name would become: archive.001.dar, archive.002.dar, ... archive.010.dar.



MULTI-THREADING references: man dar

keywords: --multi-thread

Since release 2.7.0, compression can use several threads when the new compression per block is used (by opposition to the streaming compression used so far, which is still available). Encryption can also be processed with multiple threads even for old backups (no change at encryption level). The user defines the number of threads he wants for each process, compression/decompression as well as ciphering/deciphering.