Dar Documentation


DAR's FEATURES






Here follow the main features of dar/libdar tools. For each feature an overview is presented with some pointers you are welcome to follow for a more detailed information.



FILTERS
references: man darcommand line usage notes

keywords: -I -X -P -g -[ -] -am
dar is able to backup from a total file system to a single file, thanks to its filter mechanism. This one is dual headed: The first head let one decide which part of a directory tree to consider for the operation (backup, restoration, etc.) while the second head defines which type of file to consider (filter only based on filename, like for example the extension of the file).



DIFFERENTIAL BACKUP references: man dar/TUTORIAL

keywords: -A
When making a backup with dar, you have the possibility to make a full backup or a differential backup. A full backup, as expected, makes backup of all files as specified on the command line (with or without filters). Instead, a differential backup, (over filter mechanism), saves only files that have changed since a given reference backup. Additionally, files that existed in the reference backup and which do no more exist at the time of the differential backup are recorded in the backup as "been removed". At recovery time, (unless you deactivate it), restoring a differential backup will update changed files and new files, but also remove files that have been recorded as "been removed". Note that the reference backup can be a full backup or another differential backup. This way you can make a first full backup, then many differential backup, each taking as reference the last backup made, for example.



SLICES references: man dar/TUTORIAL

keywords: -s -S -p -aSI -abinary
Dar stands for Disk ARchive. From the beginning it was designed to be able to split an archive over several removable media whatever their number is and whatever their size is. To restore from such a splitted archive, dar will directly fetch the requested data in the correct slice(s). Thus dar is able to save and restore using old floppy disk, CD-R, DVD-R, CD-RW, DVD-RW, Zip, Jazz, etc... However, Dar will not un/mounting a removable medium, instead it is independent of hardware. Given the size, it will split the archive in several files (called SLICES), eventually pausing before creating the next one, allowing this way, the user to un/mount a medium, burn the file on CD-R, send it by email (if your mail system does not allow huge file in emails, dar can help you here also). By default, (no size specified), dar will make one slice whatever its size is. Additionally, the size of the first slice can be specified separately, if for example you want first to fulfill a partially filled disk before starting using empty ones. Last, at restoration time, dar will just pause and prompt the user asking a slice only if it is missing. Note that all these operation can be automatized using the "user command between slices" feature (presented below), that let dar do all you want it to do once a slice is created or before reading a slice.



DIRECTORY TREE SNAPSHOT references: man dar

keywords: -A +
Dar can make a snapshot of a directory tree and files recording the inode status of files. This may be used to detect changes in filesystem, by "diffing" the resulting archive with the filesystem at a later time. The resulting archive can also be used as reference to save file that have changed since the snapshot has been done. A snapshot archive is very small compared to the corresponding full backup, but it cannot be used to restore any data.



COMPRESSION references: man dar

keywords: -z
dar can use compression. By default no compression is used. Actually gzip, bzip2 and lzo algorithms are implemented, and there is still some room available for any other compression algorithm. Note that, compression is made before slicing, which means that using compression together with slices, will not make slices smaller, but will probably make less slices in the backup.



DIRECT ACCESS


even using compression and/or encryption dar has not to read the whole backup to extract one file. This way if you just want to restore one file from a huge backup, the process will be much faster than using tar. Dar first reads the catalogue (i.e. the contents of the backup), then it goes directly to the location of the saved file(s) you want to restore and then proceeds to restoration. In particular using slices, dar will ask only for the slice(s) containing the file(s) to restore.



SEQUENTIAL ACCESS
references: man dar
(suitable for tapes)
--sequential-read, -at
The direct access feature seen above is well adapted to random access media like disks, but not for tapes. Since release 2.4.0, dar provides a sequential mode in which dar sequentially read and write archives. It has the advantage to be efficient with tape but suffers from the same drawback as tar archive: it is slow to restore a single file from a huge archive.


HARD LINK CONSIDERATION


hard links are properly saved in any case and properly restored if possible. For example, if restoring across a mounted file system, hard linking will fail, but dar will then duplicate the inode and file contents, issuing a warning. Hard link support includes the following inode types: plain files, char devices, block devices, symlinks (Yes, you can hard link symbolic links! Thanks to Wesley Leggette for the info ;-) )



SPARSE FILES
references: man dar

--sparse-file-min-size, -ah
By default Dar takes care of sparse files, even if the underlying filesystem does not support sparse files(!). When a long sequence of zeroed bytes is met in a file during backup, those are not stored into the archive but the number of zeroed bytes is stored instead (structure known as a "hole"). When comes the time to restore that file, dar restore the normal data but when a hole is met in the archive dar directly skips at the position of the data following that hole which, if the underlying filesystem supports sparse files, will (re)create a hole in the restored file, making a sparse file. Sparse files can report to be several hundred gigabytes large while they need only a few bytes of disk space, being able to properly save and restore them avoids wasting disk space at restoration time and in archives.



EXTENDED ATTRIBUTES (EA)
references: man dar
MacOS X FILE FORKS / ACL
keywords: -u -U -am -ae --alter=list-ea
Dar is able to save and restore EA, all or just those matching a given pattern.

File Forks (MacOS X) are implemented over EA as well as Linux's ACL, they are thus transparently saved, tested, compared and restored by dar. Note that ACL under MacOS seem to not rely on EA, thus while they are marginally used they are ignored by dar.



ARCHIVE TESTING references: man dar/TUTORIAL/ Good Backup Practice

keywords: -t
thanks to CRC (cyclic redundancy checks), dar is able to detect data corruption in the archive. Only the file where data corruption occurred will not be possible to restore, but dar will restore the others even when compression or encryption (or both) is used.



DATA PROTECTION references: man dar/Parchive integration

keywords: -al
dar relies on the Parchive program for data protection against media errors. Thanks to dar's ability to run user command or script and thanks to the ad hoc provided scripts, dar can use Parchive as simply as adding a word (par2) on command-line. Depending on the context (archive creation, archive testing, ...), dar will by this mean create parity data for each slice, verify and if necessary repair the archive slices.

However, even without Parchive, dar has the ability to be restored using an isolated catalogue as backup of the internal catalogue of an archive, which if corrupted could lead the whole archive to  become unreadable. The other vital information (like the slice layout) is replicated in each slice, this let dar overcome data corruption of that part too, and restore more than nothing in case of major problem. As a last resort, Dar also proposes a "lax" mode in which the user is asked questions (like the compression algorithm used, ...) to help dar recover very corrupted archives. However this does not replace using Parchive and has to be considered as the last resort option.



REMOTE OPERATIONS references: command line usage notes, man dar/dar_slave/dar_xform
USING PIPES keywords: -i -o -
dar is able to produce an archive to its standard output or named pipe, it is also able to read an archive from its standard input or named pipe, which let one to make remote backup easily.

However this would requires to read the archive in sequential mode which leads to transfer a whole archive just to restore a single file. For that reason, dar is also able to read an archive through a pair of pipes using dar_slave at one side and dar at the other side. From the pair of pipe, one pipe let dar ask to dar_slave which portion of the archive to send through the other pipe. This makes a remote restoration much efficient and can still be protected, simply remotely running dar_slave through a ssh session for example will let all exchanges be encrypted.



ISOLATION references: man dar

keywords: -C -A -@
the catalogue (i.e.: the contents of an archive), can be extracted (this operation is called isolation) to a small file, that can in turn be used as reference for differential archive. There is then no more need to provide an archive to be able to create a differential backup based on it, just its catalogue is necessary. Such an isolated catalogue can also be used to rescue the archive it has been isolated from in the case the archive's internal catalogue has been corrupted. Such isolated catalogue can be created at the same time as the archive (operation called on-fly isolation) or as a separate operation (called isolation).



RE-SHAPE SLICES OF AN EXISTING ARCHIVE references: man dar_xform


the provided program named "dar_xform" is able to change the size of slices of a given archive. The resulting archive is totally identical to archives directly created by dar. Source archive can be taken from a set of slice, from standard input or even a named pipe. Note that dar_xform can work on encrypted and/or compressed data without having to decompress or even decrypt it.



USER COMMAND BETWEEN SLICES references: man dar dar_slave dar_xform/command line usage notes

keywords: -E -F -~
several hooks are provided for dar to call a given command once a slice has been written or before reading a slice. Several macros allow the user command or script to know the requested slice number, path and archive basename.



USER COMMAND BEFORE AND AFTER SAVING A DIRECTORY OR A FILE
references: man dar/command line usage notes

keywords: -< -> -=
It is possible to define a set of file that will have a command executed before dar start saving them and once dar has completed saving them. This is especially intended for saving live database backup. Before entering a directory dar will call the specified user command, then it will proceed to the backup of that directory. Once the whole directory has been saved, dar will call again the same user command (with slightly different arguments) and then continue the  backup process. Such user command may have for action to stop the database and to reactivate it afterward for example.




STRONG ENCRYPTION references: man dar

keywords: -K -J -# -* blowfish, twofish, aes256, serpent256, camellia256
Dar can use blowfish, twofish, aes256, serpent256 and camellia256 algorithms to encrypt the whole archive. Two "elastic buffers" are inserted and encrypted with the rest of the data, one at the beginning and one at the end of the archive to prevent a clear text attack or codebook attack.




SLICE HASHING
references: man dar

--hash, md5, sha1
When creating an archive dar can compute an md5 or sha1 hash before the archive is written to disk and produce a small file compatible with md5sum or sha1sum that let verify that each slice of the archive is not corrupted.




CONFIGURATION FILE references: man dar, conditional syntax and user targets

keywords: -B
dar can read parameter from file. This is a way to extends the command-line limited length input. A configuration file can ask dar to read (or to include) other configuration files. A simple but efficient mechanism forbids a file to include itself directly or not, and there is no limitation in the degree of recursion for the inclusion of configuration files.

Two special configuration files $HOME/.darrc and /etc/darrc are read if they exist. They share the same syntax as any configuration file which is the syntax used on the command-line, eventually completed by newlines and comments.

Any configuration file can also receive conditional statements, which describe which options are to be used in different conditions. Conditions are: "restoration", "listing", "testing", "difference", "saving", "isolation", "any operation", "none yet defined" (which may be useful in case or recursive inclusion of files) ...



SELECTIVE COMPRESSION references: man dar/samples

keywords: -Y -Z -m -am
dar can be given a special filter that determines which files will be compressed or not. This way you can speed up the backup operation by not trying to compress *.mp3, *.mpg, *.zip, *.gz and other already compressed files, for example. Moreover another mechanism allow you to say that files under a given size (whatever their name is), will not be compressed.



DAR MANAGER references: man dar_manager


The advantage of differential backup is that it takes much less space to store and time to complete than always making full backup. But, in the other hand, while you can thus have a lot of them due to the reduces space requirement, if you want to restore a particular file, you can thus spend time to find in which backup is located the most recent version. This is solved using dar_manager. This command-line program, will gather contents information of all your backups. At restoration time, it will call dar for you to restore the asked file(s) from the proper backup.



FLAT RESTORATION references: man dar

keywords: -f
It is possible to restore any file without restoring the directories and subdirectories it was in at the time of the backup. If this option is activated, all files will be restored in the (-R) root directory whatever their real position is recorded inside the archive.



NODUMP FLAG references: man dar

keywords: --nodump
Linux ext2/3/4 filesystem, provide for each inodes a set of flags, among which is the "nodump" flag, which in substance says "don't save this file for backup". This is used by the so-called dump backup program. Dar can take care to not save those files that have this flag set.



ONE FILESYSTEM references: man dar

keywords: -M
dar can backup files of a given filesystem only, even if some subdirectory in the scope are mounting points for other filesystems, dar will not recurse in these directories.



ARCHIVE MERGING references: man dar

keywords: -+ -ak -A -@
From version 2.3.0, dar supports the merging of two existing archives into a single one. This merging operation is assorted by the same filtering mechanism used for archive creation. This let the user define which file will be part of the resulting archive.

By extension, archive merging can also take as single source archive as input. This may sound a bit strange at first, but this let you make a subset of a given archive without having to extract any file to disk. In particular, if your filesystem does not support Extended Attributes (EA), thanks to this feature you can still cleanup an archive from files you do not want to keep anymore without loosing any EA or performing any change to standard file attributes (like modification dates for example) of files that will stay in the resulting archive.

Last, this merging feature give you also the opportunity to change the compression level or algorithm used as well as the encryption algorithm and pass. Of course, from a pair of source archive you can do all these sub features at the same time: filtering out files you do not want in the resulting archive, use a different compression level and algorithm or encryption password and algorithm than the source archive(s), you may also have a different archive slicing or no slicing at all (well dar_xform is more efficient for this feature only, see above "RE-SHAPE SLICES OF AN EXISTING ARCHIVE" for details).



ARCHIVE SUBSETTING
references: man dar

keywords: -+ -ak
As seen above under the "archive merging" feature description, it is possible to define a subset of files from an archive and put them into a new archive without having to really extract these files to disk. To speed up the process, it is also possible to avoid uncompressing/recompressing files that are kept in the resulting archive or change their compression, as well change the encryption scheme used. Last, you may manipulate this way files and their EA while you don't have EA support available on your system.


DECREMENTAL BACKUP references: man dar / Decremental backup

keywords: -+ -ad
As opposed to incremental backups, where the older one is a full backup and each subsequent backup contains only the changes from the previous backup, decremental backup let the full backup be the more recent while the older ones only contain changes compared to the just more recent one. This has the advantage of having a single archive to use to restore a whole system (dar_manager is not necessary) while reducing the overall amount of data to retain older versions of files (same amount required as with differential backup). It has also the advantage to not have to keep several set of backup as you just need to delete the oldest backup when you need storage space. However it has the default to require at each new backup the creation of a full backup, then the transformation of the previous full backup into a so-called decremental backup. Everything has a cost! ;-)


DRY-RUN EXECUTION
references: man dar

keywords: -e
You can run any feature without effectively performing the action. Dar will report any problem but will not create, remove or modify any file.


DIRTY FILES
references: man dar

keywords: --dirty-behavior , --retry-on-change
At backup time, dar checks that each saved file had not changed at the time it was read. If a file has changed in that situation, it is flagged as "dirty" in the archive, and handled differently from other files at restoration time. The dirty file handling is either to warn the user before restoring, to ignore and not restore them, or to ignore the dirty flag and restore them normally. Dar has room to retry saving a file when it has been found dirty, before effectively putting the "dirty" flag for that file in the archive. This retry option is limited by a maximum number of try per file, after which the file is definitively marked as dirty and the backup process continues with the next file.


ARCHIVE USER COMMENTS
references: man dar

keywords: --user-comment, -l -v, -l -q
The archive header can encompass a message from the user. This message is never ciphered nor compressed and always available to any one listing the archive summary (-l and -q options). Several macro are available to add more confort using this option, like the current date, uid and gid used for archive creation, hostname, and command-line used for the archive creation.


PADDED ZEROS TO SLICE NUMBER
references: man dar

keywords: --min-digits
Dar slice are numbered by integers starting by 1. Which makes filename of the following form: archive.1.dar, archive.2.dar, ..., archive.10.dar, etc. However, the lexicographical order used by many directory listing tools, is not adapted to show the slices in order. For that reason, dar let the user define how much zeros to add in the slice numbers to have usual file browsers listing slices as expected. For example, with 3 as minimum digit, the slice name would become: archive.001.dar, archive.002.dar, ... archive.010.dar.



CACHE DIRECTORY TAGGING STANDARD
references: man dar

keywords: --cache-directory-tagging
Many software use cache directories (mozilla web browser for example), directories where is stored temporaneous data that is not interesting to backup. The Cache Directory Tagging Standard provides a standard way for software applications to identify this type of data, which let dar able to take into account and avoid saving them.