Dar Documentation


DAR's - Frequently Asked Questions


Questions:

I restore/save all files but dar reported some files have been ignored, what are those ignored files?
Dar hangs when using it with pipes, why?
Why, when I restore 1 file, dar report 3 files have been restored?
While compiling dar I get the following message : " g++: /lib/libattr.a: No such file or directory", what can I do?
I cannot find the binary package for my distro, where to look for?
Can I use different filters between a full backup and a differential backup? Would not dar consider some file not included in the filter to be deleted?
Once in action dar makes all the system slower and slower, then it stops with the message "killed"! How to overcome this problem?
I have a backup I want to change the size of slices?
I have a backup in one slice, how can I split it in several slices?
I have a backup in several slice, how can I stick all them in a single file?
I have a backup, how can I change its encryption scheme?
I have a backup, how can I change its compression algorithm?
Which options can I use with which options?
Why dar reports corruption for the archive I have transfered with FTP?
Why DAR does save UID/GID instead of plain usernames and usergroups?
Dar_Manager does not accept encrypted archives, how to workaround this?
How to overcome the lack of static linking on MacOS X?
Why cannot I test, extract file, list the contents of a given slice from an archive?
Why cannot I merge two isolated catalogues?
Why cannot dar use the full power of my multi-processor computer?
Is libdar thread-safe, which way do you mean it is?
How to solve "configure: error: Cannot find size_t type"?
Why dar became much slower since release 2.4.0 ?
How to search for questions (and their answers) about known problems similar to mines?
Why dar tells me that he failed to open a directory, while I have excluded this directory?
Dar reports a "SECURITY WARNING! SUSPICIOUS FILE" what does that mean!?
Can dar help copy a large directory tree?
Does dar compress per file or the whole archive?
What slice size can I use with dar?
Is there a dar fuse filesystem?
how dar compares to tar?



Answers:

I restore/save all files but dar reported some files have been ignored, what are those ignored files?
When restoring/saving, all files are considered by default. But if you specify some files to restore or save, all other files are "ignored", this is the case when using -P -X -I or -g.

Dar hangs when using it with pipes, why?
Dar can produce archive on its standard output, if you give '-' as basename. But it cannot read an archive from its standard input in direct access mode. To feed an archive to dar through pipes, you need dar_slave and two pipes or use the sequential mode (--sequential-mode option, which is very slow compared to the default direct access mode). To use dar with dar_slave over pipes in direct access mode (which is the  more efficient way to proceed), see the detailed notes or more precisely dar and ssh note.

Why, when I restore 1 file, dar report 3 files have been restored?
if you restore for example the file usr/bin/emacs dar will first restore usr (if the directory already exists, it will get its date and ownership restored, all existing files will be preserved), then /usr/bin will be restored, and last usr/bin/emacs will be restored. Thus 3 inodes have been restored or modified while only one file has been asked for restoration.

While compiling dar I get the following message : " g++: /lib/libattr.a: No such file or directory", what can I do?
The problem comes from an incoherence in your distro (Redhat and Slackware seem(ed) concerned at least): Dar (Libtool) finds /usr/lib/gcc-lib/i386-redhat-linux/3.3.3/../../../libattr.la  file to link with. This file defines where is located libattr static and dynamic libraries but in this file both static and dynamic libraries are expected to be found under /lib. While the dynamic libattr is there,  the static version has been moved to /usr/lib. A workaround is to make a symbolic link:

ln -s /usr/lib/libattr.a /lib/libattr.a


I cannot find the binary package for my distro, where to look for?
For any binary package, ask your distro maintainer to include dar (if not already done), and check on the web site of your preferred distro for a dar package

Can I use different filters between a full backup and a differential backup? Would not dar consider some file not included in the filter to be deleted?
Yes, you can. No, there is no risk to have dar deleting the files that were not selected for the differential backup. Here is the way dar works:

During a backup process, when a file is ignored due to filter exclusion, an "ignored" entry is added to the catalogue. At the end of the backup, dar compares both catalogues, the one of reference and the new one built during the backup process, and adds a "detruit" (destroyed in English) entry, when an entry of the reference is not present in the new catalogue. Thus, if an "ignored" is present no "detruit" will be added for that name. Then all "ignored" entries are removed and the catalogue is dumped in the archive.


Once in action dar makes all the system slower and slower, then it stops with the message "killed"! How to overcome this problem?
Dar needs virtual memory to work. Virtual memory is the RAM + SWAP space. Dar memory requirement grows with the amount of file saved, not with the amount of data saved. If you have a few huge files you will have little chance to see any memory limitation problem. At the opposite, saving a plethora of files (either big or small), will make dar request a lot of virtual memory. Dar needs this memory to build the catalogue (the contents) of the archive it creates. Same thing, for differential backup, except it also needs to load in memory the catalogue of the archive of reference, which most of the time will make dar using twice more memory when doing a differential backup than a full backup.

Anyway, the solution is:
  1. Read the limitatons file to understand the problem and be aware of the limitations you will bring at step 3, bellow.
  2. If you can, add swap space to your system (under Linux, you can either add a swap partition or a swap file, which is less constraining but also a bit less efficient). Bob Barry provided a script that can give you a raw estimation of the required virtual memory (doc/samples/dar_rqck.bash).
  3. If this is not enough, or if you don't want/can add swap space, recompile dar giving --enable-mode=64 argument to the configure script.
  4. If this not enough, and you have some money, you can add some RAM on you system
  5. If all that fails, ask for support on the dar-support mailing-list.
There is still a workaround which is to make several smaller archives of the files to backup. For example, make a backup for all in /usr/local another for all in /var and so on. These backup can be full or differential. The drawback is not big as you can store these archive side by side and use them at will. Moreover, you can feed a unique dar_manager database with all these different archives. This which will hide you the fact that there are several full archives and several differential archives concerning different set of files.


I have a backup I want to change the size of slices?
dar_xform is your friend!

dar_xform -s <size> original_archive new_archive

dar_xform will create a new archive with the slices of the requested size, (you can also make use of -S option for the first slice). Note that you don't need to decrypt the archive, not dar will uncompress it, this is thus a very fast processing. See dar_xform man page for more.


I have a backup in one slice, how can I split it in several slices?
dar_xform is your friend!

dar_xform -s <size> original_archive new_archive

see above for more.

I have a backup in several slice, how can I stick all them in a single file?
dar_xform is your friend!

dar_xform original_archive new_archive

dar_xform without -s option creates a single sliced archive. See dar_xform man page for more.


I have a backup, how can I change its encryption scheme?
The merging feature let you do that. The merging has two roles, putting in one archive the contents of two archives, and at the same time filtering file contents to not copy certain files in the resulting archive. The merging feature can take two but also only one archive as input, so we will use it in that special way here:
  • a single input (our original archive)
  • no file filtering (so we keep all the files)
  • Keep files compressed (no decompression/re compression) to speed up the process
dar -+ new_archive -A original_archive -K "<new_algo>:new pass" -ak

If the original archive was not in clear you need to add the -J option to provide the encryption key, and if you don't want to have password in clear on the command line (command that can be seen with top or ps by other users), simply provide "<algo>:" then dar will ask you on the fly the password, if using blowfish you can then just provide ":" for the keys:

dar -+ new_archive -A original_archive -K ":" -J ":" -ak

Note that you can also change slicing of the archive at the same time thanks to -s and -S options:

dar -+ new_archive -A original_archive -K ":" -J ":" -ak -s 1G

I have a backup, how can I change its compression algorithm?
Same thing as above : we will use the merging feature :

to use bzip2 compression:

dar -+ new_archive -A original_archive -zbzip2

to use gzip compression

dar -+ new_archive -A original_archive -zgzip

to use lzo compression

dar -+ new_archive -A original_archive -zlzo

to use no compression at all:

dar -+ new_archive -A original_archive

Note that you can also change encryption scheme and slicing at the same time you change compression:

dar -+ new_archive -A original_archive -zbzip2 -K ":" -J ":" -s 1G

Which options can I use with which options?
DAR provides seven commands:

-c   to create a new archive
-x   to extract files from a given archive
-l    to list the contents of a given archive
-d   to compare the contents of an archive with filesystem
-t    to test the coherence of a given archive
-C  to isolate an archive (extract its contents to a usually small file)
-+   to merge two archives in one or create a sub archive from one or two other archives

Follow for each command the available options (those marked OK):


short option
long  option
-c
-x
-l
-d
-t
-C
-+
-v
--verbose
OK
OK
OK
OK
OK
OK
OK
-vs
--verbose=s
OK
OK
--
OK
OK
-- OK
-b
--beep
OK
OK
OK
OK
OK
OK
OK
-n
--no-overwrite
OK
OK
-- -- -- OK
OK
-w
--no-warn
OK OK -- -- -- OK OK
-wa
--no-warn=all
-- OK -- -- -- -- --
-A
--ref
OK OK --
OK OK OK OK
-R
--fs-root
OK OK -- OK -- -- --
-X
--exclude
OK OK OK OK OK -- OK
-I
--include
OK OK OK OK OK -- OK
-P
--prune
OK OK OK OK OK -- OK
-g
--go-into
OK OK OK
OK OK -- OK
-]
--exclude-from-file
OK OK OK
OK OK -- OK
-[
--include-from-file
OK OK OK
OK OK -- OK
-u
--exclude-ea
OK OK -- -- -- -- OK
-U
--include-ea
OK OK -- -- -- -- OK
-i
--input
OK OK OK OK OK OK OK
-o
--output
OK OK OK OK OK OK OK
-O
--comparison-field
OK OK -- OK -- -- --
-H
--hour
OK OK -- -- -- -- --
-E
--execute
OK OK OK OK OK OK OK
-F
--ref-execute
OK -- -- -- -- OK OK
-K
--key
OK OK OK OK OK OK OK
-J
--ref-key
OK -- -- -- -- OK OK
-#
--crypto-block
OK OK OK OK OK OK OK
-*
--ref-crypto-block
OK -- -- -- -- OK OK
-B
--batch
OK OK OK OK OK OK OK
-N
--noconf
OK OK OK OK OK OK OK
-e
--empty
OK -- -- -- -- OK OK
-aSI
--alter=SI
OK OK OK OK OK OK OK
-abinary
--alter=binary
OK OK OK OK OK OK OK
-Q

OK OK OK OK OK OK OK
-aa
--alter=atime
OK -- -- OK -- -- --
-ac
--alter=ctime
OK -- -- OK -- -- --
-am
--alter=mask
OK OK OK OK OK OK OK
-an
--alter=no-case
OK OK OK OK OK OK OK
-acase
--alter=case
OK OK OK OK OK OK OK
-ar
--alter=regex
OK
OK
OK
OK
OK
OK
OK
-ag
--alter=glob
OK
OK
OK
OK
OK
OK
OK
-j
--jog
OK OK OK OK OK OK OK
-z
--compression
OK -- -- -- -- OK OK
-y
--bzip2
deprecated -- -- -- -- deprecated deprecated
-s
--slice
OK -- -- -- -- OK OK
-S
--first-slice
OK -- -- -- -- OK OK
-p
--pause
OK -- -- -- -- OK OK
-@
--aux
OK -- -- -- -- -- OK
-$
--aux-key
-- -- -- -- -- -- OK
-~
--aux-execute
-- -- -- -- -- -- OK
-%
--aux-crypto-block
-- -- -- -- -- -- OK
-D
--empty-dir
OK OK -- -- -- -- OK
-Z
--exclude-compression
OK -- -- -- -- -- OK
-Y
--include-compression
OK -- -- -- -- -- OK
-m
--mincompr
OK -- -- -- -- -- OK
-ak
--alter=keep-compressed
--
--
--
--
--
--
OK
-af
--alter=fixed-date
OK
--
--
--
--
--
--

--nodump
OK -- -- -- -- -- --
-M
--no-mount-points
OK -- -- -- -- -- --
-,
--cache-directory-tagging
OK -- -- -- -- -- --
-k
--deleted
-- OK -- -- -- -- --
-r
--recent
-- OK -- -- -- -- --
-f
--flat
-- OK -- -- -- -- --
-ae
--alter=erase_ea
-- OK -- -- -- -- --
-T
--list-format
-- -- OK -- -- -- --
-as
--alter=saved
-- -- OK -- -- -- --
-ad
--alter=decremental
-- -- -- -- -- -- OK
-q
--quiet
OK
OK
OK
OK
OK
OK
OK
-/
--overwriting-policy
-- OK -- -- -- -- OK
-<
--backup-hook-include
OK -- -- -- -- -- --
->
--backup-hook-exclude
OK -- -- -- -- -- --
-=
--backup-hook-execute
OK -- -- -- -- -- --
-ai
--alter=ignore-unknown-inode-type
OK
--
--
--
--
--
--
-at
--alter=tape-marks
OK
--
--
--
--
--
OK
-0
--sequential-read
OK
OK
OK
OK
OK
OK
--
-;
--min-digits
OK
OK
OK
OK
OK
OK
OK
-1
--sparse-file-min-size
OK
--
--
--
--
--
OK
-ah
--alter=hole-recheck
--
--
--
--
--
--
OK
-^
--slice-mode
OK
--
--
--
--
OK
OK
-_
--retry-on-change
OK
--
--
--
--
--
--
-asecu
--alter=secu
OK
--
--
--
--
--
--
-.
--user-comment
OK
--
--
--
--
OK
OK
-3
--hash
OK
--
--
--
--
OK
OK
-2
--dirty-behavior
--
OK
--
--
--
--
--
-al
--alter=lax
--
OK
--
--
--
--
--
-alist-ea
--alter=list-ea
--
--
OK
--
--
--
--



Why dar reports corruption of the archive I have transfered with FTP?

Dar archive are binary files, they must be transfered in binary mode when using FTP. This is done in the following way for the ftp command-line client :

ftp <somewhere>
<login>
<password>
bin
put <file>
get <file>
bye

If you transfer an archive (or any other binary file) in ascii mode (the opposite of binary mode), the 8th bit of each byte will be lost and the archive will become impossible to recover (due to the destruction of this information). Be very careful to test your archive after transferring back to you host to be sure you can delete the original file.


Why DAR does save UID/GID instead of plain usernames and usergroups?

In each file property is not present the name of the owner nor the name of the group owner, but instead are present two numbers, the user ID and the group ID (UID & GID in short). In the /etc/password file these numbers are associated to names and other properties, like the login shell, the home directory, the password (see also /etc/shadow). Thus, when you do a directory list (with the 'ls' command for example or with any GUI program for another example), the listing application used does open each directory, there it finds a list of name and a inode number associated, then the listing program fetchs the inode attributes for each file and looks among other information for the UID and the GID. To be able to display the real user name and group name, the listing application use a well-defined standard C library call that will do the lookup in /etc/password, eventually NIS system if configured and any other additional system, [this way applications have not to bother with the many system configuration possible, the same API interface is used whatever is the system], then lookup returns the name if it exist and the listing application display for each file found in a directory the attributes and the user name and group name as returned by the system.

As you can see, the user name and group name are not part of any file attribute, but UID and GID *are* instead. Dar is a backup tool mainly, it does preserve at much as possible the files property to be able to restore them as close as possible to their original state. Thus a file saved with UID=3 will be restored with UID=3. The name corresponding the UID 3 may exist or not,  may exist and be the same or may exist and be different, the file will be anyway restored in UID 3.

Scenario with dar's way of restoring

Thus, when doing backup and restoration of a crashed system you can be confident, the restoration will not interfere with the bootable system you have used to launch dar to restore your disk. Assuming you have UID 1 labeled 'bin' in your real crashed system, but this UID 1 is labeled 'admin' in the boot system, while UID 2 is labeled 'bin' in this boot system, files owned by bin in the system to restore will be restored under UID 1, not UID 2 which is used by the temporary boot system. At that time after restoration still running the from the boot system, if you do a 'ls' you will see that the original files owned by 'bin' are now owned by user 'admin'.

This is really a mirage: in your restoration you will also restore the /etc/password file and other system configuration files (like NIS configuration files if they have been used), then at reboot time on the newly restored real system, the UID 1 will be backed associated to user 'bin' as expected and files originally owned by user bin will now been listed as owned by bin as expected.

Scenario with plain name way of restoring

If dar had done else, restoring the files owned by 'bin' to the UID corresponding to 'bin', these files would have been given UID 2 (the one used by the temporary bootable system used to launch dar). But once the real restored system would have been launched, this UID 2 would have become some other user and not 'bin' which is mapped to UID 1 in the restored /etc/password.

Now, if you want to change some UID/GID when moving a set of files from one live system to another system, there is no problem if you are not restoring dar under the 'root' account. Other account than 'root' are usually not allowed to modify UID/GID, thus restored files by dar will have group and user ownership of the dar process, which is the one that has launched dar.

But if you really need to move a directory tree containing a set of files with different ownership and you want to preserve these different ownership from one live system to another, while the corresponding UID/GID do not match between the two system, dar can still help you:

  • Save your directory tree on the source live system
  • From the root account in the destination live system do the following:
  • restore the archive in a empty directory
  • change the UID of files according to the one used by the destination filesystem with the command:
find /path/to/restored/archive -uid <old UID>  -print -exec chown <new name> {} \;

find /path/to/restored/archive -gid <old GID> -print -exec chgrp <new name> {} \;

The first command will let you remap an UID to another for all files under the /path/to/restored/archive directory
The second command will let you remap a GID to another for all files under the /path/to/restored/archive directory

Example on how to globally modify ownership of a directory tree user by user

For example, you have on the source system three users: Pierre (UID 100), Paul (UID 101), Jacques (UID 102)
but on the destination system, these same users are mapped to different UID: Pierre has UID 101, Paul has UID 102 and Jacques has UID 100.

We temporary need an unused UID on the destination system, we will assume UID 680 is not used. Then after the archive restoration in the directory /tmp/A we will do the following:

find /tmp/A -uid 100 -print -exec chown 680 {} \;
find /tmp/A -uid 101 -print -exec chown pierre {} \;
find /tmp/A -uid 102 -print -exec chown paul {} \;
find /tmp/A -uid 680 -print -exec chown jacques  {} \;

which is:
change files of UID 100 to UID 680 (the files of Jacques are now under the temporary UID 680 and UID 100 is now freed)
change files of UID 101 to UID 100 (the files of Pierre get their UID of the destination live system, UID 101 is now freed)
change files of UID 102 to UID 101 (the files of Paul get their UID of the destination live system, UID 102 is now freed)
change files of UID 680 to UID 102 (the files of Jacques which had been temporarily moved to UID 680 are now set to their UID on the destination live system, UID 680 is no more used).

You can then move the modified files to appropriated destination or make a new dar archive to be restored in appropriated place if you want to use some of dar's feature like for example only restore files that are more recent than those present on filesystem.



Dar_Manager does not accept encrypted archives, how to workaround this?

Yes, that's true, dar_manager does not accept encrypted archives. The first reason is that while dar_manager database cannot be encrypted this is not very fair to add to them encrypted archives. The second reason is because the dar_manager database should hold the key for each encrypted archive making this archive the weakest point in your data security: Breaking the database encryption would then provide access to any encryption key, and with original archive access it would bring access to data of any of the archive added to the database.

OK, there is however a feature in the pipe to provide to dar_manager the support to encrypt its archives, then next another feature to provide dar_manager the possibility to store the different archive keys, then is needed another feature to have key being passed from dar_manager to dar out of command-line (which would expose the keys to the sight of other users on your multi-user system), then yet another feature to be able to feed the database with the archive keys also without using the command-line. ... well there is a lot of feature to add and test before you can expect finding it in a released version of dar.

In the meanwhile, you can proceed as follows:
  • isolate your encrypted archive to unencrypted 'extracted catalogue': Do not use the -K option while isolating, you will however need to use the -J option to let dar able to read the encrypted archive. Note that still for key protection, you are encouraged to use a DCF (Dar Command File, which  is a plain file with a list of options to be passed to dar) file with restricted permissions and containing the '-J <key>' option to be passed for dar. The dar's -B option would then receive this filename. this will avoid other users of your system to have a chance to read the key you have used for your archives,
  • add these extracted catalogue to the dar_manager database of your choice,
  • change the name and path of the added catalogue to point to your real encrypted archives (-b and -p options of dar_manager).
Note that the database is not encrypted this will expose the archive file listing (not the file's contents) of your encrypted archives to anyone able to read the database, thus it is recommended to set restrictive permission to this database file.

When will come the time to use dar_manager to restore some file, you will have to make dar_manager pass the key to dar for it be able to restore the needed files from the archive. This can be done in several ways: dar_manager's command-line, dar_manager database or dar.dcf file.
  1. dar_manager's command-line: simply pass the -e "-K <key>" to dar_manager . Note that this will expose the key twice: on dar_manager's command-line and on dar's command-line.
  2. dar_manager database: the database can store some constant command to be passed to dar. This is done using the -o option, or the -i option. The -o option exposes the arguments you want to be passed to dar because they are on dar_manager command-line. While the -i option, let you do the same thing but in an interactive manner, this is a better choice. However, if -i option it is a safe way to feed the dar_manager database with the '-K <key>' option to be passed to dar, this option will be received by dar on command-line. Thus still the key will be visible by other users on your same system.
  3. A better way is to use a DCF file with restrictive permission. This one will receive the '-K <key>' option for dar to be able to read the encrypted archives. And dar_manager will ask dar to read this file thanks to the '-B <filename>' option you will have give either on dar_manager's command-line (-e -B <filename> ...) or from the stored option in the database (-o -B <filename>).
  4. The best way is let dar_manager pass the -K option to dar, but without password : simply passing the -e "-K :" option to dar_manager. When dar will get the -K option with the ":" argument, it will dynamically ask for the password and store it in secured memory.
note that you must prevent other users reading any file holding the archive key, this covers the dar_manager database as well as the DCF files you could temporarily use. Second note, in this workaround approach we have assumed that all encrypted archive do share the same key.


How to overcome the lack of static linking on MacOS X?

The answer comes from Dave Vasilevsky in an email to the dar-support mailing-list. I let him explain how to do:

Pure-static executables aren't used on OS X. However, Mac OS X does have other ways to build portable binaries. HOWTO build portable binaries on OS X?

First, you have to make sure that dar only uses operating-system libraries that exist on the oldest version of OS X that you care about.
You do this by specifying one of Apple's SDKs, for example:

export CPPFLAGS="-isysroot /Developer/SDKs/MacOSX10.2.8.sdk"
export LDFLAGS="-Wl,-syslibroot,/Developer/SDKs/MacOSX10.2.8.sdk"


Second, you have to make sure that any non-system libraries that dar links to are linked in statically. To do this edit dar/src/dar_suite/Makefile, changing LDADD to '../libdar/.libs/libdar.a'. If any other non-system libs are used (such as gettext), change the makefiles so they are also linked in statically. Apple should really give us a way to force the linker to do this automatically!

Some caveats:

* If you build for 10.3 or lower, you will not get EA support, and therefore you will not be able to save special Mac information like
resource forks.
* To work on both ppc and x86 Macs, you need to build a universal binary. For instructions, use Google :-)
* To make a 10.2-compatible binary, you must build with GCC 3.3.
* These instructions won't work for the 10.1 SDK, that one is harder to use.


Why cannot I test, extract file, list the contents of a given slice from an archive?

Well this is due to dar's design. Since release 2.4.0 two feature can help you be close to that point, namely --sequential-reading which asks dar to read the archive sequentially and -al option which asks dar to be relaxed on sanity and coherence checks. You can put a single slice into a given directory, and create as much empty files as necessary to simulate slices of that archive which has lower numbers than the real slice(s) that remains of a partially lost archive. Then using sequential-reading (--sequential-read option) and laxist mode (-al option) you will get to the requested information:

mkdir tempo
cd tempo
ln -s ../sowhere/backup.3.dar
touch backup.1.dar
touch backup.2.dar
dar -l backup --sequential-reading -al

Note however that using the laxist mode skips a lot a sanity checks. This method is to be used as last ressort method upon heavy archive corruption. It is still a good option to test your archive once on destination medium and if possible in addition to add redundancy data using Parchive to be able to repair an archive corrupted due to media problem.

Alternative:
Once missing slices have been replaced by empty files (using the touch command for example),

if you have the last slice of the archive, you can avoid using --sequential-read mode and only use the lax mode (-al option). You can then use the testing operation to known what file can be retrieved from the archive. if you have not the last slice, you must use --sequential-read mode in addition to lax mode (-al option)

If you want to know what particular files a slice contains, you can add the following option:
-E "echo '************* Opening slice %N **********'"

all in one:
touch <archive>.<missing slice>.dar
dar -t <archive> -al -E "echo '****** Opening slice %N ******'" -v > result.txt
less result.txt

Why cannot I merge two isolated catalogues?

Since version 2.4.0, isolated catalogues can also be used to rescue an corrupted internal catalogue of the archive it has been isolated from. For that feature be possible, a mecanism let dar know if an given isolated catalogue and a given archive correspond to the same contents. Merging two isolated catalogues would break this feature as the resulting archive would not match any real archive an could only be used as reference for a differential backup.

Why cannot dar use the full power of my multi-processor computer?

Parallel computing programming is a science by itself. For having done a specialization in that area during my studies, I can explain briefly here the constraints. A program can use several processor if the algorithm it uses is able to be parallelized. Such an algorithm can either statically (at programming time) or dynamically (at execution time) be cut in several independent execution threads. These different execution threads must be as much autonomous as possible between them, if you don't want to have one thread waiting for another (which is not what we want). The constraint is this: if you cannot have different threads with no or very little communication and dependence then parallelization does not worth it.

Back to dar. From a very abstracted point of view, dar works by fetching files from the filesystem and by appending their data in a single file (the archive). For each file, dar records in memory the location of the data and once all files have been treated, this location information (contained in the so called "catalogue") is added at the end of the archive.

One could say that to parallelize file treatment, instead of proceeding file by file, let's do all file at the same time (or rather let's say N files at the same time). OK, but first you would have an important loss of performance at disk level as the disk heads would spend most of the time seeking from one of the N file's data to another of the N file's data. The second point would be that to add a file to the archive you must know the position of the end of the last added file, which is not possible to know in advance because of compression and/or encryption.  thus a given thread would have to wait that another has finished to be able to drop in turn the data of the file it owns... As you can guess, parallelizing this way would bring worse performance than the sequential algorithm.

Another possibility is to have several thread doing :
  • file lookup (report which file are present on filesystem)
  • file filtering (determine which file to save, which file to compress, and so on)
  • file compression
  • file encryption
This would be a bit better, but : File lookup is very fast and does not consume much CPU, as well as file filtering.  Instead, file compression or file encryption are very CPU intensive. Thus, first, if you only use compression OR encryption parallelizing this way will not bring you much extra power as the encryption or the compression are not possible to parallelize (compressing a file is done sequentially, same thing when encrypting it). Rawly you will get the same execution time as the sequential execution. Second if you use no compression and no encryption, your CPU will stay idle most of the time and the time to execute dar will only depend on the speed of your hard disk, so you will not get any improvement here. Last, only if you use both encryption and compression you could gain some performance having parallelization, but dar could only use at most two CPU! no more! And second, the gain of time will be less than 2 (it will not be twice faster, but much less) as for a given amount of data, compression needs much more time to proceed than encryption. Thus the encryption thread will most of the time wait for compressed data.

OK, you have maybe found also another possibility : having N threads for compression and M threads for encryption. Assuming  encryption is faster than compression, we could choose N > M.  We could also have a fixed value for N and a dynamic value for M depending on how fast compression is running. Well, this would let dar be able to compress and encrypt several files at the same time, assuming that reading data and data writing time is negligible compared to compression time (which must be demonstrated as several files have potentially to be read at the same time), we could maybe have a real performance gain. But, ... while several files can now be compressed at the same time, only one can be written to disk at a given time. Thus, during the time the compression of a file has started and the time it has finished all other threads have to keep their compressed data in memory. Then a next thread can drop its data to the archive while all other keep compressing to memory (RAM). We will quickly lack of RAM! Or your computer will start to swap, or you have to store the data back to disk in a temporary file, which file will have to be read again and wrote back to archive. So, doing so will bring huge disk performance degradation, as disk will server for read file's data, writing its compressed data to temporary file, reading back its compressed data, writing its compressed data to archive.

Last, when using parallelization there is a always a cost due to inter-process communication and concurrent I/O operations on the hardware (here, hard disk are used at the same time to read files to backup and to write them into the archive). This cost becomes negligible when the number of parallel thread increase, assuming all thread are well busy ... here there is a bottleneck, which is the archive creation that seems to avoid a real impressive parallelization.

Conclusion, unless you can find another way to parallelize dar, it will not bring noticeable improvement to have a parallelized version of dar. Parallelization is strongly related to the algorithm used, some algorithms are well adapted to this operation some others are not.

Is libdar thread-safe, which way do you mean it is?

libdar is the part of dar's source code that has been rewritten to be used by external programs (like kdar). It has been modified to be used in a multi-threaded environment, thus, *yes*, libdar is thread-safe. However, thread-safe does not mean that you do not have to take some precautions in your programs while using libdar (or any other library).

Let's take an example, considering a simple library that provides two functions that both receive the address of an integer as argument. The first increments the given integer up to an specific user key pressed, while the second decrements the given integer up to another user key pressed. This library is thread-safe in the way that there is no static variable in it nor it has any given state at a particular time. It is just a set of two functions.

Now, your multi-threaded program is the following: at a given time you have one thread running the first library function while another runs the other library function. All will work fine unless you provided to both threads the same integer. One thread would then increment it while the other would decrement it, and you would not have the expected behavior you could get if you were not using multi-threaded environment. The problem would be the same if instead of using an external library you were accessing this same integer from two different threads at the same time.

Care must thus be taken for two different threads not acting on the same variables at the same time. This is however possible with the use of posix mutex, which would define a portion of code (known as a critical section) that cannot be entered by a thread while another one is accessing it (such a thread is suspended until the other thread exits the critical section).

For libdar, this is the same, you must pay attention not having two or more different threads acting on the same data. Libdar provides a set of classes, which can be seen as a set of type (like a C struct) with associated functions (known as methods in the object oriented world). From these classes, your program will create objects: each object *is* a variable. Technically, invoking a method on an object is exactly the same as invoking a function giving it as hidden argument a pointer to the object ; while semantically, invoking a method is a way to read or modify this variable (= the object). Thus, if you plan to act on a given object from several threads at the same time, you must use posix mutex or any other mean to mutually exclude the access to this object between all your threads, this way only one thread may read or modify this variable (=this object) at a given time.

Note that internally libdar uses some static variables. By static variables, I mean variable that exist even when no thread is running a libdar function or method. These variables are enclosed in critical sections for libdar's user may use it normally. In other words, this is transparent to you. For example, to cancel a libdar call, the mechanism uses an array in which the tid (thread id) by which a call is ran must be canceled: If you wish to cancel a libdar call ran by thread 10, another thread will add the tid 10 to this list. At regular checkpoints, all libdar function check that this same list does not contain the tid the call is ran from. If so, the call aborts/returns and the thread can continue its execution out of libdar code. As you see, several thread may read or write this array of tid at the same time. thanks to a set of mutex this is transparent to you and for this reason, libdar can be said to be thread-safe.

How to solve "configure: error: Cannot find size_t type"?

This error shows when you lack support for C++ compilation. Check the gcc compiler has been compiled with C++ support activated, or if you are using gcc binary from a distro, double check you have installed the C++ support for gcc.

Why dar became much slower since release 2.4.0 ?

This is the drawback of new features!
  • Espetially to be able to read dar archive through pipes in sequential mode, dar inserts so-called "escape sequence" to know for example when a new file starts. This way dar can skip to the next mark upon archive corruption or if the given file has not to be restored. However, if such a sequence of byte is found into a file's data, it must be modified not to collide with real escape sequences. This implies to dar to inspect all data added to an archive for such sequence of byte, instead of just copying the data to the archive (eventually compressing and cyphering it).
  • The other feature that brings an important overhead is the sparse file detection mechanism. To be able to detect a hole in a file and store it into the archive, dar needs here too, to inspect each file's data.
You can disable both of these features, using respectively the options -at option, which suppress "tape marks" (just another name for escape sequences), but does not allow the generated archive to be used in sequential read mode, and -1 0 option, which completely disables the sparse file detection. The execution time becomes back the same as the one of dar 2.3.x releases.

How to search for questions (and their answers) about known problems similar to mines?

Before sending an email to the dar-support mailing-list, you are welcome to first look in the already sent email if your problem has not yet been exposed and solved. This will first for you be the fastest way to get an answer to your problem, and for me a way to preserve time for development.

But yes, there is now tones of emails subjects to read to have a chance to have a chance to find the answer to your problem. The most simple way is to use the search engine at gmane

Dar-support  mailing-list is archived at sourceforge *and* at gmane.org  Only this second archive owns a search engine (look there for the green box at the bottom of the page).

This search engine is available for all the mailing list archived at gmane used around dar.

Why dar tells me that he failed to open a directory, while I have excluded this directory?

Reading the contents of a directory is done using the usual system call (opendir/readdir/closedir). The first call (opendir) let dar design which directory to inspect, the dar call readdir to get the next entry in the opened directory. Once nothing has to be read, closedir is called. The problem here is that dar cannot start reading a directory do some treatment and start reading another directory. In brief, the opendir/readdir/closedir system call are not re-entrant.

This is in particular critical for dar as it does a depth lookup in the directory tree. In other words, from the root if we have two directories A and B, dar reads A's contents, the contents of its subdirectories, then once finished, it read the next entry of the root directory (which is B), then read the contents of B and then of each of its subdirectories, then once finished for B, it must go back to the root again, and read the next entry. In the meanwhile dar had to open many directories to get their contents.

For this reason dar caches the directory contents (when it first meet a directory, it read its whole content and stores it in the RAM). This is only after, that dar decide whether to include or not a given directory. But at this point then, its contents has already been read thus you may get the message that dar failed to read a given directory contents, while you explicitly specify not to include that particular directory in the backup.


Dar reports a "SECURITY WARNING! SUSPICIOUS FILE" what does that mean!?

When dar reports the following message:

SECURITY WARNING! SUSPICIOUS FILE <filepath>: ctime changed since archive of reference was done, while no inode or data changed

You should be concerned by finding an explanation to the root cause that triggered dar to ring this alarm. As you probably know, a unix file has three dates:
  1. atime is changed anytime you read the file's contents or write to it (this is the last access time)
  2. mtime is changed anytime you write to the file's data (this is the last modification time)
  3. ctime is changed anythime ou modify the file's attributs (the is the last change time)
In other words:
  • if you only read the data of file, only its atime will be updated
  • if you write some data to a file, its atime and mtime will change, ctime will stay unchanged
  • if you change ownership, permission, extended attributes, etc, only ctime will change
  • if you write to a file and modify its atime or mtime to let think the file has not been read or modified, ctime will change in nay case.
Yes, the point is that in most (if not all) unix systems, over the kernel itself, user program can also manually set the atime and mtime manually to any arbitrary value (see the "touch" command for example), but to my knowledge, no system provides a mean to manually set the ctime of a file. This value cannot thus be faked.

However, some rootkits and other nasty programs that tend to hide themselves from the system administrator use this trick and modify the mtime to become more difficult to detect. However the ctime keeps track of the date and time of their infamy. However, ctime may also change while neither mtime nor atime do, in several almost rare but normal situations. Thus, if you are faced to this message, you should first verify the following points before thinking your system has been infected by a rootkit:
  • have you added or removed a hardlink pointing to that file and this file's data has not been modified since last backup?
  • have you changed this file's extended attributs (including Linux ACL and MacOS file forks) while file's data has not been modified since last backup?
  • have you recently restored your data and are now performing a differential backup taking as reference the archive used to restore that same data? Or in other words, does that particular file has just been restored from a backup (was removed by accident for example)?
  • have you just moved from a dar version older than release 2.4.0 to dar version 2.4.0 or more recent?
How to know atime/mtime/ctime of a file?
  • mtime is provided by the command: ls -l
  • atime is provided by the command : ls -l --time=atime
  • ctime is provided by the command : ls -l --time=ctime
Note: With dar version older than 2.4.0 (by default, unless -aa option is use) once a file has been read for backup, dar set back the atime to the value it had before dar read it. This trick was used to accomodate some programs like leafnode (NNTP caching program) that base their cache purging scheme on the atime of files. When you do a backup using dar 2.3.11 for example, file that had their mtime modified are saved as expected and their atime is set back to their original values (value they had just before dar read them), which has the slide effect to modify the ctime. If then you upgrade to dar 2.4.0 or more recent and do a differential backup, if that same file has not been modified since, dar will see that the ctime has changed while no other metadata did (user, ownership, group, mtime), thus this alarm message will show for all saved files in the last 2.3.11 archive made. The next differential backup made using dar 2.4.0 (or more recent), the problem will not show anymore.

Well, if you cannot find an valid explanation from the one presented above, you'd better consider that your system has been infected by a rootkit or virus and use all the necessary tools (see below for examples) to find some evidence of it.

Unhide
clam anti-virus
and others...

Last point, if you can explain the cause of the alarm and are annoyed by it (you have hundred of files concerned for example ) you can disable this feature adding the "-asecu" switch to the command-line.


Can dar help copy a large directory tree?

The answer is "yes" and even for more than one reason:
  1. Many backup/copy tools do not take care of hard linked inode (hard linked plain files, named pipes, char devices, block devices, symlinks)... dar does,
  2. Many backup/copy tools do not take care of sparse files... dar does,
  3. Many backup/copy tools do not take care of Extended Attributes... dar does,
  4. Many backup/copy tools do not take care of Posix ACL (Linux)... dar does,
  5. Many backup/copy tools do not take care of file forks (MacOS X)... dar does,
  6. Many backup/copy tools do not take any precautions while working on a live system... dar does.
Using the following command will do the trick:

dar -c - -R <srcdir> --retry-on-change 3 | dar -x - --sequential-read -R <dstdir>

<srcdir> contents will be copied to <dstdir> both must exist before running this command, and <dstdir> should be an empty dir.

Here is an example: we will copy the content of /home/my to /home2/my:

first we create the destination directory:
mkdir /home2/my

then we run dar:
dar -c - -R /home/my --retry-on-change 3 | dar -x - --sequential-read -R /home2/my

The "--retry-on-change" let dar retry the copy of a file up to three times if that file has changed at the time dar was reading it. You can increase this number at will. If a file fails to be copied correctly after more than the allowed retry, a warning is issued about that file and it is flagged as dirty in the data flow, the second dar command will then ask you whether you want it to be restored (here copied) on not.

"piping" ('|' shell syntax) the first dar's output to the second dar's input makes the operation not requiering any temporary storage, only virtual memory is used to perform this copy. Compression is thus not requested as it would only slow down the whole process.

last point, you should compare the copied data to the original one, before removing it, as no backup file has been dropped down to filesystem. This can simply be done using:

    diff -r <srcdir> <dstdir>

But, no, diff will not check extended Attributes, File Forks or Posix ACL, hard linked inodes, etc. If you want a more controlable way of copying a large directory, simply use dar with a real archive file, compare the archive toward the original filesystem, restore the archive contents to its new place, and compare the restored filesystem toward the original archive.

Any better idea? Feel free to contact dar's author for an update of this documentation!


Does dar compress per file or the whole archive?


Dar uses compression (gzip, lzo, bzip2) with different level of compression (1 for quick but low compression up to 9 best compression but slower) on a file by file basis. I other words, the compression engine is reset for each new file added into the archive.  When a corruption occurs  in a  file  like a compressed tar archive, it is  not possible to  decompress the data passed that corruption, with tar you loose all file stored after such data corruption.  Having compression per file has instead the advantage to only impact one file inside the archive and all files that are stored before or after such data corruption can still be restored from that corrupted archive.  The drawback is that the overall compression ratio is slightly less good. But note that compressing per file opens the possibility to not compress all files in the archive, in particular already compressed files (like *.jpeg, *.mpeg, some *.avi files and of course the *.gz, *.bz2 or *.lzo files). Avoiding compressing already compressed files save CPU cycles (in other words it speeds up backup process time). And while compressing an already compressed file takes time for nothing, it also leads to require more storage space than if that same file was not compressed a second time.

In brief, beside the possibility to not compress already compressed files, compressing file by file, gives a quite equivalent overall compression ratio than what you get when compressing the archive globally, while it may be faster (depending on the data) and allow to recover any file before or after the file impacted by the data corruption within the archive.

How to activate compression with dar? Use the --compression option (or -z in short), telling the algorithm to use and the compression level (--compression=bzip2:9 or -zgip:7 for example), you may not mention the compression ratio (which default to 9) and even not mention the compression algorithm which default to gzip. Thus -z or -zlzo are correct.

To select file to compress or not compress, several options are available: --exclude-compression (or -Z in short --- the uppercase Z here) --include-compression (or -Y in short). Both take as argument a mask that based on their names define files that have to be compressed or not to be compressed. For example -Z "*.avi" -Z "*.mp?" -Z "*.mpeg" will avoid compressing MPEG, MP3, MP2 and AVI files. Note that dar provides in its /etc/darrc default configuration file, a long list of -Z options to avoid compressing most common compressed files, that you can activate by simply adding compress-exclusion on dar command-line.

In addition to excluding/including files from compression based on their name, you can also exclude small files (for which compression ratio is usually poor) using the --mincompr option which takes a size as argument: --mincompr 1k will avoid compressing files which size is less than or equal to 1024 bytes. You should find all details about these options in dar man page. Check also the -am and -ar options to understand how --exclude-compression and --include-compression interact with each other, or how to use regular expressions in place of glob expressions in masks.


What slice size can I use with dar?

The minimum slice size is around 20 bytes, but you will only be able to store 3 to 4 bytes of information per slice, due to the slice header that need around 15 bytes in each slice. But there is no maximum slice size! In other words you can give to -s and -S options a as long as required positive integer, thanks to its internal own integer type named "infinint" dar is able to handle arbitrarily large integers. This has a slightly memory and CPU penalty in regard to using native computer 32 or 64 bits integers, but has the advantage to provide a long term implementation in dar.

You can make use of suffixes like 'k' for kilo, M for mega, G for giga etc... (all suffixes are listed here) to simplify your work. See also the -aSI and -abinary options to swap meaning between ko (= 1000 octets) kio (= 1024 octets).

Last point dar/libdar can be compiled using the --enable-mode=64 option given to ./configure while building dar. This replaces the "infinint" type by 64 bits integers, for better performances and reduced memory usage. However this has some drawback on archive size and dates. See the limitations for more details.

Is there a dar fuse filesystem?

You can find several applications relying on dar or directly on libdar to manage dar archive, these are referred here as external software because they are not maintained nor have been created by the author of dar and libdar. AVFS is such external software that provides a virtual file system layer for transparently accessing the content of archives and remote directories just like local files.


how dar compares to tar?

Here follows a table that provides comparison on main points between tar and dar, if you find errors or inconsistencies, thanks to report them to dar maintainer.

features
tar (aka Tape ARchive)
dar (aka Disk ARchive)
Licensing
GPL v2 or more recent (GNU tar)
GPL v2 or more recent
initial release
1979
2002
Language
C
C++
can sequentially read archive
yes
yes, since release 2.4.0
can sequentially write archive
yes
yes
can directly and quickly restore a file from a large archive
no
yes
can compress data
not alone, the resulting archive can be compressed with gzip and other compression tools.
yes
can avoid compressing already compressed files
no
yes
can extract files after data corruption
no (yes, if corruption is not noticed by tar)
yes
provide reparation mechanism
no (yes, with external software like Parchive)
no (yes, with external software like Parchive)
provide a mode that help using an strongly corrupted archive without external reparation mechanism
no
yes (known as "lax" mode)
can make full backup
yes
yes
can make differential backup
yes (using GNU tar)
yes
can make incremental backup
yes (using GNU tar)
yes
can make decremental backup
no
yes
can cipher archive with strong encryption
no, (yes, with external tools, that require decrypting the whole archive in order to use it)
yes (blowfish, aes, twofish, camellia, serpent, etc.)
can split an archive in multi-volume
yes, multiple of 1024 octets
yes, at 1 byte resolution.
can define for the initial volume a different size than the following ones
no
yes
let the user define a command to run between volumes
no
yes
can save and restore Extended Attributes
GNU tar: no?
star:  yes
yes
maximum file size in archive
8 Gio
no limit (limit might come but because of the underlying OS)
can detect corruption
only in file headers
yes, in any part of the archive (slice header, archive header, saved file's data, saved file's EA, table of contents).
portable software (exists for Unix, Windows, MaC, etc.)
yes
yes
can properly backup a live filesystem
no
yes
provide a mean to run a command before and after
saving a particular set of files or directories
no
yes
can properly save hard linked inode
only plain files
yes
can detect sparse files and restore them as such
yes
yes
can merge two archives into a new one
no
yes
can update an archive with new files
yes
no
can provide on-fly calculated hash file of volumes (aka slices)
no
yes
can use configuration files
no
yes
can take care of the nodump flag
GNU tar: no?
star: yes
yes
can understand the caching directory tagging standard
yes
yes
provides a dry-run execution mode
no
yes
can provide a summary of an archive
no
yes