Dar Documentation


TUTORIAL




Introduction

This tutorial shows you how to backup your file system (partially or totally) on ZIP drives (you can replace Zip by floppies, or USB key if you prefer) and most important, how to restore your system from scratch in case of hard disk failure (or other cataclysms).

GUI over libdar are nice to use, to backup once.  Scripting using dar is better as it can run in background and be scheduled every day, this is a matter of taste. But you need a robust solution that works also within a minimum environement to be able to restore your system. Dar provides a lot a features to backup (defining compression etc.), thus a GUI or a script is welcome in this process, however only very few are necessary to restore, and in that situation, dar_static (the statically linked version of dar) is to my point of view, the best solution to use.

STEP 1: The FULL backup

We need first to make a full backup, let's go:

The size of a zip drive is 100 MB,
*** here comes the option: -s 100M
This tells dar to not create a single backup file but to split it in several files with a size of at most 100 Megabytes.

On your first ZIP drive we want to copy the dar binary outside the backup to be able to restore it in case of hard disk failure, for example. IMPORTANT: dar binary relies on several libraries which must also be available in the rescue system or copied with the dar binary. But, if you don't want to worry about needed libraries, there is a static version of dar which only difference is that it has all required library included in it (thus it is a slightly larger binary). Its name is "dar_static", and its main reason of existence is to be placed beside backups in case something goes wrong in your system. Note that dar_static is useless for windows, you will always need the Cygwin dll.

(You could also add man pages or a copy of this tutorial, if you are scared not to be able to remember all the many feature of dar ;-) and find the -h option too sparse). Note that all the dar documentation is available on the web. OK you need an Internet access to read them.

This make the free space on the first ZIP floppy a bit smaller: 95 MB.
*** Here comes the option: -S 95M
(Note that '-s' is lowercase for all the slices, and '-S' is UPPERCASE meaning the initial slice only).

We need to pause between slices to change the ZIP floppy when it is full
*** here comes the option: -p -b
telling dar to pause before writing a new slice (-p) and to ring the terminal bell (-b) when user action is needed.

We will compress data inside the backup
*** here comes the option: -z
by default -z option uses gzip compression algorithm (bzip2 and lzo are also available). Optionally, if speed is more important than archive size, you can degrade compression specifying the compression level : -z1 for example. 

Now, we want to backup the whole file system.
*** here comes the option: -R /
This tells dar that no files out of this directory tree will be saved (here, it means that no files will be excluded from the backup, if no filter is specified, see below) here "R" stands for "Root".

There are some files you don't want to backup like backup files generated by emacs "*~" and .*~".
*** here comes the options: -X "*~" -X ".*~"
Note that you have to quote the mask for it not to be interpreted by the shell, the -X options (X for eXclude) do not apply to directories, nor to path, they just apply to filenames. (see also -I option (I for Include) in man page for more information)

Among these files are several sub-trees you must not save: the /proc file system for example, as well as the /dev/pts and /sys . These are virtual file systems, saving them would only make your backup bigger filled with useless stuff.
*** here come the options: -P dev/pts -P proc -P sys
Note that path must be relative to -R option (thus no leading '/' must be used) Unlike the -X/-I options, the -P option (P for "prune") can apply to a directory. If a directory matches -P option, all its subdirectory will also be excluded. note also that -P can receive wildcards, and they must be quoted not to be interpreted by the shell: -P "home/*/.mozilla/cache" for example. Lastly, -P can also be used to exclude a plain file (if you don't want to exclude all files of a given name using -X option): -P home/joe/.bashrc for example would only exclude joe's .bashrc file not any other file, while -X .bashrc would exclude any file of that name including joe's file. (see also -g, -[, -] options in man page for more, as well as the "file selection in brief" paragraph)

More importantly we must not save the backup itself:
*** here comes the option: -P mnt/zip
assuming that your ZIP is mounted under /mnt/zip . We could also have excluded all files of extension "dar" which are backup generated by dar using -X "*.*.dar", but this would have  also exclude other dar archive from the backup, which may not always fit your need.

Now, as we don't save the /dev/pts /proc and /mnt/zip directories, we would have to create these directory mount-points by hand at recovery time to be able to mount the corresponding filesystems. But we can better use the -D option, which does not totally ignore excluded directories but rather stores them as empty.
*** here comes the option -D
thus at recovery time they will be generated automatically

Lastly, we have to give a name to this full backup. Let's call it "linux_full" as it is supposed to take place on the ZIP drive, its path is /mnt/zip/linux_full
*** here comes the option: -c /mnt/zip/linux_full
Note that linux_full is not a complete filename, it is a "basename", on which dar will add a number and an extension ".dar", this way the first slice will be a file of name linux_full.1.dar located in /mnt/zip

Now, as we will have to mount and umount the /mnt/zip file system, we must not have any process using it, in particular, dar current directory must no be /mnt/zip so we change to / for example.

All together we follow this procedure for our example:

Put an empty ZIP floppy in the device, and mount it according to your /etc/fstab file.

mount /mnt/zip

Copy the dar binary to the first zip drive (to be able to restore in case of big problem, like a hard disk failure) and eventually man pages and/or this tutorial.

cp `which dar_static` /mnt/zip

then, type the following:

cd /
dar -c /mnt/zip/linux_full -s 100M -S 95M -p -b -z -R / -X "*~" -X ".*~" -P dev/pts -P sys -P proc -P mnt/zip -D

Note that option order has no importance. Some options may be used several times (-X, -I, -P) some others cannot (see man page for more). When the first slice will be done, DAR will pause, ring the terminal and display a message. You will have to unmount the floppy

umount /mnt/zip

eject and replace the floppy by an empty new one and mount it

mount /mnt/zip

To be able to do that, you can swap to another virtual console pressing ALT+F? keys (if under Linux), or open another xterm if under X-Windows, or suspend dar by typing CTRL-Z and reactivating it after mounting/unmounting by typing `fg' (without the quotes).

Then proceed with dar for the next slice, pressing the <enter> key.

Dar will label slices this way:
slice 1: linux_full.1.dar
slice 2: linux_full.2.dar
and so on.

That's it! We have finished the first step, it may take a long time depending on the size of the data to backup. The following step (differential backup) however can be done often, and it will stay fast every time (OK, except if a big part of your system has changed, in that case you can consider making another full backup).

Just a little check on the archive you've just made: suppose you want to read the content of the backup you made, you would have
to run:

dar -l /mnt/zip/linux_full

It is recommended to either check the archive contents, or compare what's stored in it with the current file system before relying on it:

dar -t /mnt/zip/linux_full

will check the whole archive, while

It is recommended to first unmount and remount removable disk, this to flush the cache. Else you may read data from cache (in memory) and do not detect an error on you disk. dar -t cannot check a single slice, it checks all the archive. If you need to check a single slice (for example after burning it on CD) you can use the diff command : for example, you burn the last completed slices on CD-R, but have just enough free space to store one slice on disk. You can thus check the slice typing something like:

diff /mnt/cdrom/linux_full.132.dar /tmp/linux_full.132.dar

where 132 has to be replaced by the real slice number.

You can also add the --hash command when you create the archive (for example --hash md5), it will produce for each slice a small hash file named after the slice name "linux_full.1.dar.md5", "linux_full.2.dar.md5", etc. Then using the unix standard command "md5sum" you can check the integrity of the slice :

md5sum -c linux_full.1.dar.md5

If all is ok for the slice on the zip disc (which is when diff does not complain or md5sum returns "OK"), you can delete the slice from the hard disk (/tmp/slice.x.dar), and continue with dar. Else, you will have to burn/write the slice on a new disk or retry on the same.

Instead of testing the whole archive you could also compare it with the just saved system:

dar -d /mnt/zip/linux_full -R /

will compare the archive with filesystem tree located at / . Same remark as previously, it is recommended to first unmount and mount the floppy to flush the system cache.

STEP 2: DIFFERENTIAL BACKUP

The only thing to add is the base name of the backup we take as reference
*** here comes the option: -A /mnt/zip/linux_full

Of course, we have to choose another name for that new backup, let's call it linux_diff1
*** here comes the option: -c /mnt/zip/linux_diff1

Last point: if you want to put the new backup at the end of the full backup, you will have to change the -S option according to the remaining space on the last disk. suppose the last slice of linux_full takes 34MB you have 76MB available for the first slice of the differential backup (and always 100MB for the following ones),
*** here comes the option: -S 76M
but if you want to put the backup on a new floppy, just forget the -S option.

here we also want to produce a hash file to test each slice integrity before removing it from hard disk:
*** here comes the option: --hash md5

All together we get:

dar -c /mnt/zip/linux_diff1 -A /mnt/zip/linux_full -s 100M -S 76M -p -b -z -R / -X "*~" -X ".*~" -P dev/pts -P proc -P mnt/zip -P sys -D --hash md5

The only new point is that, just before effectively starting to backup, dar will ask for the last slice of the archive of reference (linux_full), then dar will pause (thanks to the -p option) for you to change the disk and put the one where you want to write the new backup's first slice, then pause again for you to change the disk for the second slice and so on.

STEP 3: ENDLESS DIFFERENTIAL BACKUP

You can make another differential backup, taking linux_diff1 as reference, in this case you would change only the following

-c /mnt/zip/linux_diff2 -A /mnt/zip/linux_diff1

You could also decide to change of device, taking a 1'44MB floppy or a CD-R,  this would not cause any problem at all. After some time when you get many differential backup for a single full backup, you will have to make a new full backup, depending on your available time for doing it, or on your patient if one day you have to recover the whole data after a disk crash: You would then have to restore the full backup, then all the following differential backup up to the most recent one. This requires more user intervention than restoring a single full backup, all is a matter of balance, between the time it takes to backup and the time it takes to restore.

Note, that starting release 1.2.0 a new command appeared that helps restoring a small set of file from a lot a differential backup. Its name is dar_manager. See at the end of this tutorial and man page for more.

Another solution, is when you have too much differential backup, is to make the next differential backup taking the last full_backup as reference, instead of the last differential backup done. This way, it will take less time than doing a full backup, and you will not have to restore all intermediate differential backup. Some people make difference between "incremental" backup and "differential" backup. Here for dar, they look like the same, it just depends on the nature of the reference backup you take.

Of course, a given backup can be used as reference for several differential backup, there is no limitation in number nor in nature (the reference can be a full of differential backup).


STEP 4: RECOVER AFTER A DISK CRASH

Sorry, it arrived, your old disk has crashed. OK, you are happy because you have now a good argument to buy the very fast and very enormous very lastest hard disk available. Usually, you also cry because you have lost data and you will have to reinstall all your system, that was working for so long!

If however the last backup you made is recent, then keep smiling! OK, you have installed your new hard disk and configured you BIOS to it (if necessary). You will need a bootable floppy, with a minimum Linux system on it, that allows you to access your zip drive and your new empty hard disk (in the case your backup resided on ZIP disk). For example use the Slackware floppy disks, they are nicely done. You don't need to install something on your brand-new disk, just make partitions and format as you want: We suppose your new disk is /dev/hda and /dev/sga is your ZIP drive.

1. Create the partition table as you wish, using
fdisk /dev/hda

2. Format the partition which will receive your data, dar is filesystem independent, you can use ext2 (as here in the example), ext3, ext4, ReiserFS, Minix, UFS, HFS Plus, XFS, whatever is the Unix-like filesystem you need, even if the backed up data did not reside on such filesystem at backup time!
mke2fs /dev/hda1

3. Additionally format the swap partition (if needed)
mkswap -c /dev/hda2

3bis. If you have a lot of file to restore, you can activate the swap on the partition of your new hard drive:
swapon /dev/hda2

4. Now we must mount the hard disk, somewhere.

cd /
mkdir disk
mount -t ext2 /dev/hda1 /disk

would do the trick

4bis. If you want to restore your system over several partitions like /usr /var /home and / you must create the partitions, format them. Then create the directories that will be used as mounting point an mount the partitions on these directories:

mkdir /disk/usr /disk/var /disk/home
mount -t ext2 /dev/hda2 /disk/usr
mount -t ext2 /dev/hda3 /disk/var
mount -t ext2 /dev/hda4 /disk/home

for example if you have / , /usr , /var and /home partitions.

5. We need to copy the dar binary from the ZIP to your disk: insert the floppy ZIP containing the dar_static binary to be able to freely change of ZIP disk later on:

cd /

mkdir /zip
mount -t ext2 /dev/sga /zip
cp /zip/dar_static /disk

where /dev/sga points to your zip drive, we will remove dar_static from your new hard drive at the end of restoration.

6. Now we can restore the archive. The stuff has to go in /disk subdirectory
*** here comes the option: -R /disk

7. The process may be long, thus it might be useful to be noticed when a user action is required by dar.
*** here comes the option: -b
note that -p option is not required here because if a slice is missing dar will pause and ask you its number. If slice "0" is requested, it means the "last" slice of the backup.

let's go restoring!
/disk/dar_static -x /zip/linux_full -R /disk -b

... and when the next zip floppy is needed,
umount /zip

change the floppy and mount it:
mount -t ext2 /dev/sga /zip

as previously, use an alter xterm / virtual console or suspend dar by CTRL-Z and awake it back by the 'fg' command. Then press <enter> to proceed with dar


7. Once finished with the restoration of linux_full, we have to do the same with any following differential/incremental backup. However, doing so will warn you any time dar restores a more recent file (file overwriting) or any time a file that has been removed since the backup of reference, has to be removed from file system (suppression). If you don't want to press the <enter> key several thousand times:
*** here comes the option: -w
(don't warn). All file will be overwritten without warning, but you may also use the -r option, that will avoid trying to overwrite more recent files than those on filesystem. It might not be of a great use here, as you restore a differential backup after its reference backup on an initially empty disk (file stored in the differential archive are more recent than those in the reference). But, it might be useful in some other situations.

All together it makes:
/disk/dar_static -x /zip/linux_diff1 -R /disk -b -w

Then any additional archive:
/disk/dar_static -x /zip/linux_diff2 -R /disk -b -w
...
/disk/dar_static -x /zip/linux...    -R /disk -b -w


8. Finally, remove the dar binary from the disk:
rm /disk/dar_static

9. And launch lilo for your Linux box to boot properly (if needed):
lilo -r /disk

If your boot loader is grub, simply launch grub and type at the prompt something like this:

grub> root (hd0,0)

10. You can reboot you machine and be happy with you brand-new hard disk with your old precious data on it:
shutdown -r now

OK, one day, I will make something like a bootable floppy image with dar inside, maybe with a simple script for user interaction... if you have already done it, you can tell or send me (or give me the URL where to get it, for I add a link to it from DAR's Homepage). [Note: Knoppix seems to include dar]

STEP 4(bis): recover only some files

Gosh, you have remove a important file by error. Thus, you just need to restore it, not the rest of the full and differential backups.

a) First method:

We could as previously, try all archive starting from the full backup up to the most recent differential backup, and restore just the file if it is present in the archive:

dar -R / -x /zip/linux_full -g home/denis/my_precious_file

This would restore only the file /home/denis/my_precious_file from the full backup.

OK, now we would also have to restore from all differential backup the same way we did. Of course, this file may have changed since the full backup.

dar -R / -x /zip/linux_diff1 -g home/denis/my_precious_file

and so on, up to the last differential archive.

dar -R / -x /zip/linux_diff29 -g home/denis/my_precious_file


b) Second method (more efficient):

We will restore our lost file, starting from the most recent differential backup and *maybe* up to the full backup. Our file may or may not be present in the a differential archive as it may have changed or not since the previous version, thus we have to check if our file is restored, using the -v option (verbose):

dar -R / -x /zip/linux_diff29 -v -g home/denis/my_precious_file

If we can see a line like

restoring file: /home/denis/my_precious_file

Then we stops here, because we got the most recent backup version of our lost file. Otherwise we have to continue with the previous differential backup, up to the full backup if necessary. This method has an advantage over the first one, which is not to have *in all case* the need to use all the backup done since the full backup.

If you are lazy (as I am) on the other hand, have a look at dar_manager (at the end of the tutorial)

OK, now you have two files to restore. No problem, just do the second method but add -r option not to override any more recent file already restored in a previous step:

dar -x /zip/linux_diff29 -R / -r -v -g home/denis/my_precious_file -g etc/fstab

Check the output to see if one or both of your files got restored. If not, continue with the previous backup, up to the time you have seen for each file a line indicating it has been restored. Note that the most recent version of each files may not be located in the same archive, thus you might get /etc/fstab restored from linux_diff28, and /home/denis/my_precious_file restored at linux_diff27. In the case /etc/fstab is also present in linux_diff27 it would not have been overwritten by an older version, thanks to the -r option.

This option is very important when restoring more than one file using the second method. Instead, in the first method is used (restoring first from the full backup, then from all the following differential backups), -r option is not so important because if overwriting occurs when you restore lost files, you would only overwrite an older version by a newer.

Same thing here, even if you are not lazy, dar_manager can help you a lots here to automate the restoration of a set of file.

ISOLATING A CATALOGUE

We have seen previously how to do differential backups. Doing so, dar asks the last slice of the archive of reference. This operation is required to read the table of contents (also known as "catalogue" [this is a French word that means "catalog" in English, I will keep this French word in the following because it is also the name of the C++ class used in libdar]) which is located at the end of the archive (thus on the last slice(s)). You have the possibility to isolate (that's it to extract) a copy of this table of content to a small file. This small file is quite exactly the same as a differential archive that holds no data in it. Let's take an example with the full backup we did previously to see how to extract a catalogue:

    dar -C /root/CAT_linux_full -A /mnt/zip/linux_full


Note here that we used the UPPERCASE 'C' letter, by opposition the the lowercase 'c' which is used for archive creation, here we just created an isolated catalogue, which is usually a small archive. In addition, you can use -z option to have it compressed, -s and -S option to have it split in slices, -p option, -b option, but for an isolated catalogue this is not often necessary as it is usually rather small. The only thing we have seen for backup that you will not be able to do for isolation is to filter files (-X, -I, -g, -P, -[ and -] option are not available for that operation).

So what, now we have our extracted catalogue, what can we do with it? Two things:

First, we can use the extracted catalogue in place of the archive, as reference for a differential backup. No need to manipulate the old zip disks, you can store the last's backup isolated catalogue on your hard disk instead. If we had used an isolated catalogue in the previous examples, we would have built our first differential backup this way (note that here we have chose to use the CAT_ prefix to indicate that the archive is an isolated catalogue, but the choice is yours to label isolated catalogue the way you want):

    dar -c linux_diff1 -A /root/CAT_linux_full ... (other options seen above stay the same)

Second, we can use the isolated catalogue as backup of the internal catalogue if it get corrupted. Well to face to data corruption the best solution ever invented is Parchive, an autonomous program that builds parity file (same mechanism as the one used for RAID disks) for a given file. Here we can use Parchive to create a parity file for each slice. So, assuming you lack Parchive, and that you failed reading the full backup because the zip disk is corrupted in the part used by the internal catalogue, you can use an isolated catalogue as rescue:

    dar -x linux_full -A /root/CAT_linux_full ...
    dar -d linux_full -A /root/CAT_linux_full ...
    dar -t linux_full -A /root/CAT_linux_full ...
    dar -l /root/CAT_linux_full

An isolated catalogue can be built for any type of archive (full, differential or incremental archive, even for an already isolated catalogue, which I admit is rather useless). You can also create an isolated catalogue at the same time you do a backup, thanks to the -@ option:

    dar -c linux_diff1 -A /mnt/zip/linux_full -@ CAT_linux_diff1 ... (other options...)
    dar -c linux_full -@ CAT_linux_full ... (other options see above stay the same for backup)

This is know as "on-fly" isolation.

DAR_MANAGER TUTORIAL

dar_manager builds a database of all your archive contents, to automatically restore the latest versions of a given set of files. Dar_manager is not targeted to the restoration a whole filesystem, the best ways to restore a whole filesystem has been described above and do not use dar_manager. So let's use dar_manager to restore a set of files. First, we have to create a "database" file :

dar_manager -C my_base.dmd

This created a file "my_base.dmd" where dmd stands for Dar Manager Database, but you are free to use any other extension.

This database is created empty. Each time you make a backup, may it be full or differential, you will have to add its table of contents (aka "catalogue") to this database using the  following command:

dar_manager -B my_base.dmd -A /mnt/zip/linux_full

This will add ("A" stands for "add") the archive contents to the base. In some cases you may not have the archive available but its extracted catalogue instead. Of course, you can use the extracted catalogue in place of the archive!

dar_manager -B my_base.dmd -A ~/Catalogues/CAT_linux_full

The problem however is that when dar_manager will need to recover a file located in this archive it will try to open the archive ~/Catalogue/CAT_linux_full for restoration, which does not contain any data because it is just the catalogue of the archive.

No problem if you made this mistake, thanks to the -b option we can change the basename of the archive, and thanks to the -p option you can change the path at any time. But first we will list the database contents:

dar_manager -B my_base.dmd -l

It shows the following:


dar path    :
dar options :

archive #   |    path      |    basename
------------+--------------+---------------
        1       /home/denis/Catalogues      CAT_linux_full

We should change the path of archive number 1 for dar_manager looks on the zip drive:

dar_manager -B my_base.dmd -p 1 /mnt/zip

and also replace the name of the extracted catalogue by the real archive name

dar_manager -B my_base.dmd -b 1 linux_full

Now we have exactly the same database as if we had use the real archive instead of its catalogue:

dar_manager -B my_base.dmd -l


dar path    :
dar options :

archive #   |    path      |    basename
------------+--------------+---------------
        1       /mnt/zip     linux_full


In place of using -b and -p options, you can also tell the path and the name of the real archive to use at restoration time this way when you add the catalogue to the database:

dar_manager -B my_base.dmd -A ~/Catalogues/CAT_linux_full /mnt/zip/linux_full

This is done adding an optional argument. The first ~/Catalogue... is the archive where to read the catalogue from, and the second /mnt/zip... is the name to keep for it. No access is done to this second archive at the time of the addition, thus it may stay unavailable at the time the command is typed.

You can add up to 65534 archives to a given database, and have as much base as you want.

Note that we did not yet gave important options in the database to be passed to dar. For example, you will likely restore from the root of your filesystem, therefor dar when called from dar_manager must get the "-R /" option. This is done with:

dar_manager -B my_base.dmd -o -R /

All that follows -o is passed to dar as-is. You can see the options passed to dar when listing the database contents (-l option).

Let's suppose that after each backup you took the time to update your database, and now you just have removed an important file by error.


Now, we can restore our /home/denis/my/precious/file :

dar_manager -B my_base.dmd -r home/denis/my/precious/file

dar_manager will find the proper archive to use, and call dar with the following options:

dar -x <archive> -R / home/denis/my/precious/file

Which in turn will ask you the corresponding slices. If you want to restore more files at a time or even a directory tree, you can add several arguments after -r option of dar_manager:

dar_manager -B my_base.dmd -r home/denis/my/precious/file etc/fstab home/joe

Once an archive become obsolete you can delete it from the database thanks to the -D option, you can also change archive order (-m option), get a list in which is located a given file (-f option), get the list of most recent files in a given archive (-u option), and get overall statistics per archive (-s option). Lastly you can specify which dar command to use given its path (-d option), by default, dar_manager uses the PATH shell variable to choose the dar command.

A new feature for those that are really very lazy: dar_manager has an interactive mode, so you don't have to remeber all these command-line switch except one:

dar_manager -B my_base.dmd -i

Interactive mode allow you to do all operation except restoration which can be done as previously explained.

TO GO FURTHER WITH DAR/LIBDAR

Well, we have reached the end of this tutorial, but dar/libdar has still a lot of features to be discovered:
- strong encryption
- archive merging
- decremental backup
- dar command-line files (DCF)
- user commands between slices (and DUC files)
- Extended Attribute manipulations
- hard links
- Sparse files
- remote backup over ssh
- suspending/resuming a database from dar before/after backing it up
- using regex in place of glob expressions in masks
- using dar with tape thanks to the sequential reading mode
- having dar adding padded zeros to slice numbers
- excluding some files from compression
- asking dar to retry saving a file if it changes a the time of the backup
- what is a "dirty" files in a dar archive
- listing an archive contents under XML format
- using conditional syntax in DCF files
- using user targets
- adding user comments in dar archive
- using DAR_DCF_PATH and DAR_DUC_PATH environment variables

all this is described in much details in the following documents:
FAQ , mini-howto, command-line usage notes, man pages. You can find out more precisely where, using the feature description page. However if you find some thing unclear, feel free to report or ask for help on dar-support mailing-list.

Well, English is not my mother tong and I have not the pretention to perfectly speak or write it, while I do my best to produce something correctly written. Thus, if you find some weird english sentences, spelling or typo errors, feel free to send me your feedback. You can use dar-support mailing-list or contact me directly (read the AUTHOR file from the dar source package to find out how to contact me).

Denis Corbin