Dar Documentation

Tutorial

Introduction

This tutorial shows you how to backup your file system (partially or totally) on USB key (thing works the same with harddisks or cloud storage), but we will keep USB keys for simplicity. Most important, we will also see how to restore your system from scratch in case of hard disk failure (or other cataclysms).

Note:
This document has been initially written circa 2003, so don't pay attention to the usage of old hardware it mentions, the dar usage stay the same with modern removable media or cloud storage, and the document has been updated with recent features as if those old stuffs were still of actuality :-)

In the following, for each feature we will use, you will find the description of what it does followed by the way to activate it both using its the long options and its the short option. Of course, that's up to you to use either the short or the long opton (but not both at the same time for a particular feature). Short option begin by a single dash (-) and have only a single letter to identify them like -s. Long option begins with two dashes (--) and usually have a descriptive word to identify them: --slice.

Short and long option may have no argument (-D), may have a mandatory argument which is the word following the option (-s 1M) and some rare ones may have an optional argument, leading the option to either be alone -z or sticked with its optional argument -zlz4, which for long option is done by mean of the equal signe (=): --compression=lz4

The FULL backup

We need first to make a full backup, let's go:

Now, as we will have to mount and umount the /mnt/usb file system, we must not have any process using it, in particular, dar current directory must no be /mnt/usb so we change to / for example.

All together we follow this procedure for our example:

To be able to do that, you can swap to another virtual console pressing ALT+F? keys (if under Linux), or open another xterm if under X-Windows, or suspend dar by typing CTRL-Z and reactivating it after mounting/unmounting by typing `fg' (without the quotes).

Then proceed with dar for the next slice, pressing the <enter> key. Dar will label slices this way:

That's it! We have finished the first step: the backup, it may take a long time depending on the size of the data to backup. The following step (differential backup) however can be done often, and it will stay fast every time (OK, except if a big part of your system has changed, in that case you can consider making another full backup).

Test your Backups!

There is so many reason a backup can be useless, it may be human error, saturated disk, lack of permission, and so on. The best test is to restore the data at least once. But there are some more quick way (though less exhaustive) to test a backup:

Check the backup content

This one is usually quick, you know the backup is readable but have to verify that all expected files are present in the output:

dar -l /mnt/usb key/linux_full

Testing the backup

One step further you can let dar try to restore everything without effectively restoring anything, (this mimics the cat > /dev/null paradigm). Doing so you validate the data and metadata of all files is not corrupted. This is usually a good thing to add in your backup script (or more generally your backup process):

dar -t /mnt/usb key/linux_full

If using removable media of poor quality, it is recommended to first unmount and remount removable disk, this to flush the system cache. Else you may read data from cache (in memory) and do not detect an error on you disk. dar -t cannot check a single slice, it checks all the archive. If you need to check a single slice, (for example after burning it on DVD-RW, you can use the diff command: for example, you have burnt the last completed slices on DVD-RW, but have just enough free space to store one slice on disk. You can thus check the slice typing something like this:

diff /mnt/cdrom/linux_full.132.dar /tmp/linux_full.132.dar

You can also add the --hash command when you create the backup (for example --hash md5), it will produce for each slice a small hash file named after the slice name "linux_full.1.dar.md5", "linux_full.2.dar.md5", etc. Then using the unix standard command "md5sum" you can check the integrity of the slice:

md5sum -c linux_full.1.dar.md5

If all is ok for the slice on target medium (diff does not complain or md5sum returns "OK"), you can continue for dar to proceed with the next slice.

Compare the backup content with filesystem

instead of testing the whole archive you could also compare it with the just saved system:

dar -d /mnt/usb key/linux_full -R /

This will compare the archive with filesystem tree located at / . Same remark as previously, it is recommended to first unmount and mount the removable media to flush the system cache.

If you backup a live filesystem, you may prefer 'testing' an archive as it will not issue errors about files that changed since the backup was made, but if you are archiving files, diffing is probably a better idea as you really compare the content of the files and you should not experiment file changes on data you are archiving as most of the time such data about to be archived is old steady data that is not likely to change.

Differential backups

The only thing to add is the base name of the backup we take as reference: --ref /mnt/usb/linux_full or -A /mnt/usb/linux_full

Of course, we have to choose another name for that new backup, let's call it linux_diff1: --create /mnt/usb/linux_diff1 or -c /mnt/usb/linux_diff1

Last point: if you want to put the new backup at the end of the full backup, you will have to change the -S option according to the remaining space on the last usb key. suppose the last slice of linux_full takes 34MB you have 76MB available for the first slice of the differential backup (and always 100MB for the following ones): --first-slice 76M or -S 76M

but if you want to put the backup on a new usb key, just forget the -S option.

here we also want to produce a hash file to test each slice integrity before removing it from hard disk (md5, sha1, sh512 are the available hash algorithm today): --hash md5 or -3 md5

All together we get:

dar -c /mnt/usb/linux_diff1 -A /mnt/usb key/linux_full -s 100M -S 76M -p -b -z -R / -X "*~" -X ".*~" -P dev/pts -P proc -P mnt/usb key -P sys -D --hash md5

The only new point is that, just before effectively starting to backup, dar will ask for the last slice of the archive of reference (linux_full), then dar will pause (thanks to the -p option) for you to change the disk if necessary and put the one where you want to write the new backup's first slice, then pause again for you to change the disk for the second slice and so on.

Endless Differential Backups

You can make another differential backup, taking linux_diff1 as reference (which is called an incremental backup, while a differential backup has always the a full backup as reference). In this case you would change only the following: -c /mnt/usb/linux_diff2 -A /mnt/usb key/linux_diff1

You could also decide to change of device, taking 4,4 GiB DVD-RAM... or maybe rather something more recent and bigger if you want, this would not cause any problem at all.

After some time when you get many incremental backups for a single full backup, you will have to make a new full backup, depending on your available time for doing it, or on your patient if one day you have to recover the whole data after a disk crash: You would then have to restore the full backup, then all the following incremental backup up to the most recent one. This requires more user intervention than restoring a single full backup, all is a matter of balance, between the time it takes to backup and the time it takes to restore.

Note, that starting with release 1.2.0 a new command appeared that helps restoring a few files from a lot a differential backup. Its name is dar_manager. See at the end of this tutorial and the associated man page for more.

Another solution, is when you have too much incremental backup, is to make the next backup a differential backup taking the last full_backup as reference, instead of the last differential backup done. This way, it will take less time than doing a full backup, and you will not have to restore all intermediate differential backup.

For dar, there is not difference in structure between a differential backup (having a full backup as reference) and an incremental backup (having a differential or another incremental backup as reference). This is just the way you chose the backup of reference that let you use two different words naming differently what dar considers of the the kind.

Of course, a given backup can be used as reference for several differential backup, there is no limitation in number nor in nature (the reference can be a full of differential backup).

Yet another solution is to setup decremental backups, this is let you have the full backup as the most recent one and the older ones as difference from the backup done just after them... but nothing is perfect, doing so takes much more time than doing full backup at each step but as less storage space as doing incremental backups and restoration time is as simple as restoring a full backup. here too all is a matter of choice, taste and use case.

Recovering after a disk crash

Sorry, it arrived, your old disk has crashed. OK, you are happy because you have now a good argument to buy the very fast and very enormous very lastest hard disk available. Usually, you also cry because you have lost data and you will have to reinstall all your system, that was working  so well and for so long!

If however the last backup you made is recent, then keep smiling! OK, you have installed your new hard disk and configured you BIOS to it (well at ancient time it was necessary to manually setup the BIOS with the new disk, today you can forget it).

  1. You first need to boot your new computer with the empty disk in order to restore your data onto it. For that I would advise using Knoppix or better system rescue CD that let you boot from CD or USB key. You don't need to install something on your brand-new disk, just make partitions and format them as you want (we will detail that below). You may even change the partition layout add new ones or merge several ones into a single one: what is important is that you setup each one with enough space to hold the data to be restored in them: We suppose your new disk is /dev/sda and /dev/sdb is your removable media drive (USB key, DVD device, ...) For clarity, in the following we will keep assuming it to be a set of USB keys, it could be CD, DVD, or other disk you would do slightly the same.

  2. Create the partition table as you wish, using fdisk /dev/sda or gdisk /dev/sda for a more versatil and modern partition table.

  3. Format the partition which will receive your data, dar is filesystem independent, you can use ext2 (as here in the example), ext3, ext4, ReiserFS, Minix, UFS, HFS Plus, XFS, whatever is the Unix-like filesystem you want, even if the backed up data did not reside on such filesystem at backup time! mke2fs /dev/sda1

  4. copy and record in a temporary file the UUID of the generated filesystem if the /etc/fstab we will restore in the next steps rely in that instead of fixed path (like /dev/sda1 or /dev/mapper/...). You can also retrieve the UUID calling blkid

  5. Additionally if you have created it, format the swap partition and also record the generated UUID if necessary: mkswap -c /dev/sda2

  6. If you have a lot of file to restore, you can activate the swap on the partition of your new hard drive: swapon /dev/hda2

  7. Now we must mount the hard disk

    cd / mkdir disk mount -t ext2 /dev/hda1 /disk
  8. As an alternative, If you want to restore your system over several partitions like /usr /var /home and / , you must create the partitions, format them as seen above and then create the directories that will be used as mounting point an mount the partitions on these directories. For example if you have / , /usr , /var and /home partitions this would look like this:

    mkdir /disk/usr /disk/var /disk/home mount /dev/sda2 /disk/usr mount /dev/sda3 /disk/var mount /dev/sda4 /disk/home
  9. If the boot system used does not already include dar/libdar (unlike system rescue CD and Knoppix for example) we need to copy the dar binary from a removable medium to your disk: insert the USB key  containing the dar_static binary to be able to freely change of key later on:

    cd / mkdir /usb_key mount /dev/sdb /usb_key cp /usb_key/dar_static /disk

    where /dev/sdb points to your usb_key drive (run "dmesg" just after plugging the key to know which device to use in place of the fancy /dev/sdb). We will remove dar_static from your new hard drive at the end of restoration.

  10. All the restored data has to go in /disk subdirectory: -R /disk

  11. The process may be long, thus it might be useful to be noticed when a user action is required by dar: -b note that -p option is not required here because if a slice is missing dar will pause and ask you its number (If slice "0" is requested by dar, it means the "last" slice of the backup is requested).

  12. OK, now we have seen all the options, let's go restoring!

    /disk/dar_static -x /usb_key/linux_full -R /disk -b
  13. ...and when the next USB key is needed:

    umount /usb_key

    ...then unplug the key, plug the next one and mount it:

    mount /dev/sdb /usb_key

    As previously, to do that either use an second xterm virtual console or suspend dar by CTRL-Z and awake it back by the 'fg' command. Then press <enter> to proceed with dar

  14. Once finished with the restoration of linux_full, we have to do the same with any following differential/incremental backup. However, doing so will warn you any time dar restores a more recent file (file overwriting) or any time a file that has been removed since the backup of reference, has to be removed from file system (suppression). If you don't want to press the <enter> key several thousand times: -w option (don't warn). All file will be overwritten without warning, and this is not an issue as be restore more recent data over older one.

  15. All together for each potential differential backups, we have to call:

    /disk/dar_static -x /usb_key/linux_diff1 -R /disk -b -w /disk/dar_static -x /usb_key/linux_diff2 -R /disk -b -w /disk/dar_static -x /usb_key/linux...... -R /disk -b -w
  16. Finally, remove the dar binary from the disk:

    rm /disk/dar_static
  17. and we have to modify the /etc/fstab with the new UUID you have recorded (use the blkid command to get them listed and modify /etc/fstab if necessary)

  18. Last, reinstall you original boot loader from the restored data:

    If you still use lilo type: lilo -r /disk

    If your boot loader is grub/grub2 type:

    update-initramfs -u update-grub grub-install /dev/sda
  19. You can reboot you machine and be happy with you brand-new hard disk with your old precious data on it:

    shutdown -r now

In this operation dar in particular restored sparse files and hard linked inodes, thus you will have no drawback and even possibly better space usage than the original filesystem as dar can even transparently convert big plain files into smaller sparse files without any impact

The Flexibly Restoring a whole system with dar document goes one step further in this direction by illustrating many use cases like, the use of LVM, LUKS encrypted filesystems, even the full restoration of a Proxmox Virtual Environment system with all its Virtual Machines

Recover only some files

Gosh, you have remove a important file by error. Thus, you just need to restore it, not the rest of the full and differential backups.

First method:

We could as previously, try all archive starting from the full backup up to the most recent differential backup, and restore just the file if it is present in the archive:

dar -R / -x /usb/linux_full -g home/denis/my_precious_file

This would restore only the file /home/denis/my_precious_file from the full backup.

OK, now we would also have to restore from all differential backup the same way we did. Of course, this file may have changed since the full backup.

dar -R / -x /usb/linux_diff1 -g home/denis/my_precious_file

and so on, up to the last differential archive.

dar -R / -x /usb/linux_diff29 -g home/denis/my_precious_file

Second method (more efficient):

We will restore our lost file, starting from the most recent differential backup and *maybe* up to the full backup. Our file may or may not be present in the a differential archive as it may have changed or not since the previous version, thus we have to check if our file is restored, using the -v option (verbose):

dar -R / -x /usb/linux_diff29 -v -g home/denis/my_precious_file

If we can see a line like this:

restoring file: /home/denis/my_precious_file

Then we are good. We can stops here, because we got the most recent backup version of our lost file. Otherwise we have to continue with the previous differential backup, up to the full backup if necessary. This method has an advantage over the first one, which is not to have *in all case* the need to use all the backup done since the full backup.

OK, now you have two files to restore. No problem, just do the second method but add -r option not to override any more recent file already restored in a previous step:

dar -x /usb key/linux_diff29 -R / -r -v -g home/denis/my_precious_file -g etc/fstab

Check the output to see if one or both of your files got restored. If not, continue with the previous backup, up to the time you have seen for each file a line indicating it has been restored. Note that the most recent version of each files may not be located in the same archive, thus you might get /etc/fstab restored from linux_diff28, and /home/denis/my_precious_file restored at linux_diff27. In the case /etc/fstab is also present in linux_diff27 it would not have been overwritten by an older version, thanks to the -r option.

This option is very important when restoring more than one file using the second method. Instead, in the first method is used (restoring first from the full backup, then from all the following differential backups), -r option is not so important because if overwriting occurs when you restore lost files, you would only overwrite an older version by a newer.

Third method (for the lay guys like me)

If you are lazy (as I am) have a look at dar_manager (at the end of the tutorial), it relies on a database that compile the content of all of your backups. You can then ask dar_manager a particular file, files or even directories, it will look in which backup to fetch them from and will invoke dar for you on the correct backup and file set.

Isolating a "catalogue"

We have seen previously how to do differential backups. Doing so, dar asks the last slice of the archive of reference. This operation is required to read the table of contents (also known as "catalogue" [this is a French word that means "catalog" in English, I will keep this French word in the following because it is also the name of the C++ class used in libdar]) which is located at the end of the archive (thus on the last slice(s)). You have the possibility to isolate (that's it to extract) a copy of this table of content to a small file. This small file is quite exactly the same as a differential archive that holds no data in it. Let's take an example with the full backup we did previously to see how to extract a catalogue:

dar -C /root/CAT_linux_full -A /mnt/usb/linux_full -z

Note here that we used the UPPERCASE 'C' letter, by opposition the the lowercase 'c' which is used for archive creation, here we just created an isolated catalogue, which is usually a small archive. In addition, you can use -z option to have it compressed, -s and -S option to have it split in slices, -p option, -b option, but for an isolated catalogue this is not often necessary as it is usually rather small. The only thing we have seen for backup that you will not be able to do for isolation is to filter files (-X, -I, -g, -P, -[ and -] option are not available for that operation).

So what, now we have our extracted catalogue, what can we do with it? Two things:

First

we can use the extracted catalogue in place of the archive, as reference for a differential backup. No need to manipulate the old usb key, you can store the last's backup isolated catalogue on your hard disk instead and use it as reference for the next backup. If we had used an isolated catalogue in the previous examples, we would have built our first differential backup this way (note that here we have chose to use the CAT_ prefix to indicate that the archive is an isolated catalogue, but the choice is yours to label isolated catalogue the way you want):

dar -c linux_diff1 -A /root/CAT_linux_full ... (other options seen above stay the same)
Second

we can use the isolated catalogue as backup of the internal catalogue if it get corrupted. Well to face to data corruption the best solution ever invented is Parchive, an autonomous program that builds parity file (same mechanism as the one used for RAID disks) for a given file. Here we can use Parchive to create a parity file for each slice. So, assuming you lack Parchive, and that you failed reading the full backup because the usb key is corrupted in the part used to store the internal catalogue, you can use an isolated catalogue as rescue:

dar -x linux_full -A /root/CAT_linux_full ... dar -d linux_full -A /root/CAT_linux_full ... dar -t linux_full -A /root/CAT_linux_full ... dar -l /root/CAT_linux_full

An isolated catalogue can be built for any type of archive (full, differential or incremental archive, even for an already isolated catalogue, which I admit is rather useless). You can also create an isolated catalogue at the same time you do a backup, thanks to the -@ option:

dar -c linux_diff1 -A /mnt/usb key/linux_full -@ CAT_linux_diff1 ... (other options...) dar -c linux_full -@ CAT_linux_full ... (other options see above stay the same for backup)

This is know as "on-fly" isolation.

Dar_manager tutorial

dar_manager builds a database of all your archive contents, to automatically restore the latest versions of a given set of files. Dar_manager is not targeted to the restoration a whole filesystem, the best ways to restore a whole filesystem has been described above and does not rely on dar_manager. So let's use dar_manager to restore a set of files or a whole directory. First, we have to create a "database" file:

dar_manager -C my_base.dmd

This created a file "my_base.dmd" where dmd stands for Dar Manager Database, but you are free to use any other extension.

This database is created empty. Each time you make a backup, may it be full or differential, you will have to add its table of contents (aka "catalogue") to this database using the  following command:

dar_manager -B my_base.dmd -A /mnt/usb/linux_full

This will add ("A" stands for "add") the archive contents to the base. In some cases you may not have the archive available but its extracted catalogue instead. Of course, you can use the extracted catalogue in place of the archive!

dar_manager -B my_base.dmd -A ~/Catalogues/CAT_linux_full

The problem however is that when dar_manager will need to recover a file located in this archive it will try to open the archive ~/Catalogue/CAT_linux_full for restoration, which does not contain any data because it is just the catalogue of the archive.

No problem in that case, thanks to the -b option we can change afterward the basename of the archive, and thanks to the -p option you can change afterward the path at any time. Let's now list the database contents:

dar_manager -B my_base.dmd -l

It shows the following:

dar path : dar options : archive # | path | basename ------------+--------------+---------------   1 /home/denis/Catalogues CAT_linux_full

We should change the path of archive number 1 for dar_manager looks on the usb key drive:

dar_manager -B my_base.dmd -p 1 /mnt/usb

...and also replace the name of the extracted catalogue by the real archive name

dar_manager -B my_base.dmd -b 1 linux_full

Now we have exactly the same database as if we had use the real archive instead of its catalogue:

dar_manager -B my_base.dmd -l dar path : dar options : archive # | path | basename ------------+--------------+---------------   1 /mnt/usb linux_full

In place of using -b and -p options, you can also tell the path and the name of the real archive to use at restoration time, when you add the catalogue to the database:

dar_manager -B my_base.dmd -A ~/Catalogues/CAT_linux_full /mnt/usb/linux_full

This is done adding an optional argument. The first ~/Catalogue... is the archive where to read the catalogue from, and the second /mnt/usb/... is the name to keep for it. No access is done to this second archive at the time of the addition, thus it may stay unavailable at the time the command is typed.

You can add up to 65534 archives to a given database, and have as much base as you want.

Note that we did not yet gave important options in the database to be passed to dar. For example, you will likely restore from the root of your filesystem, therefore when called from dar_manager, dar must get the "-R /" option. This is done with:

dar_manager -B my_base.dmd -o -R /

All that follows -o is passed to dar as-is. You can see the options passed to dar when listing the database contents (-l option).

Let's now suppose that after each backup you took the time to update your database, and you now just have removed an important file by mistake.

We can restore our /home/denis/my/precious/file using dar_manager that way:

dar_manager -B my_base.dmd -r home/denis/my/precious/file

dar_manager will find the proper archive to use, and call dar with the following options: dar -x archive -R / -g home/denis/my/precious/file which in turn will ask you the corresponding slices. If you want to restore more files at a time or even a directory tree, you can add several arguments after -r option of dar_manager:

dar_manager -B my_base.dmd -r home/denis/my/precious/file etc/fstab home/joe

Once an archive become obsolete you can delete it from the database thanks to the -D option, you can also change archive order (-m option), get a list in which is located a given file (-f option), get the list of most recent files in a given archive (-u option), and get overall statistics per archive (-s option). Lastly you can specify which dar command to use given its path (-d option), by default, dar_manager uses the PATH shell variable to choose the dar command.

A new feature for those that are really very lazy (still as I am myself): dar_manager has an interactive mode, so you don't have to remeber all these command-line switches except one:

dar_manager -B my_base.dmd -i

Interactive mode allow you to do all operation except restoration which can be done as previously explained.

To go further with dar/libdar

Well, we have reached the end of this tutorial, but dar/libdar has still a lot of features to be discovered:

all this is described in much details in the following documents:

You can also find document starting from the feature point of view using the feature description page. However if you find something unclear, feel free to report or ask for help ondar-support mailing-list.