Dar Documentation


Command-line Usage Notes





Introduction

You will find here a collection of example of use for several features of dar suite command-line tools.

Contents




Dar and remote backup server

The situation is the following : you have a host (called local in the following), on which resides an operational system, which you want to backup regularly, without perturbing users. For security reasons you want to store the backup on another host (called remote host in the following), only used for backup. Of course you have not much space on local host to store the archive.

Between these two hosts, you could use NFS and nothing more would be necessary to use dar as usually. but if for security reasons you don't want to use NFS (insecure network, local user must not have access to backups), but prefer to communicate through an encrypted session, (using ssh for example) then you need to use dar features brought by version 1.1.0:

dar can output its archive to stdout instead of a given file. To activate it, use "-" as basename. Here is an example :

dar -c - -R / -z | some_program
or
dar -c - -R / -z > named_pipe_or_file

Note, that file splitting is not available as it has not much meaning when writing to a pipe. (a pipe has no name, there is no way to skip (or seek) in a pipe, while dar needs to set back a flag in a slice header when it is not the last slice of the set). At the other end of the pipe (on the remote host), the data can be redirected to a file, with proper filename (something that matches "*.1.dar").

some_other_program > backup_name.1.dar

It is also possible to redirect the output to dar_xform which can in turn on the remote host split the data flow in several files, pausing between them if necessary, exactly as dar is able to do:

some_other_program | dar_xform -s 100M - backup_name

this will create backup_name.1.dar and so on. The resulting archive is totally compatible with those directly generated by dar. OK, you are happy, you can backup the local filesystem to a remote server through a secure socket session, in a full featured dar archive without using NFS. But, now you want to make a differential backup taking this archive as reference. How to do that? The simplest way is to use the new feature called "isolation", which extracts the catalogue from the archive and stores it in a little file. On the remote backup server you would type:

dar -A backup_name -C CAT_backup_name -z

if the catalogue is too big to fit on a floppy, you can split it as usually using dar:

dar -A backup_name -C CAT_backup_name -z -s 1440k

the generated archive (CAT_backup_name.1.dar, and so on), only contains the catalogue, but can still be used as reference for a new backup (or backup of the internal catalogue of the archive, using -x and -A at the same time). You just need to transfer it back to the local host, either using floppies, or through a secured socket session, or even directly isolating the catalogue to a pipe  that goes from the remote host to the local host:

on remote host:
dar -A backup_name -C - -z | some_program

on local host:
some_other_program > CAT_backup_name.1.dar

or use dar_xform as previously if you need splitting :
some_other_program | dar_xform -s 1440k CAT_backup_name

then you can make your differential backup as usual:
dar -A CAT_backup_name -c - -z -R / | some_program

or if this time you prefer to save the archive locally:
dar -A CAT_backup_name -c backup_diff -z -R /

For differential backups instead of isolating the catalogue, it is also possible to read an archive or its extracted catalogue through pipes. Yes, two pipes are required for dar to be able to read an archive. The first goes from dar to the external program "dar_slave" and carries orders (asking some portions of the archive), and the other pipe, goes from "dar_slave" back to "dar" and carries the asked data for reading.

By default, if you specify "-" as basename for -l, -t, -d, -x, or to -A (used with -C or -c), dar and dar_slave will use their standard input and output to communicate. Thus you need additional program to make the input of the first going to the output to the second, and vice versa. Warning: you cannot use named pipe that way, because dar and dar_slave would get blocked upon opening of the first named pipe, waiting for the peer to open it also, even before they have started (dead lock at shell level). For named pipes, there is -i and -o options that helps, they receive a filename as argument, which may be a named pipe. The -i argument is used instead of stdin and -o instead of stdout. Note that for dar -i and -o are only available if "-" is used as basename. Let's take an example:

You now want to restore an archive from your remote backup server. Thus on it you have to run dar_slave this way

on remote server:
some_prog | dar_slave backup_name | some_other_prog
or
dar_slave -o /tmp/pipe_todar -i /tmp/pipe_toslave backup_name

and on the local host you have to run dar this way:

some_prog | dar -x - -v ... | some_other_prog
or
dar -x - -i /tmp/pipe_todar -o /tmp/pipe_toslave -v ...

there is no order to run dar or dar_slave first, and dar can use -i and/or -o, while dar_slave does not. What is important here is to connect in a way or in an other their input and output, it does not matter how. The only restriction is that communication support must be perfect: no data loss, no duplication, no order change, thus communication over TCP should be fine.

Of course, you can also isolate a catalogue through pipes, test an archive, make difference, use a reference catalogue this way etc, and even then, output the resulting archive to pipe ! If using -C or -c with "-" while using -A also with "-", it is then mandatory to use -o: The output catalogue will generated on standard output, thus to send order to dar_slave you must use another channel with -o:

       LOCAL HOST                                   REMOTE HOST
   +-----------------+                     +-----------------------------+
   |   filesystem    |                     |     backup of reference     |
   |       |         |                     |            |                |
   |       |         |                     |            |                |
   |       V         |                     |            V                |
   |    +-----+      | backup of reference |      +-----------+          |
   |    | DAR |--<-]=========================[-<--| DAR_SLAVE |          |
   |    |     |-->-]=========================[->--|           |          |
   |    +-----+      | orders to dar_slave |      +-----------+          |
   |       |         |                     |      +-----------+          |
   |       +--->---]=========================[->--| DAR_XFORM |--> backup|
   |                 |        saved data   |      +-----------+ to slices|
   +-----------------+                     +-----------------------------+

on local host :
dar -c - -A - -i /tmp/pipe_todar -o /tmp/pipe_toslave | some_prog

on the remote host :

dar_slave -i /tmp/pipe_toslave -o /tmp/pipe_todar full_backup
dar_slave provides the full_backup for -A option

some_other_prog | dar_xform - diff -s 140M -p ...
while dar_xform make slice of the output archive provided by dar

See below an example with netcat and another using ssh.



dar and ssh

As reported "DrMcCoy" in the historical forum "Dar Technical Questions", the netcat program can be very helpful if you plane to backup over the network.

The context in which will take place the following examples are a "local" host named "flower" has to be backup or restored form/to a remote host called "honey" (OK, the name of the machines are silly...)

Example of use with netcat. Note that netcat command name is "nc"

Creating a full backup of "flower" saved on "honey"
on honey:
nc -l -p 5000 > backup.1.dar

then on flower:
dar -c - -R / -z | nc -w 3 honey 5000

but this will produce only one slice, instead you could use the following to have several slices on honey:

on honey:
nc -l -p 5000 | dar_xform -s 10M -S 5M -p - backup

on flower:
dar -c - -R / -z | nc -w 3 honey 5000

by the way note that dar_xform can also launch a user script between  slices exactly the same way as dar does, thanks to the -E and -F options.

Testing the archive
testing the archive can be done on honey but you could also do it remotely even if it is not very interesting doing it that way !

on honey:
nc -l -p 5000 | dar_slave backup | nc -l -p 5001

on flower:
nc -w 3 honey 5001 | dar -t - | nc -w 3 honey 5000

note also that dar_slave can run a script between slices, if for example you need to load slices from a robot, this can be done automatically, or if you just want to mount/unmount a removable media eject or load it and ask the user to change it ...

Comparing with original filesystem
on honey:
nc -l -p 5000 | dar_slave backup | nc -l -p 5001

on flower:
nc -w 3 honey 5001 | dar -d - -R / | nc -w 3 honey 5000

Making a differential backup
Here the problem is that dar needs two pipes to send orders and read data coming from dar_slave, and a third pipe to write out the new archive. This cannot be realized only with stdin and stdout as previously. Thus we will need a named pipe (created by the mkfifo command). 

on honey:
nc -l -p 5000 | dar_slave backup | nc -l -p 5001
nc -l -p 5002 | dar_xform -s 10M -p - diff_backup

on flower:
mkfifo toslave
nc -w 3 honey 5000 < toslave &
nc -w 3 honey 5001 | dar -A - -o toslave -c - -R / -z | nc -w 3 honey 5002


with netcat the data goes in clear over the network. You could use ssh instead if you want to have encryption over the network. The principle are the same.

Example of use with ssh

Creating full backup of "flower" saved on "honey"
we assume you have a sshd daemon on flower.
on honey:
ssh flower dar -c - -R / -z > backup.1.dar

or still on honey:
ssh flower dar -c - -R / -z | dar_xform -s 10M -S 5M -p - backup

Testing the archive
on honey:
dar -t backup

or from flower: (assuming you have a sshd daemon on honey)

ssh honey dar -t backup

Comparing with original filesystem
on flower:
mkfifo todar toslave
ssh honey dar_slave backup > todar < toslave &
dar -d - -R / -i todar -o toslave


Important. Depending on the shell you use, it may be necessary to invert the order in which "> todar" and "< toslave" are given on command line. The problem is that the shell hangs trying to open the pipes. Thanks to "/PeO" for his feedback.

or on honey:
mkfifo todar toslave
ssh flower dar -d - -R / > toslave < todar &
dar_slave -i toslave -o todar backup


Making a differential backup
on flower:
mkfifo todar toslave
ssh honey dar_slave backup > todar < toslave &

and on honey:
ssh flower dar -c - -A - -i todar -o toslave > diff_linux.1.dar
or
ssh flower dar -c - -A - -i todar -o toslave | dar_xform -s 10M -S 5M -p - diff_linux


Bytes, bits, kilo, mega etc.


you probably know a bit the metric system, where a dimension is expressed by a base unit (the meter for distance, the liter for volume, the Joule for energy, the Volt for electrical potential, the bar for pressure, the Watt for power, the second for time, etc.), and declined using prefixes:

      prefix (symbol) = ratio
    ================
deci  (d) = 0.1
centi (c) = 0.01
milli (m) = 0.001
micro (u) = 0.000,001 (symbol is not "u" but the "mu" Greek letter)
nano  (n) = 0.000,000,001
pico  (p) = 0.000,000,000,001
femto (f) = 0.000,000,000,000,001
atto  (a) = 0.000,000,000,000,000,001
zepto (z) = 0.000,000,000,000,000,000,001
yocto (y) = 0.000,000,000,000,000,000,000,001
deca (da) = 10
hecto (h) = 100
kilo  (k) = 1,000  (yes, this is a lower case letter, not an upper case!)
mega  (M) = 1,000,000
giga  (G) = 1,000,000,000
tera  (T) = 1,000,000,000,000
peta  (P) = 1,000,000,000,000,000
exa   (E) = 1,000,000,000,000,000,000
zetta (Z) = 1,000,000,000,000,000,000,000
yotta (Y) = 1,000,000,000,000,000,000,000,000

This way two milliseconds (noted "2 ms") are 0.002 second, and 5 kilometers (noted "5 km") are 5,000 meters. All was fine and nice up to the recent time when computer science appeared: In that discipline, the need to measure the size of information storage raised. The smallest size, is the bit (contraction of binary digit), binary because it has two possible states: "0" and "1". Grouping bits by 8 computer scientists called it a byte or an octet. A byte has 256 different states, (2 power 8). The ASCII (American Standard Code for Information Interchange) code arrived and assigned a letter or more generally a character to some well defined values of a byte, (A is assigned to 65, space to 32, etc). And as most text is composed of a set of character, they started to count size in byte. Time after time, following technology evolution, memory size approached 1000 bytes.

But as memory is accessed through a bus which is a fixed number of cables (or integrated circuits), on which only two possible voltages are authorized (to mean 0 or 1), the total amount of byte that a bus can address is always a power of 2. With a two cable bus, you can have 4 values (00, 01, 10 and 11, where a digit is the state of a cable) so you can address 4 bytes. Giving a value to each cable defines an address to read or write in the memory. Unfortunately 1000 is not a power of 2 and approaching 1000 bytes, was decided that a "kilobyte" would be 1024 bytes which is 2 power 10. Some time after, and by extension, a megabyte has been defined to be 1024 kilobytes, a terabyte to be 1024 megabytes, etc. at the exception of the 1.44 MB floppy where here the capacity is 1440 kilobytes thus here "mega" means 1000 kilo...

In parallel, in the telecommunications domain, going from analogical to digital signal made the bit to be used also. In place of the analogical signal, took place a flow of bits, representing the samples of the original signal. For telecommunications the problem was more a problem of size of flow: how much bit could be transmitted by second. At some ancient time appeared the 1200 bit by second, then 64000, also designed as 64 kbit/s. Thus here, kilo stays in the usual meaning of 1000 time the base unit. You can also find Ethernet 10 Mbit/s which is 10,000,000 bits by seconds, same thing with Token-Ring that had rates at 4, 16 or 100 Mbit/s (4,000,000 16,000,000 or 100,000,000 bits/s). But, even for telecommunications, kilo is not always 1000 times the base unit: the E1 bandwidth at 2Mbit/s for example, is in fact 32*64kbit/s thus 2048 kbit/s ... not 2000 kbit/s

Anyway, back to dar, you have to possibility to give the size in byte or using a single letter as suffix (k, M, T, P, E, Z, Y, the base unit being implicitely the byte) thus the possibility to provide a size in kilo, mega, tera, peta, exa, zetta or yotta byte, with the computer science definition of these terms (power of 1024) by default.

These suffixes are for simplicity and to not have to compute how much make powers of 1024. For example, if you want to fill a CD-R you will have to use the "-s 650M" option which is equivalent to "-s 6815744400", choose the one you prefer, the result is the same :-). Now, if you want 2 Megabytes slices in the sense of the metric system, simply use "-s 2000000" or read below:

Starting version 2.2.0, you can alter the meaning of all the suffixes used by dar, using the following option.

--alter=SI-units

(which can be shorten to -aSI or -asi) It changes the meaning of the prefixes that follow on the command-line, to follow the metric system (or System International) way of counting, up to the end of the line or to a

--alter=binary-units

arguments (which can be shortened to -abinary), after which we are back to the computer science meaning of kilo, mega, etc. up to the end of the line or up to a next --alter=SI-units. Thus in place of -s 2000000 one could use:

   -aSI -s 2M


Yes, and to make things more confuse, marketing/sales arrived and made sellers count gigabits a third way: I remember some  time ago, I bought a hard disk which was described as "2.1 GB", (OK, that's now long ago!), but in fact it had only 2097152 bytes available. This is far from 2202009 bytes (= 2.1 GiB for computer science meaning), and a bit more than 2,000,000 bytes (metric system). OK, if it had these 2202009 bytes (computer science meaning of 2.1 GB), this hard disk would have been sold under the label "2.5 GB"! ... just kidding :-)

Note that to distinguish kilo, mega, tera and so on, new abbreviations are officially defined, but are not used within dar:
ki = 1024
Mi = 1024*1024
GiB = and so on...
Ti
Pi
Ei
Zi
Yi

For example, we have 1 kiB for 1 kilobytes (= 1024 bytes), and 1 kibit for 1 kilobits (= 1024 bits) and 1 kB (= 1000 Bytes) and 1 kbit (= 1000 bits), ...



Running DAR in background


DAR can be run in background:

dar [command-line arguments] < /dev/null &



Files' extension used

dar suite programs use several type of files:
  • slices (dar, dar_xform, dar_slave, dar_manager)
  • configuration files (dar, dar_xform, dar_slave)
  • databases  (dar_manager)
  • user commands for slices (dar, dar_xform, dar_slave, using -E, -F or -~ options)
  • user commands for files (dar only, during the backup process using -= option)
  • filter lists (dar's -[ and -] options)
If for slice the extension and even the filename format cannot be customized, (basename.slicenumber.dar) there is not mandatory rule for the other type of files.

In the case you have no idea how to name these, here is the extensions I use:
"*.dcf": Dar Configuration file, aka DCF files (used with dar's -B option)
"*.dmd": Dar Manager Database, aka DMD files (used with dar_manager's -B and -C options)
"*.duc": Dar User Command, aka DUC files (used with dar's -E, -F, -~ options)
"*.dbp": Dar Backup Preparation, aka DBP files (used with dar's -= option)
"*.dfl": Dar Filter List, aka DFL files (used with dar's -[ or -] options)

but, you are totally free to use the filename you want !   ;-)




Running command or scripts from DAR


You can run command from dar at two different places:
  • when dar has finished writing a slice only in backup, isolation or merging modes, or before dar needs a slice (DUC files), in reading mode (testing, diffing, extracting, ...) and when reading an archive of reference.
  • before and after saving a given file during the backup process (DBP files)

A - Between slices:

This concerns -E, -F and -~ options. They all receive a string as argument. Thus, if the argument must be a command with its own arguments, you have to put these between quotes for they appear as a single string to the shell that interprets the dar command-line. For example if you want to call

df .

[This is two worlds: "df" (the command) and "." its argument] then you have to use the following on DAR command-line:

-E "df ."
or
-E 'df .'


DAR provides several substitution strings in that context:
  • %% is replaced by a single % Thus if you need a % in you command line you MUST replace it by %% in the argument string of -E, -F  or -~
  • %p is replaced by the path to the slices
  • %b is replaced by the basename of the slices
  • %n is replaced by the number of the slice
  • %N is replaced by the number of the slice with padded zeros (it may differ from %n only when --min-digits option is used)
  • %c is replaced by the context replaced by "operation", "init" or "last_slice" depending on the context.
The number of the slice (%n) is either the just written slice or the next slice to be read. For example if you create an new archive (either using -c, -C or -+), in -E option, the %n macro is the number of the last slice completed. Else (using -t, -d, -A (with -c or -C), -l or -x), this is the number of the slice that will be required very soon. While %c (the context) is substituted by "init", "operation" or "last_slice".

  • init : when the slice is asked before the catalogue is read
  • operation : once the catalogue is read and/or data treatment has begun.
  • last_slice : when the last slice has been written (archive creation only)

What the use of this feature? For example you want to burn the brand-new slices on CD as soon as they are  available.

let's build a little script for that:

%cat burner
#!/bin/bash

if [ "$1" == "" -o "$2" == "" ] ; then
  echo "usage: $0 <filename> <number>"
  exit 1
fi

mkdir T
mv $1 T
mkisofs -o /tmp/image.iso -r -J -V "archive_$2" T
cdrecord dev=0,0 speed=8 -data /tmp/image.iso
rm /tmp/image.iso
# Now assuming an automount will mount the just newly burnt CD:
if diff /mnt/cdrom/$1 T/$1 ; then
  rm -rf T
else
  exit 2
endif
%

This little script, receive the slice filename, and its number as argument, what it does is to burn a CD with it, and compare the resulting CD with the original slice. Upon failure, the script return 2 (or 1 if syntax is not correct on the command-line). Note that this script is only here for illustration, there are many more interesting user scripts made by several dar users. These are available in the examples part of the documentation.

One could then use it this way:

-E "./burner %p/%b.%n.dar %n"

which can lead to the following DAR command-line:

dar -c ~/tmp/example -z -R / usr/local -s 650M -E "./burner %p/%b.%n.dar %n" -p

First note that as our script does not change CD from the device, we need to pause between slices (-p option). The pause take place after the execution of the command (-E option). Thus we could add in the script a command to send a mail or play a music to inform us that the slice is burned. The advantage, here is that we don't have to come twice by slices, once the slice is ready, and once the slice is burnt.

Another example:

you want to send a huge file by email. (OK that's better to use FTP, but sometimes, people think than the less you can do the more they control you, and thus they disable many services, either by fear of the unknown, either by stupidity). So let's suppose that you only have mail available to transfer your data:

dar -c toto -s 2M my_huge_file -E "uuencode %b.%n.dar %b.%n.dar | mail -s 'slice %n' your@email.address ; rm %b.%n.dar ; sleep 300"

Here we make an archive with slices of 2 Megabytes, because our mail system does not allow larger emails. We save only one file: "my_huge_file" (but we could even save the whole filesystem it would also work). The command we execute each time a slice is ready is:

  1. uuencode the file and send the output my email to our address.
  2. remove the slice
  3. wait 5 minutes, to no overload too much the mail system, This is also
  4. useful, if you have a small mailbox, from which it takes time to retrieve mail.
Note that we did not used the %p substitution string, as the slices are saved in the current directory.

Last example, is while extracting: in the case the slices cannot all be present in the filesystem, you need a script or a command to fetch the next to be requested slice. It could be using ftp, lynx, ssh, etc. I let you do the script as an exercise. :-). Note, if you plan to share your DUC files, thanks to use the convention fo DUC files.

B - Before and after saving a file:

This concerns the -=, -< and -> options. The -< (include) and -> (exclude) options, let you define which file will need a command to be run before and after their backup. While the -= option, let you define which command to run for those files.

Let's suppose you have a very large file changing often that is located in /home/my/big/file, and several databases that each consist of several files under /home/*/database/data that need to have a coherent status and are also changing very often.

Saving them without precaution, will most probably make your big file flagged as "dirty" in dar's archive, which means that the saved status of the file may be a status that never existed for that file: when dar saves a file it reads the first byte, then the second, etc. up to the end of file. While dar is reading the middle of the file, an application may change the very begin and then the very end of that file, but only modified ending of that file will be saved, leading the archive to contain a copy of the file in a state it never had.

For a database this is even worse, two or more files may need to have a coherent status. If dar saves one first file while another file is modified at the same time, this will not lead having the currently saved files flagged as "dirty", but may lead the database to have its files saved in incoherent states between them, thus leading you to have saved the database in a corrupted state.

For that situation not to occur, we will use the following options:

-R / "-<" home/my/big/file  "-<" "home/*/database/data"

First, you must pay attention to quote the -< and -> options for the shell not to consider you ask for redirection to stdout or from stdin. Back to the example, that says that for the files /home/my/big/file and for any "database/data" directory (or file) in the home directory of a user, a command will be run before and after saving that directory of file. We need thus to define the command to run using the following option:

-= "/root/scripts/before_after_backup.sh %f %p %c"

Well as you see, here too we may (and should) use substitutions macro:
  • %% is replaced by a litteral  %
  • %p is replaced by the full path (including filename) of the file/directory to be saved
  • %f is replaced by the filename (without path) of the file/directory to be saved
  • %u is the uid of the file's owner
  • %h is the gid of the file's owner
  • %c is replaced by the context, which is either "start" or "end" depending on whether the file/directory is about to be saved or has been completely saved.

 And our script here could look like this:

cat /root/scripts/before_after_backup.sh
#!/bin/sh


if [ "$1" == "" ]; then
   echo "usage: $0 <filename> <dir+filename> <context>"
   exit 1
fi

# for better readability:
filename="$1"
path_file="$2"
context="$3"

if [ "$filename" = "data" ]; then
   if ["$context" = "start" ]; then
       # action to stop the database located in "$2"
   else
       # action to restart the database located in "$2"
   fi
else
   if ["$path_file" = "/home/my/big/file"]; then
     if ["$context" = "start" ]; then
       # suspend the application that writes to that file
     else
       # resume the application that writes to that file
     fi
   else
     # do nothing, or warn that no action is defined for that file
fi


So now, if we run dar with all these command, dar will execute our script once before entering any database/data directory located in a home directory of some user, and once all files of that directory will have been saved. It will run our script also before and after saving our /home/my/big/file file.

If you plan to share your DBP files, thanks to use the DBP convention.



Convention for DUC files

Since version 1.2.0 dar's user can have dar calling a command or scripts between slices, thanks to the -E,  -F and -~ options, called DUC files. To be able to easily share your DUC commands or scripts, I propose you the following convention:

- use the ".duc" extension to show anyone the script/command respect the following
- must be called from dar with the following arguments:

example.duc %p %b %n %e %c [other optional arguments]

- when called without argument, it must provide brief help on what it does and what are the expected arguments. This is the standard "usage:" convention.

Then, any user, could share their DUC files and don't bother much about how to use them. Moreover it would be easy to chain them:

if for example two persons created their own script, one "burn.duc" which burns a slice onDVD-R(W) and "par.duc" which makes a Parchive redundancy file from a slice, anybody could use both at a time giving the following argument to dar:

-E "par.duc %p %b %n %e %c 1 ; burn.duc %p %b %n %e %c"

or since version 2.1.0 with the following argument:

-E "par.duc %p %b %n %e %c 1" -E "burn.duc %p %b %n %e %c"

of course a script has not to use all its arguments, in the case of burn.duc for example, the %c (context) is probably useless, and not used inside the script, while it is still possible to give it all the "normal" arguments of a DUC file, extra not used argument are simply ignored.

If you have interesting DUC scripts, you are welcome to contact me by email, for I add them on the web site and in the following releases. For now, check doc/samples directory for a few examples of DUC files.

Note that all DUC scripts are expected to return a exit status of zero meaning that the operation has succeeded. If another exit status has been returned, dar asks the user for decision (or aborts if no user has been identified, for example, dar is not ran under a controlling terminal).



Convention for DBP files

Same as above, the following convention is proposed to ease the sharing of Dar Backup Preparation files:


- use the ".dbp" extension to show anyone the script/command respect the following
- must be called from dar with the following arguments:

example.duc %p %f %u %g %c [other optional arguments]

- when called without argument, it must provide brief help on what it does and what are the expected arguments. This is the standard "usage:" convention.

Identically to DUC files, DBP files are expected to return a exist status of zero, else the backup process is suspended for the user to decide wether to retry, ignore the failure or abort the whole backup process.



User targets in DCF

Since release 2.4.0, a DCF file (on given to -B option) can contain user targets. A user target is an extention of the conditional syntax. So we will first make a brief review on conditional syntax.

Conditional syntax in DCF files:

The conditional syntax gives the possiblility to have options in a DCF file that are only active in a certain context:
  • archive extraction
  • archive creation
  • archive listing
  • archive testing
  • archive isolation
  • archive merging
  • no action yet defined
  • all context
  • when a archive of reference is used
  • when an auxilliary archive of reference is used
These works with the following reserved keywords (see dar's man page for an exhaustive list). Let's take an example:

cat sample.dcf
# this is a comment

all:
-K aes:

extract:
-R /

reference:
-J aes: 

auxilliary:
-~ aes:

create:
-ac
-Z "*.mp3"
-Z "*.avi"

default:
-V

This way, the -Z options are only used when creating an archive, while the -K option is used in any case. Well, now that we have briefly review the conditional syntax, you may have guess that new "targets" (or keywords) if you prefer can be added. Let's add the following in our DCF file:

compress:
-z lzo:5

In the usual situation all that follows the target "compress" up to the next target or the end of the file will not be used to configure dar, unless you provide the "compress" keyword on command-line:

dar -c test -B sample.dcf compress

Which will do exactly the same as if you have typed:

dar -c test -z lzo:5

Of course, you can use as many user target as you wish in your files, the only constraint is that it must not have the name of the reserved keyword of a conditional syntax, but you can also mix conditional syntax and user targets. Here follows an example:

cat sample.dcf
# this is a comment

all:
-K aes:

extract:
-R /

reference:
-J aes: 

auxilliary:
-~ aes:

create:
-ac
-Z "*.mp3"
-Z "*.avi"

default:
-V

# our first user target named "compress":
compress:
-z lzo:5

# a second user target named "verbose":
verbose:
-v
-vs

# a third user target named "ring":
ring:
-b

# a last user target named "hash":
--hash sha1

So now, you can use dar and ctivate a set of commands by simply adding the name of the target on command-line:

dar -c test -B sample.dcf compress ring verbose hash

which is equivalent to:

dar -c test -K aes:
-ac -Z "*.mp3" -Z "*.avi" -z lzo:5 -v -vs -b --hash sha1

Last for those that like complicated things, you can recusively use DCF inside user targets, which may contain conditional syntax and the same or some other user targets of you own.





 Using data protection with DAR & Parchive

Parchive (PAR in the following) is a very nice program that makes possible to recover a file which has been corrupted. It creates redundancy data stored in a separated file (or set of files), which can be used to repair the original file. This additional data may also be damaged, PAR will be able to repair the original file as well as the redundancy files, up to a certain point, of course. This point is defined by the percentage of redundancy you defined for a given file. But,... check the official PAR site here:
Since version 2.4.0, dar is provided with a default /etc/darrc file. It contains a set of user targets among which is "par2". This user target invokes the dar_par.dcf file provided beside dar that automatically creates parity file for each slice during backup and verifies and if necessary repaires slices when testing an archive. So now you only need to use dar this way to activate Parchive with dar:

dar [options] par2

Simple no?



Examples of file filtering

File filtering is what defines which files are saved, listed, restored, compared, tested, and so on. In brief, in the following we will say which file are elected for the operated, meaning by "operation", either a backup, a restoration, an archive contents listing, an archive comparison, etc.

File filtering is done using the following options -X, -I, -P, -R, -[,  -] or -g.

OK, Let's start with some concretes examples:

dar -c toto

this will backup the current directory and all what is located into it to build the toto archive, also located in the current directory. Usually you should get a warning telling you that you are about to backup the archive itself

Now let's see something less obvious:

dar -c toto -R / -g home/ftp

the -R option tell dar to consider all file under the / root directory, while the -g "home/ftp" argument tells dar to restrict the operation only on the home/ftp subdirectory of the given root directory thus here /home/ftp.

But this is a little bit different from the following:

dar -c toto -R /home/ftp

here dar will save any file under /home/ftp without any restriction. So what is the difference? Yes, exactly the same files will be saved as just above, but the file /home/ftp/welcome.msg for example, will be stored as <ROOT>/welcome.msg . Where <ROOT> will be replaced by the argument given to -R option (which defaults to "."), at restoration or comparison time. While in the previous example the same file would have been stored with the following path <ROOT>/home/ftp/welcome.msg .

dar -c toto -R / -P home/ftp/pub -g home/ftp -g etc

as previously, but the -P option make all files under the /home/ftp/pub not to be considered for the operation. Additionally the /etc directory and its subdirectories are saved.

dar -c toto -R / -P etc/password -g etc

here we save all the /etc except the /etc/password file. Arguments given to -P can be plain files also. But when they are directory this exclusion applies to the directory itself and its contents. Note that using -X to exclude "password" does have the same effect:

dar -c toto -R / -X "password" -g etc

will save all the /etc directory except any file with name equal to "password". thus of course /etc/password will no be saved, but if it exists, /etc/rc.d/password will not be saved neither if it is not a directory. Yes, if a directory /etc/rc.d/password exist, it will not be affected by the -X option. As well as -I option, -X option do not apply to directories. The reason is to be able to filter some kind of file without excluding a particular directory for example you want to save all mp3 files and only MP3 files,

dar -c toto -R / -I "*.mp3" -I "*.MP3" home/ftp

will save any mp3 or MP3 ending files under the /home/ftp directories and subdirectories. If instead -I (or -X) applied to directories, we would only be able to recurse in subdirectories ending by ".mp3" or ".MP3". If you had a directory named "/home/ftp/Music" for example, full of mp3, you would not have been able to save it.

Note that the glob expressions (where comes the shell-like wild-card '*' '?' and so on), can do much more complicated things like "*.[mM][pP]3". You could thus replace the previous example by:

dar -c toto -R / -I "*.[mM][pP]3" home/ftp

this would cover all .mp3 .mP3 .Mp3 and .MP3 files. One step further, the -acase option makes following filtering arguments become case sensitive (which is the default), while the -ano-case (alias -an in short) set to case insensitive mode filters arguments that follows it. In shorter we could have:

dar -c toto -R / -an -I "*.mp3' home/ftp

And, instead of using glob expression, you can use regular expressions (regex) using the -aregex option. You can also use alternatively both of them using -aglob to return back to glob expressions. Each option -aregex / -aglob define the expected type of expression in the -I/-X/-P/-g/-u/-U/-Z/-Y options that follows, up to end of line or to the next -aregex / -aglob option.

Last a more complete example:

dar -c toto -R / -P "*/.mozilla/*/[Cc]ache" -X ".*~" -X ".*~" -I "*.[Mm][pP][123]" -g home/ftp -g "fake"

so what ?

OK, here we save all under /home/ftp and /fake but we do not save the contents of "*/.mozilla/*/[Cc]ache" like for example "/home/ftp/.mozilla/ftp/abcd.slt/Cache" directory and its contents. In these directories we save any file matching "*.[Mm][pP][123]" files except those ending by a tilde (~ character), Thus for example file which name is "toto.mp3" or ".bloup.Mp2"

Now the inside algorithm:

 a file is elected for operation if
 1 - its name does not match any -X option or it is a directory
*and*
 2 - if some -I is given, file is either a directory or match at least one of the -I option given.
*and*
 3 - path and filename do not match any -P option
*and*
 4 - if some -g options are given, the path to the file matches at least one of the -g options.

The algorithm we detailed above is the default one, which is historical and called the unordered method, since version 2.2.x there is also an ordered method (activated adding -am option) which gives even more power to filters, the dar man mage will give you all the details.

In parallel of file filtering, you will find Extended Attributes filtering thanks to the -u and -U options (they work the same as -X and -I option but apply to EA), you will also find the file compression filtering (-Z and -Y options) that defines which file to compress or to not compress, here too the way they work is the same as seen with -X and -I options, the -ano-case / -acase options do also apply here, as well as the -am option. Last all these filtering (file, EA, compression) can also use regular expression in place of glob expression (thanks to the -ag / -ar options).

Note in very last point, that the --backup-hook-include and --backup-hook-exclude options act the same as -P and -g options but apply to the files about to be saved and provides to the user the possibility to perform an action (--backup-hook-execute) before and after saving files matching the masks options. The dar man page will give you all the necessary details to use this new feature.




Decremental Backup


Well, you have already heard about "Full" backup, in which all files are completely saved in such a way that let you use this backup alone to completely restore your data. You have also probably heard about "differential" backup in which only the changes that occurred since an archive of reference was made are stored. There is also the "incremental" backup, which, in substance, is the same as "differential" ones. The difference resides in the nature of the archive of reference: "Differential" backup use only a "full" backup as reference, while "incremental" may use a "full" backup, a "differential" backup or another "incremental" backup as reference (Well, in dar's documentation the term "differential" is commonly used in place of "incremental", since there is no conceptual difference from the point of view of  dar software).

Well, here we will describe what is meant by "decremental" backup. All started by a feature request from Yuraukar on dar-support mailing-list:

In the full/differential backup scheme, for a given file, you have as many versions as changes that were detected from backup to backup. That's fair in terms of storage space required, as you do not store twice the same file in the same state as you would do if you were doing only full backups. But the drawback is that you do not know by advance in which backup to find the latest version of a given file. So, if you want to restore your entire system to the latest state available from your backup set, you need to restore the most ancient backup (the latest full backup), then the others one by one in chronological order (the incremental/differential backups). This may take some time, yes. This is moreover inefficient, because, you will restore N old revisions of a file that have changed often before restoring the last and more recent version.

Yuraukar idea was to have all latest versions of files in the latest backup done. Thus the most recent archive would always stay a full backup. But, to still be able to restore a file in an older state than the most recent (in case of accidental suppression), we need a so called decremental backup. This backup's archive of reference is in the future (a more recent decremental backup or the latest backup done, which is a full backup in this scheme). This so called "decremental" backup stores all the file differences from this archive of reference that let you get from the reference state to an older state.

Assuming this is most probable to restore the latest version of a filesystem than any older state available, decremental backup seem an interesting alternative to incremental backups, as in that case you only have to use one archive (the latest) and each file get restored only once (old data do not get overwritten at each archive restoration as it is the case with incremental restoration).

Let's take an example: We have 4 files in the system named f1, f2, f3 and f4. We make backups at four different times t1, t2, t3 and t4 in chronological order. We will also perform some changes in filesystem along this period: f1 has will be removed from the system between t3 and t4, while f4 will only appear before t3 and t4. f2 will be modified between t2 and t3 while f3 will be changed between t3 and t4.

All this can be represented this way, where lines are the state at a given date while each column represents a given file.
 
time
   ^
   |                       * represents the version 1 of a file
t4 +         #    #    *   # represents the version 2 of a file
   |
t3 +    *    #    *  
   |
t2 +    *    *    *
   |
t1 +    *    *    *
   |
   +----+----+----+----+---
        f1   f2   f3   f4  


Now we will represent the contents of backups at these different times, first using only full backup, then using incremental backups and at last using decremental backups. We will use the symbol 'O' in place of data if a given file's data is not stored in the archive because it has not changed since the archive of reference was made. We will also use an 'x' to represent the information that a given file has been recorded in an archive as deleted since the archive of reference was made. This information is used at restoration time to remove a file from filesystem to be able to get the exact state of files seen at the date the backup was made.

FULL BACKUPS

   ^
   |
t4 +         #    #    *           
   |
t3 +    *    #    *  
   |
t2 +    *    *    *
   |
t1 +    *    *    *
   |
   +----+----+----+----+---
        f1   f2   f3   f4  

Yes, this is easy, each backup contains all the files that existed at the time the backup was made. To restore in the state the system had at a given date, we only use one backup, which is the one that best corresponds to the date we want. The drawback is that we saved three time the file f1 an f3 version 1, and twice f2 version 2, which correspond to a waste of storage space.


FULL/INCREMENTAL BACKUPS


   ^
   |
t4 +    x    0    #    *     0 represents a file which only state is recorded
   |                         as such, no data is stored in the archive
t3 +    0    #    0          very little space is consummed by such entry
   |
t2 +    0    0    0
   |
t1 +    *    *    *
   |
   +----+----+----+----+---
        f1   f2   f3   f4  

Now we see that archive done at date 't2' does not contain any data as no changed have been detected between t1 and t2. This backup is quite small and needs only little storage. Archive at t3 date only stores f2's new version, and at t4 the archive stores f4 new file and f3's new version. We also see that f1 is marked as removed from filesystem since date t3 as it no longer exists in filesystem but exists in the archive of reference done at t3.

As you see, restoring to the latest state is more complicated compared to only using full backups, it is neither simple to know in which backup to took for a given file's data at date t3 for example, but yes, we do not waste storage space anymore. The restoration process the user has to follow is to restore in turn:
- archive done at t1, which will put old version of files and restore f1 that have been removed at t4
- archive done at t2, that will do nothing at all
- archive done at t3, that will replace f2's old version by its new one
- archive done at t4, that will remove f1, add f4 and replace f3's old version to by its latest version.

The latest version of files is scattered over the two last archives here, but in common systems, much of the data does not change at all and can only be found in the first backup (the full backup).

FULL/DECREMENTAL BACKUP

Here is represented the contents of backups using decremental approach. The most recent (t4) backup is always a full backup. Older backups are decremental backups based on the just more recent one (t3 is a difference based on t4, t1 is a difference based on t2). At the opposit of incremental backups, the reference of the archive is in the future not in the past.

   ^
   |
t4 +         #    #    *           
   |
t3 +    *    0    *    x
   |
t2 +    0    *    0
   |
t1 +    0    0    0
   |
   +----+----+----+----+---
        f1   f2   f3   f4  

Thus obtaining the latest version of the system is as easy as done using only full backups. And you also see that the space required to store these decremental backup is equivalent to what is needed to store the incremental backups. However, still the problem exist to locate the archive in which to find a given's file data at a given date. But also, you may also see that backup done at time t1 can safely be removed as it became useless because it does not store any data, and loosing archive done at t1 and t2 is not a big problem, you just loose old state data.

Now if we want to restore the filesystem in the state it has at time t3, we have to restore archive done at t4 then restore archive done at t3. This last step will have the consequences to create f1, replace f3 by its older version and delete f4 which did not exist at time t3 (file which is maked 'x' meaning that it has to be removed). if we want to go further in the past, we will restore the decremental backup t2 which will only replace f2's new version by the older version 1. Last restoring t1 will have no effect as no changed were made between t1 and t2.

This was for the theory. Now let's see the practice on how to build these decremental backups.

Assuming you have a full backup describing your system at date t1, can we have in one shot both the new full backup for time t2 and also transform the full backup of time t1 into a decremental backup relative to time t2? In theory, yes. But there is a risk in case of failure (filesystem full, lack of electric power, bug, ...): you may loose both backups, the one which was under construction as well as the one we took as reference and which was under process of transformaton to decremental backup.

Another point is that you cannot shrink a given file: many (all?) operating systems provide hook to create/append/overwrite data to an existing file but not to remove data from it and get a smaller file as result. This operation when needed is usually emulated by the applications, creating a temporary file in which is added the part to retain from the original file, then once the copy is finished, the original file is deleted an the temporary file is renamed at the place of the original one.  Thus here the process to transforming a full backup into a decremental backup will not simply remove data from the filesystem, but will copy (thus add) a portion of the data to a new file and then remove the old data. Thus whatever the method used to do a decremental backup, you will end at a time with two full archives, and will require disk space to store both of them.

Seen this, the dar implementation is to let the user do a normal full backup at each step [Doing just a differential backup sounds better at first, but this would end in more archive manipulation, as we would have to generate both decremental and new full backup, and we would manipulate at least the same amount of data]. Then with the two full backups the user would have to use archive merging to create the decremental backup (using -ad option). Last, once the resulting (decremental) archive have been tested and that the user is sure this decremental backup is viable, he can remove the older full backup and store the new decremental backup beside older ones and the new full backup. This at last only, will save you disk space and let you easily recover you system using the latest (full) backup.

Can one use an extracted catalogue instead of the old full backup to perform a decremental backup? No. The full backup to transform must have the whole data in it to be able to create a decremental back with data in it. Only the new full backup can be replaced by its extracted catalogue.

Now, let's oversee the implementation used in dar to build a decremental backup: The operations that the merging must follow to transform a full backup into a decremental backup are the following:
we assuming the archive of reference is the old full backup (-A option) and the auxiliary archive of reference (-@) is the new full backup (the one to be transformed).
- if a file is found in both archives, if it has the same date of modification we just store it as "unchanged" since the archive of reference was done, else if dates differ, we keep the file from the old archive (-@ archive). Same thing with EA, if both are of the same date, we mark EA as "unchanged" else we keep the EA of the old archive.
- if a file is found only in the old archive, then we keep its data/EA in the old archive
- if a file is found only in the new archive, then we store an entry in the resulting archive to record that this file did not exist at the time of the old backup and that it must be destroyed from filesystem at restoration time of this decremental backup.

Well, the only thing that the pure merging operation cannot do is the last point. This point is out of the scope of the overwriting policy as there is no conflict of file found in both archives. however as this is very close to a merging operation and to avoid code duplication, it has been designed a special command-line switch -ad (or --alter=decremental) that modifies the merging operation to address this need. This switch also ignores any overwriting policy provided and uses its own that corresponds to what is needed for building a decremental backup.

In brief, the operations to follow to build a set of decremental backups is the following:

dar -c <new full backup t3> -R /filesystem [...options]
dar -+ <decremental backup t2> -A <old full backup t2> -@ <new full backup t3> -ad [...options]
dar -t <decremental backup t2>    (this is optionnal but strongly recommended).
rm <old full backup t2>



What about dar_manager? Well, in nature, there is no difference between an incremental backup and a differential/incremental backup. The only difference resided in the way (the order) they have to be used.

So, even if you can add decremental backups in a dar_manager database, it is not designed to handle them correctly. It is thus better to keep dar_manager only for incremental/differential/full backups.


Door inodes (Solaris)

A door inode is a dynamic object that is created on top of an empty file, it does exist only when a process has a reference to it, it is thus not possible to restore it. But the empty file it is mounted on can be restored instead. As such, dar restores an door inode with an empty file having the same parameters as the door inode.

If an door inode is hard linked several times in the file system dar will restore a plain file having as much hard links to the corresponding locations.

Dar is also able to handle Extended Attributes associated to a door file, if any. Last, if you list an archive containing door inodes, you will see the 'D' letter as their type (by opposition to 'd' for directories), this is conform to what the 'ls' command displays for such entries.