Desiring to be prepared for a hard disk drive failure in home server applications like NAS often leads to some kind of RAID setups. Since I want to balance data security versus energy consumption, my current compromise according to that is to avoid a RAID (too many disks spin up if I only want to access one file), but to sync the primary hard drives once a week to secondary hard drives. Like that, the worst case is that I lose one week of data if one of the primary HDDs fail, and the secondary HDDS sleep most of the time, consuming just some milliwatts - and don't annoy me with their noise ;-).
For that, I use rsync with the following options:
rsync --del -aPEh /.../source/ /.../target
Now I experienced the fact, that the storage consumption on the primary and the secondary HDD deviates quite heavily. For instance, I had about 600 GBs difference between the primary HDD and the secondary HDD, both 4 TB devices. Digging into it (and assuming one disk to be broken) I finally found the explanation: There are many incremental backups from other servers and desktop computers on the primary HDD, and these are using hard links for all not-changed files between the incremental backup snap shots. Rsync, surprisingly to me, preserves soft links while using the popular -a (--archive) option, but not hard links. Consequently, each hard link to one file on the primary HDD resulted on the secondary HDD in the duplication of this file to the amount of referencing hard links , thus consuming a lot more disk space on the secondary HDD.
Even if the finding of this fact took a lot more time than I expected, the solution is rather simple: just set the -H option and hard links are used on the secondary HDD as well as on the primary one. The command looks like this then:
rsync --del -aPEhH /.../source/ /.../target
... and everything works just fine ;-)