If you've been programming for a while, you may have heard of rsync. It is a tool to transfer and synchronize files in different directories. These directories can be inside the same machine or between two connected machines.
Some of you may wonder, "Well, why can't I just use the copy command
cp, that way I don't have to learn a new command?" These two are two different programs.
cp copies everything from one location to another, while
rsync copies the deltas (the differences) from one location to another.
Suppose that your source directory A contains files totaling 1GB in size. Assume that you also have directory B with the same 1GB of files. Then you add small changes of about 0.1GB. With rsync, you won't have to copy the whole 1.1GB of data from A to B. You will only have to transfer 0.1GB of data. Why copy mostly the same data if you can just copy only the differences? This lets you minimize the network usage, which can be useful if you have a small bandwidth.
Let's jump straight to the code. I would strongly encourage you to code along. I find it more useful when learning a new thing if I actually type the commands. Moreover, don't just type everything you see in this article and stop there. Experiment with these commands. Read the
man rsync page. Make variations. Experiment. Do things that I don't mention here. Break things! Just make sure you make a backup first (see what I did there? :D) Only by doing these you'll get the most out of this article.
At its core, the rsync command looks like this:
rsync source destination
Suppose that you have a file
~/Projects/source/file1.txt that you want to sync to
rsync ~/Projects/source/file1.txt ~/Projects/destination/
You should see
destination/ now. Cool! However, practically speaking, you probably don't need to rsync a single file. Rsync is usually used on a directory.
To rsync the
source/ directory (including all its files) to the
destination/ directory, run:
rsync source/* destination/
Now you will find all the files inside
source/ are copied inside the
destination/. If you run the command
rsync source/* destination/ again without making any changes, rsync won't do anything (there are no deltas).
If you add a file inside
source/ that is not yet in
destination/, running the rsync command adds that file into
If you remove a file inside
source/ and that file is also inside
destination/, running the rsync command will not remove that file from
destination/. Rsync by default has an additive nature. To also delete a file in
destination/ when the source file is deleted, pass the
Finally, if you add a directory inside
source/, the rsync command above won't sync the directory (and neither the contents inside that directory). To sync directories within a source directory, you need to use rsync recursively.
-r option syncs a directory recursively. If your
source/ directory contains:
file1.txt file2.txt dir1/ dir1/file1.md dir1/file2.md
rsync source/* destination/ won't bring
dir1/ (and the files inside it) into the
destination/ directory. However, if you run:
rsync -r source/ destination/
Everything will carry over. Neat!
If you read online articles about rsync, you will notice that many developers use the
-a command (
--archive). This is analogous to running
rsync -rlptgoD. Whoa, that's a lot of options! Don't worry, let's break it down:
-ris recursive, just as you saw above
-lcopies symlinks and keep them as symlinks
-ppreserves file permissions / privileges
-tpreserves time metadata in a file
-gpreserves a group
-opreserves owner (only for super-user)
-Dpreserves device and special files
The big picture is, running
rsync -a preserves all the important metadata when transferring files. It is safe to say you will want to run
rsync -a 90% of the time.
You can use rsync over a network connection. If you have access to a remote server, you can quickly sync your directory locally with a remote server, vice versa.
Wait a second... doesn't that sound like dropbox? Yup! There are tons of other features that Dropbox has that rsync doesn't, but at the gist of it, dropbox is a fancy and glorified rsync with durability added.
For this section, if you're coding along, I am assuming that you have access to a remote server. If you don't, keep reading but take a mental note. There will probably come a time when you need to do this in the future.
To rsync your
source/ directory to the remote server's
rsync -a source/ yourUserName@123.456.788.000:~/stash/destination/
If you store a Host inside SSH config, you can also use that. For example, I have a Host named
gc (Google Compute). To sync the
Projects/ directory, I can run:
rsync -a ~/Projects gc:~
Notice that I don't have a forward slash after
Projects even though it is a directory. When you rsync a directory but you don't pass it a slash, rsync will create a directory with the same name as the source. What this does is it creates a
~/Projects/ directory inside my
Here are some options that can be helpful when transferring files over the net:
-zto compress data during transfer
-vstands for verbose. This will show the outputs of the file transfer
--progress: partial creates a partial file, in case a transfer is interrupted and progress shows the file transfer progress. This option is useful for large files.
Btw, did you know that you can pass a command when running rsync?
For example, if I want to rsync only
test9.txt files from the remote server, I can run:
rsync -avz gc:'`find . -name "*test[0-9].txt"`' ~/Projects/source
The trick here is
yourRemoteHost:'YOUR_CMD'. Note the backtick surrounding the
Use this when you need to filter for specific files from a remote host instead of having to manually pick-and-choose the files.
Rsync reminds me of file-backup services like Dropbox. When combined with cron, you can create an automated job to automatically sync data every day, hour, etc.
In Mac, I can edit a cron job with the
crontab -e command (yours might be different depending on what OS you have).
To create a multiple backups:
00 */1 * * * rsync -a --delete /Users/iggy/source/ /Users/iggy/backup/hourly 00 17 * * * rsync -a --delete /Users/iggy/source/ /Users/iggy/backup/daily 00 18 * * 5 rsync -a --delete /Users/iggy/source/ /Users/iggy/backup/weekly 00 19 1 * * rsync -a --delete /Users/iggy/source/ /Users/iggy/backup/monthly_$(date +%Y%m)
This performs 4 backups:
- an hourly backup
- a daily backup every day at 5 PM
- a weekly backup every Friday (day 5) at 6PM
- a monthly backup on the 1st at 7PM
The first three backups will overwrite the previous backup (it will rsync into the directory with the same name). The monthly backup will have a unique name.
Rsync is a powerful command for creating backups or syncing two directories. If you only need to do a one-time copy, the
cp command is probably simpler. But if you need to keep two directories in sync,
rsync is a better option.
Rsync and cron are like peanut butter and jelly. Together they let you perform automated backups easily. What other uses of rsync can you think of?
Who knows, maybe in the future you will use rsync to create the next Dropbox rival! When you do, please let me know :).
Until then, happy coding!