Let me get straight to it, I used to transfer data from Files.com or any other platform where files were dropped to our cloud buckets using scripts. It was all okay when the file sizes were within a few MBs, but things got painful once they grew into GBs. It started taking a lot more time.
To speed things up, I tried running the transfer on a VM, it did get faster, but not faster faster, especially when the size crossed 400+ GB.
That’s when I started looking for a better way to connect my GCP/AWS buckets directly with these storage platforms, something that could make the transfer process faster and more reliable. And that’s where rclone came into the picture.
Rclone
I have set it up on my vm as a job that runs the backups/transfer with ease
sudo apt update
curl https://rclone.org/install.sh | sudo bash
the usual installation process, once done with is lets set up the config, this is the place where we mention the details of the storage place where we transferring the data from and to
rclone config
gonna throw options to set remote
from here it will take you to a bunch of storage platform options supported by rclone that can be used to mount
choose the one that's preferred, I used files.com and gave it a name which will be used to refer later on, did the auth using api here.
PS : You might not find the api option right away so wait for the edit advanced config option
now we are done with one remote, moving on to the next, follow the similar steps as the first one. rclone config -> new remote -> pick the one you want and provide the auth method. I have gone for GCS bucket here, mentioned the project number and performed auth using the service account json key
Also if you're concerned and specific about object acl and classes, you can pick the appropriate one from the options
once you're done with it, you can check if the mounting has been successful by using the ls command along with the remote name
rclone ls filescom:
And to copy the files the usual syntax is
rclone copy <source> <destination> [flags]
we got bunch of flags to show the progress --progress, mention the parallel transfer with number --transfers [number], to perform a simulation use --dry-run, to exclude or include any files we can use --exclude or --include
rclone copy filescom:/hawk gcs:vault-archive/-P --transfers=8 --checkers=10 --buffer-size=64M --fast-list --retries=5 --low-level-retries=10 --timeout=5m --contimeout=30s --retries-sleep=10s --log-file=/home/mohamed-roshan-k/rclone_transfer.log --log-level=INFO
-p = progress bar
--checker = checking if the file already exists in the destination
--buffer-size = mentions the size per file that's transferred to the buffer
--retries = number of times it should retry the transfer if it fails
--low-level-retries = similar like --retries but for network and file level error
--timeout = aborts the task if its stuck more than the mentioned time
--contimeout = connection timeout
--retries-sleep = interval between each retry
--log-file = path to the logs
Some screenshots on the time taken for transfer.
Do note, the process can be made faster if we increase
- Transfer = --transfers
- Checkers = --checkers
- Buffer size = --buffer-size
If your VM has the specs to handle the increased load (CPU, RAM, and network), you’ll see a noticeable improvement in performance (pretty obvious but yea)
Top comments (0)