DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

Corentin Bettiol
Corentin Bettiol

Posted on • Updated on

How to efficiently download thousands of files?

Hello.

I need to copy ~40,000 files from a server to my computer, and I'm wondering what is the best approach to solve this problem.

using scp

  • slow
  • consume lots of bandwidth

using rsync

  • slow
  • consume less bandwidth
  • can resume copy after a network problem

using tar then scp

  • less slow
  • consume less bandwidth

using tar then rsync

  • less slow
  • consume less bandwidth
  • can resume copy after a network problem

using tar then split then parallel with scp

  • fast
  • consume less bandwidth

using tar then split then parallel with rsync

  • fast
  • consume less bandwidth
  • can resume copy after a network problem

I think I will opt for the last one, but what would you do in my case?


Edit: bash commands for using tar then split then parallel with rsync:

Prerequisite: Install parallel and remove warning:

sudo apt install parallel && echo "will cite" | parallel --citation &>/dev/null
Enter fullscreen mode Exit fullscreen mode
# on server
tar cfz files.tar.gz ~/path/to/folder/
split -b 20M files.tar.gz fragment_

# on local machine
cat $(ssh host@server ls -1 fragment_*) | parallel rsync -z host@server:{} .
cat frament_* > files.tar.gz
tar xvf files.tar.gz
Enter fullscreen mode Exit fullscreen mode

Edit 2: I used a simple rsync command, since it can compress files on the fly and handle restart from where the transfer stopped.

Since rsync always use the max bandwidth available it isn't a bottleneck that can be solved with parallel.

Top comments (0)

Top Heroku Alternatives (For Free!)

Recently Heroku shut down free Heroku Dynos, free Heroku Postgres, and free Heroku Data for Redis on November 28th, 2022. So Meshv Patel put together some free alternatives in this classic DEV post.