DEV Community

Corentin Bettiol
Corentin Bettiol

Posted on • Edited on

1 2

How to efficiently download thousands of files?

Hello.

I need to copy ~40,000 files from a server to my computer, and I'm wondering what is the best approach to solve this problem.

using scp

  • slow
  • consume lots of bandwidth

using rsync

  • slow
  • consume less bandwidth
  • can resume copy after a network problem

using tar then scp

  • less slow
  • consume less bandwidth

using tar then rsync

  • less slow
  • consume less bandwidth
  • can resume copy after a network problem

using tar then split then parallel with scp

  • fast
  • consume less bandwidth

using tar then split then parallel with rsync

  • fast
  • consume less bandwidth
  • can resume copy after a network problem

I think I will opt for the last one, but what would you do in my case?


Edit: bash commands for using tar then split then parallel with rsync:

Prerequisite: Install parallel and remove warning:

sudo apt install parallel && echo "will cite" | parallel --citation &>/dev/null
Enter fullscreen mode Exit fullscreen mode
# on server
tar cfz files.tar.gz ~/path/to/folder/
split -b 20M files.tar.gz fragment_

# on local machine
cat $(ssh host@server ls -1 fragment_*) | parallel rsync -z host@server:{} .
cat frament_* > files.tar.gz
tar xvf files.tar.gz
Enter fullscreen mode Exit fullscreen mode

Edit 2: I used a simple rsync command, since it can compress files on the fly and handle restart from where the transfer stopped.

Since rsync always use the max bandwidth available it isn't a bottleneck that can be solved with parallel.

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more