DEV Community

ufraaan
ufraaan

Posted on • Originally published at ufraan.dev

BitTorrent Internals

BitTorrent is a decentralized peer-to-peer (P2P) file-sharing protocol designed for fast, efficient distribution of large files over the internet.

Let's first see how we classically download files from the internet, and why we even need something like BitTorrent.

client-server

The client requests a file from the server, the server has the file and responds. But things get interesting when your download size is a bit larger.

  • Server bandwidth is limited, so as more clients connect, speed slows down.
  • Speed of data transfer is capped by the server's upload capacity.

alice-bob

If Bob's upload speed is 60 Mbps, then no matter how fast Alice's download speed is, the overall download speed cannot exceed 60 Mbps.


Peer-to-Peer Network

In a P2P network, every party participating in the network has the exact same capabilities: they are all equal peers and can initiate conversations with each other.

The main highlight of P2P: even if a few nodes crash or are removed, the network keeps serving its purpose. No single point of failure.

This isn't just about outages: it also applies to the core service the network provides. For example, if the network's job is to serve files, even if one machine goes down, other machines would still share those files with whoever needs them. There are no "system interruptions" as long as the network is stable enough.

P2P networks come in two flavors:

  • Pure P2P: No central entity. Every node can connect to every other node.
  • Hybrid P2P: Has a central entity, used to share metadata about the data across peers: not the data itself.

pure-hybrid-p2p

Note: If the central entity goes down, the network and its services are affected. This hybrid P2P architecture is what powers BitTorrent.

BitTorrent has a central entity called a tracker. Peers talk to each other, but to know who to talk to, they first consult the tracker.


Core Idea

The core idea of BitTorrent is to download a file from multiple machines concurrently.

We saw that download speed is limited by the upload capacity of the sender: be it a user, a server, or anything else. If you can download at 100 Mbps but the sender can only upload at 60 Mbps, you'll max out at 60 Mbps.

But what if instead of downloading from one machine, we distributed the file across the network and connected to 50 different clients simultaneously to download in parallel? That's the idea behind BitTorrent.

p2p1

  • Faster downloads.
  • Upload load is distributed among peers. Every peer may hold some fragment of the file and can serve it to others. You still get high download speeds, but the upload burden is shared across the network.
  • A large number of downloads puts only a small load on each peer, because it's highly distributed.
  • Breaking a file into smaller chunks boosts concurrency.

A Simplified Download Flow

When a user wants to download a file, they sniff around the network to find peers that have the pieces. For this, they use a tracker.

The user goes to the tracker and says "I want this file." The tracker responds with a list of peers that have it. The user then connects directly to those peers and downloads the file.

p2p2

Let's say a user wants a file that has 4 chunks. They go to the tracker, the tracker responds with the list of machines for each chunk, the user talks to those peers, downloads each chunk, and concatenates them locally to get the full file.


Nomenclature & Terminologies

These terms are useful when analyzing BitTorrent’s behavior and algorithms.


1. Pieces and Blocks

  • A file shared on the BitTorrent network is divided into pieces.
  • Each piece is further subdivided into blocks.
  • Data transfer happens at the block level (one block per request).
  • Example:
    • A ~16 MB piece → ~1000 blocks of 16 KB each.
  • A piece is considered valid only if all its blocks are received.
  • The client reconstructs the original file by concatenating all pieces.

Pieces and Blocks


2. Peer Set

  • The peer set is the list of peers a node can connect to for uploading/downloading.
  • Typically obtained from a tracker.
  • Example:
    • If peer A receives {C, E} from the tracker, it exchanges data only with C and E.

Peer Set


3. Active Peer Set

  • A subset of the peer set used for active data transfer.
  • Not all peers are connected simultaneously.
  • Example:
    • Out of 50 peers received, only ~10 may be actively connected.
  • Purpose:
    • Limits bandwidth usage.
    • Reduces network congestion.
    • Improves stability of connections.

Active Peer Set


4. Seeders & Leechers

  • Seeder:

    • A peer that has the complete file.
    • Uploads pieces to others.
  • Leecher:

    • A peer that is still downloading.
    • May also upload already downloaded pieces.

Seeder vs Leecher

Impact on Performance

  • More seeders → higher availability → faster downloads.
  • Few seeders → bottleneck (resembles client-server model).
  • If leechers ≫ seeders:
    • Increased contention.
    • Slower download speeds.

BitTorrent is Popularity-Friendly

New and popular files will have many seeders and download faster. Old or unpopular files have fewer seeders and download slower.

For example, when a new version of an operating system is released, there's a very high chance many people want to download it. Ubuntu and Debian offer official torrent distributions, and there will be many seeders: so whoever wants to download gets fast speeds.


Applications of BitTorrent

  1. Downloading Linux distributions (faster than FTP & HTTP), and large software, movies, games, etc.
  2. Sending patches to users (e.g., security patches). You can run a small BitTorrent-based system where you drop a file into one node and it automatically distributes across every machine in your network, which can then run the patches. Massive data centers use this to power security patch distribution.
  3. Facebook uses this to power massive deployments and distribute build artifacts across servers. Instead of thousands of servers all downloading a binary from one source, it splits the file across multiple places. The network gradually converges and every node ends up with the full file.

The Torrent File

To download or upload any file from the torrent network, you need a .torrent file. This file holds metadata about the file you want to download.

For example, if you want to download Ubuntu from the torrent network, the Ubuntu ISO would have a corresponding .torrent file. You download it, which contains all the metadata, and then use it to fetch the actual file from the network.

torrent-file

Lifecycle of a Torrent File

Seeders are seeding data in the network, and as long as at least 1 seeder is serving the file, the torrent is alive. Otherwise, the torrent is dead.

It's therefore very important to have at least 1 seeder: otherwise nobody can download the file.

What separates BitTorrent from a classic blockchain/cryptocurrency use case is that there's no incentive for anyone to join and stay as a seeder. Cryptocurrency incentivizes participation in the network: BitTorrent doesn't.

user-download-via-http


What Does the Torrent File Hold?

The torrent file is static: no matter when you download it, it will always have the same content.

It holds metadata about the file, not the actual data.

A torrent file is essentially a dictionary of key-value pairs:

  1. announce: URL of the tracker. This tells your torrent client which tracker to contact to find peers in the network.
  2. created by: Name and version of the program that created the torrent.
  3. creation date: Creation timestamp in Unix epoch.
  4. encoding: Encoding used for strings in the info dictionary. Defaults to UTF-8.
  5. comment: Optional comment from the author.
  6. info: A dictionary describing the file(s) of the torrent. For example, if you're downloading Ubuntu, it would contain information about the Ubuntu image itself.

BitTorrent supports two types of downloads: single-file and multi-file. Depending on the type, the structure of the info dictionary varies.

single-file-format

multi-file-format

File Data Information

The info dictionary also stores information about the pieces:

  1. piece length: Number of bytes in each piece.
  2. pieces: 20-byte SHA1 hash values for each piece, concatenated together.

pieces

Since a file is split into equal-size pieces, piece length tells you how big each one is.

For example, a 1 GB file with a piece size of 1 MB would have 1024 pieces. The torrent file doesn't store the actual piece data: instead, for each piece it stores a 20-byte SHA1 hash and concatenates all of them together.


Torrent File Format: Bencoding

Torrent files use a custom encoding format called bencoding: not JSON.

When you open a .torrent file in a client like qBittorrent, the client first decodes the bencoded file to extract the metadata. The component that does this is called a bencoding decoder.


Bencoding Specification

Every torrent file is a bencoded dictionary. The bencoding specification supports only 4 data types: strings, integers, lists, and dictionaries.

bencoding-breakdown

So the entire torrent file is a bencoded dictionary.

Wrote one in Go: understood it way better. (https://ufraan.dev/projects/bencode-foo)


The BitTorrent Architecture

The BitTorrent architecture consists of 4 entities:

  1. .torrent file
  2. Trackers
  3. Seeders
  4. Leechers

Pieces

Whenever a file is shared on the BitTorrent network, it's not shared in its entirety. It's first broken into pieces, which become the unit of transmission.

The downloader gets these pieces and concatenates them locally to form the complete file. All pieces are the same length.

For example, a 3 MB file with a piece size of 1 MB creates 3 pieces: p1, p2, p3.

piecesinbt

When you join the network and download a piece from a seeder, you immediately broadcast to the rest of the network: "I have this piece now: if anyone needs it, come to me instead."

As each peer downloads any piece, they inform everyone else. This is the power of P2P.

pieces


Torrent File

A metafile that holds static information about the file: filename, size, piece information, etc. It does not hold the actual data.

One critical field it holds is the announce URL: the tracker URL. The tracker is the only central entity in the BitTorrent architecture, acting as a metadata store where you can get information about other peers and the torrent.

seeder v leecher

stp diagram

Each torrent file is uniquely identified by an infohash: a SHA1 hash of the info section of the .torrent file. The .torrent file itself is typically downloaded through a regular HTTP web server.


Tracker

The tracker is the only central entity in this P2P network, and it's very lightweight.

For a given torrent, the .torrent file contains the tracker URL. Every peer in the network connects to this tracker to get metadata about who else is in the network.

It's a decentralized network where there can be multiple trackers, but you'll connect to one tracker for a given .torrent file.

Note: The tracker does not download or transfer files. It only holds information about peers and their distribution: that's why it's so lightweight.

The core jobs of a tracker:

  1. Keep track of peers that hold the file.
  2. Keep track of peers that are downloading.
  3. Help peers find other peers to download content from.

A tracker is essentially a simple HTTP server that:

  1. Hands out peer information to the network.
  2. Periodically collects stats from peers.

architecture-breakdown-hld

When you have a .torrent file, you first extract info from it, then contact the tracker saying "I want to join your network." The tracker responds with roughly 50 peers that are part of this network.

peer-set-and-state

The tracker doesn't just send info to users: peers in the network also periodically report back to the tracker: downloaded amount, uploaded amount, which torrent they're part of, etc.

peer-set-and-connetions

peer-set-gossip

Top comments (0)