Kavya Sahai

Posted on Oct 9 • Originally published at acaduveda.online

How does the Internet Work? Part 2

#webdev #javascript #beginners #programming

In this lesson/post, we'll talk about Networks.

Introduction to Bits, Signals, and Packets

The ability to deliver and exchange information over the world's communication networks has revolutionized how people work, play, and live. At the turn of the century, the U.S. National Academy of Engineering listed 20 technologies with the most significant societal impact in the 20th century. This list included life-changing innovations like electrification, the automobile, and the airplane. It also included four communication technologies—radio and television, the telephone, the Internet, and computers—whose technological underpinnings are the focus of this book.

Surprisingly, the Internet ranked only #13, as it was developed late in the century, and the committee believed its most significant impacts would occur in the 21st century. That prediction seems accurate: the spread of wireless networks and mobile devices, the rise of social networks, and anytime, anywhere communication have changed commerce, social connections, and even driven societal and political changes.

Communication is fundamental to modern life. It's hard to imagine life without the Internet, its applications, or networked mobile devices. By early 2011, over 5 billion mobile phones were active worldwide, over a billion with "broadband" connectivity – exceeding the number of people with electricity, shoes, toothbrushes, or toilets in 2011!

Objectives

This post/lesson(whatever you call it) aims to explain how communication networks work. This is worth studying for two reasons:

to understand the design principles and analysis techniques used in these systems;
because the technical ideas are relevant to other fields in computer science (CS) and electrical engineering (EE). This makes studying communication systems a great way to learn broadly applicable concepts.

At MIT(Massachusetts Institute of Technology), sophomores take such a course, with some exposure to probability and Fourier series.

Traditionally, "low-level communication" (how information moves across a single link) has been considered an EE topic, while "networking" (building networks of multiple links) has been a CS topic. Traditional digital communication courses rarely address network building, and computer network courses treat communication over physical links as a black box. This book aims to bridge this gap, providing a comprehensive understanding of both aspects.

We'll cover communication systems end-to-end: from the source with information to transmit, to packets (messages broken down for transmission), to bits ("0" or "1"), to signals (analog waveforms sent over links like wires, fiber, radio, or acoustic waves). We'll examine diverse networks: simple point-to-point links, shared media with multiple nodes, and larger multi-hop networks connected to form even bigger networks.

Themes

Three challenges are central to digital communication: reliability, sharing, and scalability. This introductory section focuses on the first two.

Reliability

Many factors make communication unreliable. We'll study techniques to improve reliability, all of which employ redundancy to achieve reliability with unreliable components, relying on independent component failures.

The main challenge is overcoming noise, interference, bit errors, packet losses (from uncorrectable errors, queue overflows, or link/software failures), all of which degrade communication quality.

Besides reliability, speed is also important. Reliability techniques often use redundancy, reducing speed. Many communication systems balance reliability and speed.

Communication speeds have increased dramatically, from kilobits per second in the early 1980s to 100+ Megabits per second wirelessly and 1-10 Gigabits per second over wired links today.

We'll explore why communication is unreliable and how to address it, using error-correcting codes, handling inter-symbol interference, retransmission protocols, and fault-tolerant routing.

Efficient Sharing

Dedicated links for every node pair are prohibitively expensive. Sharing is essential. We'll study sharing point-to-point links, shared media, and multi-hop networks.

We'll cover sharing a medium (relevant to Ethernet, WiFi, cellular networks, and satellite networks), modulation/demodulation (transmitting over different frequencies), and medium access control (MAC) protocols (rules governing network node behavior). We'll explore time sharing (each node transmits for a dedicated time) and frequency sharing (dividing bandwidth). Then we'll move on to multi-hop networks, where many communications share links, orchestrated by switches.

Key questions include how multiple communications share the network, how messages traverse the network, and how to ensure reliable communication across a multi-hop network.

Sharing techniques and reliability mechanisms determine network efficiency. Efficiency can be framed as minimizing cost for given requirements or maximizing "useful work" (aggregate throughput, throughput variation, and average delay/latency) for a given network. This post focuses on throughput and latency.

Scalability

Scalability (designing networks that handle large sizes) is important. This post touches upon it briefly, leaving detailed discussion for later lessons.

Outline and Plan

The lesson is divided into four parts: the source, and the abstractions of bits, signals, and packets, studied in that order.

The source: We begin with the basics of information, entropy, source coding (data compression), Huffman codes, and the Lempel-Ziv-Welch algorithm.
Bits: We address overcoming bit errors with error-correcting codes: linear block codes and convolutional codes.
Signals: We cover modulation/demodulation, modeling signal distortions with linear time-invariant (LTI) channels, time/frequency domain representations, and frequency response of channels/filters.
Packets: We study medium sharing with MAC protocols, routing in multi-hop networks, and reliable data transport protocols.

Information, Entropy, and the Motivation for Source Codes

Claude Shannon's theory of information (developed in the late 1940s) is a groundbreaking idea that has transformed many technological fields, especially communication systems and networks. This chapter introduces the intuition behind information, defines it mathematically, and links it to entropy, a property of data sources.

These concepts enable efficient data compression before communication or storage, allowing for recovery of the original data without distortion. A core idea is source coding, which maps each symbol from a data source to a codeword with desirable properties. A message is a sequence of symbols. Our focus is lossless source coding, where the original message can be perfectly recovered from an uncorrupted transmission.

Information and Entropy

Shannon, building on Hartley's work, realized that information can be defined generally, independent of the application or message semantics. Communication involves a sender (S) choosing and sending one of several possible messages to a receiver (R). For example, S could indicate the British arrival route:

"1" if by land
"2" if by sea

If each choice is equally likely (no prior knowledge), the information conveyed is 1 bit. This bit can encode the choice. 1000 such independent events can be encoded with 1000 bits.

If prior knowledge suggests a higher probability for one choice (e.g., land due to a storm), then the less likely message (sea) conveys more information. Similarly, a temperature of 75°F in Boston in January is more informative than 32°F.

Information about an event depends on its probability (p). Lower probability (less likely event) implies higher information.

Definition of information

Hartley defined information (I) as:

I = log(1/p) = -log(p) (2.1)

The base-2 logarithm is used, and the unit of information is a bit. The logarithmic function ensures additivity: information from two independent events A and B (probabilities pA and pB) adds up:

IA + IB = log(1/pA) + log(1/pB) = log(1/(pA*pB)) = log(1/P(A and B))

Examples

Entropy

Entropy (H) quantifies the expected information from a set of mutually exclusive events. If event i has probability pi:

H(p1, p2, ... pN) = Σ pi * log(1/pi) (2.2)

Entropy is measured in bits and represents the average uncertainty. For two mutually exclusive events with probabilities p and 1-p:

H(p, 1-p) = -p*log(p) - (1-p)*log(1-p) (2.3)

H(p) is symmetric around p = 1/2, with a maximum of 1 bit at p = 1/2. H(0) = H(1) = 0. Entropy is always non-negative and H(p1, p2, ... pN) ≤ log N.

Source Codes

Source coding efficiently encodes messages. Many messages have standard encodings (ASCII, image pixels, audio samples). These are fixed-length encodings, easily manipulated.

However, these encodings can be inefficient. In English text, 'e' occurs more frequently than 'x'. Encoding 'e' with fewer bits and 'x' with more bits can shorten the average message length. This aligns with the information concept: frequent symbols (higher pi) convey less information and need fewer bits.

A code maps information to bit sequences. A codeword is a bit sequence in the code. Source codes aim to compress data by matching encoded message length to the information content (entropy).

Example: encoding 1000 grades (A, B, C, D) with probabilities:

Fixed-length encoding: 2 bits/grade (total 2000 bits). Decoding is simple, but inefficient.

Variable-length encoding (example): A=10, B=0, C=110, D=111. Length depends on the message. Decoding requires sequential processing. This example code is not optimal.

How Much Compression Is Possible?

Ideally, compression uses only the necessary bits to represent the information. Entropy (Equation 2.2) provides the lower bound on the average number of bits needed to avoid ambiguity. Sending fewer bits leads to unresolved choices at the receiver.

In the grades example, the expected information per grade is 1.626 bits. Encoding 1000 grades requires 1626 bits on average. The example variable-length code uses 1667 bits, so it's not optimal. Encoding sequences of grades can improve compression.

Finding good codes is challenging. Sometimes, context-specific codes can be very efficient (e.g., encoding Shakespeare sonnets using just 8 bits if both sender and receiver know all the sonnets).

Why Compression?

Compression offers several advantages:

Faster transmission: Shorter messages reduce transmission time and free up network capacity.
Efficient resource use: Smaller messages conserve network resources (bandwidth, buffers), accommodating more communications.
Improved throughput over error-prone links: Compression before error-correction coding allows for optimized redundancy for error resilience.

Compression is typically an end-to-end function (application layer), but can also be applied at the link layer if the data is compressible and contains redundancy. The next chapter covers Huffman Codes and Lempel-Ziv-Welch (LZW) compression.

Compression Algorithms: Huffman and Lempel-Ziv-Welch (LZW)

This chapter discusses two source coding algorithms for message compression (where a message is a sequence of symbols): Huffman coding and Lempel-Ziv-Welch (LZW). Huffman coding is efficient when symbol probabilities are known and independent. LZW is adaptive, requiring no prior knowledge of probabilities. Both are widely used (GIF, JPEG, MPEG, MP3, etc.).

Properties of Good Source Codes

A code maps symbols from an alphabet (text, pixel intensities, or abstract symbols) to codewords. Binary codewords are convenient for many communication channels.

Example: Encoding grades in 6.02: A=1, B=01, C=000, D=001. A sequence of grades could be transmitted as 0010001110100001, decoded as "DCAAABCB".

Instantaneous codes: A symbol is decoded as soon as its codeword is received. The grade encoding above is instantaneous. Non-instantaneous codes require looking ahead or decoding from the end, making them harder to decode.

Code trees and prefix-free codes: A code tree visualizes codes. In a binary code tree, edges are labeled with 0 or 1. The path from the root to a symbol gives its encoding. Prefix-free codes have symbols only at leaf nodes, ensuring no codeword is a prefix of another. Prefix-free codes are instantaneous.

Expected code length (L): For N symbols with probabilities pi and codeword lengths li: L = Σ pi * li. Codes with small L are desirable for compression. An optimal code has minimum L. Shannon showed that L ≥ H (entropy), and codes achieving entropy asymptotically exist.

Huffman Codes

Huffman codes provide efficient encoding given symbol probabilities. More likely symbols get shorter codes.

Huffman's algorithm builds the code tree bottom-up, starting with the least probable symbols:

Input: Set S of (probability, symbol) tuples.
Combine the two least probable symbols into a new tuple (combined symbol, sum of probabilities). Add the new tuple to S.
Repeat step 2 until S has only one tuple (the root).

The resulting code tree defines the variable-length code.

Properties of Huffman Codes

Non-uniqueness: Multiple optimal codes (and trees) may exist due to arbitrary tie-breaking during tree construction.
Optimality: Huffman codes have minimal expected length among instantaneous codes for independent symbols drawn from a fixed, known probability distribution.

LZW: An Adaptive Variable-length Source Code

Simple Huffman coding based on letter probabilities has limitations. Adaptive encoding, which adjusts to the message content, can perform better. LZW is a popular adaptive algorithm.

LZW builds a string table mapping symbol sequences to/from N-bit indices. The table (2^N entries) is initialized with single-byte sequences (0-255). Encoding:

Accumulate bytes while the sequence (S) is in the table.
When S + next byte (b) is not in the table:
- Transmit the code for S.
- Add S + b to the table.
- Reset S to b.
Repeat until all bytes are processed; then transmit the code for the final S.

The decoder rebuilds the table as it receives codes, enabling it to recover the original message.

LZW observations:

Greedy: Finds longest matches.
Adaptive: Table entries reflect actual message sequences.
Suboptimal: May include unused entries.
Variable compression: Compression increases as the table fills.
Table reinitialization: The table is reset when full, adapting to changing probabilities. Some variants use a CLEAR code for explicit resets.

Why Digital? Communication Abstractions and Digital Signaling

This chapter explains analog and digital communication, focusing on the problems with analog and the rationale for digital. It presents methods for sending and receiving digital data over analog communication links (necessary because physical links are fundamentally analog at the lowest level). It also introduces a layered communication model: messages → packets → bits → signals, which forms the basis for the rest of the book.

Sources of Data

Communication technologies enable users (human or application) to exchange messages. Data sources can be inherently digital (e.g., computer-generated data) or analog (e.g., video, audio, sensor data). Modern systems often digitize all data, regardless of the source.

Why Digital?

Digital communication excels for two reasons:

Modularity: The digital abstraction enables building large systems by composing modules.
Sophisticated Processing: It allows using algorithms to enhance data quality and system performance.

However, physical links are analog, requiring digital-to-analog and analog-to-digital conversions.

Why analog is natural in many applications

Analog representations map well to physical link properties. Examples:

Black-and-white TV: Image luminance (shade of gray) is represented by voltage levels (0V = black, 1V = white).
Analog telephone: Sound waves are converted to electrical signals.

Analog signals can be sent at different voltage/intensity/wavelength levels, easily measured by the receiver.

So why not analog?

No communication link is error-free. Noise and distortions perturb signals, and these errors accumulate across multiple transmission stages (cascading effect). This makes analog systems difficult to build reliably, especially complex ones. Digital signaling addresses this problem.

Digital Signaling: Mapping Bits to Signals

Digital signaling uses discrete values to represent bits, enabling robust distinction from noise. Binary signaling uses two voltages: V0 for "0" and V1 for "1". V0 and V1 must be sufficiently separated to handle noise.

The receiver uses a threshold voltage (Vth = (V0 + V1) / 2) to map received voltages to bits (voltage < Vth → "0", voltage ≥ Vth → "1"). Precisely measuring near Vth is difficult, but not crucial if V0 and V1 are far apart.

Signals in this lesson

Transmission signals are fixed-voltage waveforms held for a specific time. Continuous signals are represented by discrete-time samples. The sample rate (samples per second) is agreed upon by sender and receiver. Sample interval is the time between samples.

Clocking Transmissions

Sender and receiver must agree on a clock rate. Bits are sent on clock transitions. Samples_per_bit is the number of samples per bit. The receiver infers clock edges from transitions in received samples.

Challenges:

Clock synchronization: Sender and receiver clocks may differ. Solution: Adaptive timing at the receiver based on detected transitions.
Ensuring frequent transitions: Needed for clock recovery. Solution: Line coding (e.g., 8b/10b).

Clock and Data Recovery

Receiver clock may be slightly faster or slower than the sender's. The receiver dynamically adjusts its sampling index based on transitions:

If the halfway sample between current and previous points is the same as the current sample (sampling too late): Increment the index by samples_per_bit - 1.
If the halfway sample is different (sampling too early): Increment by samples_per_bit + 1.
If no transition: Increment by samples_per_bit.

Initial correction is aided by a training sequence of alternating 0s and 1s at the start of transmission.

Line Coding with 8b/10b

8b/10b addresses DC balance and frequent transitions. It maps 8-bit symbols to 10-bit transmission symbols, ensuring:

Maximum run of 0s or 1s is 5 bits.
Maximum difference between 1s and 0s count at any sample is 6.
Special 7-bit sequences enable packet boundary detection.

Encoding process:

Packet data is divided into bytes.
Each byte is mapped to a 10-bit symbol.
Packets are framed with a training sequence and a SYNC pattern for synchronization.

Communication Abstractions

A communication system involves several modules: Mapper (bits to signals), Demapper (signals to bits), Channel coding (error correction), Channel decoding. Messages are broken into packets and transmitted over multiple links. The three key abstractions are packets, bits, and signals. The book focuses on these and their interactions within the communication network.

Coping with Bit Errors using Error Correction Codes

This chapter discusses techniques for reliable digital communication, focusing on adding redundancy to combat inevitable bit errors in communication channels and storage media. The core concept is channel coding: encoding at the sender and decoding at the receiver to correct errors or detect uncorrectable errors. The chapter focuses on error correction codes, specifically linear block codes and (later) convolutional codes.

Bit Errors and BSC

The Binary Symmetric Channel (BSC) model characterizes bit errors with a single parameter, ε (bit-flip probability), where a transmitted bit flips with probability ε, independently of other bits. ε can be estimated empirically. Packet error probability (PER) for a packet of size S bits:

PER = 1 - (1 - ε)^S (5.1)

When ε << 1: PER ≈ Sε (5.2)

Real-world channels may exhibit burst errors where error probability depends on history (higher if recent bits were also in error).

The Simplest Code: Repetition

A repetition code encodes bit b as n copies of b. The code rate is 1/n. Maximum likelihood decoding selects the most likely message given the received codeword. For a BSC, this means choosing the codeword with the most bits in common with the received one.

Decoding error probability for repetition code (see Equation 5.3 in the original text). The probability decreases exponentially with the code rate but is inefficient due to high overhead.

Embeddings and Hamming Distance

Hamming distance (HD) between two n-bit words is the number of differing bit positions. For single error correction (SEC), HD between any two valid codewords must be at least 3. A code with minimum Hamming distance D can detect D-1 errors and correct floor(D/2 -1) errors.

Linear Block Codes and Parity Calculations

Linear block codes (n, k) map k-bit messages to n-bit codewords using linear functions (weighted sums) over message bits. Algebraic block codes perform such operations within the block. (n,k,d) codes denote block codes with a minimum Hamming distance 'd'. Code rate = k/n.

Linear codes require that the sum of any two codewords is also a codeword. The all-zeros codeword exists in any linear code. The minimum Hamming distance of a linear block code equals the smallest non-zero codeword's weight (number of 1s). Binary linear codes use modulo-2 arithmetic (Galois Field F2).

Rectangular Parity SEC Code

Parity is the modulo-2 sum of bits. Even parity code adds a parity bit to each message, making the codeword have even parity. This detects single errors (HD=2).

The rectangular parity code arranges k bits into an r x c array and adds row and column parities. Codeword: message + row parities + column parities. Length: n = rc + r + c. This code is an SEC code (HD=3). Decoding involves checking row and column parities and correcting the corresponding bit if both parities indicate an error.

How many parity bits are needed in an SEC code?

Any linear code can be converted to a systematic code (message bits followed by parity bits). For SEC, the number of parity combinations (2^(n-k)) must be greater than or equal to the number of correctable situations (n+1):

n + 1 ≤ 2^(n-k) (5.6)

Parity bit count grows at least logarithmically with message bits.

Hamming Codes

Hamming codes are efficient SEC codes with logarithmic parity bit growth. Each parity bit protects multiple data bits, and single errors produce unique parity error combinations.

Syndrome bits (Ei) are computed at the receiver by checking parities. The combination of syndrome bits indicates the erroneous bit (if any).

Is There a Logic to the Hamming Code Construction?

Hamming code construction:

Assign parity bits to indices that are powers of 2 (1, 2, 4, 8,...).
Assign data bits to remaining indices.
Data bit di is included in the parity computation for pj if and only if index(pj) contributes to index(di) in binary representation (bitwise AND is non-zero).

The syndrome bits (E3E2E1 in the (7,4) example) treated as a binary number give the index of the bit to correct.

Note: This is just the information, one needs for Web Development. For Sysops, the networks and their fundamentals are a two-semester course.