- 1 billion events/day at LinkedIn launch in 2011 — immediate production scale from day one
- 7 trillion messages/day by 2019 — same core architecture, 7,000× growth
- ~50 MB/sec Kafka producer throughput vs ~2 MB/sec ActiveMQ — in the original 2011 benchmark
- 9 bytes per-message overhead vs 144 bytes in ActiveMQ — 16× storage efficiency
- Stateless brokers — consumers track their own offset; broker memory doesn't scale with consumers
- 80%+ of Fortune 100 run Kafka today; Confluent IPO'd at $4.5B valuation in 2021
In 2010, LinkedIn was drowning in data it couldn't move. Every ML model, every recommendation engine, every real-time feature was starving because there was no reliable way to get activity data from the website into the systems that needed it. Jay Kreps, Jun Rao, and Neha Narkhede spent a year building a fix. They named it after Franz Kafka. The rest of the internet adopted it.
The Story
By 2010, LinkedIn had dozens of data source systems and dozens of consumer systems — ML models, analytics pipelines, search indexers, real-time features — all needing the same activity stream data. The solution was point-to-point custom pipelines: each one custom-built, each one brittle, none sharing infrastructure. Adding one new data source meant writing N new pipelines. Adding one new consumer meant updating M existing sources. Jay Kreps, leading data infrastructure engineering, described the root cause directly: "Everyone wanted to build fancy machine-learning algorithms, but without the data, the algorithms were useless. Getting the data from source systems and reliably moving it around was very difficult."
Kreps, alongside Jun Rao (from IBM's database group) and Neha Narkhede (from Oracle), evaluated every existing solution. ActiveMQ (an open-source message broker implementing JMS, designed for reliable ordered message delivery between enterprise applications) and RabbitMQ (a message broker built around AMQP, designed for flexible routing and delivery guarantees) were built for a different problem — reliable delivery of individual task messages, not high-throughput streaming of millions of activity events. Their per-message broker state tracking consumed memory proportional to outstanding messages. They couldn't support the scenario where a Hadoop job needed to replay yesterday's activity data. Most critically: ActiveMQ's message format carried 144 bytes of overhead per message. LinkedIn needed millions of messages per second.
Problem
LinkedIn's Data Was Locked in Silos
By 2010, LinkedIn had an N×M integration problem — every data source needed a custom pipeline to every data destination. Existing messaging systems (ActiveMQ, RabbitMQ) were designed for task queues, not event streams, and couldn't handle LinkedIn's throughput requirements or support replay.
Cause
No Tool Existed for High-Throughput Real-Time Event Streaming
Batch systems (Hadoop) could handle large volumes but only hours later. Traditional message queues could deliver in real-time but couldn't scale to LinkedIn's volume or support replay. No system simultaneously provided high throughput, low latency, durability, replayability, and horizontal scalability. The three engineers concluded that the tool they needed did not exist.
Solution
One Year Building Kafka: The Append-Only Distributed Log
Kreps, Rao, and Narkhede spent approximately one year building the first version of Kafka. The core architectural decision was treating the message store as an append-only log rather than a queue. This single choice enabled sequential disk I/O (orders of magnitude faster than random I/O), stateless brokers (consumers track their own position), arbitrary replay (consumers read from any offset), and horizontal partitioning (each partition is an independent log that scales independently).
Result
1 Billion Events Per Day at Launch, 7 Trillion by 2019
Kafka went into production at LinkedIn in 2011 and immediately processed over 1 billion events per day. LinkedIn open-sourced it in early 2011. It became an Apache Top-Level Project in October 2012. By 2015: 1 trillion messages per day. By 2019: 7 trillion. Kreps, Narkhede, and Rao left LinkedIn in November 2014 to found Confluent, building the commercial ecosystem around Kafka.
I thought that since Kafka was a system optimized for writing, using a writer's name would make sense. I had taken a lot of lit classes in college and liked Franz Kafka. Plus the name sounded cool for an open source project.
— Jay Kreps, on naming Kafka, via Quora
The Fix
Five Design Decisions That Made Kafka Fast
Kafka's performance advantage was not clever optimisation of a standard architecture — it was a fundamentally different architecture where every key decision reinforced the same goal: maximise throughput for streaming event data. Five decisions stand out as architecturally defining, and each was a deliberate rejection of how existing messaging systems had been built.
- ~50 MB/s — Kafka producer throughput in the original 2011 benchmark vs ~2 MB/s for ActiveMQ at 200-byte messages
- 9 bytes — per-message overhead in Kafka vs 144 bytes in ActiveMQ — 16× storage efficiency
- Stateless — Kafka brokers; consumer offset tracking is done by the consumer, not the broker
- Sequential — disk access pattern for both writes and reads; append-only means no random I/O
// The five key Kafka design decisions illustrated in code
// DECISION 1: Append-only log storage (not a queue)
// Each partition is a directory of sequential segment files
// /kafka-logs/my-topic-0/00000000000000000000.log
// /kafka-logs/my-topic-0/00000000000000100000.log
// → Sequential writes: disk seeks are expensive; sequential I/O is ~100x faster
// DECISION 2: Consumer tracks its own offset — broker holds no state
long consumerOffset = consumer.position(topicPartition); // consumer owns this
// → Brokers are stateless: no per-consumer memory, no ack tracking overhead
consumer.seek(topicPartition, 0); // replay from the beginning — any time
// DECISION 3: Topics partitioned for horizontal scale
ProducerRecord record = new ProducerRecord<>(
"user-activity",
userId, // partition key: same user → same partition = ordered per user
eventJson // the message payload
);
// → N partitions = N consumers in parallel = linear throughput scaling
// DECISION 4: Batch I/O from client to broker
props.put("batch.size", 16384); // batch up to 16KB before sending
props.put("linger.ms", 5); // or wait 5ms for the batch to fill
// Original paper: batch size 50 improved throughput ~10x vs batch size 1
// DECISION 5: Zero-copy transfer via OS sendfile()
// Consumer fetch path: disk page cache → network socket (no userspace copy)
// → No data enters JVM heap → no GC pressure → consistent low latency
// → Delivers data at near-network-hardware-limit throughput
Kafka vs traditional message queues — original 2011 benchmarks and design properties:
| Property | ActiveMQ / RabbitMQ | Kafka |
|---|---|---|
| Storage model | Queue — messages deleted after ack | Append-only log — retained by time/size |
| Broker state | Tracks ack state per message per consumer | Stateless — consumers track own offset |
| Producer throughput | ~2 MB/sec (ActiveMQ) | ~50 MB/sec (batch size 50) |
| Message overhead | 144 bytes (ActiveMQ JMS header) | 9 bytes |
| Consumer replay | Not supported | Supported — seek to any offset |
| Horizontal scale | Limited (complex cluster configs) | Native — add partitions, add consumers |
| Use case fit | Task queues, guaranteed delivery, routing | Event streaming, log aggregation, activity tracking |
Zero-copy: the OS kernel trick that doubled throughput
One of Kafka's most impactful performance optimisations is invisible to application code. In a traditional data transfer, data moves: disk → kernel buffer → userspace → socket buffer → network. In Kafka's consumer path, the OS sendfile() syscall transfers data directly from the page cache to the network socket, bypassing userspace entirely. No data is copied into the JVM heap — no GC pressure, no object allocation overhead. At LinkedIn's throughput rates, this optimisation alone accounts for significant throughput gains and, more importantly, consistent low latency even under high load.
The log/table duality: Jay Kreps' deeper insight
In his 2013 essay "The Log," Kreps articulated a concept beyond Kafka's implementation: the log/table duality — any database table can be derived by replaying a log of changes from the beginning, and any log can be materialised into a table by applying each event as a state update. Every database table is a log in disguise. Every stream of events can be materialised into a table. This duality means a Kafka topic is simultaneously a stream and a database — query it as a stream in motion (stream processing) or materialise it as a snapshot (a table). This insight became the foundation for Kafka Streams, ksqlDB, and the entire stream-processing ecosystem that followed.
Architecture
Kafka's architecture has three layers. The storage layer is a set of partitioned, replicated append-only log files on disk — each partition is an independent, totally ordered sequence of records. The broker layer is a cluster of server processes managing partition assignment, replication, and client connections — holding no consumer state. The client layer is producers writing to partitions and consumer groups reading from them, each group maintaining its own independent offset per partition.
Before Kafka: N×M Integration Spaghetti
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
After Kafka: The Centralised Log Hub
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
Inside Kafka: Topics, Partitions, Offsets, and Consumer Groups
View interactive diagram on TechLogStack →
Interactive diagram available on TechLogStack (link above).
Lessons
Before building, verify no existing tool solves your problem at your scale. The Kafka team evaluated ActiveMQ, RabbitMQ, and existing log aggregation systems before building. Their conclusion — existing tools were designed for the wrong problem — was evidence-based. The benchmark (50 MB/sec vs 2 MB/sec) made the decision concrete. Never rebuild what can be adopted; never adopt what demonstrably can't serve your workload.
The append-only log (a data structure where records are only ever added to the end, never modified in place — enabling sequential I/O, arbitrary consumer replay, and stateless brokers) is the universal data integration primitive. Any system moving data between producers and consumers is implementing a log, whether it knows it or not. Recognising and building explicitly on this pattern is what gave Kafka its performance advantage and its flexibility.
Stateless brokers make systems horizontally scalable in ways stateful brokers cannot match. When the broker tracks delivery state per consumer per message, broker memory scales with consumers × outstanding messages. When consumers track their own offsets, broker memory scales with partitions only. This single architectural choice is why Kafka can serve hundreds of consumer groups without broker degradation.
Sequential I/O is dramatically faster than random I/O on both HDDs and SSDs. An append-only log turns a bursty stream of writes into sequential disk operations, allowing Kafka to approach disk hardware throughput limits. Systems that update records in-place pay random I/O costs on every write. Kafka writes append-only and leverages the OS page cache for reads, achieving throughput that surprised the entire industry.
Open-sourcing infrastructure that solves a universal problem creates compounding returns. LinkedIn open-sourced Kafka in 2011 because the team recognised it solved a problem every data-intensive company had. Community contributions from Netflix, Uber, Twitter, and thousands of others built tooling LinkedIn could never have built alone: Kafka Streams, Kafka Connect, ksqlDB, MirrorMaker, Schema Registry. The return on open-sourcing infrastructure is measured in ecosystem, not just code.
Engineering Glossary
Append-only log — a data structure where records are only ever added to the end, never modified in place. Enables sequential disk I/O, arbitrary consumer replay, and stateless brokers. The core data structure underlying Kafka's architecture and the reason for its performance advantage over traditional message queues.
Consumer group — a set of Kafka consumers that collectively read from a topic, with each partition assigned to exactly one consumer in the group at a time. Enables parallel consumption: a topic with N partitions can be consumed by up to N consumers simultaneously.
Consumer offset — the position of a consumer within a partition, tracking which messages have been read. In Kafka, consumers (not brokers) own and commit their offsets — the key architectural decision that makes Kafka brokers stateless.
Log/table duality — the mathematical relationship where any database table can be derived by replaying a log of changes from the beginning, and any log can be materialised into a table by applying each event as a state update. The theoretical foundation for Kafka Streams and ksqlDB.
Partition — the unit of parallelism in Kafka. Each topic is divided into one or more partitions, each of which is an independent append-only log stored on a single broker. Producers write to partitions by key; consumers read from partitions independently.
Stateless broker — a broker that holds no per-consumer delivery state. Kafka brokers store bytes in partitioned logs; consumers own their own offset positions. Broker memory scales with partition count, not consumer count — the property that makes Kafka horizontally scalable to hundreds of consumer groups without broker degradation.
Write-ahead log (WAL) — a sequential record of all changes made to a database, written to disk before the changes are applied. Used for crash recovery and replication in MySQL, Postgres, and virtually every serious database. The inspiration for Kafka's append-only log architecture.
Zero-copy transfer — the use of the OS sendfile() syscall to transfer data directly from the kernel page cache to a network socket, bypassing userspace entirely. Used in Kafka's consumer fetch path to eliminate JVM heap copies, GC pressure, and the associated latency spikes at high throughput.
This case is a plain-English retelling of publicly available engineering material.
Read the full case on TechLogStack →
(Interactive diagrams, source links, and the full reader experience)
TechLogStack — built at scale, broken in public, rebuilt by engineers.
Top comments (0)