Ramsis Hammadi

Posted on May 21

X's Feed Ranking Algorithm: How Grok Ranks 500M Posts in 200ms

#ai #llm #architecture #webdev

X's Feed Ranking Algorithm: How Grok Ranks 500M Posts in 200ms

TL;DR Summary

xAI open-sourced the full production code behind X's For You feed on GitHub — 22.5k stars, Apache 2.0, commercial use allowed
The system pulls from 500 million daily posts, narrows to candidates, and ranks them in under 200 milliseconds using a Grok-based transformer
Zero hand-engineered features — the Grok transformer predicts 14 engagement types (like, reply, repost, click, dwell, block, report) and combines them into a weighted score
Four components: Home Mixer (orchestration), Thunder (in-network, sub-ms lookups), Phoenix (Grok transformer retrieval + ranking), Candidate Pipeline (reusable framework)
A pre-trained mini Phoenix model ships with the repo — run inference without training anything

Direct Answer Block

X's For You feed algorithm is a four-component recommendation system: Home Mixer orchestrates the pipeline, Thunder serves in-network posts from followed accounts at sub-millisecond speed, Phoenix uses a Grok-based transformer to retrieve out-of-network posts and rank all candidates by predicting 14 engagement probabilities, and the Candidate Pipeline provides a reusable, composable framework for the entire system.

Introduction

Open-sourcing a recommendation algorithm that serves hundreds of millions of users isn't just a transparency gesture — it's an architecture masterclass. X's system processes 500 million daily posts, narrows them to roughly 1,500 candidates, and ranks everything in under 200ms. The Grok-based transformer does all the heavy lifting with zero hand-engineered features. Every heuristic eliminated. Every manual weight removed. Here's how the pipeline actually works, component by component.

How does the X For You feed rank 500 million posts in under 200 milliseconds?

The system achieves this speed through a layered pipeline that progressively narrows the candidate set:

Thunder serves in-network posts instantly — an in-memory post store with sub-millisecond lookups. Posts from accounts you follow are already indexed and retrievable without hitting any external database. Thunder consumes post create/delete events from Kafka and automatically trims posts older than the retention period.
Phoenix Retrieval finds out-of-network candidates — a two-tower model encodes users and posts into embeddings, then retrieves top-K candidates via dot product similarity across the global corpus. This ML-based search discovers content from accounts you don't follow.
Pre-scoring filters eliminate ineligible candidates — duplicates, old posts, self-posts, blocked/muted accounts, previously seen/served posts, muted keywords, and paywalled content are removed before the expensive transformer inference runs.
Phoenix Ranking scores remaining candidates — the Grok-based transformer predicts 14 engagement probabilities for each post. The Weighted Scorer combines them into a final score.
Selection picks the top K — sorted by final score, with author diversity attenuation to prevent feed monopolization.

"We have eliminated every single hand-engineered feature and most heuristics from the system. The Grok-based transformer does all the heavy lifting." — xAI, from the repository README

The 200ms target is achieved because the expensive ML inference (transformer ranking) runs only on the already-filtered candidate set — roughly 1,500 posts — not on the 500 million raw corpus.

What are the four components — Home Mixer, Thunder, Phoenix, and Candidate Pipeline — and how do they fit together?

Home Mixer (Orchestration Layer)

The entry point. Exposes a gRPC endpoint (ScoredPostsService) that returns ranked posts for a given user. It leverages the Candidate Pipeline framework with 8 stages: Query Hydrators → Sources → Hydrators → Filters → Scorers → Selector → Post-Selection Filters → Side Effects.

The May 15th, 2026 update added query hydrators for user context including followed topics, starter packs, impression bloom filters, IP, mutual follow graphs, and served history.

Thunder (In-Network Post Store)

An in-memory post store that tracks recent posts from all users. Written in Rust. It consumes post create/delete events from Kafka, maintains per-user stores for original posts, replies/reposts, and video posts, and serves in-network candidates from accounts the requesting user follows.

The key performance characteristic: sub-millisecond lookups without hitting an external database. Posts are trimmed automatically after the retention period. This design eliminates the database bottleneck that would make 200ms impossible at X's scale.

Phoenix (Grok Transformer — Retrieval + Ranking)

The ML component with two distinct functions:

Retrieval (Two-Tower Model): The User Tower encodes user features and engagement history into an embedding. The Candidate Tower encodes all posts into embeddings. Similarity search retrieves the top-K posts via dot product.

Ranking (Transformer with Candidate Isolation): Takes user context (engagement history) and candidate posts as input. Uses special attention masking so candidates cannot attend to each other — they can only attend to user context. This ensures a post's score doesn't depend on which other posts are in the batch, making scores consistent and cacheable.

Candidate Pipeline (Reusable Framework)

A Rust trait-based framework defining six traits: Source, Hydrator, Filter, Scorer, Selector, and SideEffect. Sources and hydrators run in parallel where possible, with configurable error handling. This makes the pipeline composable — new candidate sources, filters, or scorers can be added without modifying the framework.

How does the Grok-based Phoenix transformer predict 14 different engagement types and combine them into a single score?

Instead of predicting a single "relevance" score, Phoenix predicts probabilities for 14 distinct actions:

Action	Type
favorite, reply, repost, quote, click, profile_click, video_view, photo_expand, share, dwell, follow_author	Positive
not_interested, block_author, mute_author, report	Negative

The Weighted Scorer combines these into:

Final Score = Σ (weight_i × P(action_i))

Positive actions carry positive weights. Negative actions carry negative weights — pushing down content the user would likely dislike. This multi-action approach is more nuanced than a single relevance score because it captures how a user engages, not just whether they engage.

The transformer implementation is ported from the Grok-1 open source release by xAI, adapted for recommendation system use cases. It uses hash-based embeddings for both retrieval and ranking lookups.

"Rather than predicting a single 'relevance' score, the model predicts probabilities for many actions." — xAI, from the repository README

How does Phoenix's candidate isolation mechanism prevent posts from influencing each other's rankings?

Candidate isolation is one of the five key design decisions highlighted in the repository. During transformer inference, candidates use special attention masking so they cannot attend to each other — only to the user context.

This achieves two critical properties:

Score consistency — a post's score doesn't change based on which other posts happen to be in the same batch. The same post gets the same score whether it's ranked against 10 candidates or 1,500.
Score cacheability — because scores don't depend on batch composition, they can be pre-computed and cached. This is essential for the 200ms latency target at X's scale.

Without candidate isolation, the ranking would exhibit a listwise dependency — a post's score would shift depending on what else was in the ranking pool, making caching impossible and inference costs unpredictable.

The attention mask achieves this by allowing each candidate to attend to the user context sequence but blocking cross-attention between candidates. The transformer still encodes all candidates in a single forward pass (for efficiency), but the attention pattern is constrained to prevent batch composition effects.

Why did xAI eliminate every hand-engineered feature — and what does the transformer learn instead?

Traditional recommendation systems rely heavily on hand-engineered features: text features, author popularity, recency boosts, content category matching, engagement velocity heuristics. Each feature requires engineering effort, A/B testing, and maintenance as user behavior shifts.

xAI's approach replaces all of that with a single principle: let the transformer learn relevance from user engagement sequences.

The transformer takes as input:

User's recent engagement history (what they liked, replied to, shared, clicked)
Candidate post content and metadata
User features (following list, preferences)

From this raw data, it learns to predict the 14 engagement probabilities. No engineer needs to define a "recency weight" or "author popularity multiplier" — the model discovers these patterns from the data.

The benefit, according to the repository: "This significantly reduces the complexity in our data pipelines and serving infrastructure." Features that previously required dedicated data pipelines, feature stores, and serving infrastructure are now learned implicitly by the transformer.

The Author Diversity Scorer is one of the few post-transformer adjustments — it attenuates scores for repeated authors to prevent the feed from being dominated by a single account. This isn't a hand-engineered relevance feature; it's a diversity constraint applied after ML scoring.

What can developers learn from X's composable pipeline architecture and in-memory post store design?

Three architectural lessons stand out:

1. Trait-based pipeline composition

The Candidate Pipeline framework defines six traits (Source, Hydrator, Filter, Scorer, Selector, SideEffect) that new pipeline stages implement. This separates pipeline execution and monitoring from business logic. New candidate sources, filters, or scorers can be added by implementing the relevant trait — no pipeline code changes needed.

2. In-memory serving for latency-critical paths

Thunder demonstrates that at planet scale, the fastest database query is no database query. By keeping recent posts in memory, consuming from Kafka for updates, and trimming old data automatically, Thunder achieves sub-millisecond lookups without any external storage dependency. This pattern is applicable to any system where the working set fits in memory and freshness matters.

3. Parallel execution where independent

The framework runs sources and hydrators in parallel where possible. This isn't just about speed — it's about keeping the GPU pipeline fed during the expensive transformer inference step. If hydration is slow, the GPU sits idle. Parallel execution minimizes idle time.

The repository includes a pre-trained mini Phoenix model (256-dim embeddings, 4 attention heads, 2 transformer layers, ~3 GB) distributed via Git LFS, enabling out-of-the-box inference without training. This makes the system accessible for experimentation and learning — you can study how a production recommendation system works without needing X's training infrastructure.

Frequently Asked Questions

Q: Can I use X's ranking algorithm in a commercial product?

Yes. The repository is licensed under Apache 2.0, which permits commercial use, modification, and distribution. The Grok-1 model weights are separate and have their own license.

Q: What languages is the system written in?

The Candidate Pipeline, Thunder, and Home Mixer are written in Rust (57.4% of the repo). Phoenix (the ML component) is written in Python (42.6%). The Grok-based transformer was ported from xAI's Grok-1 open source release.

Q: Does the repo include training data or only inference code?

The repo includes the inference pipeline and a pre-trained mini Phoenix model. Training data and the full production model weights are not included. This is common for recommendation system open-source releases — you get the architecture and inference code, not user data.

Q: How does this compare to Twitter's 2023 algorithm release?

Twitter's 2023 release was the precursor. xAI's release is a major update: the transformer was ported from Grok-1 (replacing the earlier ML model), all hand-engineered features were eliminated, and the system now includes ads blending, Grox content understanding (spam, classification, policy enforcement), and an end-to-end inference pipeline.

Q: Can I run the mini Phoenix model on my laptop?

Yes. The pre-trained mini model is ~3 GB and distributed via Git LFS. The phoenix/run_pipeline.py script provides a single entry point for retrieval → ranking inference from exported checkpoints.

Q: How often is the codebase updated?

The repository's README states code updates are "promised roughly every four weeks." The May 15th, 2026 update was the most recent at time of analysis, adding the end-to-end inference pipeline, pre-trained model artifacts, Grox content understanding, ads blending, and expanded hydrators/sources.

Glossary

Home Mixer: The orchestration layer that assembles the For You feed — handles query hydration, candidate sourcing, filtering, scoring, and selection
Thunder: An in-memory post store serving in-network content (posts from followed accounts) at sub-millisecond speeds
Phoenix: The Grok-based ML component handling out-of-network retrieval (two-tower model) and candidate ranking (transformer with 14 engagement predictions)
Candidate Pipeline: A reusable Rust trait-based framework for building recommendation pipelines with Source, Hydrator, Filter, Scorer, Selector, and SideEffect traits
Candidate isolation: An attention masking technique ensuring candidates cannot attend to each other during transformer inference — only to user context — making scores consistent and cacheable
Multi-action prediction: Predicting 14 engagement probabilities (like, reply, repost, click, block, report, etc.) rather than a single relevance score

Author

Ramsis Hammadi — AI/ML engineer specializing in GenAI, LLM engineering, and automation. Full bio →

DEV Community

X's Feed Ranking Algorithm: How Grok Ranks 500M Posts in 200ms

X's Feed Ranking Algorithm: How Grok Ranks 500M Posts in 200ms

TL;DR Summary

Direct Answer Block

Introduction

How does the X For You feed rank 500 million posts in under 200 milliseconds?

What are the four components — Home Mixer, Thunder, Phoenix, and Candidate Pipeline — and how do they fit together?

Home Mixer (Orchestration Layer)

Thunder (In-Network Post Store)

Phoenix (Grok Transformer — Retrieval + Ranking)

Candidate Pipeline (Reusable Framework)

How does the Grok-based Phoenix transformer predict 14 different engagement types and combine them into a single score?

How does Phoenix's candidate isolation mechanism prevent posts from influencing each other's rankings?

Why did xAI eliminate every hand-engineered feature — and what does the transformer learn instead?

What can developers learn from X's composable pipeline architecture and in-memory post store design?

1. Trait-based pipeline composition

2. In-memory serving for latency-critical paths

3. Parallel execution where independent

Frequently Asked Questions

Q: Can I use X's ranking algorithm in a commercial product?

Q: What languages is the system written in?

Q: Does the repo include training data or only inference code?

Q: How does this compare to Twitter's 2023 algorithm release?

Q: Can I run the mini Phoenix model on my laptop?

Q: How often is the codebase updated?

Glossary

Author

Top comments (0)