Year: 2019–2022 · Crisis: "The Data Team can't keep up"
The Problem: Data Team Bottleneck
Until 2019, Airbnb had a centralized Data Engineering team. This team was responsible for serving the entire company's data needs:
- Marketing wants campaign ROI
- Finance wants revenue reports
- Product wants A/B test results
- Legal wants GDPR compliance reports
- ...
The result: the Data Team became the company's bottleneck. Every report required filing a ticket with them. Response time: 2–4 weeks. The company slowed down.
Architectural Decision: Data Mesh
In 2020, Airbnb transitioned to a Data Mesh architecture — a concept introduced by Zhamak Dehghani in 2019.
The 4 Principles of Data Mesh:
1. Domain-Oriented Data Ownership
Data belongs to the domain that produces it, not the central data team.
OLD MODEL:
┌─────────────────────────────────────┐
│ CENTRAL DATA TEAM (bottleneck) │
│ They manage all the data │
└───────┬───────┬───────┬─────────────┘
│ │ │
Finance Marketing Product
(client) (client) (client)
NEW MODEL (DATA MESH):
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Finance │ │ Marketing│ │ Product │
│ Domain │ │ Domain │ │ Domain │
│ │ │ │ │ │
│ Produces │ │ Produces │ │ Produces │
│ and │ │ and │ │ and │
│ serves │ │ serves │ │ serves │
│ its data │ │ its data │ │ its data │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└────────────┼────────────┘
▼
┌──────────────────────┐
│ SELF-SERVE PLATFORM │
│ (provides infra) │
└──────────────────────┘
2. Data as a Product
Each domain serves its data as a product. This product has:
- Documentation (Data Catalog)
- SLAs (freshness, quality, availability)
- Consumers (other domains)
- An owner (a data product owner within the domain)
3. Self-Serve Data Platform
A central team provides infrastructure to the domains:
- Spark clusters
- Data warehouse (Snowflake, BigQuery)
- Data catalog (data discovery)
- Monitoring and quality checks
But the domains write their own data pipelines.
4. Federated Computational Governance
Each domain makes its own decisions, but must comply with global standards:
- PII (personally identifiable information) masking
- GDPR/CCPA compliance
- Data quality standards
- Naming conventions
Airbnb's Practical Implementation: Minerva
Airbnb built a metric platform called Minerva. Each domain defines its own metrics in Minerva:
# Finance domain's revenue metric
metric:
name: daily_revenue
owner: finance-team@airbnb.com
description: "Daily total revenue (in USD)"
source:
table: finance.bookings
filters:
- status = 'COMPLETED'
- created_at >= CURRENT_DATE - INTERVAL '1 day'
aggregation:
type: SUM
field: amount_usd
dimensions:
- country
- listing_type
- host_tier
sla:
freshness: "09:00 UTC daily"
quality_score: "> 95%"
consumers:
- analytics-team
- executive-dashboard
- finance-reports
This way, when the Product team needs the "daily revenue" metric, they use the standard metric served by the Finance domain. Nobody invents their own revenue calculation → Single Source of Truth.
Airbnb's Data Quality System
Every data product has a quality score:
- Freshness: How current is the data?
- Completeness: Did the expected number of rows arrive?
- Accuracy: Do logical checks pass? (e.g., revenue can't be negative)
- Consistency: Is it consistent with other data products?
If a data product's quality drops, all consumers are automatically alerted.
Trade-offs
✅ Gains:
- Bottleneck removed: Domains manage their own data
- Speed: Quick iteration within the domain instead of filing tickets with the Data Team
- Ownership: The domain is responsible for the quality of its own data
- Scale: The data team doesn't have to grow as the company grows
❌ Costs:
- Cultural shift: "I'm just a backend developer, data isn't my job" mentality dies
- Duplicate efforts: Each domain can build its own data pipelines → Platform Engineering becomes critical
- Governance difficulty: Enforcing global standards in a federated model is harder than with central authority
🛠️ Takeaways
If your data team keeps saying "we can't keep up," it might be time to consider Data Mesh. Treat data not as a by-product but as a first-class product — documentation, SLAs, and ownership are essential. The team that produces the data owns it; the central data team only provides the platform. And for domains to be self-sufficient, a strong self-serve platform is non-negotiable.
Next up — Chapter 5: How Twitter solved the Lady Gaga problem — 50 million timelines per tweet. 🐦
Top comments (0)