DEV Community

Mehmet TURAÇ
Mehmet TURAÇ

Posted on

Scale Wars #4 — Airbnb: Data Mesh and the Death of the Central Data Team

Year: 2019–2022 · Crisis: "The Data Team can't keep up"


The Problem: Data Team Bottleneck

Until 2019, Airbnb had a centralized Data Engineering team. This team was responsible for serving the entire company's data needs:

  • Marketing wants campaign ROI
  • Finance wants revenue reports
  • Product wants A/B test results
  • Legal wants GDPR compliance reports
  • ...

The result: the Data Team became the company's bottleneck. Every report required filing a ticket with them. Response time: 2–4 weeks. The company slowed down.

Architectural Decision: Data Mesh

In 2020, Airbnb transitioned to a Data Mesh architecture — a concept introduced by Zhamak Dehghani in 2019.

The 4 Principles of Data Mesh:

1. Domain-Oriented Data Ownership

Data belongs to the domain that produces it, not the central data team.

OLD MODEL:
┌─────────────────────────────────────┐
│   CENTRAL DATA TEAM (bottleneck)    │
│   They manage all the data          │
└───────┬───────┬───────┬─────────────┘
        │       │       │
     Finance Marketing Product
     (client)  (client) (client)

NEW MODEL (DATA MESH):
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Finance  │ │ Marketing│ │ Product  │
│ Domain   │ │ Domain   │ │ Domain   │
│          │ │          │ │          │
│ Produces │ │ Produces │ │ Produces │
│ and      │ │ and      │ │ and      │
│ serves   │ │ serves   │ │ serves   │
│ its data │ │ its data │ │ its data │
└────┬─────┘ └────┬─────┘ └────┬─────┘
     │            │            │
     └────────────┼────────────┘
                  ▼
        ┌──────────────────────┐
        │  SELF-SERVE PLATFORM │
        │  (provides infra)    │
        └──────────────────────┘
Enter fullscreen mode Exit fullscreen mode

2. Data as a Product

Each domain serves its data as a product. This product has:

  • Documentation (Data Catalog)
  • SLAs (freshness, quality, availability)
  • Consumers (other domains)
  • An owner (a data product owner within the domain)

3. Self-Serve Data Platform

A central team provides infrastructure to the domains:

  • Spark clusters
  • Data warehouse (Snowflake, BigQuery)
  • Data catalog (data discovery)
  • Monitoring and quality checks

But the domains write their own data pipelines.

4. Federated Computational Governance

Each domain makes its own decisions, but must comply with global standards:

  • PII (personally identifiable information) masking
  • GDPR/CCPA compliance
  • Data quality standards
  • Naming conventions

Airbnb's Practical Implementation: Minerva

Airbnb built a metric platform called Minerva. Each domain defines its own metrics in Minerva:

# Finance domain's revenue metric
metric:
  name: daily_revenue
  owner: finance-team@airbnb.com
  description: "Daily total revenue (in USD)"

  source:
    table: finance.bookings
    filters:
      - status = 'COMPLETED'
      - created_at >= CURRENT_DATE - INTERVAL '1 day'

  aggregation:
    type: SUM
    field: amount_usd

  dimensions:
    - country
    - listing_type
    - host_tier

  sla:
    freshness: "09:00 UTC daily"
    quality_score: "> 95%"

  consumers:
    - analytics-team
    - executive-dashboard
    - finance-reports
Enter fullscreen mode Exit fullscreen mode

This way, when the Product team needs the "daily revenue" metric, they use the standard metric served by the Finance domain. Nobody invents their own revenue calculation → Single Source of Truth.

Airbnb's Data Quality System

Every data product has a quality score:

  • Freshness: How current is the data?
  • Completeness: Did the expected number of rows arrive?
  • Accuracy: Do logical checks pass? (e.g., revenue can't be negative)
  • Consistency: Is it consistent with other data products?

If a data product's quality drops, all consumers are automatically alerted.

Trade-offs

Gains:

  • Bottleneck removed: Domains manage their own data
  • Speed: Quick iteration within the domain instead of filing tickets with the Data Team
  • Ownership: The domain is responsible for the quality of its own data
  • Scale: The data team doesn't have to grow as the company grows

Costs:

  • Cultural shift: "I'm just a backend developer, data isn't my job" mentality dies
  • Duplicate efforts: Each domain can build its own data pipelines → Platform Engineering becomes critical
  • Governance difficulty: Enforcing global standards in a federated model is harder than with central authority

🛠️ Takeaways

If your data team keeps saying "we can't keep up," it might be time to consider Data Mesh. Treat data not as a by-product but as a first-class product — documentation, SLAs, and ownership are essential. The team that produces the data owns it; the central data team only provides the platform. And for domains to be self-sufficient, a strong self-serve platform is non-negotiable.


Next up — Chapter 5: How Twitter solved the Lady Gaga problem — 50 million timelines per tweet. 🐦

Top comments (0)