<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: David Marcelo Petrocelli</title>
    <description>The latest articles on DEV Community by David Marcelo Petrocelli (@david_marcelopetrocelli_).</description>
    <link>https://dev.to/david_marcelopetrocelli_</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2543178%2F3a31cd9c-8f87-4afe-8d2b-6f888d30d8ca.png</url>
      <title>DEV Community: David Marcelo Petrocelli</title>
      <link>https://dev.to/david_marcelopetrocelli_</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/david_marcelopetrocelli_"/>
    <language>en</language>
    <item>
      <title>How Spotify Uses Data to Build the Product 713 Million Users Actually Want</title>
      <dc:creator>David Marcelo Petrocelli</dc:creator>
      <pubDate>Tue, 03 Mar 2026 14:14:26 +0000</pubDate>
      <link>https://dev.to/david_marcelopetrocelli_/how-spotify-uses-data-to-build-the-product-713-million-users-actually-want-j42</link>
      <guid>https://dev.to/david_marcelopetrocelli_/how-spotify-uses-data-to-build-the-product-713-million-users-actually-want-j42</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Difficulty Level: 300 - Advanced&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spotify processes 1 trillion+ events per day through 38,000+ active data pipelines — every play, skip, and save is a signal that feeds back into every product decision&lt;/li&gt;
&lt;li&gt;Discover Weekly generated 100 billion+ streams in its first 10 years using three ML layers: collaborative filtering, NLP, and audio CNNs — now augmented with LLMs via custom Semantic IDs&lt;/li&gt;
&lt;li&gt;Their A/B testing culture runs tens of thousands of experiments/year across 300+ teams, including 520 simultaneous experiments on a single screen — and they measure learning rate (64%), not just win rate (12%)&lt;/li&gt;
&lt;li&gt;Backstage, born as Spotify's internal developer portal, catalogs 2,000+ services and 4,000 data pipelines — and is now used by 3,000+ companies as the CNCF standard&lt;/li&gt;
&lt;li&gt;The real lesson isn't any single tool: it's the tight coupling between organizational design (squads own their services) and technical design (services are independently deployable)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The 1 Trillion Events Question
&lt;/h2&gt;

&lt;p&gt;Spotify hit 713 million monthly active users in Q3 2025. That number looks impressive in a press release and terrifying in a system design meeting.&lt;/p&gt;

&lt;p&gt;Scale alone doesn't explain Spotify's success. What matters is that every one of those events — every play, every skip, every playlist add at 2am — feeds directly into product decisions. Not after a quarterly review. In near real-time.&lt;/p&gt;

&lt;p&gt;Most companies collect data and build dashboards. Spotify built a closed loop: user behavior shapes the product, the product generates more behavior, and the cycle compounds over 20 years of iteration. In 2024, Spotify posted its first annual profit: €1.1B on €15.6B in revenue. The closed loop is working.&lt;/p&gt;

&lt;p&gt;After years of building data systems for enterprise clients and teaching these patterns at university, I've found that the most common mistake teams make is copying Spotify's tools rather than their discipline. In this article I'll break down the actual mechanisms behind their data pipeline, recommendation engine, experimentation culture, and developer platform — and tell you which patterns you can realistically steal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Familiarity with stream processing concepts (Kafka, Pub/Sub, or similar)&lt;/li&gt;
&lt;li&gt;Basic understanding of microservices architecture (service decomposition, database-per-service)&lt;/li&gt;
&lt;li&gt;Experience with A/B testing fundamentals&lt;/li&gt;
&lt;li&gt;Some exposure to ML recommendation systems (collaborative filtering concepts)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What You'll Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;How Spotify's event pipeline evolved from self-managed Kafka to GCP Pub/Sub at 3 million events/second&lt;/li&gt;
&lt;li&gt;Why Discover Weekly uses three separate ML layers and what each one contributes&lt;/li&gt;
&lt;li&gt;How their A/B testing culture measures 64% learning rate instead of just win rate&lt;/li&gt;
&lt;li&gt;What Backstage is and why 3,000+ companies adopted it after Spotify open-sourced it&lt;/li&gt;
&lt;li&gt;Which Spotify patterns scale down to your team — and which ones don't&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Numbers Behind 713 Million Users
&lt;/h2&gt;

&lt;p&gt;The scale numbers aren't just impressive — they explain every architectural decision.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Monthly Active Users&lt;/td&gt;
&lt;td&gt;713M (Q3 2025)&lt;/td&gt;
&lt;td&gt;Up from 600M in mid-2024&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Premium Subscribers&lt;/td&gt;
&lt;td&gt;281M&lt;/td&gt;
&lt;td&gt;~42% conversion rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Annual Revenue&lt;/td&gt;
&lt;td&gt;€15.6B (2024)&lt;/td&gt;
&lt;td&gt;First profitable year&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Music Catalog&lt;/td&gt;
&lt;td&gt;100M+ tracks&lt;/td&gt;
&lt;td&gt;Grows ~60K tracks/day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Podcasts&lt;/td&gt;
&lt;td&gt;~7M titles&lt;/td&gt;
&lt;td&gt;Second only to Apple&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Events per day&lt;/td&gt;
&lt;td&gt;1 trillion+&lt;/td&gt;
&lt;td&gt;1,800+ event types&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active data pipelines&lt;/td&gt;
&lt;td&gt;38,000+&lt;/td&gt;
&lt;td&gt;Hourly + daily scheduled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production components&lt;/td&gt;
&lt;td&gt;Thousands&lt;/td&gt;
&lt;td&gt;80%+ fleet-managed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A/B experiments/year&lt;/td&gt;
&lt;td&gt;Tens of thousands&lt;/td&gt;
&lt;td&gt;300+ teams running tests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Discover Weekly streams (10yr)&lt;/td&gt;
&lt;td&gt;100 billion+&lt;/td&gt;
&lt;td&gt;56M new discoveries/week&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This scale didn't emerge from a grand architectural vision. It's the result of 20+ years of small, data-driven decisions — each one measured, validated, and shipped incrementally.&lt;/p&gt;




&lt;h2&gt;
  
  
  From Monolith to Thousands of Microservices
&lt;/h2&gt;

&lt;p&gt;Spotify started as a monolithic Python application in 2006. By 2010, the codebase had grown to the point where no single team could understand all of it, and deployments required coordinating across multiple squads.&lt;/p&gt;

&lt;p&gt;The migration to microservices wasn't a big-bang rewrite. It was driven by a single organizational principle: &lt;strong&gt;each squad should be able to deploy independently, without coordinating with other teams.&lt;/strong&gt; If a team needed another team's sign-off to ship, something was wrong — either in the service design or the org structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Database-Per-Service Pattern
&lt;/h3&gt;

&lt;p&gt;Each Spotify microservice owns its own data store, chosen for the access patterns of that service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cassandra + BigTable&lt;/strong&gt;: High-speed key-value lookups (user state, session data, real-time features)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL&lt;/strong&gt;: Transactional data (payments, account management)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Cloud Storage&lt;/strong&gt;: Large objects (audio files, model artifacts)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BigQuery&lt;/strong&gt;: Analytical queries and data pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By 2023, the number of distinct production components had grown to "thousands" — enough that Spotify needed a new abstraction to manage them: Fleet Management.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fleet Management: Treating Services as a Fleet
&lt;/h3&gt;

&lt;p&gt;The key insight behind Fleet Management is that individual service owners are blind to fleet-wide patterns. If 300 teams each manage their own dependencies, you get 300 different versions of Log4j in production. You can't patch a critical vulnerability in 9 hours by asking each team to update manually.&lt;/p&gt;

&lt;p&gt;Fleet Management flips the model: infrastructure defaults to secure and up-to-date, and teams opt out for exceptions (with documented justification).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBTcXVhZHNbIjMwMCsgU3F1YWRzIl0KICAgICAgICBTMVsiU3F1YWQgQTxici8-U2VydmljZSArIERCIl0KICAgICAgICBTMlsiU3F1YWQgQjxici8-U2VydmljZSArIERCIl0KICAgICAgICBTM1siU3F1YWQgTjxici8-U2VydmljZSArIERCIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIEZsZWV0WyJGbGVldCBNYW5hZ2VtZW50Il0KICAgICAgICBCUVsiQmlnUXVlcnk8YnIvPkNvZGViYXNlIEluZGV4Il0KICAgICAgICBGTVsiRmxlZXQgQXV0b21hdGlvbjxici8-MzAwSyBjaGFuZ2VzL3llYXIiXQogICAgICAgIEJTWyJCYWNrc3RhZ2U8YnIvPkNhdGFsb2ciXQogICAgZW5kCgogICAgc3ViZ3JhcGggR0NQWyJHb29nbGUgQ2xvdWQgUGxhdGZvcm0iXQogICAgICAgIEs4U1siS3ViZXJuZXRlczxici8-T3BlcmF0b3JzIl0KICAgICAgICBDUkRbIkN1c3RvbSBSZXNvdXJjZTxici8-RGVmaW5pdGlvbnMiXQogICAgZW5kCgogICAgUzEgJiBTMiAmIFMzIC0tPiBCUwogICAgQlMgLS0-IEJRCiAgICBCUSAtLT4gRk0KICAgIEZNIC0tPiBLOFMKICAgIEs4UyAtLT4gQ1JECiAgICBDUkQgLS4tPnxyZWNvbmNpbGV8IFMxICYgUzIgJiBTMwoKICAgIHN0eWxlIFNxdWFkcyBmaWxsOiNlOGY1ZTkKICAgIHN0eWxlIEZsZWV0IGZpbGw6I2UzZjJmZAogICAgc3R5bGUgR0NQIGZpbGw6I2ZmZjNlMA%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBTcXVhZHNbIjMwMCsgU3F1YWRzIl0KICAgICAgICBTMVsiU3F1YWQgQTxici8-U2VydmljZSArIERCIl0KICAgICAgICBTMlsiU3F1YWQgQjxici8-U2VydmljZSArIERCIl0KICAgICAgICBTM1siU3F1YWQgTjxici8-U2VydmljZSArIERCIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIEZsZWV0WyJGbGVldCBNYW5hZ2VtZW50Il0KICAgICAgICBCUVsiQmlnUXVlcnk8YnIvPkNvZGViYXNlIEluZGV4Il0KICAgICAgICBGTVsiRmxlZXQgQXV0b21hdGlvbjxici8-MzAwSyBjaGFuZ2VzL3llYXIiXQogICAgICAgIEJTWyJCYWNrc3RhZ2U8YnIvPkNhdGFsb2ciXQogICAgZW5kCgogICAgc3ViZ3JhcGggR0NQWyJHb29nbGUgQ2xvdWQgUGxhdGZvcm0iXQogICAgICAgIEs4U1siS3ViZXJuZXRlczxici8-T3BlcmF0b3JzIl0KICAgICAgICBDUkRbIkN1c3RvbSBSZXNvdXJjZTxici8-RGVmaW5pdGlvbnMiXQogICAgZW5kCgogICAgUzEgJiBTMiAmIFMzIC0tPiBCUwogICAgQlMgLS0-IEJRCiAgICBCUSAtLT4gRk0KICAgIEZNIC0tPiBLOFMKICAgIEs4UyAtLT4gQ1JECiAgICBDUkQgLS4tPnxyZWNvbmNpbGV8IFMxICYgUzIgJiBTMwoKICAgIHN0eWxlIFNxdWFkcyBmaWxsOiNlOGY1ZTkKICAgIHN0eWxlIEZsZWV0IGZpbGw6I2UzZjJmZAogICAgc3R5bGUgR0NQIGZpbGw6I2ZmZjNlMA%3D%3D" alt="diagram" width="1741" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The results are concrete:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;300,000+ automated changes&lt;/strong&gt; merged across the fleet in 3 years&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;7,500 automated changes/week&lt;/strong&gt; with 75% auto-merged without human review&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log4j vulnerability&lt;/strong&gt;: patched to 80% of backend services in 9 hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Framework updates&lt;/strong&gt;: reach 70% of fleet in under 7 days (previously ~200 days)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;95%&lt;/strong&gt; of Spotify developers report Fleet Management improved software quality&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Data Pipeline: How Every Play Becomes a Signal
&lt;/h2&gt;

&lt;p&gt;Every user interaction at Spotify — a play, a skip, a search, a playlist add — generates an event. Those events are the raw material for every recommendation, every A/B test result, every product decision.&lt;/p&gt;

&lt;p&gt;Here's how that data flows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBDbGllbnRbIlVzZXIgQ2xpZW50cyJdCiAgICAgICAgaU9TWyJpT1MgQXBwIl0KICAgICAgICBBbmRbIkFuZHJvaWQgQXBwIl0KICAgICAgICBXZWJbIldlYiBQbGF5ZXIiXQogICAgZW5kCgogICAgc3ViZ3JhcGggSW5nZXN0WyJFdmVudCBJbmdlc3Rpb24iXQogICAgICAgIFNES1siQ2xpZW50IFNESzxici8-RXZlbnQgVmFsaWRhdGlvbiJdCiAgICAgICAgUFNbIkdvb2dsZSBDbG91ZDxici8-UHViL1N1Yjxici8-M00gZXZlbnRzL3NlYyBwZWFrIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIFByb2Nlc3NbIkRhdGEgUHJvY2Vzc2luZyJdCiAgICAgICAgU2Npb1siU2NpbyAvIEFwYWNoZSBCZWFtPGJyLz5CYXRjaCArIFN0cmVhbWluZyJdCiAgICAgICAgRmxpbmtbIkFwYWNoZSBGbGluazxici8-TG93LWxhdGVuY3kgc3RyZWFtaW5nIl0KICAgICAgICBERlsiR29vZ2xlIERhdGFmbG93PGJyLz5NYW5hZ2VkIEJlYW0iXQogICAgZW5kCgogICAgc3ViZ3JhcGggU3RvcmVbIkRhdGEgU3RvcmVzIl0KICAgICAgICBCUVsiQmlnUXVlcnk8YnIvPjEwTSsgcXVlcmllcy9tb250aCJdCiAgICAgICAgQ1NbIkNhc3NhbmRyYS9CaWdUYWJsZTxici8-RmVhdHVyZSBTdG9yZSJdCiAgICAgICAgR0NTWyJDbG91ZCBTdG9yYWdlPGJyLz5Nb2RlbCBBcnRpZmFjdHMiXQogICAgZW5kCgogICAgaU9TICYgQW5kICYgV2ViIC0tPiBTREsKICAgIFNESyAtLT4gUFMKICAgIFBTIC0tPiBTY2lvCiAgICBQUyAtLT4gRmxpbmsKICAgIFNjaW8gLS0-IERGCiAgICBERiAtLT4gQlEKICAgIEZsaW5rIC0tPiBDUwogICAgQlEgLS0-IEdDUwoKICAgIHN0eWxlIENsaWVudCBmaWxsOiNlM2YyZmQKICAgIHN0eWxlIEluZ2VzdCBmaWxsOiNmM2U1ZjUKICAgIHN0eWxlIFByb2Nlc3MgZmlsbDojZThmNWU5CiAgICBzdHlsZSBTdG9yZSBmaWxsOiNmZmYzZTA%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBDbGllbnRbIlVzZXIgQ2xpZW50cyJdCiAgICAgICAgaU9TWyJpT1MgQXBwIl0KICAgICAgICBBbmRbIkFuZHJvaWQgQXBwIl0KICAgICAgICBXZWJbIldlYiBQbGF5ZXIiXQogICAgZW5kCgogICAgc3ViZ3JhcGggSW5nZXN0WyJFdmVudCBJbmdlc3Rpb24iXQogICAgICAgIFNES1siQ2xpZW50IFNESzxici8-RXZlbnQgVmFsaWRhdGlvbiJdCiAgICAgICAgUFNbIkdvb2dsZSBDbG91ZDxici8-UHViL1N1Yjxici8-M00gZXZlbnRzL3NlYyBwZWFrIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIFByb2Nlc3NbIkRhdGEgUHJvY2Vzc2luZyJdCiAgICAgICAgU2Npb1siU2NpbyAvIEFwYWNoZSBCZWFtPGJyLz5CYXRjaCArIFN0cmVhbWluZyJdCiAgICAgICAgRmxpbmtbIkFwYWNoZSBGbGluazxici8-TG93LWxhdGVuY3kgc3RyZWFtaW5nIl0KICAgICAgICBERlsiR29vZ2xlIERhdGFmbG93PGJyLz5NYW5hZ2VkIEJlYW0iXQogICAgZW5kCgogICAgc3ViZ3JhcGggU3RvcmVbIkRhdGEgU3RvcmVzIl0KICAgICAgICBCUVsiQmlnUXVlcnk8YnIvPjEwTSsgcXVlcmllcy9tb250aCJdCiAgICAgICAgQ1NbIkNhc3NhbmRyYS9CaWdUYWJsZTxici8-RmVhdHVyZSBTdG9yZSJdCiAgICAgICAgR0NTWyJDbG91ZCBTdG9yYWdlPGJyLz5Nb2RlbCBBcnRpZmFjdHMiXQogICAgZW5kCgogICAgaU9TICYgQW5kICYgV2ViIC0tPiBTREsKICAgIFNESyAtLT4gUFMKICAgIFBTIC0tPiBTY2lvCiAgICBQUyAtLT4gRmxpbmsKICAgIFNjaW8gLS0-IERGCiAgICBERiAtLT4gQlEKICAgIEZsaW5rIC0tPiBDUwogICAgQlEgLS0-IEdDUwoKICAgIHN0eWxlIENsaWVudCBmaWxsOiNlM2YyZmQKICAgIHN0eWxlIEluZ2VzdCBmaWxsOiNmM2U1ZjUKICAgIHN0eWxlIFByb2Nlc3MgZmlsbDojZThmNWU5CiAgICBzdHlsZSBTdG9yZSBmaWxsOiNmZmYzZTA%3D" alt="diagram" width="1835" height="348"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Migration from Kafka to GCP Pub/Sub
&lt;/h3&gt;

&lt;p&gt;In 2016-2017, Spotify migrated their event delivery system from self-managed Kafka clusters to Google Cloud Pub/Sub. This wasn't a trivial decision — Kafka was working. But managing Kafka at Spotify's scale required significant operational overhead that distracted from product engineering.&lt;/p&gt;

&lt;p&gt;The results after migration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Peak throughput scaled from 800,000 to &lt;strong&gt;3,000,000 events/second&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Half a trillion daily ingested events&lt;/strong&gt; (70 TB compressed)&lt;/li&gt;
&lt;li&gt;Pub/Sub handles &lt;strong&gt;1 trillion requests/day&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;BigQuery runs &lt;strong&gt;10 million+ queries and scheduled jobs/month&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scio: Spotify's Open-Source Apache Beam API
&lt;/h3&gt;

&lt;p&gt;Spotify developed &lt;a href="https://github.com/spotify/scio" rel="noopener noreferrer"&gt;Scio&lt;/a&gt;, a Scala API for Apache Beam, to process billions of events. It handles both batch and streaming workloads, running on either Dataflow (managed) or Flink (lower-latency) depending on requirements.&lt;/p&gt;

&lt;p&gt;Every data endpoint in the platform has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retention policies&lt;/strong&gt;: data deleted after defined period&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access controls&lt;/strong&gt;: squad-level permissions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lineage tracking&lt;/strong&gt;: full trace from source event to derived dataset&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality checks&lt;/strong&gt;: automated alerts for lateness, failures, anomalies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 38,000+ active pipelines are orchestrated, monitored, and surfaced through Backstage — so any squad can inspect the health of their data at any time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Recommendations at Scale: Discover Weekly Deconstructed
&lt;/h2&gt;

&lt;p&gt;Discover Weekly launched in July 2015 with a simple premise: every Monday morning, 30 personalized songs you've never heard before. In 10 years, it generated 100 billion streams and 56 million new artist discoveries every week.&lt;/p&gt;

&lt;p&gt;That impact comes from a three-layer ML architecture, each layer catching different signals:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBzdWJncmFwaCBTaWduYWxzWyJVc2VyIFNpZ25hbHMiXQogICAgICAgIElTWyJJbXBsaWNpdCBGZWVkYmFjazxici8-cGxheXMsIHNraXBzLCBzYXZlczxici8-cGxheWxpc3QgYWRkcyJdCiAgICAgICAgVFhbIlRleHQgU2lnbmFsczxici8-d2ViIGNyYXdscywgYmxvZ3M8YnIvPnJldmlldyBsYW5ndWFnZSJdCiAgICAgICAgQVVbIkF1ZGlvIFNpZ25hbHM8YnIvPnNwZWN0cm9ncmFtczxici8-d2F2ZWZvcm1zIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIExheWVyc1siUmVjb21tZW5kYXRpb24gTGF5ZXJzIl0KICAgICAgICBDRlsiTGF5ZXIgMTogQ29sbGFib3JhdGl2ZSBGaWx0ZXJpbmc8YnIvPk1hdHJpeCBGYWN0b3JpemF0aW9uPGJyLz5XaG8gZWxzZSBsaXN0ZW5zIHRvIHRoaXM_Il0KICAgICAgICBOTFBbIkxheWVyIDI6IE5MUCBBbmFseXNpczxici8-V2hhdCBkbyBwZW9wbGUgU0FZPGJyLz5hYm91dCB0aGlzIG11c2ljPyJdCiAgICAgICAgQ05OWyJMYXllciAzOiBBdWRpbyBDTk48YnIvPldoYXQgZG9lcyB0aGlzIG11c2ljPGJyLz5TT1VORCBsaWtlPyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBSYW5rWyJSYW5raW5nICYgUGVyc29uYWxpemF0aW9uIl0KICAgICAgICBNUlsiTXVsdGktT2JqZWN0aXZlIFJhbmtlcjxici8-UmVsZXZhbmNlICsgTm92ZWx0eSArIERpdmVyc2l0eSJdCiAgICAgICAgTExNWyJMTE0gTGF5ZXIgKDIwMjQrKTxici8-U2VtYW50aWMgSURzPGJyLz5Db250ZXh0ICsgRXhwbGFuYXRpb24iXQogICAgICAgIERXWyJEaXNjb3ZlciBXZWVrbHk8YnIvPjMwIHNvbmdzLCBNb25kYXkgQU0iXQogICAgZW5kCgogICAgSVMgLS0-IENGCiAgICBUWCAtLT4gTkxQCiAgICBBVSAtLT4gQ05OCiAgICBDRiAmIE5MUCAmIENOTiAtLT4gTVIKICAgIE1SIC0tPiBMTE0KICAgIExMTSAtLT4gRFcKCiAgICBzdHlsZSBTaWduYWxzIGZpbGw6I2UzZjJmZAogICAgc3R5bGUgTGF5ZXJzIGZpbGw6I2YzZTVmNQogICAgc3R5bGUgUmFuayBmaWxsOiNlOGY1ZTk%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBzdWJncmFwaCBTaWduYWxzWyJVc2VyIFNpZ25hbHMiXQogICAgICAgIElTWyJJbXBsaWNpdCBGZWVkYmFjazxici8-cGxheXMsIHNraXBzLCBzYXZlczxici8-cGxheWxpc3QgYWRkcyJdCiAgICAgICAgVFhbIlRleHQgU2lnbmFsczxici8-d2ViIGNyYXdscywgYmxvZ3M8YnIvPnJldmlldyBsYW5ndWFnZSJdCiAgICAgICAgQVVbIkF1ZGlvIFNpZ25hbHM8YnIvPnNwZWN0cm9ncmFtczxici8-d2F2ZWZvcm1zIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIExheWVyc1siUmVjb21tZW5kYXRpb24gTGF5ZXJzIl0KICAgICAgICBDRlsiTGF5ZXIgMTogQ29sbGFib3JhdGl2ZSBGaWx0ZXJpbmc8YnIvPk1hdHJpeCBGYWN0b3JpemF0aW9uPGJyLz5XaG8gZWxzZSBsaXN0ZW5zIHRvIHRoaXM_Il0KICAgICAgICBOTFBbIkxheWVyIDI6IE5MUCBBbmFseXNpczxici8-V2hhdCBkbyBwZW9wbGUgU0FZPGJyLz5hYm91dCB0aGlzIG11c2ljPyJdCiAgICAgICAgQ05OWyJMYXllciAzOiBBdWRpbyBDTk48YnIvPldoYXQgZG9lcyB0aGlzIG11c2ljPGJyLz5TT1VORCBsaWtlPyJdCiAgICBlbmQKCiAgICBzdWJncmFwaCBSYW5rWyJSYW5raW5nICYgUGVyc29uYWxpemF0aW9uIl0KICAgICAgICBNUlsiTXVsdGktT2JqZWN0aXZlIFJhbmtlcjxici8-UmVsZXZhbmNlICsgTm92ZWx0eSArIERpdmVyc2l0eSJdCiAgICAgICAgTExNWyJMTE0gTGF5ZXIgKDIwMjQrKTxici8-U2VtYW50aWMgSURzPGJyLz5Db250ZXh0ICsgRXhwbGFuYXRpb24iXQogICAgICAgIERXWyJEaXNjb3ZlciBXZWVrbHk8YnIvPjMwIHNvbmdzLCBNb25kYXkgQU0iXQogICAgZW5kCgogICAgSVMgLS0-IENGCiAgICBUWCAtLT4gTkxQCiAgICBBVSAtLT4gQ05OCiAgICBDRiAmIE5MUCAmIENOTiAtLT4gTVIKICAgIE1SIC0tPiBMTE0KICAgIExMTSAtLT4gRFcKCiAgICBzdHlsZSBTaWduYWxzIGZpbGw6I2UzZjJmZAogICAgc3R5bGUgTGF5ZXJzIGZpbGw6I2YzZTVmNQogICAgc3R5bGUgUmFuayBmaWxsOiNlOGY1ZTk%3D" alt="diagram" width="874" height="876"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Collaborative Filtering
&lt;/h3&gt;

&lt;p&gt;Collaborative filtering answers the question: &lt;em&gt;who else listens to what you listen to, and what else do they listen to?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Spotify's approach uses &lt;strong&gt;Logistic Matrix Factorization (LMF)&lt;/strong&gt; on implicit feedback — not explicit star ratings, but behavioral signals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simplified: how Spotify weights implicit feedback signals
# Real implementation uses distributed matrix factorization at scale
&lt;/span&gt;
&lt;span class="n"&gt;SIGNAL_WEIGHTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream_complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Listened to 80%+ of song
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;save_to_library&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="mf"&gt;2.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Strong positive signal
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;add_to_playlist&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Strong positive signal
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream_partial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Weak positive signal
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skip_after_30s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Negative signal
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skip_immediately&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Strong negative signal
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_interaction_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Compute a weighted interaction score for a user-track pair.
    Used as input to the matrix factorization model.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;signal_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SIGNAL_WEIGHTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;signal_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Clamp to non-negative for LMF
&lt;/span&gt;
&lt;span class="c1"&gt;# The factorization produces: user_vector @ item_vector = predicted_preference
# Trained via ALS (Alternating Least Squares) on GCP with billions of interactions
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The training runs on Hendrix, Spotify's ML platform (named after Jimi Hendrix). Hendrix uses Ray for distributed training on GCP, serves 600+ ML practitioners, and handles the full lifecycle from prototype to production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: NLP Analysis
&lt;/h3&gt;

&lt;p&gt;NLP fills in gaps where behavioral data is sparse — for new artists, for niche genres, for tracks uploaded last week.&lt;/p&gt;

&lt;p&gt;Spotify runs web crawlers across music blogs, review sites, and social platforms to extract how people describe songs and artists. The output: vector embeddings where songs described with similar language cluster together.&lt;/p&gt;

&lt;p&gt;A song described as "dreamy, lo-fi, bedroom pop" clusters with other songs sharing those descriptors — even if no user has yet listened to both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Audio CNNs
&lt;/h3&gt;

&lt;p&gt;For truly new content — songs uploaded with no listening history and no web presence — audio analysis is the only signal available.&lt;/p&gt;

&lt;p&gt;Convolutional neural networks analyze spectrograms (visual representations of audio). The model learns to detect: tempo, energy, instrumentation, tonality, rhythm patterns. Songs with similar audio characteristics cluster together regardless of metadata.&lt;/p&gt;

&lt;h3&gt;
  
  
  The LLM Layer (2024-2025)
&lt;/h3&gt;

&lt;p&gt;In 2024, Spotify added a fourth layer: LLMs for contextual recommendations and the AI DJ feature.&lt;/p&gt;

&lt;p&gt;The challenge: LLMs don't know Spotify's catalog of 100M tracks. The solution was &lt;strong&gt;Semantic IDs&lt;/strong&gt; — compact token identifiers derived from collaborative-filtering embeddings, generated via RQ-KMeans. The LLM learns to treat these IDs as vocabulary tokens, effectively learning to "speak Spotify."&lt;/p&gt;

&lt;p&gt;Outcomes from live experiments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;4% increase in listening time&lt;/strong&gt; from preference-tuned recommendations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;14% improvement&lt;/strong&gt; from Llama fine-tuned on Spotify's domain vs. vanilla Llama&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;70% reduction in tool errors&lt;/strong&gt; for the AI DJ orchestration system&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A/B Testing Culture: How Spotify Ships Without Breaking Things
&lt;/h2&gt;

&lt;p&gt;Most companies say they have an "experimentation culture." Spotify has metrics to back it up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;300+ teams&lt;/strong&gt; run experiments. The mobile home screen alone hosted &lt;strong&gt;520 experiments in one year&lt;/strong&gt; across 58 simultaneous teams. Total experiments run: &lt;strong&gt;tens of thousands per year&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The architecture behind this starts with their coordination engine, which manages mutual exclusion between experiments. When 58 teams are simultaneously testing changes to the same screen, you need a system that prevents two experiments from conflicting — and that randomly reshuffles user assignments between experiment runs (the "salt machine").&lt;/p&gt;

&lt;h3&gt;
  
  
  ABBA to Confidence: Three Generations of Experimentation
&lt;/h3&gt;

&lt;p&gt;Spotify's experimentation platform evolved through three generations:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Generation&lt;/th&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ABBA&lt;/td&gt;
&lt;td&gt;Early 2010s&lt;/td&gt;
&lt;td&gt;Feature flags + basic metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Experimentation Platform (EP)&lt;/td&gt;
&lt;td&gt;2015-2023&lt;/td&gt;
&lt;td&gt;Full orchestration, metrics catalog, coordination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Confidence&lt;/td&gt;
&lt;td&gt;2023+&lt;/td&gt;
&lt;td&gt;Commercial product, Backstage plugin, APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Metric That Changed Everything
&lt;/h3&gt;

&lt;p&gt;The most important shift in Spotify's experimentation culture wasn't a new platform — it was a new metric: &lt;strong&gt;learning rate&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Win rate (the conventional metric) measures what percentage of experiments "succeed." At Spotify, that's ~12%.&lt;/p&gt;

&lt;p&gt;Learning rate measures what percentage of experiments produce &lt;strong&gt;decision-ready insights&lt;/strong&gt; — whether the answer is yes, no, or "we need to test something different." That's 64%.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Win rate:      12%  (the experiment confirmed our hypothesis)
Learning rate: 64%  (the experiment gave us actionable information)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reframe matters enormously for culture. A team that runs 100 experiments and "wins" 12 shouldn't feel like they failed 88% of the time. Every "failed" experiment that disproves a hypothesis saved months of building the wrong thing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using Confidence for Feature Flags
&lt;/h3&gt;

&lt;p&gt;Spotify open-sourced and commercialized &lt;a href="https://confidence.spotify.com/" rel="noopener noreferrer"&gt;Confidence&lt;/a&gt; in August 2023. It's available as a managed service, a Backstage plugin, or via API. Here's what a basic feature flag + A/B test looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;spotify_confidence&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Confidence&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize with your project credentials
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Confidence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_secret&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-client-secret&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Resolve a feature flag for a specific user
&lt;/span&gt;&lt;span class="n"&gt;flag_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve_boolean_flag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;new-home-layout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;default_value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;evaluation_context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;targeting_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;country&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_country&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ios&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;flag_value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;render_new_home_layout&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;render_legacy_home_layout&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Track events for analysis
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;home-layout-engaged&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_duration_s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session_seconds&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Confidence platform handles user assignment, experiment coordination, statistical analysis, and validity checks automatically. Squads see results in real time without writing SQL.&lt;/p&gt;




&lt;h2&gt;
  
  
  Backstage: The Developer Portal That Escaped Spotify
&lt;/h2&gt;

&lt;p&gt;By 2019, Spotify had a problem that no amount of engineering talent could solve manually: 280+ teams managing thousands of services, datasets, APIs, and pipelines — with no shared understanding of what existed or who owned it.&lt;/p&gt;

&lt;p&gt;The answer was an internal project called "System Z." In March 2020, Spotify open-sourced it as &lt;strong&gt;Backstage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBCYWNrc3RhZ2VbIkJhY2tzdGFnZSBEZXZlbG9wZXIgUG9ydGFsIl0KICAgICAgICBzdWJncmFwaCBDb3JlWyJDb3JlIENhcGFiaWxpdGllcyJdCiAgICAgICAgICAgIENBVFsiU29mdHdhcmUgQ2F0YWxvZzxici8-RXZlcnkgc2VydmljZSwgQVBJLDxici8-ZGF0YXNldCwgZG9jIl0KICAgICAgICAgICAgU0NGWyJTY2FmZm9sZGVyPGJyLz5Hb2xkZW4gcGF0aCB0ZW1wbGF0ZXM8YnIvPlNlY3VyaXR5ICsgQ0kvQ0QgYnkgZGVmYXVsdCJdCiAgICAgICAgICAgIFREWyJUZWNoRG9jczxici8-RG9jcy1hcy1jb2RlPGJyLz5BdXRvLWdlbmVyYXRlZCJdCiAgICAgICAgZW5kCiAgICAgICAgc3ViZ3JhcGggUGx1Z2luc1siUGx1Z2luIEVjb3N5c3RlbSJdCiAgICAgICAgICAgIEs4U1siS3ViZXJuZXRlczxici8-UGx1Z2luIl0KICAgICAgICAgICAgR0hBWyJHaXRIdWIgQWN0aW9uczxici8-UGx1Z2luIl0KICAgICAgICAgICAgRERbIkRhdGFkb2c8YnIvPlBsdWdpbiJdCiAgICAgICAgICAgIFBEWyJQYWdlckR1dHk8YnIvPlBsdWdpbiJdCiAgICAgICAgICAgIEFXU1siQVdTPGJyLz5QbHVnaW4iXQogICAgICAgIGVuZAogICAgZW5kCgogICAgREVWWyJEZXZlbG9wZXJzIl0gLS0-IEJhY2tzdGFnZQogICAgQmFja3N0YWdlIC0tPiBJTkZSQVsiQWxsIEluZnJhc3RydWN0dXJlPGJyLz5PbmUgUGFuZSBvZiBHbGFzcyJdCgogICAgc3R5bGUgQ29yZSBmaWxsOiNlM2YyZmQKICAgIHN0eWxlIFBsdWdpbnMgZmlsbDojZjNlNWY1CiAgICBzdHlsZSBCYWNrc3RhZ2UgZmlsbDojZmZmOWM0" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBCYWNrc3RhZ2VbIkJhY2tzdGFnZSBEZXZlbG9wZXIgUG9ydGFsIl0KICAgICAgICBzdWJncmFwaCBDb3JlWyJDb3JlIENhcGFiaWxpdGllcyJdCiAgICAgICAgICAgIENBVFsiU29mdHdhcmUgQ2F0YWxvZzxici8-RXZlcnkgc2VydmljZSwgQVBJLDxici8-ZGF0YXNldCwgZG9jIl0KICAgICAgICAgICAgU0NGWyJTY2FmZm9sZGVyPGJyLz5Hb2xkZW4gcGF0aCB0ZW1wbGF0ZXM8YnIvPlNlY3VyaXR5ICsgQ0kvQ0QgYnkgZGVmYXVsdCJdCiAgICAgICAgICAgIFREWyJUZWNoRG9jczxici8-RG9jcy1hcy1jb2RlPGJyLz5BdXRvLWdlbmVyYXRlZCJdCiAgICAgICAgZW5kCiAgICAgICAgc3ViZ3JhcGggUGx1Z2luc1siUGx1Z2luIEVjb3N5c3RlbSJdCiAgICAgICAgICAgIEs4U1siS3ViZXJuZXRlczxici8-UGx1Z2luIl0KICAgICAgICAgICAgR0hBWyJHaXRIdWIgQWN0aW9uczxici8-UGx1Z2luIl0KICAgICAgICAgICAgRERbIkRhdGFkb2c8YnIvPlBsdWdpbiJdCiAgICAgICAgICAgIFBEWyJQYWdlckR1dHk8YnIvPlBsdWdpbiJdCiAgICAgICAgICAgIEFXU1siQVdTPGJyLz5QbHVnaW4iXQogICAgICAgIGVuZAogICAgZW5kCgogICAgREVWWyJEZXZlbG9wZXJzIl0gLS0-IEJhY2tzdGFnZQogICAgQmFja3N0YWdlIC0tPiBJTkZSQVsiQWxsIEluZnJhc3RydWN0dXJlPGJyLz5PbmUgUGFuZSBvZiBHbGFzcyJdCgogICAgc3R5bGUgQ29yZSBmaWxsOiNlM2YyZmQKICAgIHN0eWxlIFBsdWdpbnMgZmlsbDojZjNlNWY1CiAgICBzdHlsZSBCYWNrc3RhZ2UgZmlsbDojZmZmOWM0" alt="diagram" width="1191" height="751"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What Backstage Manages at Spotify Today
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource Type&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Backend Services&lt;/td&gt;
&lt;td&gt;2,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Websites&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Pipelines&lt;/td&gt;
&lt;td&gt;4,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mobile Features&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;strong&gt;Software Catalog&lt;/strong&gt; is the source of truth. Every component has a &lt;code&gt;catalog-info.yaml&lt;/code&gt; file in its repo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# catalog-info.yaml&lt;/span&gt;
&lt;span class="c1"&gt;# Every Spotify service has one of these in its repo root&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backstage.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Component&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;discover-weekly-generator&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Weekly&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;batch&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;job&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;generating&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;personalized&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Discover&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Weekly&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;playlists"&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;github.com/project-slug&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;spotify/discover-weekly&lt;/span&gt;
    &lt;span class="na"&gt;backstage.io/techdocs-ref&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dir:.&lt;/span&gt;
    &lt;span class="na"&gt;pagerduty.com/service-id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;P2XYZAB&lt;/span&gt;
    &lt;span class="na"&gt;datadog.com/service-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;discover-weekly-generator&lt;/span&gt;
  &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;ml&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;recommendations&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;batch&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service&lt;/span&gt;
  &lt;span class="na"&gt;lifecycle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
  &lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;recommendations-squad&lt;/span&gt;
  &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;recommendation-platform&lt;/span&gt;
  &lt;span class="na"&gt;dependsOn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;resource:default/user-feature-store&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;resource:default/track-embedding-store&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;component:default/hendrix-ml-platform&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;Scaffolder&lt;/strong&gt; generates new services from golden path templates — templates that include security scanning, observability hooks, CI/CD pipelines, and Backstage registration by default. The "right way" is the easy way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Outside Spotify
&lt;/h3&gt;

&lt;p&gt;Five years after open-sourcing, Backstage has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;3,400+ adopting companies&lt;/strong&gt; (Expedia, American Airlines, Zalando, Netflix, Twilio, Wayfair)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;1,600+ open-source contributors&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Donated to the &lt;strong&gt;CNCF&lt;/strong&gt; — now the standard for internal developer portals&lt;/li&gt;
&lt;li&gt;Evolved into &lt;strong&gt;Spotify Portal&lt;/strong&gt; (enterprise SaaS, GA October 2025)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Squad Model: What Actually Works
&lt;/h2&gt;

&lt;p&gt;The "Spotify Model" — Squads, Tribes, Chapters, Guilds — is the most imitated and most misunderstood organizational pattern in tech.&lt;/p&gt;

&lt;p&gt;Here's what the &lt;a href="https://blog.crisp.se/2012/11/14/henrikkniberg/scaling-agile-at-spotify" rel="noopener noreferrer"&gt;original 2012 whitepaper&lt;/a&gt; actually said:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Unit&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Squad&lt;/td&gt;
&lt;td&gt;6-12 people&lt;/td&gt;
&lt;td&gt;Full ownership: design, build, test, release, operate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tribe&lt;/td&gt;
&lt;td&gt;40-150 people&lt;/td&gt;
&lt;td&gt;Coordination across squads in same product area&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chapter&lt;/td&gt;
&lt;td&gt;6-15 specialists&lt;/td&gt;
&lt;td&gt;Craft community within a tribe (e.g., all iOS engineers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guild&lt;/td&gt;
&lt;td&gt;Any size&lt;/td&gt;
&lt;td&gt;Voluntary community of interest across the company&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key principle: &lt;strong&gt;"Loosely coupled but tightly aligned."&lt;/strong&gt; Squads move fast independently, but all move in the same strategic direction.&lt;/p&gt;

&lt;p&gt;But here's what Henrik Kniberg himself says now: &lt;em&gt;"Don't copy the Spotify model. That's the opposite of what we intended."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Spotify no longer follows the original model exactly — it evolved constantly. The org chart was always secondary to the &lt;strong&gt;autonomy principle&lt;/strong&gt;: if a squad can't deploy independently, something is wrong in the service design or the org design. Fix whichever is broken.&lt;/p&gt;

&lt;p&gt;The technical manifestation of squad autonomy is Conway's Law in reverse: design your organization first, and your service architecture will follow. Spotify's thousands of independently deployable microservices exist because thousands of squads have full ownership of them.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to Steal (and What to Leave Behind)
&lt;/h2&gt;

&lt;p&gt;Here's what's actually worth taking from Spotify's playbook — and what requires Spotify-level scale to justify:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Steal It?&lt;/th&gt;
&lt;th&gt;Minimum Scale&lt;/th&gt;
&lt;th&gt;Effort&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Software Catalog (Backstage)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10+ teams&lt;/td&gt;
&lt;td&gt;Low — free, CNCF standard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Golden path templates (Scaffolder)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5+ teams&lt;/td&gt;
&lt;td&gt;Medium — template once, scale forever&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;64% learning rate metric&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Any scale&lt;/td&gt;
&lt;td&gt;Low — just change what you measure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Feature flags + gradual rollouts&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Any scale&lt;/td&gt;
&lt;td&gt;Low — Confidence or LaunchDarkly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fleet automation for dependencies&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;50+ services&lt;/td&gt;
&lt;td&gt;Medium — Dependabot + custom automation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Squad autonomy principle&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes (carefully)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3+ teams&lt;/td&gt;
&lt;td&gt;High — org change, not tech change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3-layer recommendation engine&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Adapted&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10K+ users&lt;/td&gt;
&lt;td&gt;High — need data volume to work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCP Pub/Sub at 3M events/sec&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No (yet)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100M+ events/day&lt;/td&gt;
&lt;td&gt;Infrastructure complexity not worth it early&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hendrix ML platform&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100+ ML practitioners&lt;/td&gt;
&lt;td&gt;Overkill; use SageMaker/Vertex AI instead&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The three questions worth asking your team right now:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Can each team deploy independently, without coordinating with other teams?&lt;/strong&gt; If no, fix the service design or the team structure — but fix it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Are you measuring learning rate or just win rate?&lt;/strong&gt; Every experiment that disproves a bad idea is a win. Build a culture that treats it that way.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Does your internal developer portal make the right thing the easy thing?&lt;/strong&gt; If developers skip security scanning because setting it up is hard, the problem isn't the developers.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Spotify's data-driven architecture didn't emerge from a whiteboard session or a consulting engagement. It emerged from 20 years of building autonomy into every layer of the organization and letting that autonomy produce the architecture.&lt;/p&gt;

&lt;p&gt;The event pipeline processes 1 trillion events a day not because Spotify chose GCP Pub/Sub — it's because 300+ squads each own their data and ship their pipelines without waiting for a central team.&lt;/p&gt;

&lt;p&gt;Discover Weekly recommends music that feels personal not because of any single ML breakthrough — it's because a recommendations squad owned that problem for 10 years and had the freedom to experiment every Monday.&lt;/p&gt;

&lt;p&gt;Backstage manages 4,000 data pipelines and 2,000 services not because it's technically clever — it's because the alternative (no catalog) gets exponentially more painful as you grow.&lt;/p&gt;

&lt;p&gt;The tools are available to any company. Most of them are open source or commercially available today. The discipline is what differentiates Spotify — and that part you have to build yourself.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Official Documentation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://engineering.atspotify.com/" rel="noopener noreferrer"&gt;Spotify Engineering Blog&lt;/a&gt; — primary source for all technical patterns described here&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://research.atspotify.com/" rel="noopener noreferrer"&gt;Spotify Research&lt;/a&gt; — 200+ ML and recommendation papers&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://backstage.io/" rel="noopener noreferrer"&gt;Backstage.io&lt;/a&gt; — open source, free, CNCF graduated&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://confidence.spotify.com/" rel="noopener noreferrer"&gt;Confidence&lt;/a&gt; — Spotify's A/B testing platform, now commercial&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Books
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Building Microservices&lt;/em&gt; by Sam Newman (O'Reilly, 2nd ed. 2021) — covers squad/service alignment&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Designing Data-Intensive Applications&lt;/em&gt; by Martin Kleppmann (O'Reilly, 2017) — event streaming fundamentals&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Engineering Blog Posts
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://engineering.atspotify.com/2023/04/spotifys-shift-to-a-fleet-first-mindset-part-1" rel="noopener noreferrer"&gt;Fleet Management at Spotify Part 1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://engineering.atspotify.com/2024/5/data-platform-explained-part-ii" rel="noopener noreferrer"&gt;Data Platform Explained Part II&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://engineering.atspotify.com/2023/8/coming-soon-confidence-an-experimentation-platform-from-spotify" rel="noopener noreferrer"&gt;Coming Soon: Confidence&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://engineering.atspotify.com/2023/02/unleashing-ml-innovation-at-spotify-with-ray" rel="noopener noreferrer"&gt;Unleashing ML Innovation with Ray&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://engineering.atspotify.com/2025/4/celebrating-five-years-of-backstage" rel="noopener noreferrer"&gt;Celebrating Five Years of Backstage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Research Papers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://research.atspotify.com/2025/9/semantic-ids-for-generative-search-and-recommendation" rel="noopener noreferrer"&gt;Semantic IDs for Generative Search and Recommendation&lt;/a&gt; (NeurIPS 2025)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://research.atspotify.com/2023/02/users-interests-are-multi-faceted-recommendation-models-should-be-too" rel="noopener noreferrer"&gt;Users' Interests are Multi-faceted: Recommendation Models Should Be Too&lt;/a&gt; (WSDM 2023)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://research.atspotify.com/2023/07/optimizing-for-the-long-term-without-delay" rel="noopener noreferrer"&gt;Optimizing for the Long-Term Without Delay&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Original Squad Model Reference
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://blog.crisp.se/2012/11/14/henrikkniberg/scaling-agile-at-spotify" rel="noopener noreferrer"&gt;Scaling Agile @ Spotify&lt;/a&gt; by Henrik Kniberg &amp;amp; Anders Ivarsson (2012)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Did you find this article helpful?&lt;/strong&gt; Follow me for more content on system design, data engineering, and cloud architecture!&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>microservices</category>
      <category>dataengineering</category>
      <category>spotify</category>
    </item>
    <item>
      <title>How Netflix Turns 2 Trillion Daily Events Into Architectural Decisions (And How You Can Too)</title>
      <dc:creator>David Marcelo Petrocelli</dc:creator>
      <pubDate>Tue, 03 Mar 2026 02:05:37 +0000</pubDate>
      <link>https://dev.to/david_marcelopetrocelli_/how-netflix-turns-2-trillion-daily-events-into-architectural-decisions-and-how-you-can-too-58k1</link>
      <guid>https://dev.to/david_marcelopetrocelli_/how-netflix-turns-2-trillion-daily-events-into-architectural-decisions-and-how-you-can-too-58k1</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Difficulty Level: 300 - Advanced&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Netflix processes 2+ trillion events/day through Kafka and 20,000+ Flink jobs, but the real differentiator is not scale -- it is using that data to drive every architectural decision, from Java version migrations to database selection.&lt;/li&gt;
&lt;li&gt;Their Data Mesh platform with Streaming SQL democratized real-time processing: 1,200 SQL processors created in one year by non-infrastructure teams, processing 100 million events/second across 5,000+ pipelines.&lt;/li&gt;
&lt;li&gt;Every product change goes through A/B testing (150K-450K RPS, &amp;lt;1ms cache-warm latency), and in 2025 ML-optimized experimentation reduces experiment duration by up to 40%.&lt;/li&gt;
&lt;li&gt;The biggest lesson is what NOT to copy: Netflix explicitly warns against "streaming all the things," and their architecture reflects 15+ years of incremental evolution with 10,000+ engineers -- blindly replicating it is a documented anti-pattern.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The 2 Trillion Events Question
&lt;/h2&gt;

&lt;p&gt;Netflix processes over 2 trillion events every single day. Three petabytes of data ingested. Seven petabytes output.&lt;/p&gt;

&lt;p&gt;Those numbers are staggering, but scale is not what makes Netflix's architecture remarkable. What makes it remarkable is that every one of those events feeds back into decisions about what to build next.&lt;/p&gt;

&lt;p&gt;Netflix runs 1,000+ microservices on AWS across 100,000+ EC2 instances, serving 300M+ subscribers and generating $39B in revenue (2024). Their estimated annual AWS spend exceeds $1.3B. But the companies that try to replicate Netflix's infrastructure miss the point entirely. The architecture is not the product of a grand design -- it is the product of 15+ years of data-driven decisions, each one measured, validated, and rolled out incrementally.&lt;/p&gt;

&lt;p&gt;After years of building distributed systems for enterprise clients and teaching these patterns at university, I have found that the most common mistake teams make is copying Netflix's tools rather than Netflix's discipline. In this article, I will break down how their real-time data pipeline feeds architectural decisions across experimentation, observability, chaos engineering, and platform engineering -- and identify the patterns you can actually adopt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Familiarity with microservices architecture patterns (circuit breakers, service discovery, API gateways)&lt;/li&gt;
&lt;li&gt;Basic understanding of stream processing concepts (Kafka, Flink, or similar)&lt;/li&gt;
&lt;li&gt;Experience with distributed systems at any scale&lt;/li&gt;
&lt;li&gt;Understanding of A/B testing fundamentals&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What You'll Learn
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;How Netflix's Kafka + Flink pipeline evolved from 45 billion events/day (2011) to 2+ trillion&lt;/li&gt;
&lt;li&gt;Why Netflix rejected graph databases for their 8-billion-node distributed graph and chose Cassandra instead&lt;/li&gt;
&lt;li&gt;How their A/B testing platform handles 450K RPS with sub-millisecond latency&lt;/li&gt;
&lt;li&gt;What Netflix's observability stack looks like at 17 billion metrics/day and 700 billion traces/day&lt;/li&gt;
&lt;li&gt;Which Netflix patterns you should adopt -- and which ones you should absolutely avoid&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Data Pipeline: Kafka, Flink, and Data Mesh at Trillions Scale
&lt;/h2&gt;

&lt;p&gt;Netflix's real-time data infrastructure evolved through &lt;a href="https://zhenzhongxu.com/the-four-innovation-phases-of-netflixs-trillions-scale-real-time-data-infrastructure-2370938d7f01" rel="noopener noreferrer"&gt;four distinct innovation phases&lt;/a&gt; over 13 years. Understanding this evolution matters because it reveals that no one designed a "trillions-scale pipeline" from scratch. Every layer was added to solve a concrete problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBJbmdlc3Rpb24KICAgICAgICBVQVtVc2VyIEFjdGlvbnNdIC0tPiBBR1tBUEkgR2F0ZXdheV0KICAgICAgICBBRyAtLT4gS0ZbIkthZmthPGJyLz5BdnJvICsgU2NoZW1hIFJlZ2lzdHJ5PGJyLz4xTSBtc2cvc2VjL3RvcGljIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIFByb2Nlc3NpbmcKICAgICAgICBLRiAtLT4gRkxbIkFwYWNoZSBGbGluazxici8-MjBLKyBqb2JzIl0KICAgICAgICBTUUxbIlN0cmVhbWluZyBTUUw8YnIvPjEsMjAwKyBwcm9jZXNzb3JzIl0gLS4tPnx3cmFwc3wgRkwKICAgICAgICBGTCAtLT4gRE1bIkRhdGEgTWVzaDxici8-NU0gcmVjb3Jkcy9zZWMiXQogICAgZW5kCgogICAgc3ViZ3JhcGggU2lua3MKICAgICAgICBETSAtLT4gSUNbIkFwYWNoZSBJY2ViZXJnPGJyLz5XYXJlaG91c2UiXQogICAgICAgIERNIC0tPiBDU1siQ2Fzc2FuZHJhPGJyLz5LVkRBTCJdCiAgICAgICAgRE0gLS0-IERSWyJBcGFjaGUgRHJ1aWQ8YnIvPkFuYWx5dGljcyJdCiAgICAgICAgRE0gLS0-IE1TWyJEb3duc3RyZWFtPGJyLz5NaWNyb3NlcnZpY2VzIl0KICAgIGVuZAoKICAgIHN0eWxlIEluZ2VzdGlvbiBmaWxsOiNlM2YyZmQKICAgIHN0eWxlIFByb2Nlc3NpbmcgZmlsbDojZmZmM2UwCiAgICBzdHlsZSBTaW5rcyBmaWxsOiNlOGY1ZTk%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBzdWJncmFwaCBJbmdlc3Rpb24KICAgICAgICBVQVtVc2VyIEFjdGlvbnNdIC0tPiBBR1tBUEkgR2F0ZXdheV0KICAgICAgICBBRyAtLT4gS0ZbIkthZmthPGJyLz5BdnJvICsgU2NoZW1hIFJlZ2lzdHJ5PGJyLz4xTSBtc2cvc2VjL3RvcGljIl0KICAgIGVuZAoKICAgIHN1YmdyYXBoIFByb2Nlc3NpbmcKICAgICAgICBLRiAtLT4gRkxbIkFwYWNoZSBGbGluazxici8-MjBLKyBqb2JzIl0KICAgICAgICBTUUxbIlN0cmVhbWluZyBTUUw8YnIvPjEsMjAwKyBwcm9jZXNzb3JzIl0gLS4tPnx3cmFwc3wgRkwKICAgICAgICBGTCAtLT4gRE1bIkRhdGEgTWVzaDxici8-NU0gcmVjb3Jkcy9zZWMiXQogICAgZW5kCgogICAgc3ViZ3JhcGggU2lua3MKICAgICAgICBETSAtLT4gSUNbIkFwYWNoZSBJY2ViZXJnPGJyLz5XYXJlaG91c2UiXQogICAgICAgIERNIC0tPiBDU1siQ2Fzc2FuZHJhPGJyLz5LVkRBTCJdCiAgICAgICAgRE0gLS0-IERSWyJBcGFjaGUgRHJ1aWQ8YnIvPkFuYWx5dGljcyJdCiAgICAgICAgRE0gLS0-IE1TWyJEb3duc3RyZWFtPGJyLz5NaWNyb3NlcnZpY2VzIl0KICAgIGVuZAoKICAgIHN0eWxlIEluZ2VzdGlvbiBmaWxsOiNlM2YyZmQKICAgIHN0eWxlIFByb2Nlc3NpbmcgZmlsbDojZmZmM2UwCiAgICBzdHlsZSBTaW5rcyBmaWxsOiNlOGY1ZTk%3D" alt="diagram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Keystone Pipeline
&lt;/h3&gt;

&lt;p&gt;At the core is Keystone, a petabyte-scale real-time event streaming and processing system. It scaled from 1 trillion events/day in 2017 to 2+ trillion today -- a 20x increase over four years.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kafka serves as the universal backbone.&lt;/strong&gt; Thousands of topics carry roughly 1 million messages per second per topic, all Avro-encoded with schemas persisted in a centralized internal registry. Every record is dual-written to both streaming consumers (Flink) and the analytical warehouse (Apache Iceberg), enabling real-time processing and historical backfills simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Apache Flink is the processing engine.&lt;/strong&gt; Netflix runs 20,000+ Flink jobs concurrently, handling everything from graph materialization to observability analytics to ad event processing. The Data Mesh platform writes 5 million records per second across these pipelines.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;Daily Events&lt;/th&gt;
&lt;th&gt;Key Innovation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2011&lt;/td&gt;
&lt;td&gt;45 billion&lt;/td&gt;
&lt;td&gt;Chukwa-based ingestion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2015&lt;/td&gt;
&lt;td&gt;500 billion&lt;/td&gt;
&lt;td&gt;Keystone pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2017&lt;/td&gt;
&lt;td&gt;1 trillion&lt;/td&gt;
&lt;td&gt;Managed Kafka platform&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2021+&lt;/td&gt;
&lt;td&gt;2+ trillion&lt;/td&gt;
&lt;td&gt;Data Mesh + Streaming SQL&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Streaming SQL: Democratizing Real-Time Processing
&lt;/h3&gt;

&lt;p&gt;The most impactful recent evolution was not a scale increase -- it was an accessibility one. Netflix introduced &lt;a href="https://netflixtechblog.com/streaming-sql-in-data-mesh-0d83f5a00d08" rel="noopener noreferrer"&gt;Streaming SQL in Data Mesh&lt;/a&gt;, wrapping Flink's complex DataStream API behind standard SQL.&lt;/p&gt;

&lt;p&gt;The results were immediate: &lt;strong&gt;1,200 SQL processors created within one year of launch&lt;/strong&gt;, built by non-infrastructure teams. The platform now processes 100 million events per second across 5,000+ pipelines. Netflix won the Confluent Data Streaming Award for this work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Netflix Data Mesh Streaming SQL Processor&lt;/span&gt;
&lt;span class="c1"&gt;-- Domain experts write standard SQL against streaming sources&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;member_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TUMBLE_START&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'5'&lt;/span&gt; &lt;span class="k"&gt;MINUTE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;window_start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;interaction_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;member_events&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt;
    &lt;span class="n"&gt;member_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;TUMBLE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INTERVAL&lt;/span&gt; &lt;span class="s1"&gt;'5'&lt;/span&gt; &lt;span class="k"&gt;MINUTE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is the democratization pattern in action: build complex infrastructure (Flink), then wrap it in an accessible interface (SQL). Domain experts build data products without being stream processing specialists.&lt;/p&gt;

&lt;p&gt;Netflix explicitly warns against the opposite approach: &lt;a href="https://www.infoq.com/articles/netflix-migrating-stream-processing/" rel="noopener noreferrer"&gt;"Don't stream all the things."&lt;/a&gt; When they migrated critical pipelines from 24-hour batch latency to real-time, they documented the "pioneer tax" -- increased on-call burden, JAR hell, and complex failure recovery. Batch processing remains the right choice when real-time does not add measurable business value.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://current.confluent.io/post-conference-videos-2025/democratising-stream-processing-how-netflix-empowers-teams-with-data-mesh-and-streaming-sql-lnd25" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;current.confluent.io&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;








&lt;h2&gt;
  
  
  The Real-Time Distributed Graph: Architecture Under the Hood
&lt;/h2&gt;

&lt;p&gt;In October 2025, Netflix published the architecture behind their &lt;a href="https://netflixtechblog.com/how-and-why-netflix-built-a-real-time-distributed-graph-part-1-ingesting-and-processing-data-80113e124acc" rel="noopener noreferrer"&gt;Real-Time Distributed Graph (RDG)&lt;/a&gt; -- a system modeling member interactions at internet scale. The numbers: 8 billion+ nodes, 150 billion+ edges, sustaining 2 million reads/second and 6 million writes/second.&lt;/p&gt;

&lt;p&gt;What makes this architecturally instructive is not the scale but the storage decision.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBLVFtLYWZrYSBUb3BpY3NdIC0tPiBGTFsiRmxpbms8YnIvPkdyYXBoIE1hdGVyaWFsaXphdGlvbiJdCiAgICBGTCAtLT4gS1ZbIktWREFMPGJyLz5nUlBDIEludGVyZmFjZSJdCgogICAgc3ViZ3JhcGggU3RvcmFnZVsiQ2Fzc2FuZHJhIENsdXN0ZXJzPGJyLz4xMiBjbHVzdGVycyDCtyAyLDQwMCBFQzIgaW5zdGFuY2VzIl0KICAgICAgICBOUzFbTmFtZXNwYWNlIEFdCiAgICAgICAgTlMyW05hbWVzcGFjZSBCXQogICAgICAgIE5TM1tOYW1lc3BhY2UgTl0KICAgIGVuZAoKICAgIEtWIC0tPiBOUzEKICAgIEtWIC0tPiBOUzIKICAgIEtWIC0tPiBOUzMKCiAgICBFVlsiRVZDYWNoZTxici8-U3ViLW1zIExhdGVuY3kiXSAtLi0-fGNhY2hlIGxheWVyfCBLVgoKICAgIEtWIC0tPiBTVkNbIk1pY3Jvc2VydmljZXM8YnIvPjJNIHJlYWRzL3MgwrcgNk0gd3JpdGVzL3MiXQoKICAgIHN0eWxlIFN0b3JhZ2UgZmlsbDojZmNlNGVj" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBLVFtLYWZrYSBUb3BpY3NdIC0tPiBGTFsiRmxpbms8YnIvPkdyYXBoIE1hdGVyaWFsaXphdGlvbiJdCiAgICBGTCAtLT4gS1ZbIktWREFMPGJyLz5nUlBDIEludGVyZmFjZSJdCgogICAgc3ViZ3JhcGggU3RvcmFnZVsiQ2Fzc2FuZHJhIENsdXN0ZXJzPGJyLz4xMiBjbHVzdGVycyDCtyAyLDQwMCBFQzIgaW5zdGFuY2VzIl0KICAgICAgICBOUzFbTmFtZXNwYWNlIEFdCiAgICAgICAgTlMyW05hbWVzcGFjZSBCXQogICAgICAgIE5TM1tOYW1lc3BhY2UgTl0KICAgIGVuZAoKICAgIEtWIC0tPiBOUzEKICAgIEtWIC0tPiBOUzIKICAgIEtWIC0tPiBOUzMKCiAgICBFVlsiRVZDYWNoZTxici8-U3ViLW1zIExhdGVuY3kiXSAtLi0-fGNhY2hlIGxheWVyfCBLVgoKICAgIEtWIC0tPiBTVkNbIk1pY3Jvc2VydmljZXM8YnIvPjJNIHJlYWRzL3MgwrcgNk0gd3JpdGVzL3MiXQoKICAgIHN0eWxlIFN0b3JhZ2UgZmlsbDojZmNlNGVj" alt="diagram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Netflix Rejected Graph Databases
&lt;/h3&gt;

&lt;p&gt;Netflix &lt;a href="https://netflixtechblog.medium.com/how-and-why-netflix-built-a-real-time-distributed-graph-part-2-building-a-scalable-storage-layer-ff4a8dbd3d1f" rel="noopener noreferrer"&gt;evaluated and rejected Neo4j&lt;/a&gt; for the RDG storage layer. Neo4j performed well for millions of records but became inefficient beyond hundreds of millions due to high memory requirements and limited horizontal scaling.&lt;/p&gt;

&lt;p&gt;Instead, they chose KVDAL (Key-Value Data Abstraction Layer), built on Apache Cassandra. The storage layer spans approximately 27 namespaces across 12 Cassandra clusters backed by 2,400 EC2 instances. EVCache (Memcached-based) sits in front of Cassandra, providing sub-millisecond read latency on hot data.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;Neo4j&lt;/th&gt;
&lt;th&gt;Cassandra + KVDAL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scale&lt;/td&gt;
&lt;td&gt;Millions of records&lt;/td&gt;
&lt;td&gt;Billions+ (8B nodes, 150B edges)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Horizontal scaling&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Linear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Write performance&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;6M writes/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Read latency (cached)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Sub-millisecond (EVCache)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Netflix verdict&lt;/td&gt;
&lt;td&gt;Rejected&lt;/td&gt;
&lt;td&gt;Selected&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Data Abstraction Layer Pattern
&lt;/h3&gt;

&lt;p&gt;The critical design decision here is not "use Cassandra" -- it is the abstraction layer. Applications interact with KVDAL via gRPC, so storage backends can be swapped without code changes. The namespace model supports flexible backends: different namespaces can use different Cassandra clusters or entirely different storage technologies.&lt;/p&gt;

&lt;p&gt;In my experience building distributed storage systems, this pattern pays for itself the first time you need to migrate backends. Netflix's approach -- evaluate with data, abstract the interface, isolate by namespace -- is directly adoptable regardless of your scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  A/B Testing as an Architectural Principle
&lt;/h2&gt;

&lt;p&gt;At Netflix, &lt;strong&gt;every&lt;/strong&gt; product change goes through A/B testing before becoming the default. This is not a feature -- it is an architectural principle. As Netflix puts it, the goal is &lt;a href="https://netflixtechblog.com/its-all-a-bout-testing-the-netflix-experimentation-platform-4e1ca458c15" rel="noopener noreferrer"&gt;"product decisions driven by data, not by the most opinionated and vocal employees."&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBBW1Byb2R1Y3QgQ2hhbmdlIFByb3Bvc2FsXSAtLT4gQltFeHBlcmltZW50IERlc2lnbl0KICAgIEIgLS0-IEN7SGFzaC1CYXNlZCBBbGxvY2F0aW9ufQogICAgQyAtLT4gRFtUcmVhdG1lbnQgR3JvdXBdCiAgICBDIC0tPiBFW0NvbnRyb2wgR3JvdXBdCiAgICBEIC0tPiBGW1JlYWwtVGltZSBNZXRyaWNzIHZpYSBLYWZrYSBhbmQgRmxpbmtdCiAgICBFIC0tPiBGCiAgICBGIC0tPiBHW1NlcXVlbnRpYWwgU3RhdGlzdGljYWwgQW5hbHlzaXNdCiAgICBHIC0tPiBIe0RlY2lzaW9ufQogICAgSCAtLT58UG9zaXRpdmV8IElbU2hpcCB0byBBbGwgVXNlcnNdCiAgICBIIC0tPnxOZWdhdGl2ZXwgSltLaWxsIHRoZSBDaGFuZ2VdCiAgICBIIC0tPnxJbmNvbmNsdXNpdmV8IEtbSXRlcmF0ZSBhbmQgUmV0ZXN0XQogICAgRyAtLi0-fE1MIE9wdGltaXphdGlvbiAyMDI1fCBMW1JlZHVjZSBEdXJhdGlvbiB1cCB0byA0MCVdCiAgICBMIC0uLT4gRw%3D%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IFRECiAgICBBW1Byb2R1Y3QgQ2hhbmdlIFByb3Bvc2FsXSAtLT4gQltFeHBlcmltZW50IERlc2lnbl0KICAgIEIgLS0-IEN7SGFzaC1CYXNlZCBBbGxvY2F0aW9ufQogICAgQyAtLT4gRFtUcmVhdG1lbnQgR3JvdXBdCiAgICBDIC0tPiBFW0NvbnRyb2wgR3JvdXBdCiAgICBEIC0tPiBGW1JlYWwtVGltZSBNZXRyaWNzIHZpYSBLYWZrYSBhbmQgRmxpbmtdCiAgICBFIC0tPiBGCiAgICBGIC0tPiBHW1NlcXVlbnRpYWwgU3RhdGlzdGljYWwgQW5hbHlzaXNdCiAgICBHIC0tPiBIe0RlY2lzaW9ufQogICAgSCAtLT58UG9zaXRpdmV8IElbU2hpcCB0byBBbGwgVXNlcnNdCiAgICBIIC0tPnxOZWdhdGl2ZXwgSltLaWxsIHRoZSBDaGFuZ2VdCiAgICBIIC0tPnxJbmNvbmNsdXNpdmV8IEtbSXRlcmF0ZSBhbmQgUmV0ZXN0XQogICAgRyAtLi0-fE1MIE9wdGltaXphdGlvbiAyMDI1fCBMW1JlZHVjZSBEdXJhdGlvbiB1cCB0byA0MCVdCiAgICBMIC0uLT4gRw%3D%3D" alt="diagram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Experimentation Platform
&lt;/h3&gt;

&lt;p&gt;Netflix's experimentation platform handles &lt;a href="https://netflixtechblog.com/its-all-a-bout-testing-the-netflix-experimentation-platform-4e1ca458c15" rel="noopener noreferrer"&gt;150K to 450K requests per second&lt;/a&gt; with cache-warm latency under 1ms and real-time evaluation averaging approximately 50ms. Allocation is deterministic: a hash of &lt;code&gt;member_id + experiment_id&lt;/code&gt; assigns each user to an experiment cell consistently across sessions and devices.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Netflix experiment allocation pattern
# Each member is assigned to experiment cells deterministically
# using member_id + experiment_id hash
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;allocate_member&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;member_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;experiment_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_cells&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Deterministic allocation ensures consistent user experience
    across sessions and devices.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;hash_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;member_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;experiment_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hash_value&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;num_cells&lt;/span&gt;

&lt;span class="c1"&gt;# Sequential testing: allows early stopping
# Netflix monitors experiments continuously rather than
# waiting for fixed sample sizes
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;a href="https://netflixtechblog.com/sequential-a-b-testing-keeps-the-world-streaming-netflix-part-1-continuous-data-cba6c7ed49df" rel="noopener noreferrer"&gt;Sequential testing&lt;/a&gt; is critical for infrastructure experiments where bad changes could degrade streaming quality for millions. Unlike fixed-horizon tests, sequential tests let Netflix stop experiments early when results are conclusive, reducing both time and user exposure to suboptimal experiences.&lt;/p&gt;
&lt;h3&gt;
  
  
  The 2025 Evolution: ML-Optimized Experimentation
&lt;/h3&gt;

&lt;p&gt;Beginning in 2025, Netflix started using &lt;a href="https://netflixtechblog.com/a-b-testing-and-beyond-improving-the-netflix-streaming-experience-with-experimentation-and-data-5b0ae9295bdf" rel="noopener noreferrer"&gt;machine learning to optimize A/B testing&lt;/a&gt;. Adaptive causal-inference models reduce experiment duration by up to 40%. Combined with server-driven UI -- which enables experimentation without app store releases -- Netflix continuously iterates on the experience of 300M+ subscribers.&lt;/p&gt;

&lt;p&gt;Their causal inference extends well beyond simple A/B testing: contextual bandits for content matching, counterfactual logging for offline experiments, and surrogate metrics for inferring long-term effects from short-term data. Data scientists analyze billions of rows on single machines using Python and R -- a deliberate architectural choice prioritizing analyst productivity over distributed computing complexity.&lt;/p&gt;


&lt;h2&gt;
  
  
  Observability as an Architectural Decision Engine
&lt;/h2&gt;

&lt;p&gt;Netflix's observability stack is not just for debugging. It is the feedback loop that drives architectural evolution.&lt;/p&gt;

&lt;p&gt;The numbers: &lt;a href="https://netflix.github.io/atlas-docs/overview/" rel="noopener noreferrer"&gt;Atlas&lt;/a&gt; processes 17 billion metrics per day. The platform handles 700 billion distributed traces per day and 1.5 petabytes of log data. All of this costs less than 5% of Netflix's total infrastructure spend -- a deliberate and measured investment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBTVkNbIk1pY3Jvc2VydmljZXM8YnIvPjEsMDAwKyJdIC0tPiBBVFsiQXRsYXM8YnIvPjE3QiBtZXRyaWNzL2RheSJdCiAgICBTVkMgLS0-IFRSWyJEaXN0cmlidXRlZCBUcmFjaW5nPGJyLz43MDBCIHRyYWNlcy9kYXkiXQogICAgU1ZDIC0tPiBMR1siTG9nczxici8-MS41IFBCIl0KCiAgICBzdWJncmFwaCBBbmFseXNpc1siU3RyZWFtaW5nIEFuYWx5c2lzIl0KICAgICAgICBGTFtBcGFjaGUgRmxpbmtdCiAgICBlbmQKCiAgICBBVCAtLT4gRkwKICAgIFRSIC0tPiBGTAogICAgTEcgLS0-IEZMCgogICAgRkwgLS0-IEFMW0FsZXJ0aW5nXQogICAgRkwgLS0-IERCW0Rhc2hib2FyZHNdCiAgICBGTCAtLT4gQURbIkFyY2hpdGVjdHVyYWw8YnIvPkRlY2lzaW9ucyJdCgogICAgc3ViZ3JhcGggUGlwZWxpbmVbIkRhdGEgUGlwZWxpbmUgVHJhY2luZyJdCiAgICAgICAgRFBbRGF0YSBQaXBlbGluZXNdIC0tPiBJTlsiSW5jYTxici8-VVVJRCBwZXIgbWVzc2FnZSJdCiAgICAgICAgSU4gLS0-IExEWyJMb3NzICYgRHVwbGljYXRlPGJyLz5EZXRlY3Rpb24iXQogICAgZW5kCgogICAgTEQgLS0-IEFUCgogICAgc3R5bGUgQW5hbHlzaXMgZmlsbDojZmZmM2UwCiAgICBzdHlsZSBQaXBlbGluZSBmaWxsOiNmM2U1ZjU%3D" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmermaid.ink%2Fimg%2FZmxvd2NoYXJ0IExSCiAgICBTVkNbIk1pY3Jvc2VydmljZXM8YnIvPjEsMDAwKyJdIC0tPiBBVFsiQXRsYXM8YnIvPjE3QiBtZXRyaWNzL2RheSJdCiAgICBTVkMgLS0-IFRSWyJEaXN0cmlidXRlZCBUcmFjaW5nPGJyLz43MDBCIHRyYWNlcy9kYXkiXQogICAgU1ZDIC0tPiBMR1siTG9nczxici8-MS41IFBCIl0KCiAgICBzdWJncmFwaCBBbmFseXNpc1siU3RyZWFtaW5nIEFuYWx5c2lzIl0KICAgICAgICBGTFtBcGFjaGUgRmxpbmtdCiAgICBlbmQKCiAgICBBVCAtLT4gRkwKICAgIFRSIC0tPiBGTAogICAgTEcgLS0-IEZMCgogICAgRkwgLS0-IEFMW0FsZXJ0aW5nXQogICAgRkwgLS0-IERCW0Rhc2hib2FyZHNdCiAgICBGTCAtLT4gQURbIkFyY2hpdGVjdHVyYWw8YnIvPkRlY2lzaW9ucyJdCgogICAgc3ViZ3JhcGggUGlwZWxpbmVbIkRhdGEgUGlwZWxpbmUgVHJhY2luZyJdCiAgICAgICAgRFBbRGF0YSBQaXBlbGluZXNdIC0tPiBJTlsiSW5jYTxici8-VVVJRCBwZXIgbWVzc2FnZSJdCiAgICAgICAgSU4gLS0-IExEWyJMb3NzICYgRHVwbGljYXRlPGJyLz5EZXRlY3Rpb24iXQogICAgZW5kCgogICAgTEQgLS0-IEFUCgogICAgc3R5bGUgQW5hbHlzaXMgZmlsbDojZmZmM2UwCiAgICBzdHlsZSBQaXBlbGluZSBmaWxsOiNmM2U1ZjU%3D" alt="diagram"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Trace Explosion Problem
&lt;/h3&gt;

&lt;p&gt;Consider encoding a single episode of Squid Game Season 2. According to Netflix engineers at &lt;a href="https://www.infoq.com/presentations/stream-pipeline-observability/" rel="noopener noreferrer"&gt;QCon London 2025&lt;/a&gt;, this generates &lt;strong&gt;1 million trace spans&lt;/strong&gt;, 140 video encodes, 552 audio encodes, and consumes 122,000 CPU hours.&lt;/p&gt;

&lt;p&gt;At that density, traditional tracing tools collapse. Over 300K+ spans per request overwhelm conventional visualization. Netflix solved this with a request-first tree visualization and stream processing via Flink, transforming raw spans into actionable business intelligence.&lt;/p&gt;

&lt;p&gt;The high-cardinality metrics client uses metadata tagging and a taxonomy service exposed via GraphQL API, ensuring consistent metadata across hundreds of services.&lt;/p&gt;
&lt;h3&gt;
  
  
  Observability Driving Business Outcomes
&lt;/h3&gt;

&lt;p&gt;The business outcomes from this investment are concrete: ROI-based resource allocation, workflow caching without user intervention, and measurable cost efficiency improvements. Netflix also built Inca, a message-level tracing system for data pipelines where each message gets a UUID, enabling detection of loss and duplicates across trillions of daily events.&lt;/p&gt;

&lt;p&gt;The key insight: observability at Netflix is not a cost center. It is the mechanism by which data shapes architecture. When encoding costs spike for a particular content type, observability data drives the decision to cache workflows. When trace analysis reveals inefficient service-to-service calls, it informs decomposition decisions.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://www.infoq.com/presentations/stream-pipeline-observability/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fres.infoq.com%2Fpresentations%2Fstream-pipeline-observability%2Fen%2Fcard_header_image%2Ftwitter-card-sujana-sooreddy-naveen-mareddy-1766067087244.jpg" height="auto" class="m-0"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://www.infoq.com/presentations/stream-pipeline-observability/" rel="noopener noreferrer" class="c-link"&gt;
            From Confusion to Clarity: Advanced Observability Strategies for Media Workflows at Netflix - InfoQ
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Naveen Mareddy and Sujana Sooreddy discuss the evolution of Netflix’s media processing observability, moving from monolithic tracing to a high-cardinality analytics platform. They explain how to handle "trace explosion" using stream processing and a "request-first" tree visualization, and share how to transform raw spans into actionable business intelligence.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.infoq.com%2Fstatics_s1_20260319113023%2Ffavicon.ico"&gt;
          infoq.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;








&lt;h2&gt;
  
  
  Chaos Engineering and Resilience: Breaking Things With Data
&lt;/h2&gt;

&lt;p&gt;Netflix invented chaos engineering in 2011 with Chaos Monkey, which randomly terminates production VM instances. It evolved into the &lt;a href="https://www.sei.cmu.edu/blog/devops-case-study-netflix-and-the-chaos-monkey/" rel="noopener noreferrer"&gt;Simian Army&lt;/a&gt; -- Latency Monkey, Conformity Monkey, Doctor Monkey, Security Monkey -- each injecting different failure modes. The discipline was formalized in a &lt;a href="https://arxiv.org/pdf/1702.05843" rel="noopener noreferrer"&gt;2017 whitepaper&lt;/a&gt; establishing five core principles.&lt;/p&gt;

&lt;p&gt;The industry data validates the approach: organizations adopting chaos engineering report a &lt;a href="https://dev.to/jagkush/breaking-things-on-purpose-what-i-learned-from-netflixs-chaos-monkey-2f8p"&gt;35% average reduction in outages and 41% improvement in MTTR&lt;/a&gt;. In 2024, TravelTech implemented Chaos Monkey and discovered a single point of failure in payment processing, preventing a potential outage affecting 30,000+ customers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resilience Evolution: From Libraries to Infrastructure
&lt;/h3&gt;

&lt;p&gt;Netflix's resilience approach has fundamentally shifted. The original library-based patterns (Hystrix for circuit breaking, Ribbon for client load balancing) have been deprecated in favor of infrastructure-based resilience via Envoy service mesh -- zero-configuration resilience that does not require application code changes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;th&gt;Status (2026)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2011&lt;/td&gt;
&lt;td&gt;Chaos testing&lt;/td&gt;
&lt;td&gt;Chaos Monkey / Simian Army&lt;/td&gt;
&lt;td&gt;Active (evolved)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2012&lt;/td&gt;
&lt;td&gt;Circuit breaker&lt;/td&gt;
&lt;td&gt;Hystrix&lt;/td&gt;
&lt;td&gt;Deprecated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2014&lt;/td&gt;
&lt;td&gt;Client load balancing&lt;/td&gt;
&lt;td&gt;Ribbon&lt;/td&gt;
&lt;td&gt;Deprecated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2015&lt;/td&gt;
&lt;td&gt;API gateway&lt;/td&gt;
&lt;td&gt;Zuul&lt;/td&gt;
&lt;td&gt;Zuul 2 (Netty)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2020&lt;/td&gt;
&lt;td&gt;Circuit breaker v2&lt;/td&gt;
&lt;td&gt;Resilience4j&lt;/td&gt;
&lt;td&gt;Active&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026&lt;/td&gt;
&lt;td&gt;Service mesh&lt;/td&gt;
&lt;td&gt;Envoy proxies&lt;/td&gt;
&lt;td&gt;Active (new)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Stateful Systems and Automated Mitigation
&lt;/h3&gt;

&lt;p&gt;Joseph Lynch's work on &lt;a href="https://www.infoq.com/presentations/netflix-stateful-cache/" rel="noopener noreferrer"&gt;Netflix's stateful systems&lt;/a&gt; demonstrates data-driven reliability engineering. Near-caches handle billions of requests per second at sub-100-microsecond latency. When a KeyValueService experienced unexpected traffic doubling, automated mitigation recovered the system within 5 minutes -- no human intervention required.&lt;/p&gt;

&lt;p&gt;The five principles of chaos engineering remain foundational: (1) build a hypothesis around steady state, (2) vary real-world events, (3) run experiments in production, (4) automate continuous experiments, and (5) minimize blast radius. But the real lesson is that chaos engineering without robust observability is just breaking things. You need the feedback loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Live Streaming Stress Test
&lt;/h3&gt;

&lt;p&gt;Even Netflix's battle-tested architecture has limits. The Tyson-Paul fight in November 2024 drew &lt;a href="https://www.infoq.com/news/2025/12/netflix-live-streaming-pipeline/" rel="noopener noreferrer"&gt;65 million concurrent streams and 108 million total viewers&lt;/a&gt;, generating 100K+ Downdetector reports. CDN limitations were exposed.&lt;/p&gt;

&lt;p&gt;But microservices isolation proved its worth: on-demand streaming was NOT affected. The failure was contained to the live event. This is resilience architecture working as designed -- not preventing all failures, but preventing failures from cascading.&lt;/p&gt;




&lt;h2&gt;
  
  
  The GraphQL Federation and Platform Engineering Story
&lt;/h2&gt;

&lt;p&gt;Netflix's API evolution tells a story about data-driven platform decisions. The progression: REST ("OpenAPI") to "API.next" to "DNA" (GraphQL-like) to &lt;a href="https://www.apollographql.com/blog/redefining-api-strategy-why-netflix-platform-engineering-chose-federated-graphql" rel="noopener noreferrer"&gt;Federated GraphQL with the DGS Framework&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Today, 250+ Domain Graph Services maintained by 200+ teams compose a unified API graph. The gateway processes thousands of queries per second with sub-100ms response times and query planning overhead under 10ms.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Netflix DGS Framework - Domain Graph Service&lt;/span&gt;
&lt;span class="nd"&gt;@DgsComponent&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ShowsDataFetcher&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@DgsQuery&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Show&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;shows&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@InputArgument&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;titleFilter&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Each team owns their domain's data fetchers&lt;/span&gt;
        &lt;span class="c1"&gt;// Composed into unified supergraph via federation&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;showsService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getShows&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;titleFilter&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@DgsData&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parentType&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Show"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"reviews"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Review&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;reviews&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;DgsDataFetchingEnvironment&lt;/span&gt; &lt;span class="n"&gt;dfe&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Show&lt;/span&gt; &lt;span class="n"&gt;show&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dfe&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getSource&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;reviewsService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getReviewsForShow&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getId&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Java at Netflix Scale
&lt;/h3&gt;

&lt;p&gt;Netflix runs &lt;a href="https://www.infoq.com/presentations/netflix-java/" rel="noopener noreferrer"&gt;2,800 Java applications&lt;/a&gt; with approximately 1,500 internal libraries. Their migration from Java 8 to Java 17 delivered &lt;strong&gt;20% better CPU usage with zero code changes&lt;/strong&gt; -- a data-driven validation that justified the migration effort across all 2,800 applications.&lt;/p&gt;

&lt;p&gt;Java 21 virtual threads are described as "the most exciting Java feature since lambdas" by Netflix engineers, with optimal results for Tomcat thread pools and GraphQL query execution. However, gRPC worker pools showed a performance decrease. This is data-driven decision making in action -- adopt where the numbers support it, hold where they do not.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Platform Engineering Flywheel
&lt;/h3&gt;

&lt;p&gt;Netflix's workflow orchestrator &lt;a href="https://netflixtechblog.com/100x-faster-how-we-supercharged-netflix-maestros-workflow-engine-028e9637f041" rel="noopener noreferrer"&gt;Maestro&lt;/a&gt; handles hundreds of thousands of workflows and 2 million jobs per day, achieving a 100x performance improvement via an actor model combined with Java 21 virtual threads. Their incremental processing with Apache Iceberg reduced costs to 10% of the original pipeline for some workflows while improving data freshness from daily to hourly.&lt;/p&gt;

&lt;p&gt;The container platform &lt;a href="https://netflix.github.io/" rel="noopener noreferrer"&gt;Titus&lt;/a&gt; launches 1M+ containers per week. Spinnaker supports 4,000+ deploys per day. Netflix spends &lt;a href="https://www.infoq.com/presentations/ips-maestro-iceberg/" rel="noopener noreferrer"&gt;$150 million annually&lt;/a&gt; on compute and storage for data pipelines alone.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Scale&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DGS Framework&lt;/td&gt;
&lt;td&gt;GraphQL Federation&lt;/td&gt;
&lt;td&gt;250+ services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maestro&lt;/td&gt;
&lt;td&gt;Workflow orchestration&lt;/td&gt;
&lt;td&gt;2M jobs/day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metaflow&lt;/td&gt;
&lt;td&gt;ML infrastructure&lt;/td&gt;
&lt;td&gt;3,000+ projects at Netflix&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Titus&lt;/td&gt;
&lt;td&gt;Container management&lt;/td&gt;
&lt;td&gt;1M+ containers/week&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spinnaker&lt;/td&gt;
&lt;td&gt;Continuous delivery&lt;/td&gt;
&lt;td&gt;4,000+ deploys/day&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Atlas&lt;/td&gt;
&lt;td&gt;Telemetry&lt;/td&gt;
&lt;td&gt;17B metrics/day&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The open-source flywheel is deliberate. DGS, Maestro, Metaflow (used by hundreds of companies for ML), and Spinnaker create external contributions that flow back into Netflix's platform investment.&lt;/p&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://www.infoq.com/presentations/ips-maestro-iceberg/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fres.infoq.com%2Fpresentations%2Fips-maestro-iceberg%2Fen%2Fcard_header_image%2Fjun-hee-twitter-card-1738146151045.jpg" height="auto" class="m-0"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://www.infoq.com/presentations/ips-maestro-iceberg/" rel="noopener noreferrer" class="c-link"&gt;
            Efficient Incremental Processing with Netflix Maestro and Apache Iceberg - InfoQ
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            Jun He discusses how to use an IPS to build more reliable, efficient, and scalable data pipelines, unlocking new data processing patterns.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn.infoq.com%2Fstatics_s2_20260319113822%2Ffavicon.ico"&gt;
          infoq.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;








&lt;h2&gt;
  
  
  What NOT to Copy: The Anti-Patterns
&lt;/h2&gt;

&lt;p&gt;This is the most important section of this article.&lt;/p&gt;

&lt;p&gt;"Netflix's architecture is for Netflix's org chart, not your startup." A 10-person team with 50 microservices creates operational overhead that destroys velocity. Netflix's 1,000+ microservices reflect a 10,000+ person engineering organization. Conway's Law is not a suggestion -- it is a constraint.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Don't Stream All the Things" Warning
&lt;/h3&gt;

&lt;p&gt;Netflix explicitly warns against universal stream processing. Their migration from batch to streaming documented the &lt;a href="https://www.infoq.com/articles/netflix-migrating-stream-processing/" rel="noopener noreferrer"&gt;pioneer tax&lt;/a&gt;: increased on-call burden, JAR hell between Flink and Netflix OSS libraries, and complex failure recovery. Streaming failures must be addressed immediately -- unlike batch, where you re-run the job.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cargo-Culting Trap
&lt;/h3&gt;

&lt;p&gt;Three patterns I see teams consistently get wrong:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Chaos engineering without observability.&lt;/strong&gt; You break things but cannot learn from failures. Invest in monitoring first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microservices without a platform team.&lt;/strong&gt; Every team reinvents deployment, monitoring, and configuration. The overhead kills you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building on deprecated Netflix OSS.&lt;/strong&gt; Adopting Hystrix, Ribbon, or Zuul 1.x in 2025+ creates immediate technical debt. Use Resilience4j, Spring Cloud Load Balancer, and Spring Cloud Gateway instead.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Even Netflix stumbled at their own game. The Tyson-Paul fight generated 100K+ Downdetector reports, proving that on-demand architecture does not automatically translate to live event capability.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Right Approach: Adopt Patterns, Not Tools
&lt;/h3&gt;

&lt;p&gt;Start with a monolith. Extract services when pain points emerge organically. Prioritize in this order: (1) observability first, (2) experimentation platform, (3) event-driven communication, (4) microservices only when needed.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Adopt When&lt;/th&gt;
&lt;th&gt;Skip When&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Event-driven (Kafka)&lt;/td&gt;
&lt;td&gt;Multiple teams need async communication&lt;/td&gt;
&lt;td&gt;Single team, synchronous is fine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stream processing (Flink)&lt;/td&gt;
&lt;td&gt;Real-time adds measurable business value&lt;/td&gt;
&lt;td&gt;Batch latency is acceptable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A/B testing platform&lt;/td&gt;
&lt;td&gt;10+ experiments/quarter&lt;/td&gt;
&lt;td&gt;Fewer than 5 experiments/year&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chaos engineering&lt;/td&gt;
&lt;td&gt;Running 50+ microservices in production&lt;/td&gt;
&lt;td&gt;Fewer than 10 services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GraphQL Federation&lt;/td&gt;
&lt;td&gt;5+ teams need API ownership&lt;/td&gt;
&lt;td&gt;Single API team&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Mesh&lt;/td&gt;
&lt;td&gt;Multiple data domains with different owners&lt;/td&gt;
&lt;td&gt;Centralized data team&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Conclusion: Building Your Data-Driven Architecture
&lt;/h2&gt;

&lt;p&gt;Netflix's power is not scale. It is the feedback loop between data and decisions.&lt;/p&gt;

&lt;p&gt;They never did a "big rewrite." Every architectural evolution -- from monolith to microservices, from batch to streaming, from REST to GraphQL Federation, from Hystrix to Envoy -- was measured, validated against production data, and rolled out incrementally. The Java 17 migration happened because they measured a 20% CPU improvement. Streaming SQL replaced the Flink DataStream API because they measured 1,200 new processors in a year from non-infrastructure teams.&lt;/p&gt;

&lt;p&gt;Any organization can start building this feedback loop with three pillars:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Instrument everything.&lt;/strong&gt; You cannot make data-driven decisions without data. Netflix invests less than 5% of infrastructure costs in observability -- and considers it their highest-leverage architectural investment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experiment on everything.&lt;/strong&gt; Build an A/B testing capability. It does not need to handle 450K RPS. It needs to exist so that decisions are driven by evidence, not opinions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let data drive architecture.&lt;/strong&gt; When Netflix evaluated Neo4j vs. Cassandra for their distributed graph, they measured at scale and chose the tool that survived the data. Do the same with your technology decisions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pick ONE pattern from this article. Implement it in your current architecture. Measure the result. That is the Netflix way -- not copying their tools, but copying their discipline.&lt;/p&gt;

&lt;p&gt;The best architecture is the one that can prove why it made the choices it did.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Official Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://netflixtechblog.com/" rel="noopener noreferrer"&gt;Netflix Tech Blog&lt;/a&gt; - Primary source for architecture decisions&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://netflix.github.io/atlas-docs/overview/" rel="noopener noreferrer"&gt;Netflix Atlas Documentation&lt;/a&gt; - Observability platform&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://netflix.github.io/dgs/" rel="noopener noreferrer"&gt;Netflix DGS Framework&lt;/a&gt; - GraphQL Federation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/Netflix/maestro" rel="noopener noreferrer"&gt;Netflix Maestro on GitHub&lt;/a&gt; - Workflow orchestration (open-source)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Netflix Tech Blog Posts:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://netflixtechblog.com/streaming-sql-in-data-mesh-0d83f5a00d08" rel="noopener noreferrer"&gt;Streaming SQL in Data Mesh&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://netflixtechblog.com/how-and-why-netflix-built-a-real-time-distributed-graph-part-1-ingesting-and-processing-data-80113e124acc" rel="noopener noreferrer"&gt;Real-Time Distributed Graph Part 1&lt;/a&gt; and &lt;a href="https://netflixtechblog.medium.com/how-and-why-netflix-built-a-real-time-distributed-graph-part-2-building-a-scalable-storage-layer-ff4a8dbd3d1f" rel="noopener noreferrer"&gt;Part 2&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://netflixtechblog.com/its-all-a-bout-testing-the-netflix-experimentation-platform-4e1ca458c15" rel="noopener noreferrer"&gt;The Netflix Experimentation Platform&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://netflixtechblog.com/sequential-a-b-testing-keeps-the-world-streaming-netflix-part-1-continuous-data-cba6c7ed49df" rel="noopener noreferrer"&gt;Sequential A/B Testing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://netflixtechblog.com/100x-faster-how-we-supercharged-netflix-maestros-workflow-engine-028e9637f041" rel="noopener noreferrer"&gt;100X Faster Maestro Workflow Engine&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conference Talks:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://current.confluent.io/post-conference-videos-2025/democratising-stream-processing-how-netflix-empowers-teams-with-data-mesh-and-streaming-sql-lnd25" rel="noopener noreferrer"&gt;Democratising Stream Processing - Confluent Current 2025&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infoq.com/presentations/stream-pipeline-observability/" rel="noopener noreferrer"&gt;Advanced Observability Strategies - QCon London 2025&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infoq.com/presentations/ips-maestro-iceberg/" rel="noopener noreferrer"&gt;Efficient Incremental Processing with Maestro and Iceberg - QCon SF 2024&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infoq.com/presentations/netflix-java/" rel="noopener noreferrer"&gt;How Netflix Really Uses Java - QCon SF 2023&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infoq.com/presentations/microservices-netflix-industry/" rel="noopener noreferrer"&gt;Microservices Retrospective - Adrian Cockcroft, QCon London 2023&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.infoq.com/presentations/netflix-stateful-cache/" rel="noopener noreferrer"&gt;Netflix Stateful Systems - Joseph Lynch, QCon SF 2023&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Books:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Designing Data-Intensive Applications&lt;/em&gt; by Martin Kleppmann (O'Reilly) - Foundational theory for Netflix's data pipeline patterns&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Chaos Engineering: System Resiliency in Practice&lt;/em&gt; by Casey Rosenthal &amp;amp; Nora Jones (O'Reilly) - Written by Netflix chaos engineering pioneers&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Data Mesh&lt;/em&gt; by Zhamak Dehghani (O'Reilly) - The architectural philosophy Netflix adopted&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Microservices Patterns&lt;/em&gt; by Chris Richardson (Manning) - Pattern catalog applicable to Netflix's architecture&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Observability Engineering&lt;/em&gt; by Charity Majors, Liz Fong-Jones &amp;amp; George Miranda (O'Reilly) - Principles behind Netflix's observability stack&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Academic References:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basiri et al., &lt;a href="https://arxiv.org/pdf/1702.05843" rel="noopener noreferrer"&gt;"Chaos Engineering"&lt;/a&gt;, arXiv 2017 - The original chaos engineering whitepaper from Netflix&lt;/li&gt;
&lt;li&gt;Netflix Research, &lt;a href="https://netflixtechblog.com/a-survey-of-causal-inference-applications-at-netflix-b62d25175e6f" rel="noopener noreferrer"&gt;"A Survey of Causal Inference Applications at Netflix"&lt;/a&gt; - Beyond A/B testing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Did you find this article helpful?&lt;/strong&gt; Follow me for more content on AWS, GenAI, and Cloud Architecture!&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>microservices</category>
      <category>dataengineering</category>
      <category>netflix</category>
    </item>
    <item>
      <title>Amazon Bedrock: From Zero to Production in 30 Minutes</title>
      <dc:creator>David Marcelo Petrocelli</dc:creator>
      <pubDate>Wed, 07 Jan 2026 16:00:03 +0000</pubDate>
      <link>https://dev.to/david_marcelopetrocelli_/amazon-bedrock-from-zero-to-production-in-30-minutes-34a7</link>
      <guid>https://dev.to/david_marcelopetrocelli_/amazon-bedrock-from-zero-to-production-in-30-minutes-34a7</guid>
      <description>&lt;h1&gt;
  
  
  Amazon Bedrock: From Zero to Production in 30 Minutes
&lt;/h1&gt;

&lt;p&gt;If you've been curious about Generative AI but haven't dived in yet, Amazon Bedrock is the easiest way to start. No model training, no GPU management, no ML expertise required—just API calls to state-of-the-art foundation models.&lt;/p&gt;

&lt;p&gt;In this guide, I'll take you from zero to a working application that you can actually deploy to production.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Amazon Bedrock?
&lt;/h2&gt;

&lt;p&gt;Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from leading AI companies through a unified API. Think of it as "LLMs as a Service."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Available models include:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude 4 &amp;amp; Claude 3.5 (Anthropic)&lt;/strong&gt; - Best for complex reasoning and long documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Titan (Amazon)&lt;/strong&gt; - Cost-effective for general tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Llama 3 (Meta)&lt;/strong&gt; - Open-source performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mistral Large&lt;/strong&gt; - Fast inference, great for code and chat&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stable Diffusion 3 (Stability AI)&lt;/strong&gt; - Image generation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Setting Up Your Environment
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Enable Bedrock Models
&lt;/h3&gt;

&lt;p&gt;First, request access to the models you want to use:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Go to Amazon Bedrock in the AWS Console&lt;/li&gt;
&lt;li&gt;Navigate to "Model access"&lt;/li&gt;
&lt;li&gt;Click "Manage model access"&lt;/li&gt;
&lt;li&gt;Select the models you need (I recommend starting with Claude 3.5 Sonnet or Claude 4 Sonnet)&lt;/li&gt;
&lt;li&gt;Submit the request&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Most models are approved instantly. Some (like Claude 4) may take a few minutes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  2. Configure IAM Permissions
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"bedrock:InvokeModel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"bedrock:InvokeModelWithResponseStream"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:bedrock:*::foundation-model/anthropic.claude-*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:bedrock:*::foundation-model/amazon.titan*"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Install Dependencies
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;boto3 langchain-aws
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Your First Bedrock Application
&lt;/h2&gt;

&lt;p&gt;Let's build a simple text generator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the client
&lt;/span&gt;&lt;span class="n"&gt;bedrock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;service_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate text using Claude 3.5 Sonnet.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-2023-05-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-3-5-sonnet-20241022-v2:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contentType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;accept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;response_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="c1"&gt;# Test it
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain Kubernetes in 3 sentences for a beginner.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Kubernetes is a system that helps you run and manage applications in containers
across multiple computers automatically. It handles tasks like starting your
applications, restarting them if they crash, and distributing traffic between
them. Think of it as an automated IT team that keeps your applications running
24/7 without manual intervention.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Streaming Responses
&lt;/h2&gt;

&lt;p&gt;For better user experience, stream the response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_text_streaming&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Stream text generation for real-time output.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-2023-05-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model_with_response_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-3-5-sonnet-20241022-v2:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contentType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;chunk&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bytes&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content_block_delta&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;delta&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;# Use it
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;text_chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;generate_text_streaming&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a haiku about cloud computing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using LangChain for Production Apps
&lt;/h2&gt;

&lt;p&gt;For more complex applications, LangChain provides a cleaner interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_aws&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatBedrock&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.messages&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SystemMessage&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the model
&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatBedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-3-5-sonnet-20241022-v2:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Simple chat
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful AWS architect.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;HumanMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the best way to set up a VPC?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Building a RAG Application
&lt;/h3&gt;

&lt;p&gt;Retrieval-Augmented Generation (RAG) lets you query your own documents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_aws&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockEmbeddings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_community.vectorstores&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.text_splitter&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.prompts&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_core.runnables&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RunnablePassthrough&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Initialize embeddings
&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amazon.titan-embed-text-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Load and split your documents
&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...]&lt;/span&gt;  &lt;span class="c1"&gt;# Your documents here
&lt;/span&gt;&lt;span class="n"&gt;text_splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RecursiveCharacterTextSplitter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chunk_overlap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;splits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text_splitter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Create vector store
&lt;/span&gt;&lt;span class="n"&gt;vectorstore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FAISS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;splits&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorstore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_retriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Create RAG chain
&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Answer based on the following context:

Context: {context}

Question: {question}

Answer:&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ChatPromptTemplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;rag_chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;RunnablePassthrough&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;
    &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 5. Query your documents
&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rag_chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What is our refund policy?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cost Optimization Tips
&lt;/h2&gt;

&lt;p&gt;Bedrock pricing is based on input/output tokens. Here's how to optimize:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Choose the Right Model
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Recommended Model&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple Q&amp;amp;A&lt;/td&gt;
&lt;td&gt;Titan Lite&lt;/td&gt;
&lt;td&gt;$&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;General chat&lt;/td&gt;
&lt;td&gt;Claude 3.5 Haiku&lt;/td&gt;
&lt;td&gt;$$&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex reasoning&lt;/td&gt;
&lt;td&gt;Claude 3.5 Sonnet&lt;/td&gt;
&lt;td&gt;$$$&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Advanced code &amp;amp; reasoning&lt;/td&gt;
&lt;td&gt;Claude 4 Sonnet/Opus&lt;/td&gt;
&lt;td&gt;$$$$&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2. Use Provisioned Throughput for High Volume
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# For production workloads with consistent traffic
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:bedrock:us-east-1:123456789:provisioned-model/my-model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Cache Frequent Responses
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;lru_cache&lt;/span&gt;

&lt;span class="nd"&gt;@lru_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maxsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cached_generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt_hash&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;generate_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_with_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;prompt_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;cached_generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt_hash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Security Best Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Use VPC Endpoints
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc_endpoint"&lt;/span&gt; &lt;span class="s2"&gt;"bedrock"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;service_name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"com.amazonaws.us-east-1.bedrock-runtime"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_endpoint_type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Interface"&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_ids&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_subnet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;private&lt;/span&gt;&lt;span class="p"&gt;[*].&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;security_group_ids&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bedrock_endpoint&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;private_dns_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Enable Model Invocation Logging
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# CloudWatch logging for compliance
&lt;/span&gt;&lt;span class="n"&gt;bedrock_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bedrock&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;bedrock_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_model_invocation_logging_configuration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;loggingConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cloudWatchConfig&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;logGroupName&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/aws/bedrock/invocations&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;roleArn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;arn:aws:iam::123456789:role/BedrockLogging&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;textDataDeliveryEnabled&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;imageDataDeliveryEnabled&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Use Guardrails
&lt;/h3&gt;

&lt;p&gt;Amazon Bedrock Guardrails help filter harmful content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-3-5-sonnet-20241022-v2:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;guardrailIdentifier&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-guardrail-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;guardrailVersion&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DRAFT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Architecture
&lt;/h2&gt;

&lt;p&gt;Here's a production-ready architecture I use for enterprise clients:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    ┌─────────────────┐
                    │   CloudFront    │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │   API Gateway   │
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼───────┐ ┌────▼────┐ ┌───────▼──────┐
     │ Lambda (Chat)  │ │ Lambda  │ │   Lambda     │
     │                │ │ (RAG)   │ │ (Streaming)  │
     └────────┬───────┘ └────┬────┘ └───────┬──────┘
              │              │              │
              └──────────────┼──────────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼───────┐ ┌────▼────┐ ┌───────▼──────┐
     │    Bedrock     │ │ OpenSrch│ │  DynamoDB    │
     │ (Foundation M) │ │ (Vector)│ │  (Sessions)  │
     └────────────────┘ └─────────┘ └──────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;Now that you have the basics, here are some directions to explore:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Agents for Bedrock&lt;/strong&gt; - Create autonomous agents that can use tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge Bases&lt;/strong&gt; - Managed RAG with automatic chunking and embeddings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tuning&lt;/strong&gt; - Customize models with your own data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-modal&lt;/strong&gt; - Work with images and PDFs using Claude 4 Vision&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Have questions about implementing Bedrock in your architecture? Drop a comment below!&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;About the author:&lt;/strong&gt; David Petrocelli is a Senior Cloud Architect at Caylent, PhD in Computer Science, and University Professor specializing in cloud architecture and generative AI applications.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>amazonbedrock</category>
      <category>generativeai</category>
      <category>python</category>
    </item>
  </channel>
</rss>
