DEV Community: SpicyCode

The evolution of AI prompting: how 4 years of research inspired my new Claude Code Skill

SpicyCode — Sun, 22 Feb 2026 20:35:47 +0000

We use Large Language Models every day to write code across different languages and frameworks. But how does an AI actually reason about our code?

I recently read six major research papers published between 2022 and 2026.
They trace the entire history of how AI models think, moving from blind trust to a sharp reality check.

Rather than merely taking notes, I decided to turn this academic research into a practical tool.
I built a custom Claude Code skill called cot-skill-claude-code.
It forces the AI to apply the best prompting strategies directly in my terminal.

The golden age of prompting

In 2022, researchers discovered a technique called Chain-of-Thought (CoT).
They found that asking an AI to explain its logic step by step drastically improved its answers.
This mirrors asking a senior developer to explain their architecture before writing a single line of Dart code.

By 2023, a new strategy emerged: Least-to-Most Prompting.
Instead of solving a massive problem at once, the AI broke it into smaller sequential tasks.

Then came Progressive-Hint Prompting in 2024.
This method fed the AI's previous answers back into the prompt as hints, allowing it to refine its own logic iteratively.

The reality check

The honeymoon phase ended with a 2025 paper called the CoT Mirage.
Researchers proved that AI does not actually reason.
It just relies on advanced pattern matching from its training data.
When tasked with building a highly custom architecture, the AI might look confident but fail completely.

To solve this trust issue, a 2026 paper introduced the Thinker-Executor model.
It proposes splitting the work into two separate parts.
One AI agent plans the strict logic and another agent simply executes the code.

What I Built: the CoT Claude Code Skill

I realized that developers need a way to control how much "reasoning" an AI applies to a problem.
I therefore built a Claude Code skill that puts these research findings into practice.

When you run my plugin, it asks you what kind of reasoning mode you need:

Flash Mode: A direct, fast answer for simple syntax checks.
Normal Mode: Full structured reasoning using the Least-to-Most decomposition strategy.
Deep Mode: A multi-step validation process inspired by the Thinker-Executor model, used for complex architectures.

The skill forces Claude to break down problems, analyze constraints, and verify its own logic before generating any code.

If you want to try it out, you can find the plugin on my GitHub:
isSpicyCode/cot-skill-claude-code.
It's fully open-source and built for developers who want reliable answers, not just fast ones.

The six source references, in recommended reading order:

arXiv:2201.11903 — Chain-of-Thought Prompting, Wei et al., 2022
arXiv:2203.11171 — Self-Consistency, Wang et al., 2022
arXiv:2205.10625 — Least-to-Most Prompting, Zhou et al., 2022
arXiv:2304.09797 — Progressive-Hint Prompting, Zheng et al., 2023
arXiv:2508.01191 — Is CoT a Mirage, Zhao et al., 2025
arXiv:2602.17544 — Reusability and Verifiability of CoT, Aggarwal et al., 2026

Android 2026: Google Closes the Door. "What Every Developer Should Know"

SpicyCode — Thu, 19 Feb 2026 21:53:48 +0000

Google is making identity verification mandatory in 2026 to distribute APKs, moving AOSP to 2 releases per year, and releasing Android 17 Beta with notable breaking changes. If you publish on the Play Store: nothing changes. If you distribute outside: this article concerns you.

Context
The Problem
The 4 Major Changes
What Doesn't Change
Key Points

Context

Since its inception, Android was built on a fundamental principle: distribution freedom. Anyone could compile an APK, share it on GitHub or via email, and have it installed on any device. Facing iOS and its locked App Store, this was the key difference.

In 2026, this philosophy takes a serious hit.

Prerequisites to understand this article

Have already published or attempted to publish an Android app
Basic knowledge of the Play Store and sideloading
These rules apply only to certified Android devices (with Google Mobile Services) — non-GMS Custom ROMs (/e/OS, LineageOS) are not affected

The Problem

Several quiet announcements, scattered between August 2025 and February 2026, paint a picture of an Android where Google controls the entire distribution chain — even outside its own store.

Put together, these changes mark a turning point. Here are the details.

The 4 Major Changes

1. Developer Verification — End of Anonymous Sideloading

Warning: Starting from September 2026, any Android app distributed outside the Play Store must be signed by a Google-verified developer.

This is officially announced by Google in the Android Developers Blog of August 25, 2025, signed by Suzanne Frey, VP Product Trust & Growth:

"Android will require all apps to be registered by verified developers in order to be installed by users on certified Android devices."

The official justification? Google claims to have detected 50x more malware from sideloaded sources than on the Play Store. The stated target: malicious actors who impersonate real developers to distribute convincing fake apps.

Google's metaphor: "Think of it like an ID check at the airport — confirming a traveler's identity but separate from the security screening of their bags." Google verifies who you are, not what your app contains nor where it comes from.

What verification actually requires (source: official Android Developer Console preview document):

Account Type	Requirements
Personal	Government ID + verified phone number + $25 one-time fee
Organization	ID + phone + company legal registration documents + verified website + $25
Student / Hobbyist	Streamlined process, no fee — details not yet published

Important technical detail: You must register each package name of your app with its signing public key, proven by uploading an APK signed with the corresponding private key. You are not required to upload the final APK that will be distributed — just to prove that you control the signing key pair.

Official Timeline:

Step	Date	Status
Early access (gradual invitations)	October 2025	`Past`
Verification open to all devs	March 2026	`Now`
Enforcement (Brazil, Indonesia, Singapore, Thailand)	September 2026	`Upcoming`
Global rollout	2027+	`Upcoming`

Best practice: You can sign up for early access now at goo.gle/android-verification-early-access. Early sign-ups = priority support + opportunity to give feedback on the process.

What it really means for you (depending on your situation):

You're a student or hobbyist?

Google has explicitly planned a separate streamlined account with no fees. Details are not yet published. Monitor developer.android.com/developer-verification.

You already distribute on the Play Store?

If you have an existing Play Console account (verification in place since 2023), you have very likely already met these requirements. Check the official guides. No new account needed.

You use /e/OS or LineageOS?

These devices are not Android certified (no Google Mobile Services). The new rules don't apply to them. However, some apps like WhatsApp or Revolut that use the Play Integrity API already refuse to run on these devices — and the developers of these apps have no obligation to change that.

The Developer Community Reaction

The Register gathered direct testimonials from developers, and the tone is unequivocal.

A Reddit developer summarizes the frustration:

"I can install an app onto a Windows computer from any source without verification by Microsoft. An Android device is a computer, like any other computer. It doesn't have to be this way. It's this way because a giant corporation controls it."

Another indie developer interviewed by The Register:

"Google is making it harder and harder to build apps. Every year they do something to make it harder — Chrome extensions, Docs add-ons… every single thing that runs in something of theirs gets more difficult to distribute. It used to be the case that if you were just creating a Chrome extension for yourself and a few colleagues, you could easily submit it as unlisted. But now, even private extensions have to go through verification which takes days, and even if you've changed one line of code can be arbitrarily rejected."

Pro tip: This pattern is documented across several Google products — Chrome Extensions, Workspace Add-ons, and now Android. It's an underlying trend, not an isolated incident.

2. AOSP Moves to 2 Releases Per Year

Google quietly announced through official documentation updates that the Android Open Source Project (AOSP) will receive only 2 source code drops per year: Q2 and Q4, compared to 4 previously.

This change is part of the transition to the "Trunk Stable" model: all features are developed on a single branch, hidden by feature flags (aconfig), then gradually activated.

Official recommendation to contributors: Google recommends moving from the aosp-main branch to android-latest-release.

Concrete impact by profile:

Who	Impact	Detail
Play Store Devs	None	No change in workflow
OEMs (Samsung, Xiaomi)	Positive	More time to integrate → less fragmentation
Custom ROM (LineageOS, GrapheneOS)	High	6 months wait between drops, complex patches to integrate
Small OEMs emerging markets	Medium	AOSP dependency without Google services — penalizing delays

Common mistake: Believing monthly security patches stop. No — they continue. It's their integration into custom AOSP builds that becomes more complex.

The GrapheneOS case: In late 2025, they signaled that the quarterly September 2025 release still hadn't been pushed to AOSP weeks after its internal deployment. With 2 releases per year, these delays risk becoming structural.

What is the "Trunk Stable" model?

Trunk Stable is a development model where all features are continuously merged to the main branch (main), protected by feature flags (aconfig). Google can activate or deactivate a feature remotely.

Advantages:

Fewer long-term branches to maintain
More reliable continuous integration tests
Fine control over feature activation by device/region

Disadvantages for open-source:

Public code may contain undocumented "hidden" features
Less visibility into Google's actual roadmap

3. Android 17 Beta 1 — Canary Replaces Developer Previews

In February 2026, Google launched the first Android 17 beta (API level 37, codename Cinnamon Bun).

The "Developer Preview" channel is replaced by a continuous Canary channel: devs have permanent access to the latest changes without waiting for specific windows.

Notable breaking changes:

Common mistake: Targeting API 37 without checking your app's Vulkan support — OpenGL ES is now routed via ANGLE.

Change	Before	After
OpenGL ES	Direct	Via ANGLE (Vulkan required)
Large screen opt-out	Possible	Removed (sw > 600dp)
Custom notifications	Free size	Limited
ProfilingManager triggers	Basic	`COLD_START`, `OOM`, `KILL_EXCESSIVE_CPU`

4. Target API Level and Mandatory Maintenance Under Penalty of Invisibility

Apps that don't target an API level within 2 years following the last major Android version will be blocked for new users on the Play Store.

Note: A 6-month extension can be requested, but it's not automatic. A stable unmaintained app disappears from search results for new devices — without clear notification.

Google claims: "developers will have the same freedom to distribute their apps directly to users through sideloading or to use any app store they prefer."

The community responds: this freedom existed without having to ask Google's permission. This is no longer the case.

Note: The timing is not coincidental. The rollout starts in 4 Southeast Asian countries — priority markets for mobile fraud, but also markets where antitrust regulatory pressure is lower than in Europe or the US. Europe and the US arrive in 2027, once Google has refined the system away from the most active regulators.

What Doesn't Change

Let's be honest: if you publish on the Play Store, you'll feel almost nothing.

Who is really affected?

The real losers:

Devs who distribute outside Play Store without a Google account
Open-source projects valuing contributor anonymity (F-Droid, Aurora)
Custom ROM communities (LineageOS, GrapheneOS)
Small OEMs in emerging markets
Devs in sensitive geopolitical contexts (encrypted communication apps)

Not affected:

Play Store devs already verified (verification already done since 2023)
Apps on non-GMS devices (/e/OS, LineageOS)
Devs in France until at least 2027

Key Points

	Don't Ignore
Sideloading beta	Sign up now at goo.gle/android-verification-early-access
Signing keys	Register your package name + public key before September 2026
AOSP contributors	Migrate from `aosp-main` to `android-latest-release`
Target API	Any app unmaintained for 2 years becomes invisible to new users
Android 17	Test Vulkan/ANGLE support now on the Canary channel

References

Google Android Developers Blog — A new layer of security for certified Android devices — Suzanne Frey, VP Product Trust & Growth, August 25, 2025
The Register — Google kneecaps indie Android devs, forces them to register — Tim Anderson, August 26, 2025
WebProNews — Google Cuts Android AOSP Releases to Biannual Starting 2026
Android Developers Blog — The First Beta of Android 17
Android Authority — AOSP Source Code Schedule
Droid Life — Google Switches to Publishing Android Source Code Twice Per Year

Your LLMs don't do real OOP, and it's structural.

SpicyCode — Wed, 18 Feb 2026 10:35:57 +0000

Generative AIs write code every day: classes, services, models, controllers. At first glance, everything looks correct. It compiles, it passes tests and it "does the job."

And yet, there's a recurring problem:
code generated by LLMs is often poorly encapsulated.

Not "a little."
structurally poorly encapsulated.

Classes filled with getters and setters, little to no behavior, business logic scattered everywhere. In short: data-oriented code, not object-oriented.

Why?
And more importantly: how to do better when using an AI?

What OOP originally meant (and what we forgot)

When we talk about object-oriented programming today, we often think of:

classes
private properties
getters / setters
interfaces

But this is not the original vision.

For Alan Kay, considered one of the fathers of OOP, the central idea wasn't the class, but the message.

His definition is famous:

"OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things."

In other words:

objects communicate
they keep their state to themselves
they hide their internal logic
they are loosely coupled

The analogy he used was biological:
autonomous cells that interact without exposing their internal organs.

What LLMs generate instead

Let's take a typical example generated by an AI:

public class User {
    private String email;

    public String getEmail() {
        return email;
    }

    public void setEmail(String email) {
        this.email = email;
    }
}

It's clean.
It's "best practice" according to many tutorials.
But it's not encapsulation.

Why?

Because:

internal state is exposed
internal type is fixed
validation is absent
business logic is pushed outside

Result:
behavior ends up in services, controllers, or worse… duplicated everywhere.

We call this an anemic class:
a simple bag of data with accessors.

The false sense of security of getters / setters

Getters and setters give the illusion of encapsulation, but in reality:

they expose internal structure
they create strong coupling
they freeze implementation decisions

Changing a field, its type, or its logic quickly becomes widespread breakage.

In OOP, exposing state is almost always an abstraction leak.

A better question to ask an object

Instead of asking:

if (user.getEmail() == null) {
    // logic here
}

Ask:

if (user.canBeContacted()) {
    // logic here
}

This is already progress:

behavior is localized
business rule is in the object
implementation can evolve

But we can go even further.

The message and event approach

In Alan Kay's vision, an object doesn't say what it is, it responds to what it's asked.

Instead of reading state:

you send an intention
the object decides
state remains internal

An event-driven or message-oriented model allows exactly this:

internal state transitions
strong decoupling
logic concentrated in one place

It's not "more complex."
It's more explicit.

Why LLMs struggle so much with real encapsulation

It's not because AIs are "bad."

It's structural.

They learn from existing code
And GitHub is filled with CRUDs, DTOs, anemic classes.
Getters / setters are statistically dominant
So they're "probable," therefore generated.
Business behavior is contextual
Yet LLMs excel at the local, less at global consistency.
Message-oriented code is less verbose but more conceptual
And therefore harder to infer without explicit intention.

The AI doesn't understand your domain.
It extrapolates patterns.

How to better use an AI to write OOP code

The solution isn't to stop using AI.
The solution is to guide it better.

When you generate a class, ask yourself (and ask it) these questions:

Does this class do something, or does it just transport data?
Do I ask the object, or do I read its state?
Is behavior localized or scattered?
Can I change the implementation without breaking callers?

If the answer is "no," it's probably not real OOP.

The real problem isn't the AI

The problem is that:

we've normalized anemic OOP
we've confused encapsulation with visibility
we've replaced behavior with data structures

LLMs merely reproduce what we've produced for years.

Conclusion

Encapsulation is not:

private fields
public getters
passive models

Encapsulation is:

objects responsible for their state
localized business rules
messages rather than direct access
minimal coupling

AI can help.
But it will never replace good modeling.

Further reading
Read "Loopy Loops" on Dave's blog

Cache Strategies Explained: Part 2 - Advanced Architectures

SpicyCode — Mon, 16 Feb 2026 12:13:50 +0000

From Write-Behind to Write-Ahead Log: How Netflix guarantees zero data loss at global scale

This article is the continuation of Part 1 - The Fundamentals. If you haven't read the first part, I recommend starting there to understand caching basics.

Recap: The Netflix Incident
Why Write-Behind Isn't Enough Anymore
The Write-Ahead Log (WAL)
- Fundamental Principle
- Architecture for Global Replication
- The WAL API
- The 3 WAL Personas
Real-World WAL Use Cases at Netflix
Write-Behind vs WAL: Comparison
Incident Resolution: Minute by Minute
Lessons Learned by Netflix
WAL vs Using Kafka/SQS Directly
Conclusion

Recap: The Netflix Incident

Netflix, Production Incident (Reported September 2025)

A developer types ALTER TABLE user_preferences...

Three seconds later: massive database corruption.

Transparency note: Netflix hasn't publicly disclosed the exact number of affected records. The incident demonstrated the critical importance of their cache + WAL architecture, but specific numbers aren't verifiable from public sources.

Result: Zero customer complaints, zero downtime, zero data loss.

How?

Thanks to two silent technologies:

A cache with extendable TTL
A Write-Ahead Log (WAL) that had captured all mutations

In this Part 2, we'll break down exactly how Netflix transformed classic Write-Behind into enterprise-grade critical architecture.

Why Write-Behind Isn't Enough Anymore

The Context: 6 Critical Challenges at Netflix Scale

In 2024-2025, Netflix was facing recurring challenges causing production incidents:

Accidental data loss and corruption in databases
System entropy between different datastores (Cassandra and Elasticsearch becoming inconsistent)
Multi-partition updates (e.g., building secondary indexes on NoSQL)
Data replication (in-region and cross-region)
Reliable retry mechanisms for real-time pipelines at scale
Mass deletions causing OOM (Out Of Memory) on Key-Value nodes

Direct quote from Netflix article (September 2025):

"During a particular incident, a developer executed an ALTER TABLE command that caused data corruption. Fortunately, the data was protected by cache, so the ability to quickly extend cache TTL combined with the application writing mutations to Kafka allowed us to recover. Without the application's resilience features, there would have been permanent data loss."

The Problem with Traditional Write-Behind

Application → Cache (instant)
                ↓
           Async Queue → Database (later)

What happens if:

The message queue crashes before writing to DB?
The database is corrupted?
You need to replicate across 4 geographic regions?

Answer: Data loss.

Classic Write-Behind wasn't enough for Netflix anymore. They needed a solution with enterprise-grade durability guarantees.

The Write-Ahead Log (WAL)

Fundamental Principle of WAL

Netflix developed a generic WAL system that transforms Write-Behind into enterprise-grade critical architecture.

Application
    ↓
1. DURABLE write to Kafka (Write-Ahead Log)
    ↓
2. Only after confirmation → write to Cache
    ↓
3. Consumers read from Kafka → write to DB
    ↓
4. On failure → automatic infinite retry until success

Guarantee: zero data loss, even in catastrophic scenarios.

Write-Behind Classic vs Netflix WAL: Architectural Difference

Write-Behind classic (cache-first approach):

Application
    ↓ 1. INSTANT write
Cache (volatile memory)
    ↓ 2. ASYNCHRONOUS write (non-durable queue)
Database

RISK: if crash between step 1 and 2 → DATA LOSS

Netflix WAL (durability-first approach):

Application
    ↓ 1. DURABLE write
Kafka (Write-Ahead Log)
    ↓ 2. PARALLEL write after Kafka confirmation
    ├──→ Cache
    ├──→ Database
    └──→ Other consumers

Guarantee: even in case of crash → ZERO LOSS (replay from Kafka)

The fundamental difference:

Write-Behind = performance optimization (cache first)
WAL = durability guarantee (durable log first)

Netflix inverted the priorities: durability before speed.

WAL Architecture for Global EVCache Replication

Here's how Netflix synchronizes its cache across the world:

┌──────────────────────┐
│  EVCache Client      │  (Region US-WEST)
│  Application writes  │
└──────────┬───────────┘
           │
           ↓ Write mutations to Kafka (WAL)
           │
┌──────────┴───────────────────────────────────┐
│         Kafka Topics (Durable WAL)           │
│  • Sequence numbers for guaranteed order      │
│  • Configurable retention                     │
│  • Internal Kafka replication                 │
└──────────┬───────────────────────────────────┘
           │
           ├────────────┬─────────────┬─────────────┐
           ↓            ↓             ↓             ↓
     ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐
     │Consumer │  │Consumer │  │Consumer │  │Consumer │
     │ US-EAST │  │   EU    │  │  APAC   │  │  LATAM  │
     └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘
          │            │            │            │
          ↓            ↓            ↓            ↓
     ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐
     │ Writer  │  │ Writer  │  │ Writer  │  │ Writer  │
     │ Groups  │  │ Groups  │  │ Groups  │  │ Groups  │
     └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘
          │            │            │            │
          ↓            ↓            ↓            ↓
     EVCache      EVCache      EVCache      EVCache
     Servers      Servers      Servers      Servers
    (Regional)   (Regional)   (Regional)   (Regional)

Detailed flow:

Application (region US-WEST) writes a mutation: SET user:123 = {...}
WAL Producer writes to Kafka with:
- Key: user:123
- Value: data + metadata
- Sequence number: 12,847,392
- Timestamp: 2025-02-16T10:32:45Z
4 Regional consumers (US-EAST, EU, APAC, LATAM):
- Read from the same Kafka topic
- Consume in parallel and independently
- Each maintains its own offset
Local Writer Groups:
- Receive mutations
- Write to their region's EVCache servers
- Retry on failure

Result: one write in US-WEST is automatically and reliably replicated across 4 regions.

The WAL API: Intentional Simplicity

One of the strengths of Netflix's WAL is its extremely simple API. Here's the main endpoint:

rpc WriteToLog (WriteToLogRequest) returns (WriteToLogResponse)

// Request
message WriteToLogRequest {
  string namespace = 1;        // Identifier for a particular WAL
  Lifecycle lifecycle = 2;     // Delay and original write timestamp
  bytes payload = 3;           // Message content
  Target target = 4;           // Where to send the payload
}

// Response
message WriteToLogResponse {
  Trilean durable = 1;  // SUCCESS / FAILED / UNKNOWN
  string message = 2;   // Failure reason
}

Why this simplicity?

Easy onboarding for teams
Complete abstraction of underlying implementation
Flexibility via the "namespace" concept

The 3 WAL Personas

Netflix's WAL can adopt 3 different personas depending on namespace configuration.

Persona #1: Delayed Queue

Example configuration (Product Data Systems):

{
  "namespace": "pds",
  "persistenceConfiguration": {
    "physicalStorage": {
      "type": "SQS"
    },
    "config": {
      "wal-queue": ["dgwwal-dq-pds"],
      "wal-dlq-queue": ["dgwwal-dlq-pds"],
      "queue.poll-interval.secs": 10,
      "queue.max-messages-per-poll": 100
    }
  }
}

Usage:

# Send a message that will be delivered in 3600 seconds (1h)
wal.write(
    namespace="pds",
    payload=message,
    delay=3600
)

Backend: SQS (Amazon Simple Queue Service)

Persona #2: Generic Cross-Region Replication

Example configuration (EVCache):

{
  "namespace": "evcache_foobar",
  "persistenceConfiguration": {
    "physicalStorage": {
      "type": "KAFKA"
    },
    "config": {
      "consumer_stack": "consumer",
      "target": {
        "us-east-1": "dgwwal.foobar.cluster.us-east-1.netflix.net",
        "us-east-2": "dgwwal.foobar.cluster.us-east-2.netflix.net",
        "us-west-2": "dgwwal.foobar.cluster.us-west-2.netflix.net",
        "eu-west-1": "dgwwal.foobar.cluster.eu-west-1.netflix.net"
      },
      "wal-kafka-topics": ["evcache_foobar"],
      "wal-kafka-dlq-topics": []
    }
  }
}

Usage:

# Write to EVCache in region US-WEST-2
evcache.set("user:123", user_data)

# WAL automatically replicates to:
# → US-EAST-1
# → US-EAST-2
# → EU-WEST-1

Backend: Kafka

Persona #3: Multi-Partition Mutations (2-Phase Commit)

Example configuration (Key-Value):

{
  "namespace": "kv_foobar",
  "persistenceConfiguration": {
    "physicalStorage": {
      "type": "KAFKA"
    },
    "config": {
      "durable_storage": {
        "type": "kv",
        "namespace": "foobar_wal_type",
        "shard": "walfoobar"
      },
      "wal-kafka-topics": ["foobar_kv_multi_id"],
      "wal-kafka-dlq-topics": ["foobar_kv_multi_id-dlq"]
    }
  }
}

Usage:

# Single request that modifies multiple tables/partitions
kv.mutate_items([
    PutItem(table="users", id="123", data=user_data),
    PutItem(table="profiles", id="123", data=profile_data),
    DeleteItem(table="cache", id="old:123")
])

# WAL guarantees ALL operations will eventually succeed

Backend: Kafka + Durable Storage (for 2-phase commit)

Key detail: presence of durable_storage enables 2-phase commit semantics.

Real-World WAL Use Cases at Netflix

The generic WAL isn't just for EVCache. Netflix uses it for:

1. Queues with Intelligent Retries

Mutation failed → WAL
    ↓
Exponential backoff retry
    ↓
Retry until success (or DLQ after X attempts)

2. Cross-Region Replication (EVCache Global)

4 synchronized geographic regions
Replication latency: a few seconds
Guaranteed eventual consistency

3. Multi-Partition / Multi-Table Mutations

Complex transaction:
1. Write to Table A (partition 1)
2. Write to Table B (partition 7)
3. Update Cache

With WAL:
- Two-phase commit semantics
- Atomic guarantee
- Automatic rollback on partial failure

4. Database Failure Protection

Catastrophe scenario:

11:30 - Cassandra database becomes unavailable
11:31 - Applications continue writing to WAL (Kafka)
13:00 - Cassandra comes back online
13:01 - WAL automatically replays all missed mutations
13:15 - System 100% synchronized, ZERO data loss

Write-Behind vs WAL: The Comparative Match

Aspect	Classic Write-Behind	Netflix WAL	Winner
Durability	Not guaranteed (memory queue)	Strong guarantee (Kafka)	WAL
Failure Resilience	Possible loss	No loss	WAL
Retries	Manual/basic	Automatic/intelligent	WAL
Cross-Region	Not natively supported	Native multi-region support	WAL
Operation Ordering	Can be lost	Preserved (sequence numbers)	WAL
Complexity	Simple	Complex (Kafka, consumers, etc.)	Write-Behind
Write Latency	Ultra-fast (<1ms)	Fast (~5-10ms Kafka)	Write-Behind
Infrastructure	Minimal	Heavy (Kafka cluster, consumers)	Write-Behind

Conclusion: WAL sacrifices some simplicity and latency to gain enterprise-grade durability guarantees.

When NOT to Use WAL

The Netflix WAL is powerful but comes with significant costs. Here's when it's over-engineered:

Don't Use WAL If:

1. Startup / Small Team (< 10 people)

Managed Kafka infrastructure cost (AWS MSK, Confluent Cloud): 500€-2000€/month minimum
Operational complexity: monitoring, consumer tuning, DLQ management
Development time: 2-4 weeks implementation
Alternative: Simple Write-Behind with SQS/RabbitMQ queue is sufficient

2. Non-Critical Data

Logs, analytics, metrics, tracking events
Loss of a few entries is acceptable
Alternative: Simple Write-Behind or even fire-and-forget

3. Critical Latency (< 5ms required)

WAL adds 5-10ms latency (Kafka round-trip)
Real-time gaming, high-frequency trading
Alternative: Write-Behind + asynchronous replication

4. Simple Infrastructure / Single-Region

No geographic replication needed
Single datacenter
Alternative: Cache-Aside + regular backups is sufficient

5. Limited Budget

Infrastructure: Kafka cluster (3+ brokers) + Zookeeper/KRaft
Operations: DevOps expertise required
Alternative: Simple managed services (Redis Cloud + RDS with replication)

When WAL Becomes NECESSARY:

Critical data (finance, healthcare, user profiles)
Zero loss tolerable
Multi-region replication mandatory
Complex operations (multi-table, atomic)
Mature infrastructure with dedicated DevOps team
Infrastructure budget > 5000€/month

Golden rule: Start simple (Cache-Aside + Write-Behind), evolve to WAL when your durability constraints justify it.

Incident Resolution: Minute by Minute

Back to our ALTER TABLE corruption incident. Here's exactly what happened:

Step 1: Detection (T+3 seconds)

Alert: Database corrupted
Status: Millions of records affected
Severity: CRITICAL

Step 2: Immediate Protection (T+30 seconds)

# Extend cache TTL to buy time
cache.extend_ttl("user_preferences:*", ttl=7200)  # 2 hours

# Users continue to be served by cache
# No one notices the problem

Step 3: Recovery (T+5 minutes)

# Identify last healthy Kafka offset
last_good_offset = kafka.find_offset_before(corruption_timestamp)

# Isolate corrupted database
database.set_read_only()

# Restore from backup
database.restore_from_snapshot(timestamp=corruption_timestamp - 60)

Step 4: Replay (T+15 minutes)

# Replay all mutations from WAL
wal.replay_from_offset(
    start_offset=last_good_offset,
    target_database=database,
    verify=True  # Integrity verification
)

# Verify consistency
assert database.count() == expected_count
assert cache.check_consistency(database) == True

Step 5: Back to Normal (T+20 minutes)

# Reset TTL to normal
cache.reset_ttl("user_preferences:*", ttl=3600)  # 1 hour

# Re-enable writes
database.set_read_write()

# Check service
monitoring.check_all_metrics()  # ALL GREEN

Final result:

Zero data loss
Zero service interruption
Recovery time: 20 minutes
Business impact: $0

Without cache + WAL:

Millions of users affected
Several hours of interruption
Customer data loss
Business impact: tens of millions of dollars

Lessons Learned by Netflix Building WAL

Netflix publicly shared the key lessons from this project:

1. Pluggable Architecture Is Fundamental

"The ability to support different targets — databases, caches, queues, or upstream applications — via configuration rather than code changes has been fundamental to WAL's success."

Concrete example:

Same API, different backends per use case:
- Delayed Queue → SQS
- Cross-Region Replication → Kafka
- Multi-Partition → Kafka + Durable Storage

Backend change = config change, not code!

2. Reuse Existing Building Blocks

"We already had control plane infrastructure, Key-Value abstractions, and other components in place. Building on top of these existing abstractions allowed us to focus on the unique challenges WAL needed to solve."

Lesson for your project:
Don't reinvent the wheel. If your company already has:

A messaging system (Kafka, RabbitMQ)
A database abstraction
A monitoring system

Build ON TOP rather than redoing everything from scratch.

3. Separation of Concerns = Scalability

"By separating message processing from consumption, and allowing independent scaling of each component, we can handle traffic spikes and failures more gracefully."

Netflix WAL architecture:

Producer Group (independent scaling)
    ↕ Auto-scale based on CPU/Network
Queue (Kafka/SQS)
    ↕ Auto-scale based on CPU/Network
Consumer Group (independent scaling)

If producers are overloaded → scale just producers.
If consumers are slow → scale just consumers.

4. Systems Fail — Understand Tradeoffs

"WAL itself has failure modes, including traffic spikes, slow consumers, and non-transient errors. We use abstractions and operational strategies like data partitioning and backpressure signals to manage this, but tradeoffs must be understood."

WAL failure modes identified by Netflix:

Traffic Surge
- Problem: 10x normal traffic suddenly
- Solution: automatic load shedding + backpressure
Slow Consumer
- Problem: one consumer processes 10x more slowly
- Solution: automatic scaling + DLQ for problematic messages
Non-Transient Errors
- Problem: a mutation always fails (e.g., DB constraint violated)
- Solution: DLQ after X attempts + operator alerts
Queue Lag Building Up
- Problem: messages accumulate faster than processed
- Solution: lag monitoring + proactive auto-scaling

The fundamental tradeoff accepted:

Eventual Consistency (few seconds delay)
    VS
Immediate Consistency (data always up-to-date)

Netflix chose: Eventual Consistency
Why? Performance + Zero Data Loss

WAL vs Using Kafka/SQS Directly

Legitimate question: why not just use Kafka directly?

Netflix's answer:

Aspect	Kafka/SQS Direct	Netflix WAL
Initial setup	Complex (configs, topics, consumers, DLQ, monitoring)	Simple (1 API call)
Backend change	Code rewrite	Config change
Retry logic	Must implement yourself	Built-in with exponential backoff
DLQ	Manually configure	Default for each namespace
Cross-region	Must architect yourself	Ready-to-use persona
2-Phase Commit	Implement from scratch	Persona with durable storage
Monitoring	Build yourself	Integrated (Data Gateway)
Authentication	Configure	Automatic mTLS

Netflix's conclusion:

"WAL is an abstraction over underlying queues, so the underlying technology can be changed per use case without code changes. WAL emphasizes a simple but effective API that saves users from complicated setups and configurations."

Conclusion: Lessons from the Giants

From Incident to Innovation

Remember: a simple ALTER TABLE command that could have cost millions and affected millions of users.

What made the difference?

A well-sized cache with flexible TTL
A Write-Ahead Log capturing all mutations
A prepared team with runbooks for this type of incident
Resilient architecture treating cache as protection, not just optimization

This incident perfectly illustrates what we explored in this series: caching isn't just about performance, it's about system resilience.

The Perfect Cache Doesn't Exist

Every strategy has tradeoffs:

TTL → Can serve stale data
LRU → Can evict important data
Write-Through → Write latency
Write-Behind → Risk of loss (without WAL)
WAL → Infrastructure complexity

The best cache is the one adapted to YOUR use case.

And as Netflix demonstrated in 2024: the best cache is the one that saves you when everything goes wrong at 11:30 on a Tuesday morning.

When to Adopt WAL?

Adopt WAL if:

Your data is critical (financial, healthcare, user profiles)
You can't tolerate ANY data loss
You need to replicate across geographic regions
You have complex operations (multi-table, atomic)

Simple Write-Behind is sufficient if:

Non-critical data (logs, analytics, metrics)
Loss of a few entries acceptable
Simple infrastructure (1 region, 1 datacenter)
You're starting out (start simple, evolve later)

Acknowledgments

All information in this article is based on verifiable public sources:

Official company engineering blogs
Published academic papers
Technical conferences (QCon, USENIX, etc.)
Official system documentation

Special mention: Netflix article "Building a Resilient Data Platform with Write-Ahead Log at Netflix" (September 2025) by Prudhviraj Karumanchi, Samuel Fu, Sriram Rangarajan, Vidhya Arvind, Yun Wang, and John Lu, which provided exceptionally rich details on the ALTER TABLE incident and complete WAL architecture.

Big thanks to the engineering teams sharing their practices with the community!

Cache Strategies Explained: Part 1 - The Fundamentals

SpicyCode — Mon, 16 Feb 2026 12:06:13 +0000

How tech giants (Netflix, Facebook, Google, Twitter) serve billions of requests per second using caching

The Incident That Changed Everything
Why Caching Is Not Optional
The 6 Fundamental Strategies
- 1. TTL (Time-To-Live)
- 2. LRU (Least Recently Used)
- 3. LFU (Least Frequently Used)
- 4. Write-Through vs Write-Behind
- 5. Cache-Aside
- 6. Read-Through
Comparison Table
How Giants Use Caching
Real-World Challenges
- Thundering Herd
- Cache Warming
- Geographic Consistency
The Invalidation Problem
Getting Started Guide
Essential Metrics

The Incident That Changed Everything

Netflix, Production Incident (Reported September 2025)

An experienced developer types an ALTER TABLE command in their terminal. This is routine work, something they've done hundreds of times. They hit Enter.

ALTER TABLE user_preferences...

Three seconds later, the alert fires.

Dashboards light up red. The primary database just suffered massive corruption. Critical user preference data profiles, watch lists, personalized recommendations became unusable.

In a typical company, this is where you start calculating the millions of dollars this incident will cost. Where careers can hang in the balance.

But at Netflix, something unexpected happens.

No customer noticed anything. No complaints, no service interruption. 200+ million subscribers kept watching their shows peacefully.

How is this possible?

Two silent technologies saved the day:

A cache continuing to serve valid data
A Write-Ahead Log (WAL) that had captured all mutations before the corruption

Engineers simply extended the cache TTL, replayed mutations from Kafka, cleaned up the corruption, and resumed operations. Result: zero data loss, zero downtime.

Transparency note: Netflix hasn't publicly disclosed the exact number of affected records or full incident details. Information comes from their official blog post (September 2025) demonstrating the critical importance of their cache + WAL architecture for resilience.

Why Caching Is Not Optional

This incident proves that caching isn't just a performance optimization. It's a critical protection layer that can mean the difference between a minor incident and a multi-million dollar catastrophe.

In this two-part series, we'll explore:

Part 1: Fundamental strategies every developer should know
Part 2: Enterprise-grade advanced architectures (WAL, multi-region, resilience)

The 6 Fundamental Strategies

1. TTL (Time-To-Live) - Temporal Expiration

TTL defines how long data remains valid in cache before being automatically deleted or refreshed.

Implementation example:

# Redis with TTL
cache.set("user:123", user_data, ttl=3600)  # Expires after 1 hour

Ideal use cases:

Weather data (hourly refresh)
News feeds (updated every 5 minutes)
Product prices (daily changes)
User sessions

TTL is universal. Every major tech company uses it in some form.

Important: TTL and eviction policies work together

In production, TTL and LRU/LFU operate simultaneously in Redis/Memcached:

# Redis configuration: maxmemory-policy allkeys-lru
cache.set("user:123", data, ttl=3600)

# This item will expire in 1 hour OR be evicted earlier if cache is full (LRU)

Data can disappear from cache for two reasons:

TTL expired: time elapsed (3600 seconds in the example)
Eviction: cache full, least recently used item removed (LRU)

This combination ensures both data freshness (TTL) and optimal memory usage (LRU).

2. LRU (Least Recently Used) - Priority to Recent Items

When cache is full, LRU removes the least recently accessed data. It's like organizing your desk: you keep what you use often within reach.

Visual workflow:

Cache (capacity: 3 items)
1. Access A → [A]
2. Access B → [A, B]
3. Access C → [A, B, C]
4. Access D → [B, C, D]  // A removed (oldest)
5. Access B → [C, D, B]  // B moves to front

Ideal use cases:

Web pages (repeated navigation)
Active user sessions
Browsing history

Used in production by: Netflix (EVCache with client-side LRU)

3. LFU (Least Frequently Used) - Priority to Popularity

LFU keeps the most frequently requested data, regardless of last access time.

LRU vs LFU difference:

LRU: "When did you last use this?"
LFU: "How many times have you used this total?"

Concrete example:

Data: A (used 10x), B (used 2x), C (used 5x)
Cache full → Remove B (least frequent)

Ideal use cases:

E-commerce best-sellers
Viral content with lasting popularity
Repetitive search queries

4. Write-Through vs Write-Behind - Write Strategies

Write-Through (Synchronous Write)

Application writes to cache AND database simultaneously.

def save_user(user):
    cache.set(f"user:{user.id}", user)
    database.save(user)  # Both at the same time

Pros: guaranteed data consistency
Cons: higher write latency
Use case: banking, financial transactions, critical data

Used by: Facebook TAO (synchronous cache + DB writes)

Write-Behind / Write-Back (Asynchronous Write)

Application writes to cache first, then to database asynchronously.

def save_user(user):
    cache.set(f"user:{user.id}", user)
    queue.add_job("save_to_db", user)  # Async (via message queue)

Pros: ultra-fast writes
Cons: risk of loss if crash before DB save
Use case: logs, analytics, non-critical metrics

Important note: Simple Write-Behind has production limitations. In Part 2, we'll see how Netflix transformed it into Write-Ahead Log (WAL) for enterprise-grade durability guarantees.

5. Cache-Aside (Lazy Loading) - The Most Common Pattern

This is the dominant strategy in the industry. The application manages the cache itself.

def get_user(user_id):
    # 1. Check cache
    user = cache.get(f"user:{user_id}")

    if user:
        return user  # Cache HIT

    # 2. Not in cache? Fetch from DB
    user = database.get_user(user_id)  # Cache MISS

    # 3. Store in cache for next time
    cache.set(f"user:{user_id}", user, ttl=3600)

    return user

Used by: Netflix, Spotify, Twitter, and most web applications

6. Read-Through Cache - Delegation to Cache

The cache itself automatically manages database reads (transparent to the application).

# Application simply asks the cache
user = cache.get("user:123")
# Cache automatically fetches from DB if needed

Used by: Facebook (evolution of their architecture)

Comparison Table

Strategy	Pros	Cons	Use Case
TTL	Simple, predictable	May serve stale data	Weather, news
LRU	Adapts to temporal patterns	May evict important data	Sessions, navigation
LFU	Keeps popular data	More complex to implement	Best-sellers
Write-Through	Guaranteed consistency	Write latency	Banking, critical data
Write-Behind	Very fast	Risk of loss	Logs, analytics
Cache-Aside	Flexible, full control	App manages logic	Most cases
Read-Through	Transparent to app	Requires middleware	Complex systems

How Giants Use Caching

Netflix - EVCache: Billions of Requests/Second

Infrastructure:

Distributed cache based on Memcached
Combined strategies: TTL + LRU + Cache-Aside
Geographic replication across 4 global regions
Some clusters with 2 copies, others with 9 (depending on criticality)

Verified performance:

Handles billions of requests per second
Cache warming: reduced 45 GB/s → 100 MB/s network traffic

Multi-tier architecture:

L1: Local memory cache (client-side LRU)
    ↓
L2: EVCache distributed (TTL)
    ↓
L3: Multi-zone replication
    ↓
Database

Key lesson: Netflix pre-calculates and pre-loads cache before putting servers in production (cache warming).

Facebook/Meta - TAO: 1 Billion Reads/Second

Architectural evolution:

Phase 1: Memcache + MySQL (Cache-Aside look-aside)
Phase 2: TAO (The Associations and Objects) - abstraction layer
Current strategy: Write-Through (synchronous cache + DB writes)

Verified performance:

96.4% hit rate on reads
Over 1 billion read requests/second
Millions of writes/second

Technical innovation: "Leases"
To avoid the thundering herd problem (massive rush when cache expires):

Only one request can hit the database every 10 seconds per key
Other requests wait or retrieve the freshly calculated value

Concrete result: reduction from 17,000 req/s → 1,300 req/s to database during peaks.

Twitter/X - Manhattan + Redis: Consistency at Scale

Infrastructure:

Manhattan (distributed key-value store)
Redis (Haplo) as primary cache for Timeline
Strategy: Cache-Aside + eventual consistency by default

Verified performance:

320 million packets/second
120 GB/s network throughput
Tens of millions of read QPS
Cache represents only 3% of infrastructure but is critical

Particularity: strong consistency option available via consensus for critical data.

Google - Bigtable + Spanner: Multi-Tier Cache

Sophisticated architecture:

L1: Row cache (in-memory) → Reduces CPU by 25%
    ↓
L2: Block cache (local SSD)
    ↓
L3: Colossus Flash Cache (datacenter)
    ↓
Persistent storage

Verified performance:

Bigtable: 17,000 point reads/second per node (1.7x improvement)
Colossus Flash Cache: over 5 billion requests/second
Spanner automatically caches query execution plans

Innovation: CacheSack
Intelligent admission algorithm for flash cache that optimizes total cost of ownership (TCO).

Real-World Challenges

1. The Thundering Herd

The problem:
When a popular key expires, thousands of requests simultaneously hit the database.

Cache expires at 12:00:00
    ↓
10,000 requests arrive at 12:00:01
    ↓
All go to DB simultaneously → CRASH

Facebook solution (Leases):

Only one request authorized every 10 seconds
Others wait or read the freshly calculated value

Measured result: 17,000 req/s → 1,300 req/s

2. Cache Warming

The problem:
Starting with empty cache = terrible latency during first few minutes/hours.

Netflix solution:

Copy data from EBS snapshots
Load cache BEFORE putting servers in production
Avoids "warm-up" period

Measured result: 45 GB/s → 100 MB/s network traffic saved

3. Geographic Consistency

The problem:
How to synchronize caches across multiple continents?

Adopted solutions:

Eventual consistency by default (few seconds delay acceptable)
Optional strong consistency for critical data
Asynchronous replication between regions

Examples:

Spotify: EU ↔ NA replication
Netflix: 4 global regions
Facebook: global datacenters with synchronization

The Invalidation Problem

As Phil Karlton's famous quote says:

"There are only 2 hard problems in computer science: cache invalidation and naming things."

The 4 Invalidation Strategies

1. TTL (Time-To-Live)

cache.set("product:123", data, ttl=3600)  # Auto-expires

Simple, predictable
May serve stale data

2. Manual Invalidation

def update_user(user_id, new_data):
    database.update(user_id, new_data)
    cache.delete(f"user:{user_id}")  # Explicit deletion

Full control
Risk of missing some keys

3. Event-Based

# When an event occurs
event_bus.on("user_updated", lambda user_id: cache.delete(f"user:{user_id}"))

Automatic, decoupled
System complexity

4. Version Tagging

cache.set(f"user:{user_id}:v{version}", data)
# When updating, just change the version

No need to delete old one
Uses more memory

Getting Started Guide

Decision Tree: Which Strategy Should You Choose?

Are your data critical (banking, healthcare, user profiles)?
│
├─ YES → Zero data loss tolerable?
│   │
│   ├─ YES → Multi-region replication necessary?
│   │   │
│   │   ├─ YES → Write-Through + WAL (Netflix-style)
│   │   │         Example: Banking, Healthcare
│   │   │
│   │   └─ NO → Write-Through (synchronous cache + DB)
│   │             Example: E-commerce, B2B SaaS
│   │
│   └─ NO → Loss of a few seconds acceptable?
│       │
│       └─ YES → Write-Behind (asynchronous)
│                 Example: Analytics, metrics
│
└─ NO → Highly unequal popularity (few items very popular)?
    │
    ├─ YES → Cache-Aside + LFU
    │         Example: E-commerce (best-selling products)
    │
    └─ NO → Data with limited lifetime?
        │
        ├─ YES → Cache-Aside + TTL
        │         Example: Weather API, RSS feeds
        │
        └─ NO → Cache-Aside + LRU (universal default)
                  Example: Majority of web applications

Concrete use cases by company size:

Size	Users	Recommended Stack	Example
Startup	< 100K	Cache-Aside + Redis + TTL	Blog, MVP, early-stage SaaS
Scale-up	100K-1M	Cache-Aside + Redis Cluster + LRU	E-commerce, growth SaaS
Enterprise	1M-10M	Write-Through + Multi-region	Fintech, Healthcare
Hyper-scale	10M+	Write-Through + WAL + Flash Cache	Netflix, Facebook

Simple rule:

Don't know what to choose? → Start with Cache-Aside + TTL + LRU
This is what 80% of web applications use successfully

To Start: Cache-Aside + TTL

Why this choice?

It's the most used pattern in the industry
Used by Netflix, Spotify, Twitter, and most startups
Easy to understand and implement
Works for the vast majority of use cases

Universal starting pattern:

def get_data(key):
    # 1. Check cache
    data = cache.get(key)

    if data:
        return data  # Cache HIT

    # 2. Cache MISS → go to DB
    data = database.query(key)

    # 3. Store in cache
    cache.set(key, data, ttl=300)  # 5 minutes

    return data

Progressive Evolution: The Maturity Curve

Phase 1: Early Days (1-100K users)

Simple cache: Redis or Memcached
Pattern: Cache-Aside + TTL
Infrastructure: 1-2 cache servers

Phase 2: Growth (100K-1M users)

Distributed cache (Redis/Memcached cluster)
Monitoring: hit rate, latency
Add cache warming for popular data

Phase 3: Scale (1M-10M users)

Multi-tier architecture (memory + distributed)
Geographic replication
Anti-thundering herd system
Event-based invalidation

Phase 4: Hyper-scale (10M+ users)

Flash cache (SSD)
Sophisticated admission algorithms
Global replication
Strong consistency for critical data

Essential Metrics

1. Hit Rate

Hit Rate = (Cache Hits / Total Requests) × 100

Targets:

Excellent: >95%
Good: 90-95%
Needs improvement: <90%

Hit rate measured at Facebook: 96.4%

2. Latency (P50, P95, P99)

P50: 50% of requests respond in less than X ms
P95: 95% of requests respond in less than Y ms
P99: 99% of requests respond in less than Z ms

Typical targets:

Cache hit: <1ms
Cache miss: <50ms (including DB)

3. Eviction Rate

How many times per second are data removed from cache due to lack of space?

If too high: increase cache size or optimize TTL

Part 1 Conclusion

In this first part, we covered the fundamental caching strategies used by all web giants.

You now understand:

The 6 basic strategies (TTL, LRU, LFU, Write-Through/Write-Behind, Cache-Aside, Read-Through)
How Netflix, Facebook, Google, and Twitter use caching
Real-world challenges (thundering herd, cache warming, consistency)
Where to start for your own project

In Part 2: Advanced Architectures, we'll discover:

Netflix's Write-Ahead Log (WAL) in detail
How to survive database corruption with zero downtime
Multi-region replication
Tradeoffs and lessons learned at enterprise scale

Up next: Part 2 - From Write-Behind to Write-Ahead Log: How Netflix Guarantees Zero Data Loss

La Reconquête : Comment les Développeurs Reprennent le Contrôle en 2026

SpicyCode — Wed, 04 Feb 2026 11:50:52 +0000

Le marché de l'emploi tech est en crise. Des milliers de développeurs qualifiés, diplômés, expérimentés, se retrouvent au chômage pendant des mois. Certains envoient 800 candidatures pour 10 entretiens. D'autres, après 15 ans de carrière, découvrent qu'ils ne valent plus rien aux yeux des recruteurs.

Mais pendant que certains attendent qu'une entreprise veuille bien d'eux, d'autres ont compris quelque chose de fondamental : le jeu a changé, et les règles aussi.

Le mythe de la stabilité

Durant des années, on nous a vendu le même rêve : trouve un CDI, grimpe les échelons, assure ta retraite. Le bon développeur trouve toujours. Si tu galères, c'est que tu n'es pas assez bon.

Ce discours est mort.

Des seniors avec 20 ans d'expérience vivent dans des caravanes après avoir été jetés. Des juniors avec des diplômes prestigieux postulent pendant un an sans résultat. Le marché ne récompense plus la compétence ou l'expérience de la même manière.

La stabilité qu'on nous promettait n'existe plus. Les licenciements massifs, les gels d'embauche, l'IA qui remplace les juniors — tout ça est réel. Attendre passivement qu'une entreprise te choisisse, c'est accepter de perdre le contrôle de ta vie.

La vraie sécurité

La vraie sécurité en 2026, ce n'est pas un contrat. C'est ta capacité à créer de la valeur de manière autonome.

Quand tu sais résoudre des problèmes réels pour des gens qui paient, tu n'as plus besoin qu'une entreprise te valide. Tu ne mendies plus d'entretiens. Tu proposes des solutions.

Ce n'est pas de l'entrepreneuriat romantique. C'est pragmatique. C'est construire une alternative crédible pendant que les autres envoient leur 500ème candidature.

Les trois piliers de l'autonomie

Résoudre des problèmes invisibles

Les développeurs adorent construire des outils pour d'autres développeurs. C'est confortable. On comprend le problème, on parle le même langage.

Mais personne ne paie.

Les vrais problèmes — ceux qui génèrent des revenus — sont ailleurs. Dans les petites entreprises qui gèrent encore tout sur Excel. Dans les artisans qui perdent des heures sur des tâches administratives. Dans les professions réglementées qui croulent sous la paperasse.

Ces personnes-là ne cherchent pas des solutions élégantes. Ils cherchent quelqu'un qui comprend leur douleur et la fait disparaître. Peu importe comment.

Vendre avant de construire

Le réflexe du développeur : « Je vais coder quelque chose de génial, et après je trouverai des utilisateurs. »

C'est l'inverse.

Trouve des personnes qui ont mal. Demande-leur de décrire leur problème. Propose de le résoudre. S'ils acceptent de payer avant même que tu aies écrit une ligne de code, tu tiens quelque chose de réel.

Si personne ne veut payer, tu viens d'économiser trois mois de ta vie.

Construire en public

La distribution tue plus de projets que les mauvais produits. Tu peux avoir la meilleure solution du monde — si personne ne sait qu'elle existe, tu as échoué.

Documenter ce que tu construis. Partager tes galères. Montrer tes échecs. Expliquer tes choix. Ça attire les bonnes personnes : celles qui ont les mêmes problèmes, celles qui veulent payer pour ne pas avoir à le faire elles-mêmes.

Ce n'est pas du personal branding. C'est de la construction de confiance.

Le mythe du grand saut

On fantasme l'histoire du développeur qui quitte tout, se lance dans son SaaS, et devient millionnaire. Ces histoires existent, mais elles représentent une infime minorité.

La vraie trajectoire ressemble à ça :

Tu gardes ton chômage ou tu acceptes un boulot alimentaire. Le soir, tu résous des problèmes pour une ou deux personnes. Tu factures. C'est laid, c'est petit, mais c'est réel.

Puis tu en trouves trois autres avec le même problème. Tu affines ta solution. Tu factures un peu plus. Tu construis une réputation dans un micro-niche que personne ne connaît.

Six mois plus tard, tu as un revenu complémentaire stable. Pas spectaculaire, mais suffisant pour respirer. Pour négocier. Pour refuser les propositions médiocres.

Un an plus tard, ce revenu complémentaire dépasse ton ancien salaire. Et là, tu peux choisir.

C'est ça, la liberté. Pas le fantasme Instagram du nomade digital. La capacité concrète de dire non.

L'année des transitions

2026 n'est pas l'année du grand remplacement par l'IA. C'est l'année où la distinction se fait entre ceux qui subissent et ceux qui construisent.

L'IA ne remplace pas les développeurs. Elle amplifie ceux qui savent quoi en faire. Un développeur avec des outils modernes peut livrer en deux semaines ce qui prenait trois mois avant. Cette compression du temps est une opportunité massive pour ceux qui l'exploitent.

Le remote est désormais accepté. Tu peux travailler pour des clients partout. Les barrières géographiques tombent.

Les outils no-code explosent, mais les entreprises ont besoin de quelqu'un pour les connecter à leur réalité métier. Cette zone grise entre « outil tout fait » et « développement from scratch » est un territoire fertile.

Tout est aligné pour ceux qui osent.

Le prix de l'inaction

Chaque mois passé à envoyer des candidatures sans résultat est un mois de momentum perdu. Un mois où tu aurais pu construire quelque chose, apprendre à vendre, valider une idée, te tromper et recommencer.

Le marché de l'emploi traditionnel ne reviendra pas à ce qu'il était. Les entreprises ont compris qu'elles peuvent faire plus avec moins. Les juniors sont remplacés par l'IA et les seniors par des contractors moins chers.

Attendre que ça s'améliore, c'est parier contre l'évidence.

Le marché ne va pas se réparer. Mais toi, tu peux te construire une alternative.

Et dans un monde où des seniors expérimentés finissent dans des caravanes après 800 candidatures, avoir une alternative n'est plus optionnel.

C'est vital.

2026 appartient à ceux qui construisent pendant que les autres attendent.

DEV Community: SpicyCode

The evolution of AI prompting: how 4 years of research inspired my new Claude Code Skill

The golden age of prompting

The reality check

What I Built: the CoT Claude Code Skill

Android 2026: Google Closes the Door. "What Every Developer Should Know"

Table of Contents

Context

Prerequisites to understand this article

The Problem

The 4 Major Changes

1. Developer Verification — End of Anonymous Sideloading

The Developer Community Reaction

2. AOSP Moves to 2 Releases Per Year

3. Android 17 Beta 1 — Canary Replaces Developer Previews

4. Target API Level and Mandatory Maintenance Under Penalty of Invisibility

What Doesn't Change

Key Points

Your LLMs don't do real OOP, and it's structural.

What OOP originally meant (and what we forgot)

What LLMs generate instead

The false sense of security of getters / setters

A better question to ask an object

The message and event approach

Why LLMs struggle so much with real encapsulation

How to better use an AI to write OOP code

The real problem isn't the AI

Conclusion

Cache Strategies Explained: Part 2 - Advanced Architectures

Table of Contents

Recap: The Netflix Incident

Netflix, Production Incident (Reported September 2025)

Why Write-Behind Isn't Enough Anymore

The Context: 6 Critical Challenges at Netflix Scale

The Problem with Traditional Write-Behind

The Write-Ahead Log (WAL)

Fundamental Principle of WAL

Write-Behind Classic vs Netflix WAL: Architectural Difference

WAL Architecture for Global EVCache Replication

The WAL API: Intentional Simplicity

The 3 WAL Personas

Persona #1: Delayed Queue

Persona #2: Generic Cross-Region Replication

Persona #3: Multi-Partition Mutations (2-Phase Commit)

Real-World WAL Use Cases at Netflix

1. Queues with Intelligent Retries

2. Cross-Region Replication (EVCache Global)

3. Multi-Partition / Multi-Table Mutations

4. Database Failure Protection

Write-Behind vs WAL: The Comparative Match

When NOT to Use WAL

Don't Use WAL If:

When WAL Becomes NECESSARY:

Incident Resolution: Minute by Minute

Step 1: Detection (T+3 seconds)

Step 2: Immediate Protection (T+30 seconds)

Step 3: Recovery (T+5 minutes)

Step 4: Replay (T+15 minutes)

Step 5: Back to Normal (T+20 minutes)

Lessons Learned by Netflix Building WAL

1. Pluggable Architecture Is Fundamental

2. Reuse Existing Building Blocks

3. Separation of Concerns = Scalability

4. Systems Fail — Understand Tradeoffs

WAL vs Using Kafka/SQS Directly

Conclusion: Lessons from the Giants

From Incident to Innovation

The Perfect Cache Doesn't Exist

When to Adopt WAL?

Further Reading

Recommended Resources

Acknowledgments

Cache Strategies Explained: Part 1 - The Fundamentals

Table of Contents

The Incident That Changed Everything

Netflix, Production Incident (Reported September 2025)

Why Caching Is Not Optional

The 6 Fundamental Strategies

1. TTL (Time-To-Live) - Temporal Expiration

2. LRU (Least Recently Used) - Priority to Recent Items