Why You Should Stop Reading Tutorials

Deepak Doriya — Wed, 08 Jul 2026 17:14:26 +0000

What I Learned Auditing a World-Class Python Library (And Why You Should Stop Reading Tutorials)

When you're learning Python, most courses follow the same path: variables, lists, basic loops, object-oriented programming. But there's a massive chasm between a tutorial script and the code that runs production-grade libraries.

To bridge that gap, I recently audited the source of HTTPX — a modern, fully typed HTTP client for Python with 100% test coverage.

Here are 4 advanced patterns I found that standard tutorials never teach you — and how they hold up in real production code.

1. The Naked Asterisk `*`: Enforcing Clean API Calls

When a function has 10+ optional configurations, it's easy to pass arguments in the wrong order. HTTPX prevents this using a naked asterisk * in its signatures:

def request(self, method: str, url: str, *, headers: dict = None):
    ...

The * acts as a barrier. Every argument placed after it can no longer be passed positionally — the caller is forced to write the parameter name explicitly.

# ❌ Raises TypeError:
client.request("GET", "https://google.com", {"User-Agent": "Bot"})

# ✅ Required syntax (forces clarity):
client.request("GET", "https://google.com", headers={"User-Agent": "Bot"})

Takeaway: Use * to force callers to write explicit, self-documenting code.

2. Sentinels: Solving the "Default Value" Dilemma

We're taught to write timeout = None when a parameter is optional. But what happens if None is actually a valid choice for the user to make?

Scenario A: client.get(url) — the user wants the client's default 5-second timeout.
Scenario B: client.get(url, timeout=None) — the user explicitly wants no timeout (infinite wait).

If your function signature is timeout=None, you can't tell these two cases apart — omitted and "explicitly None" look identical.

HTTPX solves this with a sentinel object called USE_CLIENT_DEFAULT:

# Define the sentinel
USE_CLIENT_DEFAULT = object()

def get(self, url, timeout=USE_CLIENT_DEFAULT):
    if timeout is USE_CLIENT_DEFAULT:
        timeout = self.default_timeout  # falls back to client default (e.g. 5s)
    # if the user passed None explicitly, we skip the if-block and timeout stays None

Takeaway: Use sentinels when you need to distinguish "argument omitted" from "argument explicitly set to None."

3. Abstract Base Classes (ABCs): Creating Reliable Custom Types

If you want a custom dictionary — like HTTPX's Headers class, which needs case-insensitive keys — your first instinct might be to inherit from dict:

class Headers(dict):
    ...

This is a trap. Built-in types written in C often bypass your overrides. If you override __setitem__ to lowercase keys, methods like dict.update() will ignore your override and write raw keys anyway.

Instead, HTTPX inherits from MutableMapping, an Abstract Base Class:

from collections.abc import MutableMapping

class Headers(MutableMapping):
    ...

By agreeing to this "contract," you only need to implement a handful of core methods (__getitem__, __setitem__, __delitem__, __iter__, __len__). Python then generates all the other dict-like methods (.get(), .pop(), .update()) automatically — and guarantees they route through your custom logic.

Takeaway: Never subclass dict or list for custom containers. Use collections.abc instead.

4. Mypy Strict Mode: Type Safety at Scale

Python is dynamically typed, but large-scale libraries can't afford runtime type errors. HTTPX enforces safety by running mypy in strict mode. In their pyproject.toml:

disallow_untyped_defs = true — every function must be fully type-annotated.
disallow_incomplete_defs = true — no partially type-hinted signatures.
check_untyped_defs = true — mypy still scans untyped functions for logic bugs.

Takeaway: If you're publishing a library, wire up mypy strict mode from day one — retrofitting types onto an untyped codebase later is far more painful.

Conclusion: Stop Reading Tutorials, Start Auditing Code

Instead of reading a standard Python textbook, I decided to do a codebase scavenger hunt challenge on HTTPX — hunting down specific classes, tracing parameters, and figuring out how their types are structured. It forced me to look at actual production code, and it taught me more about intermediate Python design in 30 minutes than any tutorial course could.

If you want to level up, pick a library, set up a few questions to find, and start digging.

Source: httpx/_models.py on GitHub.

How Netflix Knows What You Want to Watch: Matrix Factorization & Architecture

Deepak Doriya — Fri, 03 Jul 2026 13:24:52 +0000

How Netflix Knows What You Want to Watch: Matrix Factorization & Architecture

Have you ever finished a binge-worthy series on Netflix, only for the algorithm to instantly recommend the perfect follow-up show? It feels like magic, but under the hood, it’s one of the most sophisticated Machine Learning systems in the world.

As I dive deeper into Data Science and Machine Learning, I recently studied the architecture behind Netflix's recommendation engine. It’s not just a simple "if/then" script—it requires complex linear algebra and a highly distributed microservices architecture. Here is a technical breakdown of how it actually works.

1. The Math: Matrix Factorization

At the core of many recommendation engines is a technique called Matrix Factorization (often implemented via Singular Value Decomposition or SVD).

Imagine a massive grid (a matrix) where the rows are millions of Netflix users and the columns are thousands of movies. The cells contain ratings or engagement scores. Because most users have only watched a tiny fraction of the library, this matrix is incredibly sparse (mostly empty).

Matrix Factorization solves this by breaking that giant matrix down into two smaller, dense matrices:

A User Matrix representing latent user preferences.
A Item Matrix representing latent movie traits.

These "latent traits" are hidden features the algorithm discovers on its own. For example, a trait might heavily correlate with "quirky indie comedies starring Steve Carell" without anyone ever explicitly programming that rule. By calculating the dot product of a user's vector and a movie's vector, the system can predict exactly how much that user will enjoy a movie they’ve never seen.

2. The Architecture: Offline, Nearline, and Online Computation

Matrix Factorization is computationally expensive. You can't recalculate the entire matrix for 250+ million users every time someone clicks "Play." To solve this, Netflix splits its machine learning computation into three distinct tiers:

Offline Computation: This is the heavy lifting. Massive batch jobs run on Apache Spark or Hadoop clusters overnight or weekly. This is where models are trained on historical data and where the heavy matrix factorization occurs. It’s highly accurate but very slow.
Nearline Computation: This tier acts asynchronously. It listens for events (like you finishing an episode) and quickly recalculates localized recommendations or updates your profile in the background. It provides a sweet spot between responsiveness and deep analysis, usually executing within seconds or minutes.
Online Computation: This is the real-time layer. When your app loads, this synchronous layer must respond within milliseconds. It takes the pre-computed models from the offline layer, updates them instantly with real-time context (like what device you are on or the current time of day), and serves the final ranked list to your screen.

3. Personalizing the Artwork

Beyond ranking the shows, Netflix also relies heavily on Contextual Bandits to personalize the thumbnails.

If the system recommends Good Will Hunting to you, the thumbnail image you see will depend on your watch history:

If your user vector leans toward romance, the thumbnail might feature Matt Damon and Minnie Driver about to kiss.
If your user vector leans toward comedy, the thumbnail might feature Robin Williams laughing.

By optimizing the artwork dynamically, Netflix dramatically increases their Click-Through Rate (CTR).

Conclusion

At the end of the day, Netflix’s ultimate metric isn't just prediction accuracy—it is user retention. Every model they deploy, across every tier of their architecture, is designed to keep you engaged.

As I continue my journey into Machine Learning, dissecting these industry-scale systems shows just how powerful core math concepts become when paired with scalable engineering!

What is the most accurate recommendation an algorithm has ever given you? Let me know in the comments!

DEV Community: Deepak Doriya

Why You Should Stop Reading Tutorials

What I Learned Auditing a World-Class Python Library (And Why You Should Stop Reading Tutorials)

1. The Naked Asterisk *: Enforcing Clean API Calls

2. Sentinels: Solving the "Default Value" Dilemma

3. Abstract Base Classes (ABCs): Creating Reliable Custom Types

4. Mypy Strict Mode: Type Safety at Scale

Conclusion: Stop Reading Tutorials, Start Auditing Code

How Netflix Knows What You Want to Watch: Matrix Factorization & Architecture

How Netflix Knows What You Want to Watch: Matrix Factorization & Architecture

1. The Math: Matrix Factorization

2. The Architecture: Offline, Nearline, and Online Computation

3. Personalizing the Artwork

Conclusion

1. The Naked Asterisk `*`: Enforcing Clean API Calls