<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daniel Popoola</title>
    <description>The latest articles on DEV Community by Daniel Popoola (@lisan_al_gaib).</description>
    <link>https://dev.to/lisan_al_gaib</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3278167%2F8337ed8b-d96c-4736-82d5-c44818266123.jpg</url>
      <title>DEV Community: Daniel Popoola</title>
      <link>https://dev.to/lisan_al_gaib</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lisan_al_gaib"/>
    <language>en</language>
    <item>
      <title>ML in Warehouse Operations - How I Built a Production ML System to Automate Fashion Return Classification</title>
      <dc:creator>Daniel Popoola</dc:creator>
      <pubDate>Mon, 16 Mar 2026 06:50:31 +0000</pubDate>
      <link>https://dev.to/lisan_al_gaib/ml-in-warehouse-operations-how-i-built-a-production-ml-system-to-automate-fashion-return-54gf</link>
      <guid>https://dev.to/lisan_al_gaib/ml-in-warehouse-operations-how-i-built-a-production-ml-system-to-automate-fashion-return-54gf</guid>
      <description>&lt;p&gt;&lt;em&gt;From a warehouse problem I read about to a working MLOps pipeline&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;There's a stat that stuck with me when I started this project: &lt;strong&gt;online fashion retailers see return rates of up to 30%.&lt;/strong&gt; That's nearly 1 in 3 items coming back.&lt;/p&gt;

&lt;p&gt;Behind that number is a real operational headache. Every returned item — a pair of casual shoes, a handbag, a watch — has to be physically inspected, categorized, and processed. Is it a shirt or a top? Does it go back on the shelf or get refurbished? That decision, made by a human staring at an item after a long shift, happens &lt;strong&gt;hundreds of times a day&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I wanted to solve that with machine learning. Not just train a model and call it a day — but build something that could actually run in the background of a warehouse operation: automated, reliable, and observable.&lt;/p&gt;

&lt;p&gt;That project is &lt;strong&gt;RefundClassifier&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem with "Just Training a Model"
&lt;/h2&gt;

&lt;p&gt;When I started thinking about this, my first instinct was the same as any ML student's: find a dataset, train a classifier, hit 90%+ accuracy, done.&lt;/p&gt;

&lt;p&gt;But accuracy on a test set doesn't keep a warehouse running. The real questions are harder:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What happens when the batch job crashes halfway through 400 images at 2 AM?&lt;/li&gt;
&lt;li&gt;How do you update the model without taking the whole system down?&lt;/li&gt;
&lt;li&gt;How do you know if predictions are quietly degrading weeks after deployment?&lt;/li&gt;
&lt;li&gt;Who reviews the results in the morning — and in what format?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are MLOps problems. And they're the gap between a notebook demo and a system someone can actually trust.&lt;/p&gt;

&lt;p&gt;RefundClassifier is my attempt to close that gap.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the System Does
&lt;/h2&gt;

&lt;p&gt;In plain terms: every night at 2 AM, the system picks up all the return images uploaded during the business day, runs them through an ML model, and writes out a results file that warehouse staff can review in the morning.&lt;/p&gt;

&lt;p&gt;The five categories it classifies are: &lt;strong&gt;Casual Shoes, Handbags, Shirts, Tops, and Watches&lt;/strong&gt; — trained on 2,500 product images with &lt;strong&gt;96.53% accuracy&lt;/strong&gt; on the test set.&lt;/p&gt;

&lt;p&gt;But the interesting parts aren't the model. They're the infrastructure around it.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It's Built
&lt;/h2&gt;

&lt;p&gt;The architecture has three main layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The Model Service (FastAPI)&lt;/strong&gt;&lt;br&gt;
A lightweight REST API that loads the EfficientNet-B0 model from an MLflow registry and serves &lt;code&gt;/predict&lt;/code&gt; endpoints. It's stateless — it doesn't know or care about batches. It just classifies what it's given.&lt;/p&gt;

&lt;p&gt;Separating the model into its own service was a deliberate choice. It means I can update, restart, or swap the model without touching the batch processing logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The Batch Orchestrator (Python)&lt;/strong&gt;&lt;br&gt;
This is the core of the system. It runs on a cron schedule, scans the input directory for unprocessed images, calls the Model Service in batches of 10, writes results to a CSV, and pushes metrics to Prometheus.&lt;/p&gt;

&lt;p&gt;The most important feature here: &lt;strong&gt;checkpoint recovery&lt;/strong&gt;. If the job crashes at image 287 of 400, it doesn't restart from zero. It reads the checkpoint, skips what's already done, and continues. In a production warehouse context, reprocessing already-classified items creates data integrity issues. This prevents that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Monitoring (Prometheus + Grafana)&lt;/strong&gt;&lt;br&gt;
Every batch run pushes metrics — inference latency, batch success rate, class distribution — to a Prometheus Pushgateway. Grafana dashboards surface those metrics visually. If the model starts misclassifying at unusual rates, or a batch takes 3x longer than normal, it shows up.&lt;/p&gt;

&lt;p&gt;This was the part I underestimated the most. Monitoring isn't a "nice to have." It's how you find out something is wrong before a human has to tell you.&lt;/p&gt;




&lt;h2&gt;
  
  
  Model Versioning with MLflow
&lt;/h2&gt;

&lt;p&gt;The model is registered in MLflow with a &lt;strong&gt;production alias&lt;/strong&gt; — a pointer that says "this is the version the Model Service should load." When I retrain with new data, I register the new version and promote it to production. The service picks it up on restart, no code changes needed.&lt;/p&gt;

&lt;p&gt;This is the simplest version of a deployment pipeline, but it enforces a useful discipline: the model is never just a file on disk. It has a version, experiment metadata, accuracy metrics attached to it, and a clear promotion path.&lt;/p&gt;




&lt;h2&gt;
  
  
  The UI
&lt;/h2&gt;

&lt;p&gt;There's also a Streamlit interface for manual use — useful for ad-hoc classification or demos. Staff can upload a batch of images, trigger classification, and see the results in a table without touching the command line.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Actually Learned
&lt;/h2&gt;

&lt;p&gt;Building this taught me a few things that no ML course covered:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Batch processing is underrated.&lt;/strong&gt; Most tutorials show real-time inference. But most real business operations don't need sub-second latency — they need reliable, scheduled, auditable processing. Batch is often the right answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 10% that isn't model accuracy is 90% of the work.&lt;/strong&gt; Getting to 96% accuracy took two days. Getting checkpoint recovery, metric pushing, model registry integration, and error handling right took the rest of the project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability is the difference between a deployed model and a trusted system.&lt;/strong&gt; A model running in the dark is not production. A model with dashboards, alerts, and traceable outputs is.&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com" rel="noopener noreferrer"&gt;github.com/DanielPopoola/autorma&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Dataset: Fashion Product Images (Kaggle) — 2,500 images across 5 categories&lt;/li&gt;
&lt;li&gt;Stack: PyTorch · FastAPI · MLflow · Prometheus · Grafana · Streamlit · Docker&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This was my final year CS project. I'm currently looking for roles in backend engineering and ML engineering — feel free to connect.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>automation</category>
      <category>devops</category>
      <category>machinelearning</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Building a Payment Gateway That Doesn't Lie: How I Solved Distributed State Failures in Go</title>
      <dc:creator>Daniel Popoola</dc:creator>
      <pubDate>Fri, 20 Feb 2026 17:55:38 +0000</pubDate>
      <link>https://dev.to/lisan_al_gaib/i-built-a-production-grade-payment-gateway-in-go-heres-what-i-learned-about-distributed-systems-3882</link>
      <guid>https://dev.to/lisan_al_gaib/i-built-a-production-grade-payment-gateway-in-go-heres-what-i-learned-about-distributed-systems-3882</guid>
      <description>&lt;p&gt;Your server just charged a customer's card. The bank confirmed it — funds reserved, authorization ID returned. Then, a millisecond later, your server crashes.&lt;/p&gt;

&lt;p&gt;Your database never got the memo.&lt;/p&gt;

&lt;p&gt;Now your system thinks the payment failed. FicMart's order service re-routes the customer to a failure page, maybe even prompts them to retry. But the bank already has a hold on their money. The customer gets charged twice, or worse — their funds are locked in limbo with no order attached.&lt;/p&gt;

&lt;p&gt;This isn't a hypothetical. It's the fundamental challenge of payment processing in distributed systems, and it's deceptively easy to ignore until it happens in production. I built &lt;strong&gt;FicMart Payment Gateway&lt;/strong&gt; — a production-grade payment gateway in Go — specifically to confront this problem head-on. Here's how I thought through it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Enemy: Partial Failures
&lt;/h2&gt;

&lt;p&gt;Most engineers think about failures in binary terms. Either a request succeeds or it fails. But distributed systems introduce a third, nastier category: &lt;strong&gt;partial failures&lt;/strong&gt; — where some things succeed and others don't, with no clean way to tell which is which.&lt;/p&gt;

&lt;p&gt;In payment processing, this is especially dangerous because two systems are involved: your gateway and the bank. When you ask the bank to capture $50, the sequence looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Gateway calls bank: "Capture $50 for Auth #123"&lt;/li&gt;
&lt;li&gt;Bank processes it: "Done. Capture ID: #456"&lt;/li&gt;
&lt;li&gt;Gateway prepares to save &lt;code&gt;CAPTURED&lt;/code&gt; to the database&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gateway crashes&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Database still says &lt;code&gt;AUTHORIZED&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The money has moved. But your system doesn't know it. And because you have no record of Capture #456, you have no way to reconcile without manual intervention.&lt;/p&gt;

&lt;p&gt;This is the problem I set out to solve. The solution came down to three interlocking patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 1: Capture Intent Before Acting
&lt;/h2&gt;

&lt;p&gt;The core insight is simple: &lt;strong&gt;your database needs to know what you're &lt;em&gt;about&lt;/em&gt; to do, not just what you've done.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before the gateway makes any external bank call, it persists the payment in an intermediate state. For a capture, that means transitioning from &lt;code&gt;AUTHORIZED&lt;/code&gt; to &lt;code&gt;CAPTURING&lt;/code&gt; &lt;em&gt;before&lt;/em&gt; touching the bank. A naive state machine looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PENDING → AUTHORIZED → CAPTURED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But this leaves a blind spot. If the gateway crashes between &lt;code&gt;AUTHORIZED&lt;/code&gt; and &lt;code&gt;CAPTURED&lt;/code&gt;, there's no record that a capture was ever attempted. Was the bank called? Did it succeed? You don't know.&lt;/p&gt;

&lt;p&gt;The intermediate state closes that gap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PENDING → AUTHORIZED → CAPTURING → CAPTURED → REFUNDING → REFUNDED
                    ↓
                 VOIDING → VOIDED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;CAPTURING&lt;/code&gt; is not just a status — it's a signal of intent. It says: &lt;em&gt;"A capture was started here. If you find me stuck in this state, you know exactly what to do."&lt;/em&gt; The transition into it happens inside the same database transaction that acquires the idempotency lock, so the intent is either fully committed or fully rolled back — no ambiguity.&lt;/p&gt;

&lt;p&gt;This is borrowed from database engineering: the Write-Ahead Log pattern, where you record what you're &lt;em&gt;about&lt;/em&gt; to do before doing it so recovery is always possible.&lt;/p&gt;

&lt;p&gt;For authorizations specifically, this gets more nuanced. PCI compliance means you can never store raw card details, so if a crash happens during authorization, there's no way to retry it — the card data is gone. Rather than pretending this is solvable automatically, &lt;code&gt;PENDING&lt;/code&gt; authorizations older than 10 minutes are marked &lt;code&gt;FAILED&lt;/code&gt; and flagged for manual reconciliation. Some failures can't be fully automated away, and being honest about that is better than silently losing money.&lt;/p&gt;

&lt;p&gt;The domain layer enforces all of this with zero database or HTTP dependencies. Business rules — you can't void a captured payment, you can't refund an unauthorized one — live in pure Go. The domain is the source of truth for what's &lt;em&gt;allowed&lt;/em&gt;, completely independent of what's &lt;em&gt;stored&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 2: Background Workers That Heal the System
&lt;/h2&gt;

&lt;p&gt;Intermediate states create the evidence. Background workers act on it.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;RetryWorker&lt;/strong&gt; polls the database on a configurable interval, looking for payments stuck in &lt;code&gt;CAPTURING&lt;/code&gt;, &lt;code&gt;VOIDING&lt;/code&gt;, or &lt;code&gt;REFUNDING&lt;/code&gt; past their retry window. For each one, it re-invokes the appropriate bank operation using the &lt;em&gt;original idempotency key&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That last part is what makes this safe. Because the bank supports idempotency, sending the same key twice doesn't trigger a second charge — it returns the cached result from the first attempt. The worker doesn't need to know whether the original call succeeded or not. If the bank already processed it, we get the success response back and update the database. If it didn't, we process it now. Either way, the database eventually converges to reality.&lt;/p&gt;

&lt;p&gt;Before any retry decision is made, errors are classified:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Transient errors&lt;/strong&gt; (timeouts, 500s) — retry with exponential backoff and jitter to avoid hammering the bank&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Permanent errors&lt;/strong&gt; (card declined, insufficient funds, auth expired) — fail fast, no retry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business rule violations&lt;/strong&gt; (invalid state transitions) — reject immediately at the domain layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This classification is what separates a robust retry system from one that makes things worse. Retrying a permanent error doesn't fix anything — a declined card won't become approved on the fifth attempt. Treating it as retryable wastes cycles and delays the customer from finding out their payment failed.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;ExpirationWorker&lt;/strong&gt; handles a different edge case: authorized payments approaching the bank's 7-day authorization window. Rather than trusting the local clock blindly, the worker checks the bank's state before marking anything expired — with a 48-hour grace period to account for distributed clock skew.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 3: Idempotency as the Safety Net
&lt;/h2&gt;

&lt;p&gt;Recovery workers only work if retrying is safe. That guarantee comes entirely from idempotency.&lt;/p&gt;

&lt;p&gt;Every external-facing operation requires an &lt;code&gt;Idempotency-Key&lt;/code&gt; header. But the enforcement here goes deeper than most implementations.&lt;/p&gt;

&lt;p&gt;Idempotency state is stored in PostgreSQL, not Redis — deliberately. This means it survives restarts and is subject to ACID guarantees. The &lt;code&gt;idempotency_keys&lt;/code&gt; table does two jobs simultaneously.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's a response cache.&lt;/strong&gt; Once an operation completes, the result is stored against the key. Future requests with the same key get the cached response instantly, without touching the bank.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's a distributed lock.&lt;/strong&gt; A &lt;code&gt;locked_at&lt;/code&gt; timestamp is set when an operation begins and cleared when it finishes. If two requests arrive with the same key at the same time, the second enters a polling loop — checking every 100ms — until the first completes, then receives the same response. No double-processing, no race conditions.&lt;/p&gt;

&lt;p&gt;There's also a subtler protection: a &lt;code&gt;request_hash&lt;/code&gt; (SHA-256 of the request body) stored alongside each key. If a client tries to reuse an idempotency key with &lt;em&gt;different&lt;/em&gt; parameters — a different amount, a different payment — the gateway rejects it with an &lt;code&gt;IDEMPOTENCY_MISMATCH&lt;/code&gt; error. This prevents a class of silent bugs where key reuse returns a stale result for a completely different operation.&lt;/p&gt;

&lt;p&gt;The three patterns form a chain: intermediate states give workers something to act on → workers retry using the original idempotency key → idempotency makes those retries safe. Remove any one of them and the others stop working.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Do Differently at Scale
&lt;/h2&gt;

&lt;p&gt;Building this taught me as much about the limits of my approach as the strengths of it.&lt;/p&gt;

&lt;p&gt;The most important change in a high-traffic environment would be moving idempotency lookups to Redis. PostgreSQL works here, but for a gateway handling thousands of requests per second, sub-millisecond idempotency checks matter. I'd keep Postgres as the durable fallback but use Redis as the hot path.&lt;/p&gt;

&lt;p&gt;I'd also move to &lt;strong&gt;event sourcing&lt;/strong&gt; for payment state. Right now, the &lt;code&gt;payments&lt;/code&gt; table stores the current state — you can see that a payment is &lt;code&gt;CAPTURED&lt;/code&gt;, but you can't see the full timeline of how it got there. An append-only &lt;code&gt;payment_events&lt;/code&gt; table would make debugging orphaned authorizations significantly easier: you'd be able to reconstruct exactly where the gap between the bank's state and yours opened up.&lt;/p&gt;

&lt;p&gt;The retry worker would also benefit from &lt;code&gt;FOR UPDATE SKIP LOCKED&lt;/code&gt; on its database queries. Currently, multiple worker instances compete for the same stuck payments. Skip-locked semantics let workers divide the work without blocking each other — a meaningful concurrency improvement once the system is under real load.&lt;/p&gt;

&lt;p&gt;Finally, I'd add chaos testing: deliberately crashing the gateway at the exact millisecond between a bank response and the database commit. That's the failure mode this entire system is designed to handle, and the only way to be truly confident it works is to make it happen on purpose.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Really Taught Me
&lt;/h2&gt;

&lt;p&gt;Payment systems forced me to think about a dimension of engineering I hadn't fully internalized before: &lt;strong&gt;correctness under failure&lt;/strong&gt;, not just correctness under normal conditions.&lt;/p&gt;

&lt;p&gt;It's easy to build a service that works when everything goes right. The interesting engineering happens when you ask: &lt;em&gt;what is the worst possible moment for this process to crash, and what does the system look like afterward?&lt;/em&gt; That question shapes every decision in this gateway — the intermediate states, the write-ahead pattern, the idempotency locking, the recovery workers.&lt;/p&gt;

&lt;p&gt;The result is a system that doesn't just handle payments. It handles uncertainty. And in distributed systems, uncertainty is the only thing you can count on.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The full source code is available on GitHub: &lt;a href="https://github.com/DanielPopoola/ficmart-payment-gateway" rel="noopener noreferrer"&gt;DanielPopoola/ficmart-payment-gateway&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>distributedsystems</category>
      <category>go</category>
      <category>softwareengineering</category>
      <category>postgres</category>
    </item>
    <item>
      <title>Building a Health-Check Microservice with FastAPI</title>
      <dc:creator>Daniel Popoola</dc:creator>
      <pubDate>Fri, 20 Jun 2025 10:44:57 +0000</pubDate>
      <link>https://dev.to/lisan_al_gaib/building-a-health-check-microservice-with-fastapi-26jo</link>
      <guid>https://dev.to/lisan_al_gaib/building-a-health-check-microservice-with-fastapi-26jo</guid>
      <description>&lt;p&gt;In modern application development, health checks play a crucial role in ensuring reliability, observability, and smooth orchestration—especially in containerized environments like Docker or Kubernetes. In this post, I’ll walk you through how I built a production-ready health-check microservice using &lt;strong&gt;FastAPI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This project features structured logging, clean separation of concerns, and asynchronous service checks for both a database and Redis—all built in a modular and extensible way.&lt;/p&gt;

&lt;p&gt;GitHub Repo: [&lt;a href="https://github.com/DanielPopoola/fastapi-microservice-health-check" rel="noopener noreferrer"&gt;https://github.com/DanielPopoola/fastapi-microservice-health-check&lt;/a&gt;]&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 What This Project Covers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Creating a &lt;code&gt;/health/&lt;/code&gt; endpoint with real service checks (DB, Redis)&lt;/li&gt;
&lt;li&gt;Supporting &lt;code&gt;/live&lt;/code&gt; and &lt;code&gt;/ready&lt;/code&gt; endpoints for Kubernetes probes&lt;/li&gt;
&lt;li&gt;Using async &lt;code&gt;asyncio.gather()&lt;/code&gt; for fast, parallel checks&lt;/li&gt;
&lt;li&gt;Configurable settings with Pydantic&lt;/li&gt;
&lt;li&gt;Structured logging with custom log formatting using loguru.&lt;/li&gt;
&lt;li&gt;Middleware for request timing and error handling&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📁 Project Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;project/
├── main.py             # App factory and configuration
├── config.py           # App settings via Pydantic
├── routers/
│   ├── health.py       # Health check endpoints
│   └── echo.py         # Echo endpoint (for demo)
├── utils/
│   └── logging.py      # Custom logger setup
└── ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🔍 Under the Hood: &lt;code&gt;main.py&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;main.py&lt;/code&gt; acts as the orchestrator. Here's what it handles:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. App Lifecycle Management
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@asynccontextmanager&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lifespan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Application starting up&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Application shutting down&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This cleanly logs startup and shutdown events, essential for container lifecycle awareness.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. App Factory Pattern
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;create_app()&lt;/code&gt; function encapsulates app setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loads settings with &lt;code&gt;get_settings()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Sets up structured logging&lt;/li&gt;
&lt;li&gt;Registers CORS middleware&lt;/li&gt;
&lt;li&gt;Adds global and HTTP exception handlers&lt;/li&gt;
&lt;li&gt;Includes routers for modularity&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Middleware
&lt;/h3&gt;

&lt;p&gt;A custom middleware logs request data and execution time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_requests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;call_next&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;start_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;call_next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Response-Time&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;ms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Exception Handling
&lt;/h3&gt;

&lt;p&gt;Two global handlers catch errors and format them consistently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One for &lt;code&gt;HTTPException&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;One for unexpected &lt;code&gt;Exception&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚕️ Health Check Logic (&lt;code&gt;routers/health.py&lt;/code&gt;)
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;routers/health.py&lt;/code&gt; file houses the core of this service:&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ &lt;code&gt;/health/&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Performs parallel health checks using &lt;code&gt;asyncio.gather()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;perform_health_checks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Settings&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ServiceCheck&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;checks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;database_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;database&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;check_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;database_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;health_check_timeout&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;  
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;check_redis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;health_check_timeout&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;return_exceptions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;checks&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is a combined status response showing the health of each component.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔁 &lt;code&gt;/live&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;A simple liveness check returning HTTP 200 to signal the app is alive.&lt;/p&gt;

&lt;h3&gt;
  
  
  📦 &lt;code&gt;/ready&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Waits for both Redis and DB to pass checks before returning 200. Useful for Kubernetes readiness probes.&lt;/p&gt;




&lt;h2&gt;
  
  
  📡 Root Endpoint and Echo
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/&lt;/code&gt; returns app metadata like name, version, and timestamp&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/echo&lt;/code&gt; is a simple test endpoint to verify connectivity&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🛠️ How to Run It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvicorn app.main:app &lt;span class="nt"&gt;--reload&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or using the embedded &lt;code&gt;__main__&lt;/code&gt; block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🌟 What’s Next?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Add more service checks (e.g., external APIs, caches)&lt;/li&gt;
&lt;li&gt;Integrate with Docker’s &lt;code&gt;HEALTHCHECK&lt;/code&gt; instruction&lt;/li&gt;
&lt;li&gt;Configure Kubernetes readiness/liveness probes&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Building robust health checks is one of the simplest yet most impactful ways to improve system reliability. With FastAPI’s speed and async support, this project offers a solid base for both simple and enterprise-grade applications.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/DanielPopoola/fastapi-microservice-health-check" rel="noopener noreferrer"&gt;DanielPopoola/fastapi-microservice-health-check&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>fastapi</category>
      <category>beginners</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
