DEV Community

Muhammad Kamran Kabeer
Muhammad Kamran Kabeer

Posted on

How I Built a Self-Healing Database on a 10-Year-Old Laptop

How I Built a Self-Healing Database on a 10-Year-Old Laptop (Using Docker + Ansible)

A practical experiment in resilience engineering on aging hardware—with modern DevOps tools.

🚀 Introduction

Running production-grade systems on old hardware sounds like a bad idea… until you treat it as a lab.

I set out to build a self-healing database system on a 10-year-old laptop—but this time with a more modern approach:

  • 8 GB RAM
  • SSD (thankfully!)
  • Docker for isolation
  • Ansible for automation

The goal wasn’t raw performance. It was resilience, repeatability, and recovery.


🧠 What “Self-Healing” Meant in This Project

In this setup, self-healing means:

  • Detecting failures automatically
  • Restarting or replacing failed components
  • Recovering corrupted or lost state
  • Rebuilding the system with minimal manual intervention

And most importantly:

Everything should be recoverable using code.


🖥️ Why This Setup Works (Even on Old Hardware)

The SSD made a huge difference compared to traditional HDD setups:

  • Faster I/O → better database responsiveness
  • Quicker container restarts
  • Improved log handling and recovery

With 8 GB RAM, I had just enough room to:

  • Run multiple containers
  • Simulate primary + replica
  • Keep monitoring lightweight

⚙️ Architecture Overview

The system is composed of:

  • Primary database container
  • Replica database container
  • Monitoring container / scripts
  • Backup service
  • Ansible playbooks (control layer)

Everything runs locally but is logically separated using Docker.


🐳 Containerized Database Setup

I used Docker to run isolated database instances.

Why Docker?

  • Clean environment separation
  • Easy restarts and redeployments
  • Fault isolation
  • Reproducibility

Example (Simplified)

```yaml id="k2js9a"
version: '3'
services:
db_primary:
image: postgres:latest
ports:
- "5432:5432"

db_replica:
image: postgres:latest
ports:
- "5433:5432"




Each container behaves like an independent node.

---

## 🔁 Replication Strategy

Even on a single laptop, I implemented logical replication:

* Primary handles writes
* Replica syncs asynchronously
* Replica stays ready for failover

### Key Idea

If the primary fails:

* Promote the replica
* Spin up a new replica using automation

---

## 🤖 Automation with Ansible

This is where things got interesting.

Instead of manually fixing things, I used **Ansible playbooks** to:

* Provision containers
* Configure replication
* Restart failed services
* Rebuild broken nodes

### Example Playbook Task



```yaml id="p9dl2x"
- name: Ensure database container is running
  docker_container:
    name: db_primary
    image: postgres:latest
    state: started
    restart_policy: always
Enter fullscreen mode Exit fullscreen mode

With this, recovery becomes:

Run a playbook → system fixes itself


👀 Health Monitoring

I implemented lightweight monitoring using scripts + container checks:

What I monitored:

  • Container health/status
  • Database connectivity
  • Replication lag
  • Disk usage

Basic Logic

  • If container stops → restart it
  • If DB not responding → recreate container
  • If replication breaks → reconfigure replica via Ansible

🔧 Self-Healing Mechanisms

Here’s how the system heals itself:

1. Container Restart (First Line of Defense)

Docker restart policies:

  • Automatically restart failed containers

2. Ansible Reconciliation

If something drifts from the desired state:

  • Re-run playbooks
  • Recreate containers
  • Reapply configs

This mimics Infrastructure as Code recovery.


3. Replica Promotion

If primary fails:

  • Stop primary container
  • Redirect traffic to replica
  • Promote replica to primary

4. Rebuild Failed Node

Using Ansible:

  • Destroy broken container
  • Recreate it
  • Resync from current primary

5. Backup + Restore

  • Periodic volume backups
  • Fast restore using Docker volumes

Even if both containers fail, recovery is still possible.


💾 Storage Strategy (SSD Advantage)

Using an SSD improved:

  • WAL/log write speed
  • Backup performance
  • Container startup time

Docker Volumes

  • Persistent storage for database data
  • Survives container restarts
  • Easily backed up

🔥 Failure Testing

I intentionally broke the system multiple times:

  • docker kill on primary
  • Deleted volumes
  • Simulated corruption
  • Stopped replication

Results

  • Containers restarted automatically
  • Ansible restored desired state quickly
  • Replica promotion worked reliably
  • Full recovery was possible from backups

📉 Trade-Offs

This setup isn’t perfect.

Downsides:

  • Single physical machine = single point of failure
  • Limited RAM → careful tuning required
  • SSD wear over time
  • Not truly “distributed”

But still valuable because:

  • It simulates real-world failure scenarios
  • Teaches recovery patterns
  • Builds DevOps discipline

🧩 Key Lessons

1. Docker + Ansible is a powerful combo

  • Docker handles runtime
  • Ansible handles desired state

Together, they approximate orchestration.


2. Self-healing = automation + observability

Without monitoring, automation is blind.


3. Old hardware is a great teacher

Failures happen more often → faster learning.


4. Infrastructure as Code is the real backup

If you can rebuild everything from playbooks:

You’re already halfway to self-healing.


🌱 What I’d Do Next

To push this further:

  • Add Prometheus + Grafana for observability
  • Introduce alerting (email/Slack)
  • Use Docker Swarm or Kubernetes
  • Move to multi-node setup (even with cheap machines)

🎯 Final Thoughts

This project reinforced a simple idea:

Reliability is not about powerful hardware—it’s about good design.

Even on a 10-year-old laptop, using:

  • Docker
  • Ansible
  • Smart recovery strategies

…you can build a system that fails gracefully and recovers automatically.


📌 GitHub Repo

https://github.com/muhammadkamrankabeer-oss/MK_Labs/tree/main/Lab4_Database

If you’ve experimented with self-healing systems or run labs on constrained hardware, I’d love to hear how you approached it!

Top comments (0)