Muhammad Kamran Kabeer

Posted on Apr 17

How I Built a Self-Healing Database on a 10-Year-Old Laptop

#devops #ansible #docker #database

How I Built a Self-Healing Database on a 10-Year-Old Laptop (Using Docker + Ansible)

A practical experiment in resilience engineering on aging hardware—with modern DevOps tools.

🚀 Introduction

Running production-grade systems on old hardware sounds like a bad idea… until you treat it as a lab.

I set out to build a self-healing database system on a 10-year-old laptop—but this time with a more modern approach:

8 GB RAM
SSD (thankfully!)
Docker for isolation
Ansible for automation

The goal wasn’t raw performance. It was resilience, repeatability, and recovery.

🧠 What “Self-Healing” Meant in This Project

In this setup, self-healing means:

Detecting failures automatically
Restarting or replacing failed components
Recovering corrupted or lost state
Rebuilding the system with minimal manual intervention

And most importantly:

Everything should be recoverable using code.

🖥️ Why This Setup Works (Even on Old Hardware)

The SSD made a huge difference compared to traditional HDD setups:

Faster I/O → better database responsiveness
Quicker container restarts
Improved log handling and recovery

With 8 GB RAM, I had just enough room to:

Run multiple containers
Simulate primary + replica
Keep monitoring lightweight

⚙️ Architecture Overview

The system is composed of:

Primary database container
Replica database container
Monitoring container / scripts
Backup service
Ansible playbooks (control layer)

Everything runs locally but is logically separated using Docker.

🐳 Containerized Database Setup

I used Docker to run isolated database instances.

Why Docker?

Clean environment separation
Easy restarts and redeployments
Fault isolation
Reproducibility

Example (Simplified)

```yaml id="k2js9a"
version: '3'
services:
db_primary:
image: postgres:latest
ports:
- "5432:5432"

db_replica:
image: postgres:latest
ports:
- "5433:5432"




Each container behaves like an independent node.

---

## 🔁 Replication Strategy

Even on a single laptop, I implemented logical replication:

* Primary handles writes
* Replica syncs asynchronously
* Replica stays ready for failover

### Key Idea

If the primary fails:

* Promote the replica
* Spin up a new replica using automation

---

## 🤖 Automation with Ansible

This is where things got interesting.

Instead of manually fixing things, I used **Ansible playbooks** to:

* Provision containers
* Configure replication
* Restart failed services
* Rebuild broken nodes

### Example Playbook Task



```yaml id="p9dl2x"
- name: Ensure database container is running
  docker_container:
    name: db_primary
    image: postgres:latest
    state: started
    restart_policy: always

With this, recovery becomes:

Run a playbook → system fixes itself

👀 Health Monitoring

I implemented lightweight monitoring using scripts + container checks:

What I monitored:

Container health/status
Database connectivity
Replication lag
Disk usage

Basic Logic

If container stops → restart it
If DB not responding → recreate container
If replication breaks → reconfigure replica via Ansible

🔧 Self-Healing Mechanisms

Here’s how the system heals itself:

1. Container Restart (First Line of Defense)

Docker restart policies:

Automatically restart failed containers

2. Ansible Reconciliation

If something drifts from the desired state:

Re-run playbooks
Recreate containers
Reapply configs

This mimics Infrastructure as Code recovery.

3. Replica Promotion

If primary fails:

Stop primary container
Redirect traffic to replica
Promote replica to primary

4. Rebuild Failed Node

Using Ansible:

Destroy broken container
Recreate it
Resync from current primary

5. Backup + Restore

Periodic volume backups
Fast restore using Docker volumes

Even if both containers fail, recovery is still possible.

💾 Storage Strategy (SSD Advantage)

Using an SSD improved:

WAL/log write speed
Backup performance
Container startup time

Docker Volumes

Persistent storage for database data
Survives container restarts
Easily backed up

🔥 Failure Testing

I intentionally broke the system multiple times:

docker kill on primary
Deleted volumes
Simulated corruption
Stopped replication

Results

Containers restarted automatically
Ansible restored desired state quickly
Replica promotion worked reliably
Full recovery was possible from backups

📉 Trade-Offs

This setup isn’t perfect.

Downsides:

Single physical machine = single point of failure
Limited RAM → careful tuning required
SSD wear over time
Not truly “distributed”

But still valuable because:

It simulates real-world failure scenarios
Teaches recovery patterns
Builds DevOps discipline

🧩 Key Lessons

1. Docker + Ansible is a powerful combo

Docker handles runtime
Ansible handles desired state

Together, they approximate orchestration.

2. Self-healing = automation + observability

Without monitoring, automation is blind.

3. Old hardware is a great teacher

Failures happen more often → faster learning.

4. Infrastructure as Code is the real backup

If you can rebuild everything from playbooks:

You’re already halfway to self-healing.

🌱 What I’d Do Next

To push this further:

Add Prometheus + Grafana for observability
Introduce alerting (email/Slack)
Use Docker Swarm or Kubernetes
Move to multi-node setup (even with cheap machines)

🎯 Final Thoughts

This project reinforced a simple idea:

Reliability is not about powerful hardware—it’s about good design.

Even on a 10-year-old laptop, using:

Docker
Ansible
Smart recovery strategies

…you can build a system that fails gracefully and recovers automatically.

📌 GitHub Repo

https://github.com/muhammadkamrankabeer-oss/MK_Labs/tree/main/Lab4_Database

If you’ve experimented with self-healing systems or run labs on constrained hardware, I’d love to hear how you approached it!

DEV Community