ikaro

Posted on Apr 8

Why I Built PureMyHA: A Lightweight MySQL 8.4 HA Manager in Haskell

#mysql #database #opensource #haskell

TL;DR

What is it? PureMyHA is a lightweight, asynchronous High Availability manager built exclusively for MySQL 8.4.
Why Haskell? To leverage its robust type system and fearless concurrency for rock-solid state management where failure is not an option.
The Goal: Provide automated failover, split-brain protection, and modern MySQL 8.4 syntax support without the complexity of full-scale database clustering systems.
🔗 GitHub Repository: ikaro1192/PureMyHA (If you find it interesting, a star would be awesome!)

The State of MySQL 8.4 High Availability

In modern database operations, the need to build highly available (HA) MySQL clusters from scratch on bare metal or IaaS is decreasing, thanks to managed services like Amazon Aurora. However, due to strict latency requirements, cost optimization, or existing infrastructure constraints, self-hosted MySQL HA is still a harsh reality for many of us.

When looking at HA solutions for the recently released MySQL 8.4, the official InnoDB Cluster is a common choice. But there's a catch: it relies on Group Replication, which requires synchronous commits across nodes. For workloads highly sensitive to write latency, this can be a dealbreaker.

What about community solutions?

Orchestrator (now under Percona) supports MySQL 8.4, but its semi-synchronous replication support is still in the Tech Preview stage.

Vitess fully supports MySQL 8.4 and asynchronous replication. However, Vitess is a massive, all-in-one database clustering system. For a modest setup of just a few nodes, it is simply overkill.

The Missing Piece: PureMyHA

To summarize, there is still a strong need for a lightweight HA tool based on standard asynchronous replication—a tool that stays out of the way and only intervenes during failures.

That’s why I built PureMyHA: a simple yet powerful HA manager written in Haskell, fully compatible with MySQL 8.4's new syntax and authentication methods.

Core Features

PureMyHA is designed to be minimal but heavily armed for production edge-cases. Instead of packing every possible database feature, it focuses purely on making asynchronous replication highly available and easy to operate.

Here are the highlights:

🐬 1. Exclusively MySQL 8.4 Native

We dropped legacy baggage. PureMyHA uses only modern syntax and defaults:

Strictly uses SHOW REPLICA STATUS and CHANGE REPLICATION SOURCE TO.
Fully supports caching_sha2_password authentication (the default in 8.4), leaving mysql_native_password in the past.

🛡️ 2. Bulletproof Automatic Failover

Failovers shouldn't be scary. PureMyHA ensures safe promotions with zero-data-loss semantics where possible:

Errant GTID Repair: Automatically detects and neutralizes errant GTIDs by injecting empty transactions before promotion.
Split-Brain Auto-Fencing: If multiple nodes act as a source, it can automatically enforce super_read_only=ON to prevent write divergence.
Anti-Flap Protection: Prevents endless failover loops during network instability by enforcing a configurable recovery_block_period.
Consecutive Failure Thresholds: Avoids false positives from momentary TCP timeouts by requiring $N$ consecutive probe failures.

🛠️ 3. Built for the Operator

We know that maintaining clusters involves more than just waiting for crashes.

Zero-Downtime Config Reloads: Tweak monitoring thresholds or webhooks and apply them via SIGHUP without restarting the daemon.
Native CLONE Plugin Support: Re-seed a lagging or broken replica effortlessly from a donor node with a single CLI command (puremyha clone).
Granular CLI Controls: Pause/resume auto-failover, exclude specific replicas during maintenance, or perform dry-runs for manual switchovers.

📊 4. Observability & Integrations

PureMyHA fits right into modern cloud-native stacks.

Prometheus Metrics: Exposes a /metrics endpoint out-of-the-box tracking replication lag, failure counts, and cluster health.
Kubernetes-Ready: Provides /health liveness probes.
Custom Hooks: Trigger your own shell scripts (e.g., Slack alerts, DNS updates) via lifecycle events like pre_failover or on_lag_threshold_exceeded.

Note: For the complete, exhaustive list of features and configuration options, check out the Feature Reference in the docs.

Architecture: The Unix Philosophy

PureMyHA is designed around the classic Unix philosophy: separate the heavy background lifting from the user interface. It consists of two main components communicating over a local socket.

puremyhad (The Daemon): The brain of the operation. This long-running background process handles topology auto-discovery, continuous health monitoring, and automatic failovers.
puremyha (The CLI): The control panel. It sends commands to the daemon and formats responses for the operator.
Communication: They communicate via a Unix domain socket (/run/puremyhad.sock) using newline-delimited JSON (NDJSON), ensuring fast and secure local-only access.

Quick Start: Up and Running in Minutes

One of the main goals of PureMyHA is operational simplicity. Because it compiles to a single, statically-linked binary, you don't need to set up external dependencies like etcd, Consul, or ZooKeeper.

Here is how easily you can get a cluster under management.

1. Installation

The easiest way to install PureMyHA is via the pre-built packages for your distribution. (Docker builds and source compilation via cabal are also available).

$ DOWNLOAD_URL=$(curl -s https://api.github.com/repos/ikaro1192/PureMyHA/releases/latest \
  | jq -r '.assets[].browser_download_url' \
  | grep "$(uname -m)" \
  | grep "\.rpm$")
$ wget "$DOWNLOAD_URL"
$ FILE_NAME=$(basename "$DOWNLOAD_URL")
$ sudo rpm -ivh $FILE_NAME
$ sudo systemctl enable --now puremyhad

2. The Minimal Configuration

Configuration is done via a straightforward YAML file. You just need to define your cluster nodes and provide the monitoring credentials, and hooks. The daemon handles the rest by auto-discovering the topology.

# cp -abi /etc/puremyha/config.yaml{.example,}
# vim /etc/puremyha/config.yaml

3. Start the Daemon

Enable and start the background daemon. It will immediately connect to the nodes, map the replication tree, and begin continuous health monitoring. Now you can use the puremyha CLI to interact with the daemon over the local Unix socket.

# systemctl start puremyhad 
# systemctl status puremyhad

Demo: Chaos in Action

Setting up is easy, but how does PureMyHA handle a real fire? Let’s walk through a simulated disaster and recovery scenario using a 3-node cluster, controlling everything strictly through the CLI.

1. The Initial State

First, let's check our healthy cluster. db01 is our active primary (source), with two replicas smoothly following along.

# puremyha status
CLUSTER             HEALTH                   SOURCE              NODES PAUSED  RECOVERY BLOCKED
----------------------------------------------------------------------------------------
main                Healthy                  db01                3     no      2026-04-08T13:04:40Z
# puremyha topology
Cluster: main
[SOURCE] db01:3306 [Healthy]
  [REPLICA] db02:3306 [Healthy] lag=0s
  [REPLICA] db03:3306 [Healthy] lag=0s

2. Pulling the Plug (Automatic Failover)

Now, let's simulate a hard crash by stopping the MySQL service on our primary node, db01.

Instantly, the background daemon detects the disruption, confirms quorum, and executes an automatic failover. When we check the topology again, we can see that db02 has been safely promoted to the new source, and db03 has been re-pointed to it. db01 is correctly marked as unreachable.

# puremyha topology
Cluster: main
[SOURCE] db02:3306 [Healthy]
  [REPLICA] db03:3306 [Healthy] lag=0s
  [REPLICA] db01:3306 [NodeUnreachable: Network.Socket.connect: <socket: 16>: does not exist (Connection refused)]

3. Bringing the Dead Node Back

Let's fast-forward. We fixed the issue on db01 and brought the server back online.

PureMyHA detects that db01 is alive again, but because of its strict safety principles, it doesn't just blindly start replicating. It waits for operator instruction, marking the node as [NotReplicating].

# puremyha topology
Cluster: main
[SOURCE] db02:3306 [Healthy]
  [REPLICA] db03:3306 [Healthy] lag=0s
  [REPLICA] db01:3306 [NotReplicating]

4. Rejoining the Cluster

To bring db01 back into the fold safely, we use the demote command. This instructs PureMyHA to configure db01 as a standard replica under our new source, db02.

The cluster is fully healthy again!

# puremyha demote --host db01 --source db02
OK: Demote completed: db01 is now a replica
# puremyha topology
Cluster: main
[SOURCE] db02:3306 [Healthy]
  [REPLICA] db03:3306 [Healthy] lag=0s
  [REPLICA] db01:3306 [Healthy] lag=0s

5. The Graceful Switchover

Finally, to complete our maintenance, we want our original topology back with db01 at the helm. Since this is a planned operation, we demand zero data loss. We trigger a manual switchover.

PureMyHA handles the delicate dance under the hood: locking writes, waiting for db01 to catch up to the exact GTID, promoting it, and re-pointing db02 and db03.

# puremyha switchover --to=db01
OK: Switchover completed
# puremyha topology
Cluster: main
[SOURCE] db01:3306 [Healthy]
  [REPLICA] db02:3306 [Healthy] lag=0s
  [REPLICA] db03:3306 [Healthy] lag=0s

And just like that, we are exactly back where we started. No missing transactions, no split-brain, no manual GTID math—just clean, predictable operations.

Engineering Decisions

Why Haskell for an HA Manager?

When building infrastructure tooling—especially an HA manager like PureMyHA—you are dealing with a hostile environment. Network partitions, split-brain scenarios, and manual interventions create a chaotic storm of asynchronous events.

Managing this state machine correctly is the difference between keeping a database highly available and accidentally wiping out production. To handle this "complex state management" and "concurrent execution" safely, Haskell's language guarantees proved to be invaluable.

1. Pure Functions for State Transitions (Fearless Testing)

The Problem: In many languages, I/O (like polling MySQL) and state mutations get tangled together. This creates a breeding ground for race conditions and makes thorough testing nearly impossible without complex mocking.

The Haskell Solution: Haskell enforces strict separation between side effects (I/O) and business logic. We modeled the core state transitions as a completely pure function (a Reducer): $f: \text{State} \times \text{Event} \rightarrow \text{State}$.

Because this logic is detached from the network or database, we can write lightning-fast, exhaustive unit tests that cover thousands of edge cases deterministically.

2. Lock-Free Concurrency with STM

The Problem: Traditional concurrency models rely on mutexes and locks. This inevitably leads to deadlocks, forgotten unlocks, or Time-of-Check to Time-of-Use (TOCTOU) bugs—fatal flaws for an HA manager.

The Haskell Solution: Software Transactional Memory (STM). By using atomically blocks, we can treat the entire flow—popping an event from a queue, calculating the new state, and updating the state—as an indivisible transaction. If a conflict occurs, the Haskell runtime safely and automatically retries the transaction. Data corruption between threads is structurally impossible.

3. Strict Modeling with Algebraic Data Types (ADTs)

The Problem: Relying on boolean flags or string constants to track state leads to "impossible states" or loss of context (e.g., knowing a TopologyDrift occurred, but losing the specific reason why).

The Haskell Solution: ADTs allow us to model facts and events rigorously. Every state transition is explicit. Even better, the compiler enforces exhaustive pattern matching. If we introduce a new failure scenario or event type, the code literally will not compile until we have explicitly handled that event everywhere in the system. No unhandled exceptions at runtime.

4. Natural Backpressure and Async Side-Effects

The Problem: An HA manager often needs to run external hook scripts (like Slack alerts or DNS updates). If these scripts hang or events spike, it can block the main monitoring loop or cause memory exhaustion.

The Haskell Solution: Haskell makes robust concurrency patterns trivial. Using bounded queues (TBQueue) provides natural backpressure, preventing the system from being overwhelmed. Furthermore, executing side-effects asynchronously (fire-and-forget) while maintaining thread safety takes only a few lines of code using the async library.

The Unix Philosophy: Simple by Deliberate Omission

In an era where database tools tend to evolve into heavyweight, all-in-one platforms, PureMyHA takes a step back. It embraces the classic Unix philosophy: write programs that do one thing and do it well.

Here are the core principles that guided its architecture:

🔪 1. Do One Thing Well (and Delegate the Rest)

PureMyHA is a highly focused HA tool. It is not a topology manager, a query router, or a schema migration framework. It detects failures, promotes a replica safely, and gets out of the way.

Furthermore, it strictly delegates what it does not own. For example, PureMyHA does not implement its own distributed consensus or leader election for the daemon itself. Its own high availability is delegated entirely to tools like Pacemaker, which are already purpose-built for that exact problem.

🛡️ 2. Correctness Before Convenience

A failover that corrupts data is worse than no failover at all. Every decision made by PureMyHA is strictly GTID-aware. It actively detects and neutralizes errant GTIDs, waits for relay log application before promotion, and identifies split-brain scenarios before acting. Safety is never compromised for speed.

🚫 3. Simple by Deliberate Omission

We explicitly target MySQL 8.4+ and nothing else. There is no support for legacy syntax, older authentication plugins (mysql_native_password), or non-GTID topologies. Saying "no" to backward compatibility layers keeps the codebase remarkably small, auditable, and inherently correct.

📦 4. Pure, Stateless, and Dependency-Free

Pure Haskell: PureMyHA is built entirely on mysql-haskell, a pure-Haskell MySQL client. There is no libmysqlclient, no CGo, and no FFI. The result is a single, statically-linked binary that you can drop in and run anywhere.

Stateless by Design: The daemon itself holds no durable state. All topology knowledge is dynamically derived directly from MySQL on startup and continuously refreshed. If the daemon crashes, recovery is trivially safe—just restart it.

🔍 5. Transparent Operation

Infrastructure operators need to trust their tools. PureMyHA provides dry-run modes, config hot-reloads, and pause/resume controls, giving operators full visibility and control over the cluster without ever requiring a daemon restart.

Wrapping Up

The database ecosystem is inherently complex, and the stakes in production are incredibly high. However, by leveraging Haskell's robust concurrency guarantees and adhering strictly to the Unix philosophy—doing one thing and doing it flawlessly—I believe PureMyHA brings a breath of fresh air to modern MySQL 8.4 operations.

It proves that infrastructure tooling can be incredibly safe without being highly complex.

Try It Out! 🚀

If you are running MySQL 8.4 (or planning an upgrade soon) and feel that existing HA solutions are either overkill for your modest setup or too complex to audit, I highly encourage you to give PureMyHA a spin.

Grab the binary, spin up a few local Docker containers, and try pulling the plug on a source node. I think you'll appreciate how smoothly and predictably it handles the chaos.

Star, Fork, and Contribute 🌟

This project is completely open-source, and I would love to grow it with the community.

Drop a Star: If you found this article interesting or like the architectural approach, please consider giving the project a ⭐ Star on GitHub! It means the world to me and helps the project gain visibility.
Issues & PRs Welcome: I am actively looking for feedback from real-world operators. Did you find an edge case? Do you have an idea for a new webhook trigger? Please open an Issue! Pull Requests—whether for code, tests, or documentation—are more than welcome.

Let's build a rock-solid, minimalist HA ecosystem for MySQL 8.4 together. Thanks for reading!

DEV Community