Homer

Posted on Apr 23

ElasticRelay: Reliably Streaming Multi‑Source Database Changes into Elasticsearch

#elasticsearch #opensource #mysql #mongodb

ElasticRelay is an open-source CDC gateway that streamlines real-time data synchronization from MySQL, PostgreSQL, and MongoDB into Elasticsearch. It provides a lightweight, reliable alternative to heavy streaming platforms by integrating data governance, batch writing, and failure recovery into a single, Go-based pipeline

In many teams, Elasticsearch is no longer “just a search engine.”
It has become a core piece of infrastructure for search, operational analytics, log correlation, real‑time dashboards, and accelerated business queries.

With that evolution comes a familiar challenge: upstream data is usually scattered across OLTP databases such as MySQL, PostgreSQL, and MongoDB. How do you reliably, continuously, and with low operational overhead synchronize those changes into Elasticsearch?

Traditional approaches can solve the problem, but often at a high cost. You may need to maintain complex data pipelines, handle the transition between initial full imports and incremental updates, manage checkpoints and retries, tune indexing performance, and deal with field cleanup, masking, filtering, and index routing. For teams that do not want to introduce a heavy streaming platform, these tasks can quickly consume project time and energy.

This is exactly the problem ElasticRelay is designed to solve.

What Is ElasticRelay?

From its codebase and repository structure, ElasticRelay’s positioning is very clear:
It is a multi‑source CDC (Change Data Capture) gateway purpose‑built for Elasticsearch, continuously synchronizing data changes from MySQL, PostgreSQL, and MongoDB into Elasticsearch.

ElasticRelay is also fully open-sourced and independently developed by Yogoo Software Co., Ltd.
For teams that care about open‑source ecosystems and long‑term sustainability, this matters: you can use it out of the box, or extend, integrate, and customize it based on real, production‑grade code.

Currently, ElasticRelay supports three common database types:

MySQL:binlog‑based change capture, with support for initial sync and parallel snapshots
PostgreSQL:logical replication / WAL‑based incremental capture with LSN management
MongoDB:Change Streams‑based real‑time subscriptions, compatible with replica sets and sharded clusters

In one sentence, ElasticRelay can be described as:
A CDC middleware layer designed specifically for Elasticsearch, making database‑to‑index synchronization simpler, more direct, and easier for backend teams to operate independently.

Why the Elastic Community Should Care
For Elasticsearch users, the hardest part is rarely “can I write data into ES?”
The real pain point is “can I write it continuously, reliably, and at low cost?”
ElasticRelay addresses this with several very pragmatic design choices.

1. Unified Multi‑Source Configuration Model

ElasticRelay consolidates multiple database sources into a single Go service.
Its MultiConfig model separates data_sources, sinks, jobs, and global settings, allowing teams to define—using a single configuration approach—which source writes to which Elasticsearch target and under which job.

2. A Write Path Designed Around Elasticsearch

The ES sink uses the official go-elasticsearch/v8 client and BulkIndexer for batched writes.
Index names are dynamically generated based on _table or _collection metadata in events, for example:

elasticrelay-users
elasticrelay-orders

This is especially friendly for teams that want to split indices by business entities.

3. Built‑In Data Governance

ElasticRelay embeds “governance” directly into the synchronization pipeline.
Its Transform Engine supports rule‑based matching by source and table/collection, enabling:

filtering
field mapping
type conversion
expression processing
data masking

As a result, much of the “pre‑index cleanup” work no longer needs to live in ad‑hoc scripts or external services.

4. Failure‑ and Recovery‑Oriented Design

When sink writes fail, ElasticRelay persists events into a durable **DLQ (Dead Letter Queue) **and ties them to checkpoints for recovery and retry. Failures are not silently dropped.

How the Internal Data Pipeline Works

ElasticRelay is not an abstract concept—it is a very concrete data pipeline.

Core Components

Connector: reads changes from databases
Orchestrator: manages job lifecycle and synchronization flow
Transform Engine: applies rule‑based data governance
ES Sink: handles batched writes into Elasticsearch
DLQ: persists failed events and manages retries and cleanup

Runtime Flow

A typical runtime flow looks like this:

Create a synchronization job defining source, sink, and job configuration
If required, run an initial snapshot to import existing data into Elasticsearch
Start CDC to continuously receive incremental changes
Events enter an asynchronous batching queue, decoupling database reads from sink write speed
Batched events pass through the Transform Engine for filtering, mapping, and masking
Processed events are streamed to the ES sink using bulk writes
On success, checkpoints are committed; on failure, events are written to the DLQ for retry

A key design choice here is the explicit decoupling of database change reading from Elasticsearch writes.
Binlog / WAL consumption is not directly blocked by temporary ES slowdowns, which is critical for system stability.

Features Especially Friendly to Elasticsearch Use Cases

1. Dynamic Index Naming
ElasticRelay extracts table or collection names from events and automatically generates target index names using a prefix.
This allows a single pipeline to naturally route different business entities into separate indices.
This is valuable for scenarios such as:

Writing orders, users, and products into separate indices
Applying different mappings and query strategies per entity
Aligning index lifecycle management with business domains

2. Batch Writes Instead of Per‑Document Writes

The ES sink relies on BulkIndexer, which aligns with Elasticsearch’s throughput model.
For continuous CDC streams, batch writes are a baseline requirement for production readiness.

3. Built‑In Data Governance

In real‑world projects, OLTP data is rarely indexed “as‑is.” Common needs include:

removing internal fields
unifying field names
handling nulls and type normalization
masking sensitive data
filtering out records that should not be indexed

ElasticRelay’s Transform Engine abstracts these into a rule system matched by source and table patterns, making it a synchronization layer tailored specifically for search index construction—not just database replication.

4. Failure‑Aware by Design

Real pipelines encounter failures: ES outages, index creation errors, network hiccups, rule evaluation exceptions.
ElasticRelay’s philosophy is not “ignore failures,” but to persist them in DLQ with error context, retry counts, and checkpoints.
For Elasticsearch users, this turns silent data loss into an observable, traceable, and recoverable engineering problem.

Who Is ElasticRelay For?

ElasticRelay is particularly well‑suited for teams that:

Have production databases and want to quickly sync data into Elasticsearch for search or analytics
Need to integrate MySQL, PostgreSQL, and MongoDB without maintaining separate solutions
Do not want to introduce heavy infrastructure like Kafka or Flink solely for CDC
Want field governance, masking, and filtering built into the pipeline
Prefer a dedicated service responsible for “database → Elasticsearch” synchronization

Its engineering style is also very clear: a single Go binary, gRPC for internal APIs, JSON‑driven configuration, and multi‑stage Docker builds.
This form factor is especially friendly for small to mid‑sized teams in terms of deployment and operational cost.

A Practical Way to Get Started

If this is your first time working with ElasticRelay, you do not need to understand all of its internal implementation up front. A more practical approach is to first run a minimal end-to-end pipeline: prepare the configuration, fill in the database and Elasticsearch connection details, start the service, and then check whether data is beginning to flow into your index.

Using standalone deployment as an example, the entire process can be reduced to just a few steps:

Create the config, logs, and dlq directories, and copy a sample configuration file as your starting point.
Fill in data_sources, sinks, and jobs in the configuration, linking MySQL, PostgreSQL, or MongoDB to the target Elasticsearch cluster.
Start the service with docker-compose -f docker-compose.elasticrelay.yml up -d.
Verify that both the initial sync and subsequent incremental sync are working by checking the logs, checkpoints, and Elasticsearch index status.

What makes the experience user-friendly is not just the small number of steps, but also the clarity of the configuration model itself. You do not need to set up a full messaging or stream-processing stack first. Instead, you can organize everything around three simple questions: where the data comes from, where it goes, and under what rules it is synchronized.

For many teams, this means they can build a working proof of concept in a very short time. Start with a single table or collection, sync its fields into a target index such as myapp-users, and once the pipeline is confirmed to be stable, gradually expand to more tables, more rules, and more complete data-governance logic. That “start simple, then refine” path is exactly what makes ElasticRelay easy to adopt.

Why Elastic Users Would Care About ElasticRelay

For many Elasticsearch projects, the real challenge is often not how to write queries or design indexes, but how to reliably and continuously synchronize data from upstream business databases.

This is exactly where ElasticRelay shows its value. It is not a general-purpose data platform; instead, it is a lighter, more focused solution built around typical Elasticsearch integration needs. It captures changes from multiple source databases, processes them through bulk writes, rule-based transformations, checkpoint management, and failure recovery, and ultimately forms a synchronization pipeline that can be put into practice.

That is also why it deserves attention from the Elastic community. For teams building search, analytics, or real-time query capabilities, ElasticRelay offers more than just the ability to "write data into ES" - it provides an engineering solution that is easier to deploy, easier to maintain, and better aligned with real-world business scenarios.

Conclusion

From an implementation perspective, ElasticRelay already shows a solid production‑oriented shape: multi‑source ingestion, initial sync, incremental CDC, batch writes, rule‑based transformation, DLQ, and checkpointing are all part of a single main pipeline.

For teams looking to reduce the complexity of Elasticsearch data ingestion, tools like this offer immediate, tangible value.

For Elastic Meetup discussions, ElasticRelay represents more than “yet another sync tool.”

It reflects a growing engineering preference:
Instead of building ever‑larger pipelines, build focused systems that make the database‑to‑Elasticsearch path deep, stable, and understandable.

Looking ahead, two promising directions would be richer observability and a more mature control plane with visual configuration. Even in its current state, however, ElasticRelay clearly demonstrates one idea:

Making Elasticsearch data ingestion simpler is itself a valuable form of innovation.

As a fully open‑source project developed by Shanghai Yogoo Software Co., Ltd., ElasticRelay welcomes more developers and users from the Elastic community to explore, discuss, and contribute.

Project: https://github.com/YogooSoft/elasticrelay
Discussions: https://github.com/YogooSoft/elasticrelay/discussions
Twitter: https://twitter.com/elasticrelay

DEV Community

ElasticRelay: Reliably Streaming Multi‑Source Database Changes into Elasticsearch

Top comments (0)