How We Built OpenETL: A Simple, Scalable Data Migration Tool for Everyone 🚀

#beginners #programming #python #dataengineering

Why Did We Do It?

A while ago, my friends and I were freelancing as data engineers, and every day we found ourselves migrating data for
clients. It was always the same—creating custom scripts for each project, using different tools, and repeating the
process. We thought, "Why not create a boilerplate code to reuse for all these migrations?" But we didn’t stop there. We
decided to make it more accessible by adding a UI and turning it into a full-fledged app!

You can find the code here: OpenETL Github.

That's how OpenETL was born. It’s an open-source ETL tool designed to simplify data migration with minimal setup.

We worked hard to maintain code quality while ensuring the tool was beginner-friendly. Our goal was to help
people—whether they’re just starting or are mid-level engineers—who have to deal with daily data migration tasks. The
design makes it easy to understand and use, so you can focus on your work instead of battling with complex
configurations.

How to Use OpenETL

Before starting, ensure you have the following:

Python Installed: OpenETL is a Python-based tool. Install Python 3.7 or later.
Access to HubSpot: You’ll need to generate an API key or private token.
PostgreSQL Database: Ensure you have a running PostgreSQL instance.
Docker: Install Docker to run OpenETL in a container.

Step 1: Configure Environment Settings

Rename and edit the .env file in the OpenETL directory to include your environment configuration:

OPENETL_DOCUMENT_HOST=localhost  # Replace with your host
OPENETL_DOCUMENT_DB=airflow  # Replace with your database name
OPENETL_DOCUMENT_SCHEMA=open_etl  # Replace with your schema
OPENETL_DOCUMENT_USER=MY_USER  # Replace with your username
OPENETL_DOCUMENT_PASS=1234  # Replace with your password
OPENETL_DOCUMENT_PORT=5432  # Replace with your port
OPENETL_DOCUMENT_ENGINE=PostgreSQL  # Use PostgreSQL (recommended)
OPENETL_HOME=/Users/usr/OpenETL  # Path to OpenETL repository
CELERY_BROKER_URL=redis://redis:6379/0  # Replace with your Redis URL

Install and Start OpenETL

Clone the OpenETL repository, install dependencies, and start the application using Docker Compose:

git clone https://github.com/RusabKhan/OpenETL
cd OpenETL
docker compose up --build -d backend && docker compose up --build -d

Step 1: Setting Up Connections in OpenETL

OpenETL makes it easy to configure both source and target connections via its user interface:

1. Source Connection:

– Navigate to the Create Connection screen.

– Select your choice of connector and provide the authentication details.

2. Target Connection:

– Navigate to the Create Connection screen.

– Select the target and enter your database credentials.

Step 2: Creating an ETL Pipeline

After configuring connections, you can set up the ETL pipeline in OpenETL:

Navigate to Create ETL from the sidebar.
Specify the source details, such as the table (Contacts) and type (API).
Enter the target details, including your connection.
Configure optional compute settings like Spark or Hadoop if needed, or skip to the next step.

3. Set ETL parameters:

Load Type: Choose full or incremental.
Batch Size: Define the number of records processed per batch.
Schedule: Specify the pipeline frequency (e.g., hourly, daily).
Click Create Integration to finalize your ETL pipeline.
Step 5: Monitoring the Integration

Once the pipeline is created, you can view its status and logs: