DEV Community

Cover image for How We Built OpenETL: A Simple, Scalable Data Migration Tool for Everyone 🚀
Rusab Naeem Khan
Rusab Naeem Khan

Posted on

How We Built OpenETL: A Simple, Scalable Data Migration Tool for Everyone 🚀

Why Did We Do It?

A while ago, my friends and I were freelancing as data engineers, and every day we found ourselves migrating data for
clients. It was always the same—creating custom scripts for each project, using different tools, and repeating the
process. We thought, "Why not create a boilerplate code to reuse for all these migrations?" But we didn’t stop there. We
decided to make it more accessible by adding a UI and turning it into a full-fledged app!

You can find the code here: OpenETL Github.

That's how OpenETL was born. It’s an open-source ETL tool designed to simplify data migration with minimal setup.

We worked hard to maintain code quality while ensuring the tool was beginner-friendly. Our goal was to help
people—whether they’re just starting or are mid-level engineers—who have to deal with daily data migration tasks. The
design makes it easy to understand and use, so you can focus on your work instead of battling with complex
configurations.


How to Use OpenETL

Before starting, ensure you have the following:

  1. Python Installed: OpenETL is a Python-based tool. Install Python 3.7 or later.

  2. Access to HubSpot: You’ll need to generate an API key or private token.

  3. PostgreSQL Database: Ensure you have a running PostgreSQL instance.

  4. Docker: Install Docker to run OpenETL in a container.

Step 1: Configure Environment Settings

Rename and edit the .env file in the OpenETL directory to include your environment configuration:

OPENETL_DOCUMENT_HOST=localhost  # Replace with your host
OPENETL_DOCUMENT_DB=airflow  # Replace with your database name
OPENETL_DOCUMENT_SCHEMA=open_etl  # Replace with your schema
OPENETL_DOCUMENT_USER=MY_USER  # Replace with your username
OPENETL_DOCUMENT_PASS=1234  # Replace with your password
OPENETL_DOCUMENT_PORT=5432  # Replace with your port
OPENETL_DOCUMENT_ENGINE=PostgreSQL  # Use PostgreSQL (recommended)
OPENETL_HOME=/Users/usr/OpenETL  # Path to OpenETL repository
CELERY_BROKER_URL=redis://redis:6379/0  # Replace with your Redis URL
Enter fullscreen mode Exit fullscreen mode

Install and Start OpenETL

Clone the OpenETL repository, install dependencies, and start the application using Docker Compose:

git clone https://github.com/RusabKhan/OpenETL
cd OpenETL
docker compose up --build -d backend && docker compose up --build -d
Enter fullscreen mode Exit fullscreen mode

Step 1: Setting Up Connections in OpenETL

create connection

OpenETL makes it easy to configure both source and target connections via its user interface:

1. Source Connection:

– Navigate to the Create Connection screen.

– Select your choice of connector and provide the authentication details.

2. Target Connection:

– Navigate to the Create Connection screen.

– Select the target and enter your database credentials.

Step 2: Creating an ETL Pipeline

create integration

After configuring connections, you can set up the ETL pipeline in OpenETL:

  1. Navigate to Create ETL from the sidebar.

  2. Specify the source details, such as the table (Contacts) and type (API).

  3. Enter the target details, including your connection.

  4. Configure optional compute settings like Spark or Hadoop if needed, or skip to the next step.

3. Set ETL parameters:

  1. Load Type: Choose full or incremental.

  2. Batch Size: Define the number of records processed per batch.

  3. Schedule: Specify the pipeline frequency (e.g., hourly, daily).

  4. Click Create Integration to finalize your ETL pipeline.
    Step 5: Monitoring the Integration

Once the pipeline is created, you can view its status and logs:

logging

1. Integration Screen:

– Navigate to the Integrations page.

– Click on the integration ID to view its history and execution details.

2. API Logs:

– Navigate to the Logs screen.

– Click on the integration ID to check logs for troubleshooting or debugging.

3. Dashboard:

dashboard

– Navigate to the Dashboard screen.

– The Dashboard provides a visual representation of the pipeline’s progress.

1. Debugging Issues: Use the API logs in OpenETL to identify and resolve errors in the pipeline.

2. Data Transformation: OpenETL allows for built-in transformations like mapping, normalization, and null-value
handling.

3. Scheduling Pipelines: Leverage OpenETL’s scheduling features or external workflow tools like Apache Airflow for
automation.

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay