DEV Community

teaganga
teaganga

Posted on

Popular tools used in ETL/ELT workflows

I compiled this categorized list using chatgpt, of popular tools used in ETL/ELT workflows


🚀 Orchestration & Workflow Management

Tools that schedule, coordinate, and monitor data pipelines.

  • Apache Airflow
  • Prefect
  • Dagster
  • Luigi
  • Azkaban
  • Apache Oozie
  • Argo Workflows (Kubernetes-native)

🏗️ Transformation Frameworks (ELT)

Tools that transform data inside warehouses (dbt-like).

  • dbt (Data Build Tool)
  • Dataform (now part of Google Cloud)
  • SQLMesh
  • Transform (metrics layer)
  • Coalesce
  • Meltano (plugin-based pipelines including dbt)

🔄 ETL/ELT Platforms (Full Stack)

All-in-one platforms for extraction, loading, and some transformation.

  • Talend
  • Informatica PowerCenter / Informatica Cloud
  • Fivetran
  • Stitch
  • Matillion
  • Airbyte
  • Hevo Data
  • Pentaho Data Integration (Kettle)
  • SSIS (SQL Server Integration Services)
  • AWS Glue

🧰 Streaming & Real-Time ETL

For event-driven or continuous data processing.

  • Apache Kafka
  • Kafka Connect
  • Apache Flink
  • Apache Spark Streaming
  • ksqlDB
  • Redpanda

💾 Data Processing Engines

Heavy-duty compute frameworks for batch or streaming.

  • Apache Spark
  • Apache Beam
  • Databricks
  • Google Dataflow

🏢 Cloud-Native ETL Services

Managed ETL options provided by cloud platforms.

  • AWS Glue
  • AWS Data Pipeline
  • Azure Data Factory
  • Google Cloud Dataflow
  • Google Cloud Data Fusion
  • Snowflake Tasks & Streams

📊 Data Integration + iPaaS (Low-code)

Often business-friendly and UI-driven.

  • Zapier (simple workflows)
  • Make.com
  • Workato
  • Boomi
  • Tray.io

Sure — here’s an expanded and more exhaustive list of ETL/ELT tools, including many lesser-known, specialized, and enterprise options. I’ll break them down by category for clarity.


🔥 More ETL / ELT Tools (Extended List)

🛠️ Orchestration & Workflow Engines (More)

Beyond Airflow, Prefect, Dagster, Luigi:

  • Airlift
  • Metaflow (Netflix)
  • Kedro (QuantumBlack)
  • Flyte (Lyft)
  • Kubeflow Pipelines
  • Apache NiFi
  • Control-M (BMC)
  • Tecton (feature pipelines)
  • Google Cloud Composer (managed Airflow)
  • Azure Data Factory Pipelines
  • Astronomer (managed Airflow)

🧱 Transformation-Only Tools (More ELT / SQL Modeling)

Beyond dbt, Dataform, SQLMesh:

  • Cube (metrics layer, modeling)
  • MetricFlow by Transform / dbt Semantic Layer
  • MindsDB (AI transforms)
  • LookML (Looker) — transformation/model layer on BI side
  • Y42 (full stack, but dbt-like transformations)
  • Narrator.ai (modeling methodology)
  • PipeRider (data testing and profiling)
  • Datafold (data quality + diffs for SQL)

🔄 Extraction & Loading Tools (More Connectors)

In addition to Fivetran, Stitch, Airbyte, Hevo:

  • Singer (Tap/Target framework)
  • RudderStack (CDP + pipelines)
  • Segment (customer data pipelines)
  • Blendo
  • Etleap
  • Xplenty
  • CloverDX
  • FlyData
  • Matillion (also supports transformations)
  • Portable.io (long-tail connector provider)
  • Grouparoo (open-source reverse ETL)
  • Hightouch (reverse ETL)
  • Census (reverse ETL)

🧬 Reverse ETL (More)

Sending data back to SaaS tools:

  • Polytomic
  • OmniAnalytics reverse ETL
  • Workato (enterprise integration)

Real-Time & Streaming ETL (More)

Beyond Kafka, Flink, Spark Streaming:

  • Apache Pulsar
  • Apache Storm
  • Materialize (streaming SQL)
  • Rockset
  • Confluent Cloud (managed Kafka ecosystem)
  • StreamSets
  • Decodable
  • Quix
  • Estuary Flow

💾 Processing Engines & Compute Frameworks (More)

In addition to Spark, Beam, Dataflow:

  • Snowpark (Snowflake’s compute engine)
  • Presto / Trino
  • Dask
  • Ray
  • Hive
  • ClickHouse pipelines
  • Delta Live Tables (Databricks)
  • EMR (AWS for Spark/Hadoop)

🌥️ Cloud ETL / Integration Tools (More)

Cloud-specific ETL services:

AWS

  • AWS Glue Studio
  • AWS Lambda pipelines
  • AWS Step Functions
  • Amazon AppFlow

Azure

  • Azure Databricks
  • Synapse Pipelines
  • Logic Apps

Google Cloud

  • BigQuery Data Transfer Service (DTS)
  • GKE + Argo Workflows
  • Workflows (GCP native orchestration)

🤖 ML-Focused Pipelines

ETL for feature engineering, ML data prep:

  • Feast (feature store)
  • Tecton (enterprise feature store)
  • ZenML
  • MLflow Pipelines
  • Hopsworks

🧩 Enterprise ETL Platforms (More)

Besides Talend, Informatica, SSIS:

  • IBM DataStage
  • SAS Data Integration Studio
  • SAP Data Services
  • Oracle Data Integrator (ODI)
  • Ab Initio
  • Alteryx
  • Qlik Replicate (formerly Attunity)
  • SnapLogic
  • Informatica MDM
  • Collibra Data Quality (DQ Pipelines)

Top comments (0)