gjdickens

Posted on Mar 16

Choosing an ETL Tool in 2026

#etl #python #data #aws

As the founder of Dark Seahorses, a new agency that provides data engineering and analysis services to small businesses, I had the opportunity to survey the current ETL (extract, transform, load) landscape in 2026 and pick the best tool out there for us.

When picking any tool, it’s important to start with your vision for where the marketplace is heading and how you think you can add value to that. Only then can you can see clearly what tools fit with that vision. What it comes to ETL, I see a marketplace that is heading towards customization, and as an agency it is our job to try to deliver that in a way that scales for us and our clients.

Why ETL Will Become More and More Customized

Previously, a lot of data engineering was about piping data from one predictable source to another limited number of destination databases / data warehouses. This led to several companies succeeding in making connector platforms that provided these pipes in a low or no-code interface. This freed up developer resources while empowering non-technical analysts to have more control over their data pipeline.

Recently, however, two trends are disrupting the dynamics of this structure in the marketplace. The first of those is the rise of AI-assisted programming which is reducing the amount of time it takes for developers to create custom integrations. And the second of those is the proliferation of APIs and potential data sources to connect to.

What was once a problem that could have a very centralized solution, has now turned into something with a much longer tail. Every company has a slightly different set of digital platforms that they rely on. Many of these will be common across industries, while others not so much. Legacy centralized platforms naturally prioritized the most profitable types of data sources to integrate, usually focused on digital marketers, but that led to all the big platforms offering the same limited amount of data sources.

The demands of the marketplace require a lot more customization in order to cover all of the different data sources that are out there. But how do you do that at scale while keeping costs under control?

Delivering Customized ETL at Scale

A key tenant of our philosophy as an agency, is to get our clients onto open-source software that prevents them from being locked into un-affordable platforms while enabling the customization they need. We can scale this up, with an assist from AI, but we also have another trend that helps us scale down services when not required.

Serverless computing services (like AWS Lambda) have provided the infrastructure to pay for only the services that are being used at any one time, and are perfectly suited to ETL jobs that often run for only a few minutes every day or week. By taking advantage of serverless architectures, we can keep client costs down to the absolute minimum of what they actually need.

Key Decision Criteria Based on Our Vision

So with this vision in mind, we had the following key criteria for choosing the platform we will focus on for new client projects:

Criteria	Rationale
Open-source	Need full control and pricing that scales up
Code-first (as opposed to no-code)	Need the customization ability and increasingly can be accelerated with AI assistance
Lightweight, can run on serverless	Need the ability to scale up and down on demand

Proprietary No-Code Platforms

The first category is not one that we seriously considered, as it goes against two of our main criteria: open-source and code-first. But we did think it was important enough to understand the offering of the primary proprietary no-code platforms.

Fivetran

As far as no-code platforms go, Fivetran is considered the industry standard. They have over 300 pre-built connectors and a UI that seems simple enough to get started. But with no options for custom code and pricing that can add up quickly it just wasn’t the right fit for our vision.

To give you an idea of how the quickly the pricing can add up, for our first sample project we were looking at moving 15k rows of data which prices out a $12/month already for just a demo that we can run on AWS for pennies.

No-code
Pre-built connectors
Enterprise standard
No customization
Expensive

Stitch

We saw Stitch as a similar option to Fivetran but with the added con that it seems to be lagging behind in the marketplace in terms of connections. On the plus side, the pricing seems more reasonable than Fivetran’s but again it’s not a fit for us. For someone who can find the connector they’re looking for and just wants a simple no-code connection, this could be a good fit.

No-code
Pre-built connectors, but limited compared to Fivetran
Acquired, not much sign of new development
Better pricing than Fivetran but can get expensive at scale

Open-Source No-Code Platforms

The next category fits closer to our vision and we seriously thought about the trade-offs. Would it make sense for us to operate some connections through a no-code interface and then only escape out to code when necessary? Or maybe the orchestration capabilities of a fully baked platform would come in handy and save us some AWS tooling.

Airbyte

Airbyte is the name to know in this category and we had a good long look at it. In the end, we decided that it was a bit overbuilt for what we needed, and we didn’t want to give up the flexibility of being able to run on serverless. We had other ideas for orchestration (like using simple AWS cron jobs) so that aspect of the platform didn’t offer us much value. Also, debugging issues seems like it can be challenging based on some developer comments we saw. Getting closer, but heavier weight than we needed.

Largest selection of pre-built connectors
Can’t run serverless, needs a dedicated server
Data source configuration can be done with no-code
Customization possible but main focus is on pre-built connectors

Code-first Tools

Custom Python scripts

I’ve gone down this route before with prior projects and know that you can get a long way with Python and libraries like Pandas and Polars. But some things would be nice to have out of the box, like interfaces with AWS or Google BigQuery as well as logic for managing incremental loading and keeping track of load metrics. Ideally we wanted to stay as close to vanilla Python and these core libraries to avoid learning frameworks that might fall out of favor later, but some additional tools would be nice.

Fully customizable
Libraries like Pandas and Polars make it easy to work with tabular data
Better to have some help with schemas, state and incremental loading patterns

Meltano

Meltano comes really close to delivering everything we were looking for. It’s open-source, code-first and can be run on serverless. Its primary strength is also its primary weakness: it’s built on the open-source Singer specification which was developed by Stitch. This means there are lots of pre-built connectors, but the developer opinions are mixed as to how well these work. The specification is also quite complex and requires a lot of up-front work to get a new data source set up.

We thought about investing the time to learn the specification, but didn’t see much enthusiasm from the developer community going in this direction and saw it as too big of a risk that it would be throw-away work. A viable option, but we found something else that fit our needs just a bit better.

Python first
Open-source
Has pre-built connectors
Lots of up-front effort to understand Singer specification (risk of throw-away work)

dlt

In the end, we chose dlt, a relatively new library for Python that handles a lot of the ETL legwork, like incremental loading and schema inference, while still allowing for full customization of the actual data extraction. You can use as much Python as you want in writing your connections, with very little new syntax to learn.

For us, it worked out well because we weren’t looking for lots of pre-built connections or orchestration capability and really valued how lightweight and easy it was to run on serverless.

Available anywhere Python runs
Lightweight
Handles incremental loading and schema inference
Code your own connections and orchestration

Conclusion

This is the thought process that led us to choose dlt as the primary ETL tool that we are running client projects on. It meets all the criteria of our vision of customization that scales up and down, delivering control and affordability to our clients.

For sure, other businesses will have different needs that might make a proprietary platform a better fit or lead someone to value no-code interfaces or orchestration features more highly, but that’s why it’s important to have your vision and choose your tools based on that. For us and our vision of the marketplace, dlt is the right tool for ETL in 2026.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.