DEV Community

adrian
adrian

Posted on

3

Transfer SQL-> analytics 30x faster with ConnectorX + arrow + dlt

ConnectorX + Arrow + dlt

dlt is a recently released python library for data extraction and loading, the EL in ETL. At dltHub we are big fans of optimising things and integrating those optimisations into our toolkit to enable others to re-use them.

Speed boosts and schema from arrow, dlt for loading with schema evolution

In this example, we combine ConnectorX + Arrow + dlt to extract data and load it to a strongly typed environment 30x faster than classic data transfer via sqlalchemy.

Result: Much faster, but mind the memory usage

In this example we can see 30x overall speedup on extraction and normalisation with Arrow The process took 16 seconds with arrow vs 8 minutes with sqlalchemy + dlt's JSON normaliser for 10m rows.

The output in both of methods is the same (parquet files or loaded data) with schema evolution. However, in the case of arrow, we are not iterating row by row, so we cannot perform optimisations we can while streaming from sqlalchemy, such as microbatching to keep memory use low.

Read more about it + implementation docs on our blog here

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay