DEV Community

abdu masah
abdu masah

Posted on

Arrowjet is now a Cross-Database Sync Tool in Python (PG, MySQL, Redshift)

I've been building Arrowjet, an open-source Python library for fast bulk data movement. It started as a Redshift speed tool, but it now supports PostgreSQL, MySQL, and cross-database transfers.

The latest addition: stateful sync that keeps tables in sync across databases.

The problem

Moving data between databases usually means writing custom scripts per source/destination pair. Add incremental logic, schema drift handling, retry on failure, and you're maintaining a mini-ETL framework.

What sync does

One function call:

import arrowjet
from arrowjet_pro import sync

pg = arrowjet.Engine(provider="postgresql")
my = arrowjet.Engine(provider="mysql")

result = sync(
    source_engine=pg, source_conn=pg_conn,
    dest_engine=my, dest_conn=mysql_conn,
    table="orders",
    key_column="updated_at",  # incremental
)
# Sync SUCCESS: 12,000 rows (incremental)
Enter fullscreen mode Exit fullscreen mode

It decides full vs incremental automatically based on previous state. Truncates destination on full sync. Validates row counts after. Retries with backoff on failure.

Schema-level sync

Sync an entire schema with filtering:

from arrowjet_pro import sync_schema

result = sync_schema(
    source_engine=pg, source_conn=pg_conn,
    dest_engine=my, dest_conn=mysql_conn,
    schema="public",
    exclude=["*_tmp", "*_backup"],
)
# Multi-table sync: ALL OK
#   Tables: 14/14 succeeded
#   Total rows: 2,340,000
Enter fullscreen mode Exit fullscreen mode

YAML config for repeatable jobs

source:
  profile: my-postgres
destination:
  profile: my-mysql
defaults:
  mode: auto
  key_column: updated_at
  retry: 2
tables:
  - orders
  - users
  - name: products
    dest_table: product_catalog
    mode: full
Enter fullscreen mode Exit fullscreen mode

CLI

arrowjet sync --table orders \
  --from-profile pg --to-profile mysql \
  --key-column updated_at

arrowjet sync --schema public \
  --from-profile pg --to-profile mysql \
  --exclude "*_tmp" --dry-run
Enter fullscreen mode Exit fullscreen mode

Under the hood

All transfers use the fast path for each database:

  • PostgreSQL: COPY protocol (850x faster than INSERT)
  • MySQL: LOAD DATA LOCAL INFILE (6.6x faster)
  • Redshift: COPY/UNLOAD via S3

Arrow is the in-memory bridge between databases. No intermediate files, no serialization overhead.

Future

  • pip install arrowjet - bulk read/write/transfer, CLI, 3 database providers
  • pip install arrowjet-pro - sync, drift detection, schema auto-fix, alerting, operation log

GitHub: https://github.com/arrowjet/arrowjet PyPI: https://pypi.org/project/arrowjet/0.6.0/

Top comments (0)