I've been building Arrowjet, an open-source Python library for fast bulk data movement. It started as a Redshift speed tool, but it now supports PostgreSQL, MySQL, and cross-database transfers.
The latest addition: stateful sync that keeps tables in sync across databases.
The problem
Moving data between databases usually means writing custom scripts per source/destination pair. Add incremental logic, schema drift handling, retry on failure, and you're maintaining a mini-ETL framework.
What sync does
One function call:
import arrowjet
from arrowjet_pro import sync
pg = arrowjet.Engine(provider="postgresql")
my = arrowjet.Engine(provider="mysql")
result = sync(
source_engine=pg, source_conn=pg_conn,
dest_engine=my, dest_conn=mysql_conn,
table="orders",
key_column="updated_at", # incremental
)
# Sync SUCCESS: 12,000 rows (incremental)
It decides full vs incremental automatically based on previous state. Truncates destination on full sync. Validates row counts after. Retries with backoff on failure.
Schema-level sync
Sync an entire schema with filtering:
from arrowjet_pro import sync_schema
result = sync_schema(
source_engine=pg, source_conn=pg_conn,
dest_engine=my, dest_conn=mysql_conn,
schema="public",
exclude=["*_tmp", "*_backup"],
)
# Multi-table sync: ALL OK
# Tables: 14/14 succeeded
# Total rows: 2,340,000
YAML config for repeatable jobs
source:
profile: my-postgres
destination:
profile: my-mysql
defaults:
mode: auto
key_column: updated_at
retry: 2
tables:
- orders
- users
- name: products
dest_table: product_catalog
mode: full
CLI
arrowjet sync --table orders \
--from-profile pg --to-profile mysql \
--key-column updated_at
arrowjet sync --schema public \
--from-profile pg --to-profile mysql \
--exclude "*_tmp" --dry-run
Under the hood
All transfers use the fast path for each database:
- PostgreSQL: COPY protocol (850x faster than INSERT)
- MySQL: LOAD DATA LOCAL INFILE (6.6x faster)
- Redshift: COPY/UNLOAD via S3
Arrow is the in-memory bridge between databases. No intermediate files, no serialization overhead.
Future
- pip install arrowjet - bulk read/write/transfer, CLI, 3 database providers
- pip install arrowjet-pro - sync, drift detection, schema auto-fix, alerting, operation log
GitHub: https://github.com/arrowjet/arrowjet PyPI: https://pypi.org/project/arrowjet/0.6.0/
Top comments (0)