I recently had to migrate a large MongoDB collection to PostgreSQL for a client project. Instead of writing a complex ETL pipeline, I used a simple Python script with pandas. Here's the core logic:
python
import pandas as pd
from pymongo import MongoClient
import psycopg2
Extract from MongoDB
mongo_client = MongoClient('mongodb://localhost:27017/')
db = mongo_client['mydb']
collection = db['mycollection']
data = list(collection.find({}, {'_id': 0}))
df = pd.DataFrame(data)
Transform data
df['created_at'] = pd.to_datetime(df['created_at'])
df['price'] = df['price'].astype(float)
Load to PostgreSQL
conn = psycopg2.connect(
host='localhost',
database='mydb',
user='user',
password='password'
)
df.to_sql('mytable', conn, if_exists='replace', index=False)
conn.close()
print(f'Migrated {len(df)} records successfully!')
For larger datasets, I've been using a tool called DataBridge that handles streaming and schema mapping automatically. What's your go-to method for database migrations?
Top comments (1)
For small and medium datasets, a simple Python-based approach is often all you need.
What I’ve found interesting is that the bottleneck usually isn’t extraction or loading. It’s understanding the assumptions hidden inside the source data.
MongoDB gives teams a lot of flexibility. PostgreSQL tends to force explicit structure. The migration becomes less about moving records and more about deciding which flexibility should survive the transition.
That’s usually where the most valuable architectural decisions are made.