I love MongoDB for its flexibility, but I’ve always hated building analytics dashboards on top of it.
The problem is always the same: Nested JSON.
If you want to visualize your data in a standard BI tool, you usually have to write a script or a complex aggregation pipeline ($unwind, anyone?) to flatten the arrays and objects into a tabular format.
I got tired of maintaining those ETL scripts, so I built a tool to do it automatically.
The Project: NeoShiftBI
I’ve been building NeoShiftBI, an AI-powered analytics platform. My goal for the MongoDB connector was simple: Connect a cluster and get a flat table instantly.
Here is a quick 90-second demo of how the auto-flattening and Incremental Sync works:
How it works under the hood
I built a custom schema inference engine (inferMongoSchema) to handle the translation to BigQuery. Here is the logic:
Sampling & Recursion: The connector fetches a sample of up to 100 documents and recursively analyzes each field (analyzeDocument) to determine the most common data type.
Type Normalization: MongoDB-specific types often break SQL pipelines. We normalize them on the fly:
ObjectId → converted to string
ISODate wrappers → extracted as clean timestamp
The Flattening Strategy: Since BigQuery doesn't allow dots in column names, we flatten nested objects using an underscore separator.
Input: {"user": {"address": {"city": "NYC"}}}
Output Column: user_address_city
Incremental Sync (CDC): Once the schema is set, we use a tracking column (like updated_at) to only fetch new or changed documents, preventing full-table scans on your production DB.
Try the Beta (Feedback Wanted)
I’m currently in Public Beta and I’m looking for developers with complex/messy MongoDB collections to stress-test the flattening logic.
If you want to try it out, I’m upgrading all Dev.to users to the Basic Plan ($29/mo) for free during the beta.
Link: https://bi.neoshift.ai/#/register
Invite Code: BETA-790DA393
Let me know if the parser handles your schema correctly or if you manage to break it! 🐛
Top comments (0)