How I automated MongoDB JSON Flattening for Analytics (No ETL)

#mongodb #dataengineering #showdev #database

I love MongoDB for its flexibility, but I’ve always hated building analytics dashboards on top of it.

The problem is always the same: Nested JSON.

If you want to visualize your data in a standard BI tool, you usually have to write a script or a complex aggregation pipeline ($unwind, anyone?) to flatten the arrays and objects into a tabular format.

I got tired of maintaining those ETL scripts, so I built a tool to do it automatically.

The Project: NeoShiftBI

I’ve been building NeoShiftBI, an AI-powered analytics platform. My goal for the MongoDB connector was simple: Connect a cluster and get a flat table instantly.

Here is a quick 90-second demo of how the auto-flattening and Incremental Sync works:

How it works under the hood

I built a custom schema inference engine (inferMongoSchema) to handle the translation to BigQuery. Here is the logic:

Sampling & Recursion: The connector fetches a sample of up to 100 documents and recursively analyzes each field (analyzeDocument) to determine the most common data type.

Type Normalization: MongoDB-specific types often break SQL pipelines. We normalize them on the fly:

ObjectId → converted to string

ISODate wrappers → extracted as clean timestamp

The Flattening Strategy: Since BigQuery doesn't allow dots in column names, we flatten nested objects using an underscore separator.

Input: {"user": {"address": {"city": "NYC"}}}

Output Column: user_address_city

Incremental Sync (CDC): Once the schema is set, we use a tracking column (like updated_at) to only fetch new or changed documents, preventing full-table scans on your production DB.

Try the Beta (Feedback Wanted)

I’m currently in Public Beta and I’m looking for developers with complex/messy MongoDB collections to stress-test the flattening logic.

If you want to try it out, I’m upgrading all Dev.to users to the Basic Plan ($29/mo) for free during the beta.

Link: https://bi.neoshift.ai/#/register

Invite Code: BETA-790DA393

Let me know if the parser handles your schema correctly or if you manage to break it! 🐛