The origins of MongoDB

#architecture #backend #database #startup

The Internet Archive holds some real gems. Let’s trace the origins of MongoDB with links to its archived 2008 content. The earliest snapshot is of 10gen.com, the company that created MongoDB as the internal data layer subsystem of a larger platform before it became a standalone product.

MongoDB was first described by its founders as an object-oriented DBMS, offering an interface similar to an ORM but as the native database interface rather than a translation layer, making it faster, more powerful, and easier to set up. The terminology later shifted to document-oriented database, which better reflects a key architectural point: object databases store objects together with their behavior (methods, class definitions, executable code), while document databases store only the data — the structure and values describing an entity. In MongoDB, this data is represented in JSON (because it is easier to read than XML), or more precisely BSON (Binary JSON), which extends JSON with types such as dates, binary data, and more precise numeric values.

Like object-oriented databases, MongoDB stores an entity's data — or, in DDD terms, an aggregate of related entities and values — as a single, hierarchical structure with nested objects, arrays, and relationships, instead of decomposing it into rows across multiple normalized tables, as relational databases do.
Like relational databases, MongoDB keeps data and code separate, a core principle of database theory. The database stores only data. Behavior and logic live in the application, where they can be version-controlled, tested, and deployed independently.

MongoDB's goal was to combine the speed and scalability of key-value stores with the rich functionality of relational databases, while simplifying coding significantly using BSON (binary JSON) to map modern object-oriented languages without a complicated ORM layer.

Object-oriented, object-relational, and XML databases never gained broad adoption, but the JSON document model did: MongoDB became the most popular NoSQL database, and all major SQL databases now offer JSON support and a document API.

An early 10gen white paper, A Brief Introduction to MongoDB, framed MongoDB's creation within a broader database evolution — from three decades of relational dominance, through the rise of OLAP for analytics, to the need for a similar shift in operational workloads. The paper identified three converging forces: big data with high operation rates, agile development demanding continuous deployment and short release cycles, and cloud computing on commodity hardware. Today, releasing every week or even every day is common, whereas in the relational world, a schema migration every month is often treated as an anomaly in the development process.

The same paper explains that horizontal scalability is central to the architecture, using sharding and replica sets to be cloud-native — unlike relational databases, where replication was added later by reusing crash and media recovery techniques to send write-ahead logs over the network.

Before MongoDB, founders Dwight Merriman and Eliot Horowitz had already built large-scale systems. Dwight co-founded DoubleClick, an internet advertising platform that handled hundreds of thousands of ad requests per second and was later acquired by Google, where it still underpins much of online advertising. Eliot, at ShopWiki, shared Dwight's frustration with the state of databases. Whether they used Oracle, MySQL, or Berkeley DB, nothing fit their needs, forcing them to rely on workarounds like ORMs, caches that could serve stale data, and application-level sharding.

Dwight Merriman explained this frustration in Databases and the Cloud.

In 2007, architects widely accepted duct-tape solutions and workarounds for SQL databases:

Caching layers in front of databases, with no immediate consistency. Degraded consistency guarantees were treated as normal because SQL databases were saturated by the calls from the new object-oriented applications.
Hand-coded, fragile, application-specific sharding. Each team reinvented distributed data management from scratch, inheriting bugs, edge cases, and heavy maintenance.
Stored procedures to reduce the multi-statement transactions to a single call to the database. Writes went through stored procedures while reads hit the database directly, pushing critical business logic into the database, outside version control, and forcing developers to work in three languages: the application language, SQL, and the stored procedure language.
Query construction via string concatenation, effectively embedding custom code generators in applications to build SQL dynamically. Although the SQL standard defined embedded SQL, precompilers were available only for non–object-oriented languages.
Vertical scaling: when you needed more capacity, you bought a bigger server. Teams had to plan scale and costs upfront, ran into a hard ceiling where only parallelism could help, and paid a premium for large enterprise machines compared with commodity hardware. Meanwhile, startups were moving to EC2 and cloud computing. A database that scaled only vertically was fundamentally at odds with the cloud-native future they saw coming.

Beyond infrastructure workarounds, there was a deeper disconnect with how software was being built. By 2008, agile development dominated. Teams iterated quickly — at Facebook, releases went out daily, and broken changes were simply rolled back. Relational databases, however, remained in a waterfall world. Schema migrations meant downtime, and rollbacks were risky. The database had become the primary obstacle to the agile experience teams wanted.

Scaling horizontally was the other key challenge. Many NoSQL databases solved it by sharply reducing functionality—sometimes to little more than primary-key get/put—making distribution trivial. MongoDB instead asked: what is the minimum we must drop to scale out? It kept much more of the relational model: ad hoc queries, secondary indexes, aggregation, and sorting. It dropped only what it couldn’t yet support at large distributed scale: joins across thousands of servers and full multi-document transactions. Transactions weren’t removed but were limited to a single document, which could be rich enough to represent the business transaction that might otherwise be hundreds of rows across several relational tables. Later, distributed joins and multi-document ACID transactions were added via the lookup aggregation stage and multi-document transactions.

Many people think MongoDB has no schema, but "schemaless" is misleading. MongoDB uses a dynamic, or implicit, schema. When you start a new MongoDB project, you still design a schema—you just don’t define it upfront in the database dictionary. And it has schema validation, relationships and consistency, all within the document boundaries owned by the application service.

It's interesting to look at the history and see what remains true or has changed. SQL databases have evolved and allow more agility, with some online DDL and JSON datatypes. As LLMs become fluent at generating and understanding code, working with multiple languages may matter less. The deeper problem is when business logic sits outside main version control and test pipelines, and is spread across different execution environments.

Cloud-native infrastructure is even more important today, as the application infrastructure must not only be cost-efficient on commodity hardware but also resilient to the new failure modes that arise in those environments. Agile development methods are arguably even more relevant with AI-generated applications. Rather than building one central database with all referential integrity enforced synchronously, teams increasingly need small, independent bounded contexts that define their own consistency and transaction boundaries — decoupled from other microservices to reduce the blast radius of failures and changes.

Finally, the video from the What Is MongoDB page from 2011 summarizes all that:

Over the past two decades, MongoDB—like many other databases—has evolved dramatically. It began with a strong focus on developer experience and the idea that data consistency should be handled in the application layer, where data lifecycle starts, before the database. This marked a shift as developers took architectural decisions away from DBAs. Today, we face a new inflection point, with AI agents increasingly making these choices. To navigate this era, where application builders are agents, it’s useful to revisit past trade-offs, learning from both mistakes and innovations.

DEV Community

The origins of MongoDB

Top comments (0)