DEV Community

Cover image for Mastering MongoDB Associate Data Modeler Exam: The Ultimate Guide
Jonatan Kruszewski
Jonatan Kruszewski

Posted on • Updated on

Mastering MongoDB Associate Data Modeler Exam: The Ultimate Guide

Introduction

If you are willing to take the MongoDB Data Modeling Exam and are looking for someone who can explain to you from A to Z how it was, how to prepare, and strategies for the exam, you got to the right place. I published this article because when I went through the exam there wasn’t any resource available from someone who already did it. So here we are.

Why I took the MongoDB Data Modeling Exam

The truth is that I am not one of those developers with years after years of experience under their belt. I started developing about five years ago, and even though I got to a point where I am comfortable with what I know, a) It is never enough, and b) I get bored quickly.

Also, since experience can’t be gained faster than the pace of the clock, I appeal to certifications to prove myself better. At this point, I might have about 30.

So, here we are me, my 30 certifications, my boredom, and the fact that I saw that MongoDB had a new one… The challenge was screaming my name. Also, I already had the C100DEV certification when Mongo was sitting in its 4.4 version — time flies by— so the foundations were there.

Exam Overview

The exam is about designing data structures and applying design patterns to them. What does this mean? Given data requirements, query needs, and other parameters, which pattern will fit better in the use case?

For example, you are working on an application that stores Posts. Each post can hold several Users’ Comments. So, we have three entities: Posts, Users, and Comments.

How should you model that relationship?

If you come from the SQL world, the first answer that might come to your mind is: “Easy! Create a post document that can hold an array of the comments’ IDs. Store the comments in a different collection.”

In MongoDB culture, that is called to reference, which is the opposite of to embed. For this use case, that may not be the most adequate solution. While the solution takes care of the unbounded growth of the comments, the reads will be relatively slow since they will need to query two collections. Comments are often queried for reads, not for writes.

What MongoDB recommends is actually to live and breathe for the moto:

Data that is accessed together should be stored together.

So, going back to our example, as long as you know that the comments can’t exceed the maximum file size of 16MB per document, you should store them as an array of documents inside the Post entity. That is called the Embedded Document Pattern. It is helpful for read-heavy queries, fast data retrieval, and simplicity. On the other hand, it may not scale very well, and writing could be slow if there is a peak of concurrent writes.

I know what you are thinking right now. Most posts will be okay with that for a while, but some will have an incredible number of comments that may outgrow the maximum file size.

In that case, you can mix it up by embedding the last 10–20 comments and moving the rest to a separate collection. You will pay some latency price on every write, but reads will be simplified. This is called the Subset pattern.

As you can see, there is no “one size fits all” solution: the answer always is “It depends,” you will need to know the whole picture or pay attention to the small details in the questions to conclude the correct answer.

Getting a discount for the exam

As of April 2024, the good news is that if you finish their Learning Path, you get a 50% discount on the price of the exam, resulting in a total of $75 instead of $150. If you are a student or educator, that discount gets to 100% of the price.

There is no reason not to go over the content: you can get to “finish” the learning path by skipping all the labs — they are optional; failing all the quizzes — you need to complete them, not pass them; and waiting 5 seconds on each video until it marks it as “visited.” With that technique, you can get the discount in 20ish minutes of skipping content without consuming it. That is up to you.

Their learning dashboard was renewed since I last signed in in 2021. It looks nice, but the UI for the learning path is cumbersome.

The Exam Format

Once you buy your exam (link here), you can schedule it in Examity, the platform service that provides the exam. The booking hours were broad, with slots usually available the next day. You must upload some ID, undergo computer checks for hardware, and fill in some details.

I recommend connecting to the platform 15 minutes before your scheduled hour on exam day because all the setups and ceremonies with your proctor take time.

If you don’t know what a “proctored exam” means, a person will constantly look at you through your webcam for the exam. You will usually need to clear the desk where you will take the exam: no extra monitors, no papers on top, and nothing else than your laptop or computer. You will need to take your laptop for a walk inside the room to show that no one is hiding in it. For me, that is as cringe as it gets, but rules are rules.

The official length of the exam is 1 hour and 45 minutes, but it can be done in less time, even for non-native English speakers. It has 70 multiple-choice questions. You can go to the bathroom and take a break, but when you return, you must show again with your webcam that no one is in the room or hiding under the desk 😂.

Only 60 of those questions count toward your score; 10 are for testing new variants, but you won’t know which ones count. You must clear at least 70% of them, meaning you should correctly answer 49 out of 70 to pass (technically, you can fail up to 28 if 10 of those failures are from the unscored, but let’s keep the gamble low)

For the multiple-choice questions, most would be “choose the right answer,” but some instances would require marking two out of four or five elements.

Timewise, the timeframe is generous: 105 minutes for 70 questions means you will have about 70 minutes to answer calmly and some 35 to double-check your answers. This is more than enough to go over all your marked questions for review.

As a reference, I averaged about 1 minute per question: I read each question meticulously, thought a bit about each, and double-checked my answers before moving to the next. I didn’t rush it, quite the opposite. I stopped to think when I needed to, but I didn’t spend too much time — if I were troubled, I would pick the best option, mark it, and revise it later. From the 35 minutes remaining, I only used about 15 to modify the bookmarked questions, finishing my exam 20 minutes earlier than what was stipulated.

How to NOT prepare for the exam

As mentioned, MongoDB has crafted content called MongoDB University. This was not the first time I went through their lectures, quizzes, and labs; I did that some time ago for the Developer exam.

While the developer exam was a resounding — dive, the path provided for this certification was too superficial and naive. You can check it out here. It is a 9-hour path, divided into 7 slow-paced parts, free-to-take course. None of the content provided will help you pass the exam; it is just enough for you to get the broader picture and understand the foundational concepts. It includes some quizzes after each chapter to check your knowledge, and while it is nice to have, the questions don’t get even close to the real deal. The labs are more challenging, but you won’t get those tasks in the exam. The videos can be sped up to 1,50–1.75x easily.

There is no chance that someone will clear the exam by only finishing this. This can be enough to clear the “easy” questions, but about half of the exam requires more in-depth knowledge. Don’t worry; I will point you to the right resources so you can clear it on the first try.

Additionally, MongoDB provides a practice test with 25 questions. These questions are closer to the exam's difficulty level but intentionally less challenging, so achieving a high score doesn’t necessarily reflect what you’ll encounter on the exam. A good resource, but not enough.

Let’s go over the Study Guide.

First, if you have some general knowledge of MongoDB and common sense and have finished your Learning Path, you can cover the exam's first 30–35 questions. That's not bad.

To cover the second half of the exam, I recommend you to go over the official study guide subjects and their descriptions:

  • Requirements Gathering (10%)

  • Entities (13%)

  • Relationships (8.5%)

  • Workload/Usage (10%)

  • Data model Design (28%)

  • Modeling for technical requirements (10%)

  • Indexing (13%)

  • Monitoring and evolving Data models (7.5%)

Let’s go over one by one of the subjects you should know.

  • Requirements gathering: Make sure you read the questions correctly and notice small clues that might change the answer. Look for details on the write speed: If it says something like “fast writes are required,” you will know to cross out some patterns. Also, sometimes, they specify that they want to model for “scalability,” meaning that embedding will probably not be the right option. Pay attention to those clues. It also appeared a lot regarding questions with analytics: sometimes they will say that the analytic needs to be computed on the fly, and sometimes they will suggest that it can be pre-computed and stored in a separate collection, for example: “the data modeler requires to review the analytics once a month.” Be sharp on all the data models to recognize these clues quickly.

  • Entities: You will need to know what weak and strong entities are and the relationships between them. This section is a walk in the park.

  • Relationships: You should know the basic one-to-one, one-to-many, and many-to-many. Something that surprised me was that I had questions about one-to-few. This article covers this topic very well. I didn’t encounter questions about one-to-squillions in my exam. Usually, these were among the easy questions, too.

  • Workload/Usage: This section relates to the first one, where understanding correctly the needs of the situation described in the question plays a crucial role in choosing the correct answer. Something that you should know is how to identify the workload. Ensure you understand the whole chapter of query optimization, especially how to read the results of an explain command, the winning plan explained in the executionStats, and the interpretation of the results. Some questions were addressed with “on some query…. there are X amount totalKeysExamined and Y amount of totalDocsExamined while nReturned was Z”. This subject also relates to indexes because sometimes changing or adding an index improves these results. This section is also about historical data. When you have a “historical data” requirement, a TTL index will probably be involved, and a separate collection will be needed to store that data.

  • Data model design: This is the essential unit in the exam. Go over all the 12 patterns and ensure you know them by heart. Some questions will be formulated in a somewhat unclear situation where a bucket or pre-computed pattern might seem like the best fit. Again, pay attention to the small details of the questions. Ensure you understand when to embed, reference, and choose a proper model based on scalability or efficiency [on read or write]. One thing that you need to have in mind in this section is JSON Schema validation: how to create a schema, how to make elements unique, and how to lock elements to a list (enum). Go over the whole section; while creating a schema is a simple task, some questions were addressed to details in the validationLevel or validationAction fields. Ensure you know what happens when you try inserting an invalid document with validationLevel set to strict and moderate. Same thing for validationAction.

  • Modeling for technical requirements: This unit relates tightly with requirements Gathering and Data Model design. They could be grouped: combined, they totaled 53% of the questions. In this unit, you should focus on each model's limitations, disadvantages, and trade-offs. Ensure you know when not to use a model: sometimes, crossing out nonviable options will lead you to the correct answer rather than striking the right one.

  • Indexing: Indexing was a subject that appeared frequently in my exam. You should know how indexes work, how to optimize them, and all their perks and whistles: compound indexes, multi-key, partial indexes, sparse indexes, clustered indexes, and TTL indexes. This unit also covers partial Indexes using the partialFilterExpression. One tricky case that I encountered is that some questions asked to choose the proper way to index a collection, and something like these two options were provided:

db.restaurants.createIndex(
   { cuisine: 1,
   { partialFilterExpression: { cuisine: { $exists: true } } }
)
Enter fullscreen mode Exit fullscreen mode
db.restaurants.createIndex(
   { cuisine: 1 }
)
Enter fullscreen mode Exit fullscreen mode
  • While the first query looks like a partial filter, it uses the filter expression that would create a sparse index. The difference between a regular index and a sparse one — besides that some documents won’t be indexed with the sparse — is that regular indexes contain all documents in a collection, storing null values for those documents that do not include the indexed field. In contrast, on the sparse, they won’t be stored. Keep that detail in mind.

  • You should also go over the documentation on hidden indexes and TTL indexes. I encountered some questions in the exam on using TTL indexes to expire after a specific time (e.g., one hour) and to expire elements on a particular date pragmatically. I can’t recall whether I encountered questions about clustered indexes, but make sure you know what you can/can’t do with them.

  • I also encountered questions where you have at least two common queries that share some fields and must choose the best index strategy.

  • Monitoring and evolving Data models: Ensure you know how and when a data model needs to be updated based on data changes — increased throughput, peak demands, analytics, and new requirements. Understand how you can capitalize on the db.collection.stats() command, as well as in the db.serverStatus(), specially in indexStats section. Remember that you can also access the indexStats through aggregation.

What you won’t need for the exam

In my case, I didn't encounter any of the following:

  • Questions about shards

  • Questions about clusters

  • Questions about how to run some query commands in a collection

  • Questions about journaling or the WiredTiger engine

  • Questions about transactions.

It makes sense since those subjects weren’t listed in the study guide, but you never know.

What took me by surprise in the exam

One thing that I noticed is that the exam presents an imbalanced level of difficulty among the questions. While some questions were easy-peasy, as simple as the ones in their practice exam, others weren’t for the faint-hearted. They dove so much into niche subjects that you must remember particular configurations, like the ones on validationAction.

Another subject that caught me off guard leveraging some 2–3 questions, was write concerns. I remember having a question about what would happen if an insertOne operation with a write concern of “majority” and a timeout of 5000 fails to achieve that under that time. I needed to know exactly how it would respond:

wtimeout causes write operations to return with an error after the specified limit, even if the required write concern will eventually succeed. When these write operations return, MongoDB does not undo successful data modifications performed before the write concern exceeded the wtimeout time limit.

Lastly, in the entities section, I encountered 1–2 questions that included an element of Domain. I couldn’t find anything specific about what they wanted from my life, so I chose whatever was right for me.

Wrapping up

You can review the Learning Path to get the basics to pass the exam. Make sure you understand:

  • The 12 design patterns that MongoDB recommends, with its advantages and disadvantages, as well as application cases,

  • Entities and their relationships and how they relate to design patterns,

  • How indexes work, how to optimize them, and all their perks and whistles: compound indexes, multi-key, partial indexes, sparsed indexes, clustered indexes, and TTL indexes,

  • JSON Schema validation: creating a schema, making elements unique, and locking elements to a list (enum). You will also need to know the different levels of validationLevel available, the default behavior, and the same for validationAction,

  • You will need to know to read the result of an execution plan and recommend actions based on that,

  • Implementation details on write concern.

Combine the Learning Path, the practice exam, the documentation, and the tips provided here to level your knowledge and ensure you can hit 85% or above in any mock exam. Free some 2 hours for the exam; start it 15 minutes before.

All in all, it is not a challenging exam if you understand the concepts, and if you do well enough, you will get a beautiful badge like this one:

Image description

Good luck with the exam 🤞!

Top comments (0)