MongoDB has become a popular choice among developers for NoSQL databases and unlike AUTO INCREMENT ID in mysql, it uses ObjectId which is auto generated when we insert any record. This ObjectId is used as primary key for uniquely identifying its records. This ObjectId is a 24 byte hex string which is generated by mongoDB.
We often use this ObjectId in our URLs and APIs and expose it. But how safe is it to expose this ObjectID? To understand this, we need to first understand how this ObjectId is generated.
The way MongoDB generates ObjectId in v3.0 and v4.0 is slightly different. I will be explaining both ways below.
- a 4-byte timestamp value, representing the ObjectId’s creation, measured in seconds since the Unix epoch
- a 5-byte random value
- a 3-byte incrementing counter, initialized to a random value
- a 4-byte value representing the seconds since the Unix epoch,
- a 3-byte machine identifier,
- a 2-byte process id, and
- a 3-byte counter, starting with a random value.
(I will only be discussing with respect to version 4.0 as version 3.0 is deprecated but same thing will make sense in 3.0)
- Since first 4 bytes contains the timestamp of the object when it was created. So it can expose signup date/time of users or date/time of any document when it was created.
- Since last 3-byte is a counter, so using this counter one can find about your hidden objects: i.e., if the counter part goes from 0x....c1 to 0x....c9 between times t1 and t2, one can guess ObjectIds within these invervals. However, guessing ids is most likely useless if you enforce access permissions
- by crawling your API/website, one can infer time period of the day when users on your site is most active.
E.g, most amount of objects were created between 3 PM to 4 PM so this is the time when your site gets most amount of traffic
- by using timestamp of objects one can also infer when timezone of your most active audience.
E.g., if your website is one which people use mostly at lunchtime, then one could measure peaks of ObjectIds and deduce that a peak at 7 AM UTC means the audience was from India (because 7:00AM UTC = 12:30PM IST)
- generally speaking, the only information it can disclose is analytics related to object creation timestamp like the time period in which site is most active, user/object count maybe, etc
Though ObjectId is a very randomized string and mostly exposes timestamp information and some information related to counter. But if someone is determined enough then he will scrape most of the information he can.
- The best measure is to strengthen and enforce access permissions on all private objects which makes guessing ids most likely useless
- Add a rate limiter to the API which makes tough for user to scrape API/website
- [Not Recommended] Creating a random unique ID(like UUID/GUID) for every object to prevent exposing of ObjectId. But it will also add another level of headache involved in handling collisions of random unique IDs
It would be extremely burdensome to design a system without exposing ObjectIds at all. Thus, it's important to understand the risks and take care to address them. Until and unless you are taking all security measures and applying access permissions on private objects, you should not worry about exposing ObjectIds.