DEV Community

Cover image for 1.1 - MongoDB Types
Wellington Gasparin
Wellington Gasparin

Posted on • Edited on

1.1 - MongoDB Types

Get ready for the first section of the MongoDB Developer Certification! This part carries 8% weighting and focuses on document Types and Shapes.

Since this post has become longer than anticipated, it will be divided into two parts.


1.1 Identify the set of value types MongoDB BSON supports.

To focus, I took the liberty to split the types into two categories, like so:

Common types

Type Size Number Alias Notes
ObjectId 12 bytes 7 "objectId"
Boolean 1 byte 8 "bool" true or false
32-bit integer 4 bytes 16 "int" between -2^31 and 2^31-1
64-bit integer 8 bytes 18 "long" between -2^63 and 2^63-1
Decimal128 16 bytes 19 "decimal" up to 34 decimal digits
Double 8 bytes 1 "double" 15 to 17 decimal digits
String 2 "string" Variable size (UTF-8 encoded)
Object 4 bytes 3 "object" + size of the object
Array 4 bytes 4 "array" + size of elements
Binary data 5 "binData"
Date 8 bytes 9 "date"
Timestamp 8 bytes 17 "timestamp"
Null 0 bytes 10 "null"

Not common types

Type Size Number Alias Notes
Min key -1 "minKey"
Max key 127 "maxKey"
Regular Expression 11 "regex"
JavaScript 13 "javascript"
DBPointer 12 "dbPointer" Deprecated.
Symbol 14 "symbol" Deprecated.
Undefined 6 "undefined" Deprecated.

The most common types

Despite the extensive list of types, the MongoDB documentation could provide more detailed information on each one. However, we can focus on the most common types.


ObjectId

ObjectIds are 12 bytes compound by:

  • A 4-byte timestamp, measured in seconds since Unix epoch.
  • A 5-byte random value generated once per process. This random value is unique to the machine and process.
  • A 3-byte incrementing counter, initialized to a random value.

So, it's small, likely unique, fast to generate, and ordered.

Example of ObjectId value: 66b7ccfcde5c167d5c6c9561

With Mongosh it's possible to retrieve the timestamp from an ObjectID.

$ ObjectId('66b7ccfcde5c167d5c6c9561').getTimestamp()
> 2024-08-10T20:26:36.000Z
Enter fullscreen mode Exit fullscreen mode

Important
While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:

  • Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and
  • Are generated by clients, which may have differing system clocks.

Int32 and Int64

If a number can be converted to an integer32, MongoDB will store it as such; otherwise, it will be converted to an integer64.

Using Mongosh, you can explicitly specify which type you want to use.

> db.types.insertOne(
{
  "intValue": 2147483647,
    "intValueExplicity": Int32(1),
  "longValue": 9223372036854775807,      
  "longValueExplicity": Long("9223372036854775807"),
});
Enter fullscreen mode Exit fullscreen mode

In Mongosh when you wish to explicitly inform that value is a long value, it must be passed as a string.

example of inserted int64 value


Decimal128

Values are 128-bit decimal-based floating-point numbers that emulate decimal rounding with exact precision, supporting 34 digits of precision like this 9.999999999999999999999999999999999.

This functionality is intended for applications that handle monetary data, such as financial, tax, and scientific computations.

Mongosh inserting the value 10,000,000,000,000.123456789

> db.decimal.insertOne({value: new Decimal128("10000000000000.123456789")})
Enter fullscreen mode Exit fullscreen mode

Retrieving the data

> db.decimal.find()
{
  _id: ObjectId('66c8c08ada326cac4262e372'),
  value: Decimal128('10000000000000.123456789')
}
Enter fullscreen mode Exit fullscreen mode

Double

Double is less precise than Decimal128. If your application doesn't deal with numbers that need to be stored with such precision, you can use double for saving decimal numbers.

Mongosh inserting the value 10,000,000,000,000.123456789

> db.double.insertOne({value: 10000000000000.123456789})
Enter fullscreen mode Exit fullscreen mode

Retrieving the data

> db.double.find()
{
  _id: ObjectId('66c8bfbeda326cac4262e371'),
  value: 10000000000000.123
}
Enter fullscreen mode Exit fullscreen mode

String

BSON strings are stored as UTF-8 making it possible to store most international data.

Important
Given strings using UTF-8 character sets, using sort() on strings will be reasonably correct. However, because internally sort() uses the C++ strcmp API, the sort order may handle some characters incorrectly.

To verify this observation, I asked MongoDB to return just 4 documents in descending order. I think Иванов should come before O'Connor, but I don't know if И wiki is equal to N.

Example of sorting


Boolean

No mysteries here, boolean types can hold only true or false values.


Date

BSON Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). This results in a representable date range of about 290 million years into the past and future.

Given the following code

public class Date
{
    public DateTime DateTimeUtc { get; set; } = DateTime.UtcNow;

    public DateTime DateTimeLocal { get; set; } = DateTime.Now;
}
Enter fullscreen mode Exit fullscreen mode

DateTime is always saved as UTC.

Print image local and utc date

MongoDB Compass
Print image both local and utc stored as utc value


Timestamp

I had difficulty finding a straightforward way to save timestamps with C#, so I used Mongosh instead.

BSON has a special timestamp type for internal MongoDB use and is not associated with the regular Date type. Check MongoDB timestamps

Insert with Mongosh console

> db.date.insertOne({timestamp: new Timestamp()})
Enter fullscreen mode Exit fullscreen mode

Retrieve the data

> db.date.findOne({_id: ObjectId('66d478ac10620d368380a43f')})
{
  _id: ObjectId('66d478ac10620d368380a43f'),
  timestamp: Timestamp({ t: 1725200556, i: 4 })
}
Enter fullscreen mode Exit fullscreen mode

Object and Array

With MongoDB, it's possible to store complex objects structures or arrays within a document.

Example of object

{
  _id: ObjectId("66c1fec432fc73d4982e5ee9"),
  name: "Liam Wilson",
  address: {
    street: "",
    zipcode: "",
    city: ""
  }
}
Enter fullscreen mode Exit fullscreen mode

Example of array

{
  _id: ObjectId("66c1fec432fc73d4982e5ee9"),
  name: "Liam Wilson",
  address: ["stree name 1", "stree name 2"]
}
Enter fullscreen mode Exit fullscreen mode

Example of array of objects

{
  _id: ObjectId("66c1fec432fc73d4982e5ee9"),
  name: "Liam Wilson",
  address: [{
    street: "",
    zipcode: "",
    city: ""
  }]
}
Enter fullscreen mode Exit fullscreen mode

Null

Being schema-less, MongoDB allows having each document with different types for the same field, which means that in the same collection is possible to have different documents.

The example bellow shows that by creating a field type as a string and updating the value to null, the type changes to null.

> db.string.insertOne({value: "lorem ipsum"})
Enter fullscreen mode Exit fullscreen mode

Let's check the type

> db.string.aggregate([{$project: {value: 1, nameType: {$type: "$value"}}}])

< 
{
  _id: ObjectId('66c9d9d875b145385f4c7db6'),
  value: 'lorem ipsum',
  nameType: 'string'
}
Enter fullscreen mode Exit fullscreen mode

I'll cover the aggregate method on CRUD post.

Let's update the value field to null

> db.string.updateOne({_id: ObjectId('66c9d9d875b145385f4c7db6')}, {$set: {value: null}})
Enter fullscreen mode Exit fullscreen mode

Checking the type again.

> db.string.aggregate([{$project: {value: 1, nameType: {$type: "$value"}}}])

< 
{
  _id: ObjectId('66c9d9d875b145385f4c7db6'),
  value: null,
  nameType: 'null'
}
Enter fullscreen mode Exit fullscreen mode

BinaryData

BSON Binary Values are a fundamental data type in the BSON format, which is used for storing data in MongoDB. They essentially represent raw binary data, such as images, audio files, or other binary-encoded information.

I'll focus only on UUID and describe it, but here's the complete table with possible binary types.

Number SubType
0 Generic binary subtype
1 Function data
2 Binary (old)
3 UUID (old)
4 UUID
5 MD5
6 Encrypted BSON value
7 Compressed time series data
128 Custom data

UUID

Universally Unique Identifier aka UUID is a 128bits value represented as 32 hexadecimal characters. More about UUID

Let's get into the code and see the differences between UUID old and UUID.

Saving data as GUID type in C# will lead us to have the data as a sub-type 3 - UUID (old).

It took me some time to understand why I was having it as UUID old instead of having a subtype 4 - UUID.

To save the following UUID 057f3e75-24a0-468c-8788-5b3bbb7be407 as a subtype 4 - UUID, I have to use an attribute with my propriety as GuidRepresentation.Standard. Otherwise, it'll save as UUID old.

public class Binary
{
    string Uuid_AsString { get; set; } = "057f3e75-24a0-468c-8788-5b3bbb7be407";

    Guid Uuid_SubType3 { get; set; } = Guid.Parse("057f3e75-24a0-468c-8788-5b3bbb7be407");

    [BsonGuidRepresentation(GuidRepresentation.Standard)]
    Guid Uuid_SubType4 { get; set; } = Guid.Parse("057f3e75-24a0-468c-8788-5b3bbb7be407");
}
Enter fullscreen mode Exit fullscreen mode

MongoDB Compass shows the binary value for UUID like this UUID('057f3e75-24a0-468c-8788-5b3bbb7be407') better than Binary.createFromBase64('dT5/BaAkjEaHiFs7u3vkBw==', 3) for the binary UUID old. It helps when you have to find some document by its UUID.

UUID as binary data

And the document

{
  "Uuid_AsString": "057f3e75-24a0-468c-8788-5b3bbb7be407",
  "Uuid_SubType3": {
    "$binary": {
      "base64": "dT5/BaAkjEaHiFs7u3vkBw==",
      "subType": "03"
    }
  },
  "Uuid_SubType4": {
    "$binary": {
      "base64": "BX8+dSSgRoyHiFs7u3vkBw==",
      "subType": "04"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Notice that the base64 binary value diverges between them. It can lead us to problems, so be careful. Since sub-type 3 is old, make sure you always use sub-type 4.

Converting these two values from Base64 to text show us that sub-type 3 value dT5/BaAkjEaHiFs7u3vkBw== is converted into 753e7f05-a024-8c46-8788-5b3bbb7be407 and sub-type 4 value BX8+dSSgRoyHiFs7u3vkBw== is converted into 057f3e75-24a0-468c-8788-5b3bbb7be407.
I found this amazing tool this week https://cryptii.com/ so check that.

The other problem I ran into was retrieving the data. Filtering by Uuid_SubType4 directly wasn't working and nothing was returned. I have to use GuidRepresentationMode = GuidRepresentationMode.V3 code to retrieve the data.

This code GuidRepresentationMode = GuidRepresentationMode.V3 is already obsolete and will be removed in a later release. I haven't found out yet another way and MongoDB documentation still shows it as a solution. Let me know in the comments if you know another way to solve that.

// This property will be removed in a later release.
BsonDefaults.GuidRepresentationMode = GuidRepresentationMode.V3;
Enter fullscreen mode Exit fullscreen mode

After forcing my application to use GuidRepresentationMode.V3 I have to alter my POCO to this:

public class Binary
{
    string Uuid_AsString { get; set; } = "057f3e75-24a0-468c-8788-5b3bbb7be407";

    [BsonGuidRepresentation(GuidRepresentation.CSharpLegacy)]
    Guid Uuid_SubType3 { get; set; } = Guid.Parse("057f3e75-24a0-468c-8788-5b3bbb7be407");

    [BsonGuidRepresentation(GuidRepresentation.Standard)]
    Guid Uuid_SubType4 { get; set; } = Guid.Parse("057f3e75-24a0-468c-8788-5b3bbb7be407");
}
Enter fullscreen mode Exit fullscreen mode

After that, I was able to filter my data.

var filterSubType4 = Builders<Binary>.Filter
                                     .Where(p => p.Uuid_SubType4 == Guid.Parse("057f3e75-24a0-468c-8788-5b3bbb7be407"));

var result = binary.Find(filterSubType4).ToList();
Enter fullscreen mode Exit fullscreen mode

I had no idea that the UUID topic would take so long to describe. Anyway, here is my takeaway:

  • Set your properties as GuidRepresentation.Standard to ensure using a binary sub-type 4;
    • It helps when you need to find some data with MongoDB Compass because UUID value is shown instead of a Base64 value.
  • Having GUID properties not set, mark them as GuidRepresentation.CSharpLegacy;
  • Force your application to use BsonDefaults.GuidRepresentationMode = GuidRepresentationMode.V3;
    • This helps querying your data;
    • Remember this is an obsolete propriety.
  • Binary data are preferred over string. It's faster and smaller.

References

Top comments (0)