WiredTiger is MongoDB’s default storage engine, but what really occurs behind the scenes when collections and indexes are saved to disk? In this short deep dive, we’ll explore the internals of WiredTiger data files, covering everything from _mdb_catalog
metadata and B-Tree page layouts to BSON storage, primary and secondary indexes, and multi-key array handling. The goal is to introduce useful low-level tools like wt and other utilities.
I ran this experiment in a Docker container, set up as described in a previous blog post:
docker run --rm -it --cap-add=SYS_PTRACE mongo bash
# install required packages
apt-get update && apt-get install -y git xxd strace curl jq python3 python3-dev python3-pip python3-venv python3-pymongo python3-bson build-essential cmake gcc g++ libstdc++-12-dev libtool autoconf automake swig liblz4-dev zlib1g-dev libmemkind-dev libsnappy-dev libsodium-dev libzstd-dev
# get WiredTiger main branch
curl -L $(curl -s https://api.github.com/repos/wiredtiger/wiredtiger/releases/latest | jq -r '.tarball_url') -o wiredtiger.tar.gz
git clone https://github.com/wiredtiger/wiredtiger.git
cd wiredtiger
# Compile
mkdir build && cmake -S /wiredtiger -B /wiredtiger/build \
-DCMAKE_C_FLAGS="-O0 -Wno-error -Wno-format-overflow -Wno-error=array-bounds -Wno-error=format-overflow -Wno-error=nonnull" \
-DHAVE_BUILTIN_EXTENSION_SNAPPY=1 \
-DCMAKE_BUILD_TYPE=Release
cmake --build /wiredtiger/build
# add `wt` binaries and other tools in the PATH
export PATH=$PATH:/wiredtiger/build:/wiredtiger/tools
# Start mongodb
mongod &
I use the mongo
image, add the WiredTiger sources from the main branch, compile it to get wt
, and start mongod
.
I create a small collection with three documents, and an index, and stop mongod
:
mongosh <<'JS'
db.franck.insertMany([
{_id:"aaa",val1:"xxx",val2:"yyy",val3:"zzz",msg:"hello world"},
{_id:"bbb",val1:"xxx",val2:"yyy",val3:"zzz",msg:["hello","world"]},
{_id:"ccc",val1:"xxx",val2:"yyy",val3:"zzz",msg:["hello","world","hello","again"]}
]);
db.franck.createIndex({_id:1,val1:1,val2:1,val3:1,msg:1});
db.franck.find().showRecordId();
use admin;
db.shutdownServer();
JS
I stop MongoDB so that I can access the WiredTiger files with wt
without them being opened and locked by another program. Before stopping, I displayed the documents:
[
{
_id: 'aaa',
val1: 'xxx',
val2: 'yyy',
val3: 'zzz',
msg: 'hello world',
'$recordId': Long('1')
},
{
_id: 'bbb',
val1: 'xxx',
val2: 'yyy',
val3: 'zzz',
msg: [ 'hello', 'world' ],
'$recordId': Long('2')
},
{
_id: 'ccc',
val1: 'xxx',
val2: 'yyy',
val3: 'zzz',
msg: [ 'hello', 'world', 'hello', 'again' ],
'$recordId': Long('3')
}
]
The files are stored in the default WiredTiger directory /data/db
MongoDB catalog, which maps the MongoDB collections to their storage attributes, is stored in a WiredTiger table _mdb_catalog
. The default WiredTiger directory is /data/db
:
root@72cf410c04cb:/wiredtiger# ls -altU /data/db
drwxr-xr-x. 4 root root 32 Sep 1 23:10 ..
-rw-------. 1 root root 0 Sep 13 20:33 mongod.lock
drwx------. 2 root root 74 Sep 13 20:29 journal
-rw-------. 1 root root 21 Sep 12 22:47 WiredTiger.lock
-rw-------. 1 root root 50 Sep 12 22:47 WiredTiger
-rw-------. 1 root root 73728 Sep 13 20:33 WiredTiger.wt
-rw-r--r--. 1 root root 1504 Sep 13 20:33 WiredTiger.turtle
-rw-------. 1 root root 4096 Sep 13 20:33 WiredTigerHS.wt
-rw-------. 1 root root 36864 Sep 13 20:33 sizeStorer.wt
-rw-------. 1 root root 36864 Sep 13 20:33 _mdb_catalog.wt
-rw-------. 1 root root 114 Sep 12 22:47 storage.bson
-rw-------. 1 root root 20480 Sep 13 20:33 collection-0-3767590060964183367.wt
-rw-------. 1 root root 20480 Sep 13 20:33 index-1-3767590060964183367.wt
-rw-------. 1 root root 36864 Sep 13 20:33 collection-2-3767590060964183367.wt
-rw-------. 1 root root 36864 Sep 13 20:33 index-3-3767590060964183367.wt
-rw-------. 1 root root 20480 Sep 13 20:20 collection-4-3767590060964183367.wt
-rw-------. 1 root root 20480 Sep 13 20:20 index-5-3767590060964183367.wt
-rw-------. 1 root root 20480 Sep 13 20:33 index-6-3767590060964183367.wt
drwx------. 2 root root 4096 Sep 13 20:33 diagnostic.data
drwx------. 3 root root 21 Sep 13 20:17 .mongodb
-rw-------. 1 root root 20480 Sep 13 20:33 collection-0-6917019827977430149.wt
-rw-------. 1 root root 20480 Sep 13 20:23 index-1-6917019827977430149.wt
-rw-------. 1 root root 20480 Sep 13 20:25 index-2-6917019827977430149.wt
Catalog
_mdb_catalog
maps MongoDB names to WiredTiger table names. wt
lists the key (recordId) and value (BSON):
root@72cf410c04cb:~# wt -h /data/db dump table:_mdb_catalog
WiredTiger Dump (WiredTiger Version 12.0.0)
Format=print
Header
table:_mdb_catalog
access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=snappy,block_manager=default,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,disaggregated=(page_log=),encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,import=(compare_timestamp=oldest_timestamp,enabled=false,file_metadata=,metadata_file=,panic_corrupt=true,repair=false),in_memory=false,ingest=,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=,bloom=,bloom_bit_count=,bloom_config=,bloom_hash_count=,bloom_oldest=,chunk_count_limit=,chunk_max=,chunk_size=,merge_max=,merge_min=),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source="file:_mdb_catalog.wt",split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,stable=,tiered_storage=(auth_token=,bucket=,bucket_prefix=,cache_directory=,local_retention=300,name=,object_target_size=0),type=file,value_format=u,verbose=[],write_timestamp_usage=none
Data
\81
r\01\00\00\03md\00\eb\00\00\00\02ns\00\15\00\00\00admin.system.version\00\03options\00 \00\00\00\05uuid\00\10\00\00\00\04\ba\fc\c2\a9;EC\94\9d\a1\df(\c9\87\eaW\00\04indexes\00\97\00\00\00\030\00\8f\00\00\00\03spec\00.\00\00\00\10v\00\02\00\00\00\03key\00\0e\00\00\00\10_id\00\01\00\00\00\00\02name\00\05\00\00\00_id_\00\00\08ready\00\01\08multikey\00\00\03multikeyPaths\00\10\00\00\00\05_id\00\01\00\00\00\00\00\00\12head\00\00\00\00\00\00\00\00\00\08backgroundSecondary\00\00\00\00\00\03idxIdent\00+\00\00\00\02_id_\00\1c\00\00\00index-1-3767590060964183367\00\00\02ns\00\15\00\00\00admin.system.version\00\02ident\00!\00\00\00collection-0-3767590060964183367\00\00
\82
\7f\01\00\00\03md\00\fb\00\00\00\02ns\00\12\00\00\00local.startup_log\00\03options\003\00\00\00\05uuid\00\10\00\00\00\042}_\a9\16,L\13\aa*\09\b5<\ea\aa\d6\08capped\00\01\10size\00\00\00\a0\00\00\04indexes\00\97\00\00\00\030\00\8f\00\00\00\03spec\00.\00\00\00\10v\00\02\00\00\00\03key\00\0e\00\00\00\10_id\00\01\00\00\00\00\02name\00\05\00\00\00_id_\00\00\08ready\00\01\08multikey\00\00\03multikeyPaths\00\10\00\00\00\05_id\00\01\00\00\00\00\00\00\12head\00\00\00\00\00\00\00\00\00\08backgroundSecondary\00\00\00\00\00\03idxIdent\00+\00\00\00\02_id_\00\1c\00\00\00index-3-3767590060964183367\00\00\02ns\00\12\00\00\00local.startup_log\00\02ident\00!\00\00\00collection-2-3767590060964183367\00\00
\83
^\02\00\00\03md\00\a7\01\00\00\02ns\00\17\00\00\00config.system.sessions\00\03options\00 \00\00\00\05uuid\00\10\00\00\00\04D\09],\c6\15FG\b6\e2m!\ba\c4j<\00\04indexes\00Q\01\00\00\030\00\8f\00\00\00\03spec\00.\00\00\00\10v\00\02\00\00\00\03key\00\0e\00\00\00\10_id\00\01\00\00\00\00\02name\00\05\00\00\00_id_\00\00\08ready\00\01\08multikey\00\00\03multikeyPaths\00\10\00\00\00\05_id\00\01\00\00\00\00\00\00\12head\00\00\00\00\00\00\00\00\00\08backgroundSecondary\00\00\00\031\00\b7\00\00\00\03spec\00R\00\00\00\10v\00\02\00\00\00\03key\00\12\00\00\00\10lastUse\00\01\00\00\00\00\02name\00\0d\00\00\00lsidTTLIndex\00\10expireAfterSeconds\00\08\07\00\00\00\08ready\00\01\08multikey\00\00\03multikeyPaths\00\14\00\00\00\05lastUse\00\01\00\00\00\00\00\00\12head\00\00\00\00\00\00\00\00\00\08backgroundSecondary\00\00\00\00\00\03idxIdent\00Y\00\00\00\02_id_\00\1c\00\00\00index-5-3767590060964183367\00\02lsidTTLIndex\00\1c\00\00\00index-6-3767590060964183367\00\00\02ns\00\17\00\00\00config.system.sessions\00\02ident\00!\00\00\00collection-4-3767590060964183367\00\00
\84
\a6\02\00\00\03md\00\e6\01\00\00\02ns\00\0c\00\00\00test.franck\00\03options\00 \00\00\00\05uuid\00\10\00\00\00\04>\04\ec\e2SUK\ca\98\e8\bf\fe\0eu\81L\00\04indexes\00\9b\01\00\00\030\00\8f\00\00\00\03spec\00.\00\00\00\10v\00\02\00\00\00\03key\00\0e\00\00\00\10_id\00\01\00\00\00\00\02name\00\05\00\00\00_id_\00\00\08ready\00\01\08multikey\00\00\03multikeyPaths\00\10\00\00\00\05_id\00\01\00\00\00\00\00\00\12head\00\00\00\00\00\00\00\00\00\08backgroundSecondary\00\00\00\031\00\01\01\00\00\03spec\00q\00\00\00\10v\00\02\00\00\00\03key\005\00\00\00\10_id\00\01\00\00\00\10val1\00\01\00\00\00\10val2\00\01\00\00\00\10val3\00\01\00\00\00\10msg\00\01\00\00\00\00\02name\00!\00\00\00_id_1_val1_1_val2_1_val3_1_msg_1\00\00\08ready\00\01\08multikey\00\01\03multikeyPaths\00?\00\00\00\05_id\00\01\00\00\00\00\00\05val1\00\01\00\00\00\00\00\05val2\00\01\00\00\00\00\00\05val3\00\01\00\00\00\00\00\05msg\00\01\00\00\00\00\01\00\12head\00\00\00\00\00\00\00\00\00\08backgroundSecondary\00\00\00\00\00\03idxIdent\00m\00\00\00\02_id_\00\1c\00\00\00index-1-6917019827977430149\00\02_id_1_val1_1_val2_1_val3_1_msg_1\00\1c\00\00\00index-2-6917019827977430149\00\00\02ns\00\0c\00\00\00test.franck\00\02ident\00!\00\00\00collection-0-6917019827977430149\00\00
I can decode the BSON value with wt_to_mdb_bson.py
to display it as JSON, and use jq
to filter the file information about the collection I've created:
wt -h /data/db dump -x table:_mdb_catalog |
wt_to_mdb_bson.py -m dump -j |
jq 'select(.value.ns == "test.franck") |
{ns: .value.ns, ident: .value.ident, idxIdent: .value.idxIdent}
'
{
"ns": "test.franck",
"ident": "collection-0-6917019827977430149",
"idxIdent": {
"_id_": "index-1-6917019827977430149",
"_id_1_val1_1_val2_1_val3_1_msg_1": "index-2-6917019827977430149"
}
}
ident
is the WiredTiger table name (collection-...
) for the collection documents. All collections have a primary key index on "_id" and additional secondary indexes, stored in WiredTiger tables (index-...
). These indexes are stored as .wt
files in the data directory.
Collection
Using the WiredTiger table name for the collection, I dump the content, keys, and values, and decode it as JSON:
wt -h /data/db dump -x table:collection-0-6917019827977430149 |
wt_to_mdb_bson.py -m dump -j
{"key": "81", "value": {"_id": "aaa", "val1": "xxx", "val2": "yyy", "val3": "zzz", "msg": "hello world"}}
{"key": "82", "value": {"_id": "bbb", "val1": "xxx", "val2": "yyy", "val3": "zzz", "msg": ["hello", "world"]}}
{"key": "83", "value": {"_id": "ccc", "val1": "xxx", "val2": "yyy", "val3": "zzz", "msg": ["hello", "world", "hello", "again"]}}
The "key" here is the recordId — an internal, unsigned 64-bit integer MongoDB uses (when not using clustered collections) to order documents in the collection table. The 0x80 offset is because the storage key is stored as a signed 8‑bit integer, but encoded in an order-preserving way.
I can also use wt_binary_decode.py
to look at the file blocks. Here is the leaf page (page type: 7 (WT_PAGE_ROW_LEAF)
) that contains my three documents as six key and value cells (cells (oflow len): 6
) :
wt_binary_decode.py --offset 4096 --page 1 --verbose --split --bson /data/db/collection-0-6917019827977430149.wt
/data/db/collection-0-6917019827977430149.wt, position 0x1000/0x5000, pagelimit 1
Decode at 4096 (0x1000)
0: 00 00 00 00 00 00 00 00 1f 0f 00 00 00 00 00 00 5f 01 00 00
06 00 00 00 07 04 00 01 00 10 00 00 64 0a ec 4b 01 00 00 00
Page Header:
recno: 0
writegen: 3871
memsize: 351
ncells (oflow len): 6
page type: 7 (WT_PAGE_ROW_LEAF)
page flags: 0x4
version: 1
Block Header:
disk_size: 4096
checksum: 0x4bec0a64
block flags: 0x1
0: 28: 05 81
desc: 0x5 short key 1 bytes:
<packed 1 (0x1)>
1: 2a: 80 91 51 00 00 00 02 5f 69 64 00 04 00 00 00 61 61 61 00 02
76 61 6c 31 00 04 00 00 00 78 78 78 00 02 76 61 6c 32 00 04
00 00 00 79 79 79 00 02 76 61 6c 33 00 04 00 00 00 7a 7a 7a
00 02 6d 73 67 00 0c 00 00 00 68 65 6c 6c 6f 20 77 6f 72 6c
64 00 00
cell is valid BSON
{ '_id': 'aaa',
'msg': 'hello world',
'val1': 'xxx',
'val2': 'yyy',
'val3': 'zzz'}
2: 7d: 05 82
desc: 0x5 short key 1 bytes:
<packed 2 (0x2)>
3: 7f: 80 a0 60 00 00 00 02 5f 69 64 00 04 00 00 00 62 62 62 00 02
76 61 6c 31 00 04 00 00 00 78 78 78 00 02 76 61 6c 32 00 04
00 00 00 79 79 79 00 02 76 61 6c 33 00 04 00 00 00 7a 7a 7a
00 04 6d 73 67 00 1f 00 00 00 02 30 00 06 00 00 00 68 65 6c
6c 6f 00 02 31 00 06 00 00 00 77 6f 72 6c 64 00 00 00
cell is valid BSON
{ '_id': 'bbb',
'msg': ['hello', 'world'],
'val1': 'xxx',
'val2': 'yyy',
'val3': 'zzz'}
4: e1: 05 83
desc: 0x5 short key 1 bytes:
<packed 3 (0x3)>
5: e3: 80 ba 7a 00 00 00 02 5f 69 64 00 04 00 00 00 63 63 63 00 02
76 61 6c 31 00 04 00 00 00 78 78 78 00 02 76 61 6c 32 00 04
00 00 00 79 79 79 00 02 76 61 6c 33 00 04 00 00 00 7a 7a 7a
00 04 6d 73 67 00 39 00 00 00 02 30 00 06 00 00 00 68 65 6c
6c 6f 00 02 31 00 06 00 00 00 77 6f 72 6c 64 00 02 32 00 06
00 00 00 68 65 6c 6c 6f 00 02 33 00 06 00 00 00 61 67 61 69
6e 00 00 00
cell is valid BSON
{ '_id': 'ccc',
'msg': ['hello', 'world', 'hello', 'again'],
'val1': 'xxx',
'val2': 'yyy',
'val3': 'zzz'}
The script shows the raw hexadecimal bytes for the key, a description of the cell type, and the decoded logical value using WiredTiger’s order‑preserving integer encoding (packed int encoding). In this example, the raw byte 0x81 decodes to record ID 1:
0: 28: 05 81
desc: 0x5 short key 1 bytes:
<packed 1 (0x1)>
Here is the branch page (page type: 6 (WT_PAGE_ROW_INT)
) that references it:
wt_binary_decode.py --offset 8192 --page 1 --verbose --split --bson /data/db/collection-0-6917019827977430149.wt
/data/db/collection-0-6917019827977430149.wt, position 0x2000/0x5000, pagelimit 1
Decode at 8192 (0x2000)
0: 00 00 00 00 00 00 00 00 20 0f 00 00 00 00 00 00 34 00 00 00
02 00 00 00 06 00 00 01 00 10 00 00 21 df 20 d6 01 00 00 00
Page Header:
recno: 0
writegen: 3872
memsize: 52
ncells (oflow len): 2
page type: 6 (WT_PAGE_ROW_INT)
page flags: 0x0
version: 1
Block Header:
disk_size: 4096
checksum: 0xd620df21
block flags: 0x1
0: 28: 05 00
desc: 0x5 short key 1 bytes:
""
1: 2a: 38 00 87 80 81 e4 4b eb ea 24
desc: 0x38 addr (leaf no-overflow) 7 bytes:
<packed 0 (0x0)> <packed 1 (0x1)> <packed 1273760356 (0x4bec0a64)>
As we have seen in the previous blog post, the pointer includes the checksum of the page it references (0x4bec0a64
) to detect disc corruption.
Another utility, bsondump
, can be used to display the output of wt dump -x
as JSON, like wt_to_mdb_bson.py
, but requires some filtering to get the BSON content:
wt -h /data/db dump -x table:collection-0-6917019827977430149 | # dump in hexa
egrep '025f696400' | # all documents have an "_id " field
xxd -r -p | # gets the plain binary data
bsondump --type=json # display BSON it as JSON
{"_id":"aaa","val1":"xxx","val2":"yyy","val3":"zzz","msg":"hello world"}
{"_id":"bbb","val1":"xxx","val2":"yyy","val3":"zzz","msg":["hello","world"]}
{"_id":"ccc","val1":"xxx","val2":"yyy","val3":"zzz","msg":["hello","world","hello","again"]}
2025-09-14T08:57:36.182+0000 3 objects found
It also provides a debug type output that gives more insights into how it is stored internally, especially for documents with arrays:
wt -h /data/db dump -x table:collection-0-6917019827977430149 | # dump in hexa
egrep '025f696400' | # all documents have an "_id " field
xxd -r -p | # gets the plain binary data
bsondump --type=debug # display BSON as it is stored
--- new object ---
size : 81
_id
type: 2 size: 13
val1
type: 2 size: 14
val2
type: 2 size: 14
val3
type: 2 size: 14
msg
type: 2 size: 21
--- new object ---
size : 96
_id
type: 2 size: 13
val1
type: 2 size: 14
val2
type: 2 size: 14
val3
type: 2 size: 14
msg
type: 4 size: 36
--- new object ---
size : 31
0
type: 2 size: 13
1
type: 2 size: 13
--- new object ---
size : 122
_id
type: 2 size: 13
val1
type: 2 size: 14
val2
type: 2 size: 14
val3
type: 2 size: 14
msg
type: 4 size: 62
--- new object ---
size : 57
0
type: 2 size: 13
1
type: 2 size: 13
2
type: 2 size: 13
3
type: 2 size: 13
2025-09-14T08:59:15.268+0000 3 objects found
Arrays in BSON are just sub-objects with the array position as a field name.
Primary index
RecordId is an internal, logical key used in the BTree to store the collection. It allows documents to be physically moved without fragmentation when they're updated. All indexes reference documents by recordId, not their physical location. Access by "_id" requires a unique index created automatically with the collection and stored as another WiredTiger table. Here is the content:
wt -h /data/db dump -p table:index-1-6917019827977430149
WiredTiger Dump (WiredTiger Version 12.0.0)
Format=print
Header
table:index-1-6917019827977430149
access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=8),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,block_manager=default,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,disaggregated=(page_log=),encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,import=(compare_timestamp=oldest_timestamp,enabled=false,file_metadata=,metadata_file=,panic_corrupt=true,repair=false),in_memory=false,ingest=,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=16k,key_format=u,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=16k,leaf_value_max=0,log=(enabled=true),lsm=(auto_throttle=,bloom=,bloom_bit_count=,bloom_config=,bloom_hash_count=,bloom_oldest=,chunk_count_limit=,chunk_max=,chunk_size=,merge_max=,merge_min=),memory_page_image_max=0,memory_page_max=5MB,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=true,prefix_compression_min=4,source="file:index-1-6917019827977430149.wt",split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,stable=,tiered_storage=(auth_token=,bucket=,bucket_prefix=,cache_directory=,local_retention=300,name=,object_target_size=0),type=file,value_format=u,verbose=[],write_timestamp_usage=none
Data
<aaa\00\04
\00\08
<bbb\00\04
\00\10
<ccc\00\04
\00\18
There are three entries, one for each document, with the "_id" value (aaa
,bbb
,ccc
) as the key, and the recordId as the value. The values are packed (see documentation), for example <
prefixes a little-endian value.
In MongoDB’s KeyString format, the recordId is stored in a special packed encoding where three bits are added to the right of the big-endian value, to be able to store the length at the end of the key. The same is used when it is in the value part of the index entry, in a unique index. To decode it, you need to shift the last byte right by three bits. Here, 0x08 >> 3 = 1
, 0x10 >> 3 = 2
, and 0x18 >> 3 = 3
, which are the recordId of my documents.
I decode the page that contains those index entries:
wt_binary_decode.py --offset 4096 --page 1 --verbose --split /data/db/index-1-6917019827977430149.wt
/data/db/index-1-6917019827977430149.wt, position 0x1000/0x5000, pagelimit 1
Decode at 4096 (0x1000)
0: 00 00 00 00 00 00 00 00 1f 0f 00 00 00 00 00 00 46 00 00 00
06 00 00 00 07 04 00 01 00 10 00 00 7c d3 87 60 01 00 00 00
Page Header:
recno: 0
writegen: 3871
memsize: 70
ncells (oflow len): 6
page type: 7 (WT_PAGE_ROW_LEAF)
page flags: 0x4
version: 1
Block Header:
disk_size: 4096
checksum: 0x6087d37c
block flags: 0x1
0: 28: 19 3c 61 61 61 00 04
desc: 0x19 short key 6 bytes:
"<aaa"
1: 2f: 0b 00 08
desc: 0xb short val 2 bytes:
"
2: 32: 19 3c 62 62 62 00 04
desc: 0x19 short key 6 bytes:
"<bbb"
3: 39: 0b 00 10
desc: 0xb short val 2 bytes:
""
4: 3c: 19 3c 63 63 63 00 04
desc: 0x19 short key 6 bytes:
"<ccc"
5: 43: 0b 00 18
desc: 0xb short val 2 bytes:
""
This utility doesn't decode the recordId, we need to shift it. There's no BSON to decode in the indexes.
Secondary index
Secondary indexes are similar, except that they can be composed of multiple fields, and any indexed field can contain an array, which may result in multiple index entries for a single document, like an inverted index.
MongoDB tracks which indexed fields contain arrays to improve query planning. A multikey index creates an entry for each array element, and if multiple fields are multikey, it stores entries for all combinations of their values. By knowing exactly which fields are multikey, the query planner can apply tighter index bounds when only one field is involved. This information is stored in the catalog as a "multikey" flag along with the specific "multikeyPaths":
wt -h /data/db dump -x table:_mdb_catalog |
wt_to_mdb_bson.py -m dump -j |
jq 'select(.value.ns == "test.franck") |
.value.md.indexes[] |
{name: .spec.name, key: .spec.key, multikey: .multikey, multikeyPaths: .multikeyPaths | keys}
'
{
"name": "_id_",
"key": {
"_id": { "$numberInt": "1" },
},
"multikey": false,
"multikeyPaths": [
"_id"
]
}
{
"name": "_id_1_val1_1_val2_1_val3_1_msg_1",
"key": {
"_id": { "$numberInt": "1" },
"val1": { "$numberInt": "1" },
"val2": { "$numberInt": "1" },
"val3": { "$numberInt": "1" },
"msg": { "$numberInt": "1" },
},
"multikey": true,
"multikeyPaths": [
"_id",
"msg",
"val1",
"val2",
"val3"
]
}
Here is the dump of my index on {_id:1,val1:1,val2:1,val3:1,msg:1}
:
wt -h /data/db dump -p table:index-2-6917019827977430149
WiredTiger Dump (WiredTiger Version 12.0.0)
Format=print
Header
table:index-2-6917019827977430149
access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=8),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,block_manager=default,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,disaggregated=(page_log=),encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,import=(compare_timestamp=oldest_timestamp,enabled=false,file_metadata=,metadata_file=,panic_corrupt=true,repair=false),in_memory=false,ingest=,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=16k,key_format=u,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=16k,leaf_value_max=0,log=(enabled=true),lsm=(auto_throttle=,bloom=,bloom_bit_count=,bloom_config=,bloom_hash_count=,bloom_oldest=,chunk_count_limit=,chunk_max=,chunk_size=,merge_max=,merge_min=),memory_page_image_max=0,memory_page_max=5MB,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=true,prefix_compression_min=4,source="file:index-2-6917019827977430149.wt",split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,stable=,tiered_storage=(auth_token=,bucket=,bucket_prefix=,cache_directory=,local_retention=300,name=,object_target_size=0),type=file,value_format=u,verbose=[],write_timestamp_usage=none
Data
<aaa\00<xxx\00<yyy\00<zzz\00<hello world\00\04\00\08
(null)
<bbb\00<xxx\00<yyy\00<zzz\00<hello\00\04\00\10
(null)
<bbb\00<xxx\00<yyy\00<zzz\00<world\00\04\00\10
(null)
<ccc\00<xxx\00<yyy\00<zzz\00<again\00\04\00\18
(null)
<ccc\00<xxx\00<yyy\00<zzz\00<hello\00\04\00\18
(null)
<ccc\00<xxx\00<yyy\00<zzz\00<world\00\04\00\18
(null)
Values are packed (as described earlier) and separated by 0x00
. When an array is indexed, its items are stored as multiple entries (after deduplication only one value per document - visible in the entries for \00\04\00\18
where <hello\00
is onle once). The entries are not only deduplicated but also sorted in ascending/descending order to find quickly the minimum and maximum.
The encoded recordId is similar to what was discussed before, but since this is not a unique index, it's placed at the end of the key, rather than as a value, to ensure each key remains unique.
The recordId uses a special encoding that stores three “length” bits in the top three bits of the first byte and the bottom three bits of the last byte. These bits let MongoDB determine the length and decode the recordId from the end without reading the entire key.
Additional metadata
The MongoDB metadata is contained in _mdb_catalog.wt
and maps to the WiredTiger files. The WiredTiger metadata is stored in WiredTiger.wt
. For example, for my collection:
wt -h /data/db dump file:WiredTiger.wt |
grep -A1 collection-0-6917019827977430149
colgroup:collection-0-6917019827977430149\00
app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),collator=,columns=,source="file:collection-0-6917019827977430149.wt",type=file,verbose=[],write_timestamp_usage=none\00
colgroup:collection-2-3767590060964183367\00
--
file:collection-0-6917019827977430149.wt\00
access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,collator=,columns=,dictionary=0,encryption=(keyid=,name=),format=btree,huffman_key=,huffman_value=,id=11,ignore_in_memory_cache_size=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,readonly=false,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,tiered_object=false,tiered_storage=(auth_token=,bucket=,bucket_prefix=,cache_directory=,local_retention=300,name=,object_target_size=0),value_format=u,verbose=[],version=(major=1,minor=1),write_timestamp_usage=none,checkpoint=(WiredTigerCheckpoint.1=(addr="018181e4d620bee18281e41546bd168381e4745f6da6808080e22fc0cfc0",order=1,time=1757794673,size=8192,newest_start_durable_ts=0,oldest_start_ts=0,newest_txn=0,newest_stop_durable_ts=0,newest_stop_ts=-1,newest_stop_txn=-11,prepare=0,write_gen=3872,run_write_gen=3870)),checkpoint_backup_info=,checkpoint_lsn=(2,19456)\00
--
table:collection-0-6917019827977430149\00
app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),colgroups=,collator=,columns=,key_format=q,value_format=u,verbose=[],write_timestamp_usage=none\00
The medatata for WiredTiger.wt
is in WiredTiger.turtle
as simple text:
cat WiredTiger.turtle
WiredTiger version string
WiredTiger 12.0.0: (November 15, 2024)
WiredTiger version
major=12,minor=0,patch=0
file:WiredTiger.wt
access_pattern_hint=none,allocation_size=4KB,app_metadata=,assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,cache_resident=false,checksum=on,collator=,columns=,dictionary=0,encryption=(keyid=,name=),format=btree,huffman_key=,huffman_value=,id=0,ignore_in_memory_cache_size=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=S,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=0,log=(enabled=true),memory_page_image_max=0,memory_page_max=5MB,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,readonly=false,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,tiered_object=false,tiered_storage=(auth_token=,bucket=,bucket_prefix=,cache_directory=,local_retention=300,name=,object_target_size=0),value_format=S,verbose=[],version=(major=1,minor=1),write_timestamp_usage=none,checkpoint=(WiredTigerCheckpoint.1616=(addr="018081e49ce334508181e453b31e788281e4c5e974cf808080e3012fc0e24fc0",order=1616,time=1757864587,size=32768,newest_start_durable_ts=0,oldest_start_ts=0,newest_txn=2,newest_stop_durable_ts=0,newest_stop_ts=-1,newest_stop_txn=-11,prepare=0,write_gen=4742,run_write_gen=4736,next_page_id=0)),checkpoint_backup_info=,checkpoint_lsn=(4294967295,2147483647)
In addition to _mdb_catalog.wt
MongoDB tracks the size of collections in sizeStorer.wt
:
wt_binary_decode.py -v -o 4096 -p 1 --bson /data/db/sizeStorer.wt
/data/db/sizeStorer.wt, position 0x1000/0x8000, pagelimit 1
Decode at 4096 (0x1000)
Page Header:
recno: 0
writegen: 4555
memsize: 435
ncells (oflow len): 10
page type: 7 (WT_PAGE_ROW_LEAF)
page flags: 0x4
version: 1
Block Header:
disk_size: 4096
checksum: 0x6447ad0c
block flags: 0x1
0: desc: 0x49 short key 18 bytes:
"table:_mdb_catalog"
1: cell is valid BSON
{'dataSize': 2037, 'numRecords': 4}
2: desc: 0x99 short key 38 bytes:
"table:collection-0-3767590060964183367"
3: cell is valid BSON
{'dataSize': 59, 'numRecords': 1}
4: desc: 0x99 short key 38 bytes:
"table:collection-0-6917019827977430149"
5: cell is valid BSON
{'dataSize': 299, 'numRecords': 3}
6: desc: 0x99 short key 38 bytes:
"table:collection-2-3767590060964183367"
7: cell is valid BSON
{'dataSize': 25928, 'numRecords': 4}
8: desc: 0x99 short key 38 bytes:
"table:collection-4-3767590060964183367"
9: cell is valid BSON
{'dataSize': 0, 'numRecords': 0}
Conclusion
By exploring MongoDB’s WiredTiger files with low-level tools, we can see precisely how high‑level collections, documents, and indexes map down to on‑disk structures.
At the core is the _mdb_catalog
table — a WiredTiger BTree that acts as MongoDB’s internal namespace directory. It tells MongoDB which WiredTiger table holds the actual documents for each collection, and which tables hold the associated indexes.
A collection’s data table is itself a BTree, where the key is the internal RecordId and the value is the document in BSON format. Leaf pages hold these (key, BSON) pairs, while branch pages store key ranges and child pointers with checksums to protect against corruption.
Every collection has at least a primary "_id" index, stored in a separate BTree table. Here, the index key is the "_id" field value, and the value is the encoded RecordId pointing back to the collection’s document.
Additional secondary indexes work the same way, with index keys built from one or more fields (in compound indexes) and, in the case of array fields, multiple index entries per document. For non‑unique indexes, the RecordId is appended to the key so that each entry remains unique.
To explore the internals, I used:
-
wt
to dump the WiredTiger tables (keys/values, hex output). I compiled it from the WiredTiger sources. -
wt_to_mdb_bson.py
to decode MongoDB BSON fromwt dump -x
output into JSON andwt_binary_decode.py
to inspect WiredTiger BTree page internals. I got them from the WiredTiger repo. -
bsondump
to display BSON as JSON or a detailed debug format. It is included in the MongoDB Database Tools
Something that might surprise you if you're familiar with other databases is that MongoDB's on-disk storage only holds persistent data - the fields and their values in a clean state for future queries. Many relational databases also store transient metadata on disk, such as transaction information, locks, undo records, and dead tuples, which are used in ongoing transactions and need to be cleaned up later (through processes like garbage collection, vacuuming, delayed block cleaning, ghost cleanup, purging, and compaction). In contrast, MongoDB was designed for short transactions on modern infrastructure, so it keeps transient information in memory and stores durable data on disk to optimize performance and avoid resource intensive background tasks. This is known as "No-Steal / No-Force" cache management:
Top comments (0)