Types of Databases
There are two main types of databases that can be used in applications and these include NoSQL
databases and SQL
databases. SQL databases are relational and structured and usually consist of tables of rows and columns. NoSQL databases come in the following types Key-Value
, Wide-Column
, Graph
, and Document
but we will mainly be taking a look at document databases specifically MongoDB
, which store data in documents that are grouped together in collections. SQL databases are built to vertically scale which means they are ran on a single server and to handle more load you need to increase the CPU and Memory on that server. On the other hand, NoSQL databases are built to horizontally scale and are typically deployed with multiple servers in a replica-set, so handling more load is as simple as adding a new server. Databases are evaluated on their ACID Compliance (Atomicity, Consistency, Isolation, Durability) which is a standard set of properties that guarantee database transactions are processed reliably. Most SQL Databases are ACID compliant by default where NoSQL databases can sacrifice ACID compliance for performance and scalability. Although most of the NoSQL databases offer a solution to achieve ACID compliance if it is needed.
https://starship-knowledge.com/when-to-choose-nosql-over-sql
Local Setup
MongoDB is an open source NoSQL Document database which horizontally scales utilizing clustered servers in replica-sets. A great option for testing and performing local development with MongoDB is using Docker and Docker Compose. The following docker compose spec will create a local docker container running MongoDB with the data persisted to a docker volume.
📝 docker-compose.yml
services:
db:
image: mongo:7.0.1
container_name: myDataBase
restart: always
ports:
- 27017:27017
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: mySecureDbPassword1!
volumes:
- type: volume
source: my_db_data
target: /data/db
volumes:
my_db_data:
💡 Refer to Containers Demystified 🐳🤔 for a full Docker container guide
Start the container as a detached background process.
$ docker-compose up -d
[+] Building 0.0s (0/0) docker-container:unruffled_shockley
[+] Running 3/3
✔ Network mongo_default Created
✔ Volume "mongo_my_db_data" Created
✔ Container myDataBase Started
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
11bce7e4ab08 mongo:7.0.1 "docker-entrypoint.s…" 2 minutes ago Up 2 minutes 0.0.0.0:27017->27017/tcp myDataBase
💡 Mongo uses Collections with BSON Documents and you can loosely think of this as multiple lists of Dictionaries or JSON Objects. The difference with BSON and JSON is BSON supports additional complex types such as DateTime Objects.
Connect to the container with the mongo shell utility mongosh then create our first database (myAppDb), collection (todos) and insert a document.
$ mongosh 'mongodb://root:mySecureDbPassword1!@localhost:27017/'
Current Mongosh Log ID: 651abecd3a009f9bf2b8999c
Connecting to: mongodb://<credentials>@localhost:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.0.1
Using MongoDB: 7.0.1
Using Mongosh: 2.0.1
test> use myAppDb
switched to db myAppDb
myAppDb> db.todos.insertOne({"title": "Do a thing", "completed": false, "created_date": new Date()})
{
acknowledged: true,
insertedId: ObjectId("652009addda6ead56a87d0e5")
}
💡 mongosh is written with JavaScript (NodeJS) so we can insert a DateTime object with
new Date()
We can verify the document has been created by listing out the documents in the collection with the find command.
myAppDb> db.todos.find()
[
{
_id: ObjectId("652009addda6ead56a87d0e5"),
title: 'Do a thing',
completed: false,
created_date: ISODate("2023-10-06T13:20:45.434Z")
}
]
If you prefer to use a visual tool with a GUI instead of the command line, MongoDB Compass can be used.
Define your Data Model
Mongo is great because it allows for flexibility with the data you can store in a collection, all of the documents do not need to have all of the same fields. That being said, it is still important to start with defining your data model when you create a new app or add a new feature to make sure you do not have complexity, organization or performance issues later on. The two types of relationship schemas that I use the most are One-to-One
and One-to-Few
.
💡 Great article on MongoDB Schema Best Practices which covers more schema designs
One-to-One
In the One-to-One
relationship each attribute has a single value so the resulting document is flat and we have one document per item. This is the schema of the document we just inserted.
{
"_id": ObjectId(),
"title": "string",
"completed": false,
"created_date": ISODate()
}
One-to-Few
In the One-to-Few
relationship we use a list of nested objects to group all associated items into a single document such as a user record with multiple addresses.
💡 MongoDB has a 16MB max document size
{
"_id": ObjectId('A'),
"fist_name": "John",
"last_name": "Doe",
"company": "Cisco",
"addresses": [
{ "street": "7200-11 Kit Creek Rd", "city": "Durham", "state": "NC", "country": "US"},
{ "street": "7200-12 Kit Creek Rd", "city": "Durham", "state": "NC", "country": "US"},
]
}
Common Operations using Python
This full example is available on GitHub
https://github.com/dpills/mongo-todos/tree/master
We have used the mongo shell and mongo compass but these tools are usually only used for manual operations or validation of data. MongoDB offers drivers (libraries) in many different programming languages but we will be taking a look at common operations using the Python PyMongo driver for a more realistic application example.
💡 The majority of code is written to interact with the collection of the database and the full list of features can be referenced at the pymongo Collection level operations documentation.
Make sure to install pymongo
with a Python package manager to get started.
# Pip
$ python3 -m pip install pymongo
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.4.2 pymongo-4.5.0
# Poetry
$ poetry add pymongo
The following packages are already present in the pyproject.toml and will be skipped:
• pymongo
Create a python file, import pymongo and setup the mongo connection, specifying to use the myAppDb
database.
📝 todos.py
import argparse
from datetime import datetime
import pymongo
from bson import ObjectId
from bson.errors import InvalidId
db_client = pymongo.MongoClient("mongodb://root:mySecureDbPassword1!@localhost:27017/")
db = db_client["myAppDb"]
🛑 In a real application make sure to put the URI in an environment variable so it is not exposed in your source code
Insert a Document
Insert the document with the same schema we used earlier using insert_one which includes adding a DateTime object.
📝 todos.py
def create_todo(title: str) -> None:
"""
Create a todo
"""
create_result = db.todos.insert_one(
{"title": title, "completed": False, "created_date": datetime.utcnow()}
)
print(f"New Todo ID: {create_result.inserted_id}")
return None
We can get the Document ID by printing the inserted_id
of the create result.
💡 MongoDB generates an ObjectId for each document
ObjectIds are small, likely unique, fast to generate, and ordered. ObjectId values are 12 bytes in length, consisting of:
- A 4-byte timestamp, representing the ObjectId's creation, measured in seconds since the Unix epoch.
- A 5-byte random value generated once per process. This random value is unique to the machine and process.
- A 3-byte incrementing counter, initialized to a random value.
$ python3 todos.py -o create -d 'Write a todo app'
New Todo ID: 65200e33a29c1e7244f7df59
List Documents
We can now loop the existing documents with the find operation and provide a filter so that only the items which are not completed are returned. If we wanted to return all of the items event the completed ones we could just remove the filter db.todos.find()
.
💡If you need to find only a single document then find_one can be used
📝 todos.py
def get_todos() -> None:
"""
Get Todos
"""
print()
for document in db.todos.find({"completed": False}):
for key, value in document.items():
print(f"{key}: {value}")
print()
return None
We see both of our To Do items printed out.
$ python3 todos.py -o read
_id: 652009addda6ead56a87d0e5
title: Do a thing
completed: False
created_date: 2023-10-06 13:20:45.434000
_id: 65200e33a29c1e7244f7df59
title: Write a todo app
completed: False
created_date: 2023-10-06 13:40:03.944000
Update a Document
To update a document update_one can be used, we need to pass a filter as the first argument to say which document to modify, the ObjectId
of the todo is used in this case. The second argument is the operation we want to run, I tend to use $set
the most which modifies the field value. In this example we can just modify the document to complete the todo item by setting completed to true.
📝 todos.py
def complete_todo(todo_id: str) -> None:
"""
Complete a todo
"""
try:
todo_object_id = ObjectId(todo_id)
except InvalidId:
print("Invalid Todo Id")
return None
update_result = db.todos.update_one(
{"_id": todo_object_id}, {"$set": {"completed": True}}
)
if update_result.matched_count == 0:
print("Todo not found")
else:
print("Completed todo, nice work! 🎉")
return None
Complete our initial todo item and verify that we no longer see it when listing out our unfinished todo items.
$ python3 todos.py -o complete -d 652009addda6ead56a87d0e5
Completed todo, nice work! 🎉
$ python3 todos.py -o read
_id: 65200e33a29c1e7244f7df59
title: Write a todo app
completed: False
created_date: 2023-10-06 13:40:03.944000
Delete a Document
Deleting a document is a similar syntax to finding a document where a filter is provided to delete_one to indicate which document to delete.
📝 todos.py
def delete_todo(todo_id: str) -> None:
"""
Delete a todo
"""
try:
todo_object_id = ObjectId(todo_id)
except InvalidId:
print("Invalid Todo Id")
return None
delete_result = db.todos.delete_one({"_id": todo_object_id})
if delete_result.deleted_count == 0:
print("Todo not found")
else:
print("Deleted the todo")
return None
Delete the todo we created and we can see that nothing gets listed when running our read command.
$ python3 todos.py -o delete -d 65200e33a29c1e7244f7df59
Deleted the todo
$ python3 todos.py -o read
Checking from the Mongo shell shows that that document was deleted and we can see that our initial document still exists but has just been marked as completed.
myAppDb> db.todos.find()
[
{
_id: ObjectId("652009addda6ead56a87d0e5"),
title: 'Do a thing',
completed: true,
created_date: ISODate("2023-10-06T13:20:45.434Z")
}
]
Indexes
Indexes are common across most databases and without them database queries can end up scanning every item in a collection in order to find the document or documents required. Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form. Indexes should be created for fields that are matched in the filters of commands. These are the index types that I use the most but a full list can be reference in the Index Types documentation.
💡 1 = Ascending, -1 = Descending
Single Field Index
Single field index can be used when you need to match a single field. We can apply this one to our todo example since we match on the completed
field.
myAppDb> db.todos.createIndex({completed: 1})
completed_1
Unique Index
A unique index ensures that the indexed fields do not store duplicate values; i.e. enforces uniqueness for the indexed fields. By default, MongoDB creates a unique index on the _id
field during the creation of a collection.
> db.collection.createIndex( { "unique_id": 1 }, { unique: true } )
TTL Index
TTL indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time. These work on fields that have Date
or in Python datetime
object types. This is useful for large amounts of data that may only need to be retained for a short or determined period of time.
> db.collection.createIndex( { "created_date": 1 }, { expireAfterSeconds: 3600 } )
Compound Index
Compound indexes can be used when you need to match on multiple fields.
> db.collection.createIndex( { owner: 1, created_date: 1 } )
Get Collection Indexes
Collection Indexes can be viewed with the getIndexes method.
myAppDb> db.todos.getIndexes()
[
{ v: 2, key: { _id: 1 }, name: '_id_' },
{ v: 2, key: { completed: 1 }, name: 'completed_1' }
]
Top comments (4)
I think this is an excellent post that provides a great introduction to MongoDB. I would highly recommend it to anyone who is interested in learning more about this powerful NoSQL database.
@dpills if it supports TTL, can we use it for session/cache like Redis?
@onlinemsr Thanks! 🙂 and yep for larger applications I still use Redis for caching but for smaller applications or if you do not want to manage additional infrastructure you can use mongoDB for caching. A pattern I have used to allow flexible TTL caching is setting the
expireAfterSeconds
to0
and then set anexpire_at
key in the document which makes it more obvious when a document will expire and allows different length TTLs in the same collection.Very thorough article @dpills!
Nice guide, thanks.