Lawrence Murithi

Posted on May 26

Where Does the Data Go? A Comprehensive Guide to Databases

#database #dataengineering

Introduction

Whether you are building a website, a mobile app, or a complex business system, you will inevitably need a secure and efficient place to house your data. This data serves as the digital backbone of your application, and how you manage it directly impacts your system's reliability and performance. To handle this responsibility, developers rely on a database. A database is a specialized, organized system designed to store, manage, and retrieve information with precision and speed.
However, in today’s digital landscape, not all data is created equal. Storing a highly sensitive list of user passwords requires a fundamentally different architectural approach than processing thousands of temperature readings collected per second from a weather sensor. Because modern data comes in various shapes, sizes, and velocities, software engineers have developed different types of databases specifically tailored to handle these diverse requirements.
This article explores the primary types of databases, how they work, when to best utilize them, foundational database concepts, and the pros and cons of each.

1. Relational Databases (SQL)

Relational databases are the traditional filing cabinets of the software world. They have been around since the 1970s and store data in strict, organized tables containing rows and columns—very much like an Excel spreadsheet. The relational part means you can link these tables together. For example, a Customers table can be easily linked to an Orders table using a customer ID number. To talk to these databases, developers use a standard language called SQL (Structured Query Language).
Examples: MySQL, PostgreSQL, Oracle, Microsoft SQL Server.
Uses: Banking systems, inventory management, e-commerce stores, and anywhere you need absolute accuracy.

Pros

Reliability - They use a strict rule system (ACID), guaranteeing that data transactions are processed reliably. If you transfer $10 to a friend, the database ensures your account loses $10 and their account gains $10. If the system crashes halfway through, it safely cancels the whole transaction rather than losing the money.
Standardization - SQL is widely known, highly documented, and supported by almost all data tools.
Complex Querying - You can easily ask complicated questions that pull connected data from dozens of different tables at once.
Data Integrity - Built-in rules prevent orphaned data (like having an order in the system for a customer that has been deleted).

Cons

Rigid Structure - You must define exactly what your tables look like ahead of time. Adding a new column later can be a slow, system-locking process.
Hard to Scale Out - To handle more traffic, you usually have to buy a bigger, more expensive server (scaling up). It is very difficult to spread a relational database across dozens of cheap servers (scaling out).
Performance Bottlenecks - They can become slow when searching through massive, row datasets if not perfectly optimized.

2. NoSQL Databases

NoSQL stands for Not Only SQL. These databases were created in the 2000s when companies realized traditional relational databases could not handle the massive amounts of fast, messy, unstructured data they were collecting. Instead of strict tables and rows, NoSQL databases come in a few different types.

A. Document Databases
These store data as documents (usually JSON format), which look like nested lists. Instead of splitting a user's details across five different tables, you store all their information together in one single document.
Examples: MongoDB, CouchDB.
Uses: User profiles, content management systems (like blogs), and product catalogs.

Pros

Highly flexible - You can add new, unique pieces of information to one document without affecting the others.
Developer-friendly - the documents look exactly like the code objects developers write.

Cons

Not great for complex queries - Cross-referencing data across hundreds of documents is slow.
Data duplication - stores the same piece of info in multiple documents to keep reading fast.

B. Key-Value Stores
This is the simplest type of database. It works just like a dictionary. You have a Key (a unique word or ID) and a Value (the data attached to it). To retrieve the data, you must know the exact key.
Examples: Amazon DynamoDB, Redis.
Uses: Shopping carts, user session data (keeping you logged in), and user preferences.

Pros

Fast and simple to implement
Easy to scale across hundreds of servers.

Cons

You can only search by the exact key. You cannot easily ask the database complex questions like, Show me all users who live in New York.
They also offer no built-in way to link data together.

C. Wide-Column Stores
These look a bit like relational tables, but the names and formats of the columns can change from row to row. They are designed to hold massive amounts of data spread out over many servers.
Examples: Apache Cassandra, HBase.
Uses: Logging user activity, smart home (IoT) device data, massive music or video streaming platforms.

Pros

Can handle millions of data writes per second.
They are highly fault-tolerant and almost never go down, even if several servers in the cluster break. They also compress data very efficiently.

Cons

Complex to set up and manage.
They offer very limited querying options and are slower for reading individual, specific records compared to a Key-Value store.

D. Graph Databases
These databases care more about the relationships between data than the data itself. They store data in nodes (entities/people) and edges (the connections between them).
Examples: Neo4j, Amazon Neptune.
Uses: Social networks (finding friends of friends), fraud detection rings, and recommendation engines (users who bought X also bought Y).

Pros

Makes querying deeply connected, complex relationships incredibly fast.
They also provide highly intuitive, visual ways to look at your data.

Cons

Highly specialized. Its built to do one specific job and cant be used as a normal database.
They are terrible for keeping standard business records, doing simple counting tasks,
Require learning steep, specialized query languages.

3. NewSQL Databases

NewSQL combines the strict, reliable rules of a Relational (SQL) database, but scales and grows across hundreds of servers as effortlessly as a NoSQL database.
Examples: CockroachDB, Google Cloud Spanner.
Uses: Global financial platforms, multi-region software apps, and high-availability gaming backends.

Pros

You get the absolute safety of SQL, but the database automatically balances itself across many servers without manual work.
It survives data center crashes easily and prevents downtime.

Cons

They are highly complex to operate compared to standard SQL databases
Require more expensive hardware or cloud setups to run smoothly
Smaller community of developers if you need troubleshooting help.

4. Vector Databases

Vector databases are the storage brains behind modern Artificial Intelligence. Instead of searching for exact word matches, they turn data (like text, images, or audio) into complex arrays of numbers called embeddings. They find things based on underlying meaning and similarity.
Examples: Pinecone, Milvus, Weaviate.
Uses: AI chatbots (feeding relevant info to models like ChatGPT), facial recognition, and semantic search (searching by context, not just keywords).

Pros

Extremely fast at similarity searches, even when sifting through billions of complex records.
Effortlessly handle unstructured data like audio and video files.

Cons

Very specialized hence cannot be used to store user passwords, process credit card payments, or keep track of inventory.
Yield approximate results (trading 100% accuracy for speed)
Useless for standard business bookkeeping.
Generating the embeddings to put into the database can be computationally expensive.

5. Time-Series Databases

Sometimes, time is the most important piece of data you have. A time-series database is built specifically to handle data that is recorded over time, usually arriving in a constant, heavy stream.
Examples: InfluxDB, Prometheus, TimescaleDB.
Uses: Stock market tracking, server performance monitoring, weather station data.

Pros

Heavily optimized to write data quickly and compress it so it takes up minimal space.
They feature built-in functions to easily answer questions like, What was the average temperature between 2:00 PM and 4:00 PM?
Can automatically delete old data after a set period.

Cons

They are notoriously bad at handling random data updates or deleting specific old records.
Once a timestamped event happens, it is generally treated as permanent history.

6. Search Databases

If you search for red running shoes on a website, a standard database might struggle to quickly find all variations, misspellings, and relevant items. Search databases (search engines) are built specifically to read through massive blocks of text and return highly relevant results in milliseconds.
Examples: Elasticsearch, Meilisearch, Apache Solr.
Uses: E-commerce search bars, scanning millions of computer logs for errors, document/wiki search.

Pros

Lightning-fast full-text search.
They handle typos gracefully (fuzzy matching)
Feature built-in ranking systems so the most relevant results show up at the top.

Cons

They consume a massive amount of memory (RAM) and storage space because they index every single word.
They lack the strict safety rules of a relational database.

7. In-Memory Databases

Standard databases permanently save data onto a server's hard drive. In-memory databases save data directly into the computer's RAM (Main Memory).
Examples: Redis, Memcached.
Uses: Caching (saving a temporary copy of frequently requested data so the main database doesn't have to work as hard), real-time gaming leaderboards, instant messaging.

Pros

Very fast. They retrieve data in fractions of a millisecond, drastically reducing the load on your primary databases.

Cons

RAM is very expensive, meaning storage capacity is limited.
Because data is stored in memory, a sudden power loss or server crash will cause the data to disappear instantly (though modern versions offer background backup tricks to mitigate this).

8. Object-Oriented Databases

These store data exactly how software developers write it in modern programming languages (as objects). Instead of breaking a complex digital item apart to fit into rows and columns, it saves the whole object exactly as it is.
Examples: db4o, ObjectDB.
Uses: Complex engineering systems (CAD designs), 3D modeling, scientific simulations.

Pros

Excellent for highly complex, nested data.
Removes the tedious extra work developers usually have to do to translate their application code into database tables.

Cons

They have a very small developer community
No universally standard query language
Poor interoperability with other data tools
Have mostly been replaced by Document NoSQL databases.

9. Cloud-Native and Serverless Databases

These databases do not run on a physical server that you own. They live entirely in the cloud. The cloud provider handles all the heavy lifting—such as backups, security patching, and adding more compute power when traffic spikes.
Examples: Amazon Aurora, Firebase, Supabase, DynamoDB.
Uses: Fast-moving startups, mobile apps, and systems where user traffic spikes unpredictably.

Pros

Zero operational headaches.
Built-in disaster recovery.
If the database is truly serverless, it scales down to zero when not in use, meaning you only pay for exactly what you consume.

Cons

Vendor lock-in; once you build your app heavily around a specific cloud database, it is difficult to move to a competitor.
Troubleshooting is harder because you do not control the underlying server
Costs can skyrocket unexpectedly if your app suddenly goes viral.

10. Multi-Model Databases

Instead of setting up a document database for user profiles, a graph database for friend connections, and a key-value store for session data, a multi-model database allows you to do all of it in a single system.
Examples: ArangoDB, PostgreSQL (when utilizing extra add-ons).
Uses: Applications with diverse data needs, or small teams that want to keep their software architecture simple.

Pros

You only have one database to learn, monitor, secure, and back up.
All your data stays unified in one place, making agile development easier.

Cons

The jack of all trades, master of none dilemma. They might not be as incredibly fast or efficient at one specific task as a dedicated, single-purpose database would be.
They can also be complex to tune for performance.

A Few Important Database Rules to Know

When comparing these options, developers often use specific acronyms to describe how the database handles data.

ACID vs. BASE
ACID guarantees your data is always 100% correct, verified, and safely recorded before moving on (used by Relational databases).
BASE prioritizes keeping the database online and running fast, even if it means some users briefly see slightly outdated data for a few milliseconds (used by NoSQL databases).

The CAP Theorem
In a system spread across many servers, you can only guarantee two out of these three things;
Consistency - everyone sees the exact same data at the exact same time.
Availability - the system always responds and never goes down .
Partition Tolerance -the system survives even if the servers lose connection to each other.
NB: You must pick a database that makes the right trade-off for your specific business needs.

Sharding vs. Replication
Sharding means chopping your data into pieces and putting different pieces on different servers to make the database hold more total data.
Replication means making exact duplicate copies of your data and putting them on different servers so the data remains safe and readable if one server dies.

Tools Often Mistaken for Databases

Because modern software architecture is complicated, you will often hear names of technologies thrown around that sound like databases, but aren't. They work alongside databases, but serve totally different purposes. Let's clarify a few common ones.

1. Debezium
People often confuse Debezium for a database because it deals heavily with data. In reality, Debezium is a messenger.
It is a Change Data Capture (CDC) tool. Imagine you have a main database running your store, a separate search engine, and a data warehouse for accounting. When a user updates their address in your main database, how do the other systems know? Debezium sits next to your database and constantly reads its internal diary (transaction log). When a change happens, Debezium grabs it and broadcasts a message saying, Hey everyone, User 123 changed their address!
Debezium doesn't store your data permanently. It simply watches for changes and passes the news along.

2. Apache Kafka
Kafka is often mentioned alongside Debezium and is frequently mistaken for a database because it can hold data. However, Kafka is actually an Event Streaming Platform, or a Message Broker.
Think of Kafka as a giant, high-speed conveyor belt. Different parts of your software drop messages onto the conveyor belt, and other parts of your software pick them up.
While Kafka temporarily holds onto messages so they don't get lost, it is not designed for you to look up specific information. You cannot easily query Kafka. Its job is to move data reliably from Point A to Point B, not to act as a filing cabinet.

3. Elasticsearch
While we listed Elasticsearch as a Search Database in the categories above, developers often dump all their primary data into it and treat it like a traditional Relational database.
Elasticsearch is brilliant at reading blocks of text and returning search results in milliseconds. However, it lacks strict ACID compliance. If you use it to permanently store critical things, a server crash could cause you to lose data or get out of sync. It should always be used as a secondary copy of your data strictly for search purposes, functioning alongside a more reliable primary database.

Conclusion

Choosing the right database ultimately comes down to understanding the shape of your data and what you want to achieve with it.
No single database is perfect for everything. The best, most scalable software systems in the world use multiple databases together; a relational database for financial transactions, a search engine for product discovery, an in-memory cache for speed, and a wide-column data warehouse for analytics. This approach is known as polyglot persistence.