This article was written by Erin Doherty and was originally published at Educative.
Databases are universal - they underlie the technology you use every day. They're a crucial component of everything from telecommunications systems, banking systems, and video games to just about any other software system that maintains some amount of persistent data.
In this tutorial we will cover the following:
- What is a database?
- Where do databases fit into web apps
- Why do we need a database?
- What is cloud computing?
- Relational database - RDBMS
- SQL tutorial
- Non-relational database
- Benefits of NoSQL
- SQL vs NoSQL
A database is a collection of information. When you look at a database it isn’t pretty – it’s raw data that needs a lot of work to be displayed nicely in a user interface. Databases are important because they represent how information is modeled logically.
Data itself is even considered a valuable resource now – many companies make money selling users data to advertisers – like Google.
In this article, the audience are developers new to computer programming.
To go back to web development, in the Server- Client model, the database is the server.
- Data is stored on the server (database) in an ugly, raw, unformatted form. In general, information is grouped logically by data schema – not by what the user wants to see.
- Client is the browser, where the user interacts with the data. Data is displayed in an easy to read way for users with cool colors, buttons, pictures and more!
Databases come in two main flavors: Relational Databases which have a table schema, and NoSQL databases which vary.
Imagine if you were running a pet store and you were searching for a count of individual pets that had been in that year:
If you used receipts to store information, you’d have to:
- Go through hundreds of receipts by hand
- Group the receipts by pet so you don’t double count a pet that’s been to your sitters twice
- Create a list of all the pets to get the count
With a database, information is stored in an organized tabular format, so you can query the database to see the count of pets you’ve had in the year. The great thing about a database, is you can easily break the information down further: how many dogs, how many cats, and the count for how many times they visited your pet sitting business are all quickly accessible using a database.
A long time ago, in the 2000’s most companies would buy their own server, and store it in their own buildings. On these servers, they would have many databases, holding tables, storing information. This is known as on-prem. The databases live on your premises.
In this on-prem time, things were complicated. Companies had to employ their own DBA’s (database administrators) and figure out what to do if the power went out. Also, all databases were siloed and lonely. Each company had their own servers, databases, and table schemas holding information, but it was near impossible to share data between companies.
Then came Cloud Computing! Cloud computing is paying a database specialist to do the servers and databases for your company. What does this mean for businesses? Businesses can focus on their value proposition instead of computers! For example:
- Your petstore can employ more crazy cat ladies, and less database administrators, because you’re outsourcing servers to a large company.
- Better databases : Because the Petstore paying professionals to focus databases, you’re going to have a better product.
- Easier maintenance because giant corporations like Microsoft Azure, Amazon Web Services, or Google Cloud will run technical updates
- More on demand technical help. The professionals at your Cloud provider can help with some basic technical standards: like how to set up a database, what tools to use, what software to buy.
All this Cloud Computing leads into all the acronyms ending in aaS:
- Software as a Service
- Platform as a Service
- Functions as a Service
In the beginning there were relational databases. RDBMS stands for Relational Database Management System - a way to control your database system.
Relational databases model data logically using tables – often called tabular relations.
Spreadsheets in Excel are a good, easy metaphor to think about how a relational database works. Relational databases are like tabular - that means they are like tables in Excel. If you imagine a relational database as an Excel workbook:
- the spreadsheet is like the database
- each tab is like a database table
- tables are defined in different tabs
- Each table has a key
- data is linked between tables using keys
Relational databases have a schema defining table structures & how tables are related, and keys to give the row address of the information.
Let’s imagine you’re running the pet sitting business. How do you keep track of owner’s information? Here’s 2 pieces of information we’ll start with.
Both of these pieces of information can change – people get married, move. So how does a programmer keep track of information in a way that’s quick to update, and flexible for all the ways life changes?
The developer designs a Relational Databases would versioned name table, and a versioned address table. Each table has a primary key – a unique number that points to that row’s information. NameId and AddressId. Primary keys are not repeated - in the Excel screenshots - the primary key is how the developer can find the exact row of information.
The first step to designing a relational database is to define the schema. The schema is a map of where all the data lives in tables – table names, column names. The schema shows how tables relate to each other – from our example above, how to use the name table to look up the address associated with the person.
In a relational database, all information is sorted, structured, defined, and designed using schema. Relational Databases work well for when the developer knows what their data inputs are going to be – for example, if address information comes in on forms, the data has a defined structure already.
SQL, Structured Query Language, is the most common way to query a relational database. Querying is a way of pulling information back from the database. When querying the database, the developer focuses on efficiency. If you imagine a database with billions of addresses, how do we get to the one address to display to the user quickly?
The answer is using keys for efficiency. Primary keys, or clustered indexes, are the unique address pointing to only that data. Primary keys can’t be re-used. Non-clustered indexes are additional keys the developer & database administrator add to the database to make often used queries faster. Secondary keys are often added once the software goes into production, and through monitoring performance, the developers can identify the largest speed bottlenecks, and add secondary keys to alleviate traffic.
Learn database design without scrubbing through videos or documentation. Educative's text-based courses are easy to skim and feature live coding environments - making learning quick and efficient.
Database Design Fundamentals for Software Engineers
In the SQL code snippets above we created 2 SQL tables. The schema defines how the tables are created and linked.
Notice in the query, we joined the tables on AddressId. The star * means pull back all columns so this query pulls back all columns from both tables. We also told SQL to find the correct row of data by using the primary key NameId. For efficiency reasons, it's essential to use keys whenever possible. Imagine databases with billions of rows of data; finding the right information can take forever when it's a production amount of information.
What does a developer do when they don’t know what kind of information they are going to receive? Recently, lots of advancements have been made on Non-Relational databases, which can take unstructured information, and store it. Non-relational databases don’t require as much up-front design, and they are more flexible. The downside of non-relational database is they are generally harder to use – because the developer doesn’t know what kind of information they are going to receive – data could come a picture, or a movie, a .zip file, or plain text for example. After storage, once the developer has to use information from the Non-relational database, it’s harder to write coding logic to process that information because there are so many options.
NoSQL means the database is not SQL. It’s something other than the traditional tabular relations. NoSQL is great for big data, and real-time web applications. No SQL is a bit of an exaggeration. NoSQL can better be thought of as “not-only SQL”, many NoSQL databases use some table relationships, and some other relationships, for example, a picture storage database may take multiple kinds of files, and still have a key to file table relationship.
NoSQL compromises consistency – the developer doesn’t know what they’re going to get when they query their NoSQL database – for other benefits.
- Simple design
- Simpler horizontal scaling
- Control over availability
- Limiting object-relational impedance mismatch
- Availability, Partition Tolerance, and speed
- But…NoSQL compromises consistency to achieve these 3 benefits, leading to the idea that NoSQL has “eventual consistency”.
When developing a NoSQL database different data structures are used. We won't go into them in-depth here, but here's a list of common NoSQL data structures so you can get an idea:
- Wide Column
NoSQL databases have different query options, querying is asking the database for information.
NoSQL databases are often used to store unformatted information. The software can take in the data now, process it later. This is very helpful when you don't know what kind of information you're going to get up-front: like when a user can email in a picture, PDF, attachment, or text in an email.
If you are writing a stock trading, banking, or a Finance-based app or you need to store a lot of relationships, for instance, when writing a social networking app like Facebook, then you should pick a relational database. Here’s why:
Transactions & Data Consistency
If you are writing software that has anything to do with money or numbers, that makes transactions, ACID, data consistency super important to you. Relational DBs shine when it comes to transactions & data consistency. They comply with the ACID rule, have been around for ages & are battle-tested.
If your data has a lot of relationships like which friends of yours live in a particular city? Which of your friend already ate at the restaurant you plan to visit today? etc. There is nothing better than a relational database for storing this kind of data.
Relational databases are built to store relationships. They have been tried & tested & are used by big guns in the industry like Facebook as the main user-facing database.
Popular relational databases:
- Microsoft SQL Server
Here are a few reasons why you’d want to pick a NoSQL database:
Handling A Large Number Of Read Write Operations
Look towards NoSQL databases when you need to scale fast. For example, when there are a large number of read-write operations on your website and when dealing with a large amount of data, NoSQL databases fit best in these scenarios. Since they have the ability to add nodes on the fly, they can handle more concurrent traffic and large amounts of data with minimal latency.
Running data analytics
NoSQL databases also fit best for data analytics use cases, where we have to deal with an influx of massive amounts of data.
Popular NoSQL databases:
If you’re curious about trying a NoSQL database like MongoDB then I highly suggest checking out Nikola Zivkovic’s course, The Definitive Guide to MongoDB.
There has been a lot covered in this post, but we've barely scratched the surface. You should invest time learning about data modeling, normalization, functional dependencies, and SQL.
Database Design Fundamentals for Software Engineers is a great course for learning the key aspects of database design. In this course, you will:
- Have a grasp of the basics of database systems.
- Be exposed to different types of databases.
- Learn about entity relationship diagrams and their uses.
- Be able to normalize databases in order to increase query efficiency.
- Have learned basic SQL commands to query the database.
You can check out a free preview by clicking the link above.
Posted on Jun 1 by: