You as a developer may always use ORMs for CRUD operations, but I believe there is a need for every developer to have at least a basic knowledge about databases and queries as there may be some limitations with ORMs when you get into complicated queries.
Anyhow, I'm not talking about ORMs vs DB Queries in this post. so, if you are a DBA or you are interested in using SQL server, a relational Database Management System for your back-end, it's important to consider a few things to have a well-designed database and good performance. Also, if you are manipulating data as a data-scientist, data analyst and so, it's important to know how to write efficient queries.
Every piece of this document can be expanded and explained in more detail. Lots of good articles and posts are available in each. I wanted to be brief and just mention the important parts.
In this post, I'm going to mention some tips when designing databases for your project.
Normalization is a way of organizing data in database to reduce redundancy and improve data integrity and avoid maintenance problems. It applies for any relational database not only SQL.
please normalize your tables
It's done through obeying some simple rules so-called normal forms e.g. 1NF, 2NF, ...
The first 3 applicable forms are as follows:
Rule No. 1 (1NF) : Do not have multiple values in a cell
Rule No. 2 (2NF) : Remove partial dependency by breaking your table in two tables. All values must depend on the whole primary key if you have more than one column which makes the primary key together (composite primary key).
Rule No. 3 (3NF) : Every non-prime attribute should depend on the primary key. If so, put them in another table.
Sometimes you are allowed to disobey !!
I suggest to apply 3NF to data that changes frequently, because normalization has a negative impact on performance and it seems unusual to separate all the columns that are not describing primary key column to a new table.
There are other Normal Forms but are not really practical and used in real world because they affect functionality.
Normalization is a must if you want to look professional. Do it, although you may need to denormalize some already normalized tables due to performance issues.
For Denormalization, Normalize First
If and only if there are some performance issues which are not solvable by query optimization, indexing and so, you may think of a few denormalization.
For example, you may need to join 10 tables for a simple query, then denormalization worths being considered as a solution.
This action has pros and cons. the positive point is that you will have better performance in querying the tables but the negative point is that you will have duplicate data and bad performance in other operations like deletion, insertion and modification.
N stands for National language Character Set and is used to specify a Unicode string. when the application grows, supporting clients from different locals is important.
Nvarchar type takes twice space as varchar; 2 bytes and 1 byte respectively.
Because of that, it was always recommended to use varchar instead to keep tables smaller and use less space.
But, regarding the fact that storage and memory is less expensive nowadays, it might not be the choice anymore. Also, modern development platforms now use Unicode internally and by using nvarchar you don't have to do encoding conversion when writing or reading to the database.
Using nvarchar may cause a query take longer time to read large columns but take into account that conversions also take time and are prone to errors.
It's a good approach to follow some rules for naming everything in your database specially tables and columns. It makes life easier for you and your team and for future fellows working on the project. Keep in mind that your database may never die and remain forever so it's important to define proper names for the objects from the beginning.
And these are my suggestions:
- Name your tables in the singular e.g. city, employee, student
- Use lower case
- Use underscore for separating the words e.g. student_course, employee_role
- Don't use underscore at the beginning
- Don't use reserved keywords
- Use the same name for all the probable primary keys e.g. "id", "ID" unless you have composite primary key. I prefer not to use simple "id" as the primary key. Use something like [table name]+[id] e.g. id_student
- For Foreign Key columns use a combination of id column and table name e.g. customer_id, id_employee, id_student
- Don't use long names for tables and columns; as a standard, 26 characters for table and column names is recommended. You can use abbreviations if it's going to be long. You'd better to use standard abbreviations for database object naming.
- Don't use data types in naming columns
- Preferably, if you are going to have the same name for a column in different tables, change them not to be the same e.g. student_name, employee_fullname
- Name boolean columns as "is_name" e.g. is_deleted
- I myself prefer to add "sp_", "fn_" and "vw_" before stored procedures, functions and views only not other objects. It gives me the comfort of distinguishing the object when calling out of SQL management studio.
- Define your own schema if you need to create tables or any other objects that should be separated by concept rather that dbo. e.g. hr., fin.
- Use alias in views for computed data using As
- Use @ for defining a local variable or parameter.
Regarding the fact that the naming convention is a standard defined in organizations or projects, you may have to obey their standard.
That's enough for naming I suppose :)
There are many different types of indexes which you can create. The main purpose of creating indexes is improving the performance of a query among many other actions that should be done to improve the performance of a database.
BUT ... The main point is that Do Not create indexes at first when you are creating tables. Create them when you have queries which need to run faster and applying an indexing strategy may help. That is the time you should create a sort of indexes for columns which are called in Where clauses, Order by, group, etc. Therefore, another consideration you must take into account: Do Not create Indexes in every column
You should know that switching indexes is a bit complicated and may need downtime and dropping all other indexes on the table and recreating them. So, do not hurry to create them at the early stages. of course, during the application upgrade, you will define new queries. Then, review and index tuning come to action.
Choose your desired recovery option in database design process. Remember that Full recovery will help you restore data for any transaction but it also slows down the system.
I myself use simple as a choice most of the time which gives the benefit of re-usability of log files storage. However, it depends on your business requirements.
After selecting your recovery model, design a backup strategy for your databases which defines the type and frequency of backups.
Schedule your regular backups in database maintenance plan or alternatively, you can schedule a job for that.
After designing your backup strategy, test your backups by restoring them as a copy database in a test environment.
As a recommendation from Microsoft, for full database backups, choose a period that is off-pick. I usually have a full backup every weekend and differential backup and perhaps Transnational Log backup between the two full backups.
as a Best Practice, choose a different physical storage for database backups from the database files.
Important: if you make a full backup in another location than your plan/job is normally saving the backups, you will break the backup plan. To avoid this use Copy-Only backups
That happened to me once, bitter .. but then, I realized to do so.
A log record is written for every single operation in the database by the engine including starting/ending a SQL transaction when modifying a data, creating/dropping a table, after page allocation/deallocation and so. Therefore, you see the transaction log file always grows rapidly until it reaches the maximum size or if it's set to be auto-growth, you will soon run out of free space on your disk drive. Therefore, you must monitor and manage SQL Server Transaction Log growth.
There are some cases that prevent the transaction log file to be truncated automatically and free up space for reuse such as configuring only full backup plan with full recovery model, uncommitted transactions, SQL Server Agent Service being stopped, long running full and diff backups, etc.
When you encounter the problem of free space with Transaction Log file, it's recommended to check the file size setting and if it's possible extend it. If not possible, change your recovery model to simple (if it's not already) which forces log truncation. If it's not applicable, you have to check the reason of preventing the SQL Transaction log file from being truncated. the sys.database system catalog view under log_reuse_wait_desc column would be a help.
When the recovery model is other than simple, the Transaction Log file is not truncated automatically then you have to take Transaction Log backup otherwise the file will grow continuously. If you have set a backup plan, you can add Transaction Log file backup to your plan to be taken between full backups.
In order to monitor the growth of SQL Transaction Log file, tools such as the System Center Operation manager (SCOM), Performance Monitor counters or creating an alert that reads from one of the system catalog views would be good choices.
Truncating Transaction Log file frees up space to be reused but it doesn't decrease the size of the Log file. In this case you can perform Shrink operation which will be very useful after performing an operation that creates a large number of Transaction Logs.
I recommend you to read a full article about Transaction Log file here which discusses the subject in more detail.
Always have a document keeping the important data of your product resources. Mention the design specification e.g. recovery, backup and restore plan and files locations and a checklist to show what should be done before restoring the databases.
I hope you read this post and use it in your real life of designing databases.
Any suggestion, objection or question ?? comment below :)