Adnan Babakan (he/him)

Posted on Oct 18

How to Design a Multilingual Database Structure | A Practical Guide

#webdev #database #programming

Hi there, DEV.to community!

Designing a database is a pleasurable task. It is where you limit the computers in how they should manage data. What each data means and what can be done with the combinations of them.

I've enjoyed designing a multilingual database structure that can translate any entity in the database. And by translating, the action of storing translation records is meant, not translating them.

What is the problem?

The problem arises from wanting to have everything translatable in my back-end. And the number of languages varies as well. So there is a need to design such a structure that can accept many kinds of data as a translation target, like strings, numbers and even complex (non-primitive) objects like JSON data.

Relational or Non-relational

As you know, relational databases can have relations between their records, such as the relation of a foreign key to a primary key. And in non-relational databases, this cannot happen, not at least in their vanilla (bare) form.
Whilst I designed this database in MySQL (MariaDB), there is no limitation in which database software you use, regardless of whether it is RDBMS or non-relational.

What is required

Knowledge of how databases and ORM/ODM work
A database engine running

A background of what we want

What we want to achieve is a database that can store translatable data, regardless of its data type (at least as much as we can). As we know, foreign keys in relational databases can only relate to a single table. But we might have multiple tables, such as posts, products, couriers, and so many more. Here is when we are introduced to a thing called morphic relations. These kinds of relations are usually managed through a logical back-end.

How we may create such a thing, you might ask. The answer depends on the approach we choose and can vary based on the mindset of the programmer. But there are two ways we may do it:

Instead of having a simple foreign table ID relation, we can have two columns named morphic_table and morphic_id. Storing the name of the target table and the ID of the related record, respectively.
Store both the target table and target ID in a column named morphic_relation or something similar. Although this is against the best practices when designing a database, it proves to be super handy. The issue arises from the data we want to store not being atomic (meaning only one thing), but rather giving us two pieces of information. Where to look, and what to find. This structure is not unusual if you think of vector values. Thus, we may create our column as a text/string column and store our data with a simple delimiter. For instance, table:id. So our data would become something like posts:1243 or products:88214.

The Languages Table

The first thing we want is the languages table. This is where we will store the languages and their related data.
Here is how I designed my languages table:

Enum direction {
  "rtl"
  "ltr"
}

Table languages {
  id primary unique
  iso_code string unique
  local_name string
  latin_name string
  direction direction
  calendar_system string
  font string
  font_feature_settings string
  is_default boolean
  is_active boolean
  created_at datetime
  updated_at datetime
}

I am using dbdiagram.io to draw the database diagrams.

Here is the explanation:

id: Primary ID of the record. I used CUID2 as my ID generation method, but the type ID can be anything and is irrelevant.
iso_code: A unique code that represents a language. I found IETF BCP 47 to be a good reference to distinguish every language. For instance fa-IR, en-US, or es-ES.
local_name: Name of the language written in its local form. Such as فارسی, English or Español.
latin_name: Name of the language written in Latin form. For instance, Persian, English or Spanish.
direction: Indicating the direction of a language's written form. Having only two options, being RTL and LTR.
calendar_system: Indicating the calendar system associated with a language. Such as Jalali or Gregorian.
font: Font used for the interface in the given language.
font_feature_settings: Used to configure how the given font behaves in different conditions.
is_default: A boolean indicating whether the language is the default or not.
is_active: A boolean indicating if the language is active or not.
created_at: Creation timestamp
updated_at: Last update timestamp

Most of the columns indicated above are optional, as you can design your languages table with only two columns being id and iso_code. But having extra details can come in handy later on. The scope of using those additional data is beyond this article and you are at liberty to discover them on your own.

The language records table

We want to make our database translatable no matter what our data are and how they are stored. To demonstrate such a structure, let's have a table called language_records which will store data for each table and each piece of data we need. Here is the structure after designing that table:

Enum direction {
  "rtl"
  "ltr"
}

Table languages {
  id primary unique
  iso_code string unique
  local_name string
  latin_name string
  direction direction
  calendar_system string
  font string
  font_feature_settings string
  is_default boolean
  is_active boolean
  created_at datetime
  updated_at datetime
}

Table language_records {
  id primary unique
  language_id string [ref: > languages.id]
  morphic_table string
  morphic_id string
  key string
  value string
  created_at datetime
  updated_at datetime
}

Language Records Table — Demonstration of the language records table

As you can see, I have added two simple columns named morphic_table and morphic_id to store which table to refer to and what row is wanted, respectively.

Another approach to such a need, as mentioned before, is to have a single column to store both data and get out of being atomic. Here is the second kind of table:

Enum direction {
  "rtl"
  "ltr"
}

Table languages {
  id primary unique
  iso_code string unique
  local_name string
  latin_name string
  direction direction
  calendar_system string
  font string
  font_feature_settings string
  is_default boolean
  is_active boolean
  created_at datetime
  updated_at datetime
}

Table language_records {
  id primary unique
  language_id string [ref: > languages.id]
  morphic_relation string
  key string
  value string
  created_at datetime
  updated_at datetime
}

As you may have noticed, neither of the two structures above has any relation to any table other than the languages table. The relation to the entities themselves (such as posts or products) will be an abstract relation, meaning that we will handle them programmatically, and there is no need for the database to handle such a relation.

I will go forward with both approaches so you may choose what fits your needs the best.

Entities to be translated

Let's assume that we have two tables called posts and products that we want to have translatable fields in. A post has title and content, and a product has name and short_description.

Here is the table structure for both tables:

Table posts {
  id primary unique
  created_at datetime
  updated_at datetime
}

Table products {
  id primary unique
  price double
  created_at datetime
  updated_at datetime
}

I won't be digging down the structure to a proper post and product build. As it is non-relevant in this article.

You may notice that neither the posts table has columns named title and content, nor the products table has columns named name and short_description.

These data are abstract and will be controller and generated programmatically. You are free to implement how to store these data on your back-end. What I suggest to do is to have a mixin or a similar implementation based on your programming language.

Storing Language Records

Now that we have our structures ready we may simply create our language and their relevant data according to our entities (being posts and products).

Here is a simple data for our languages table (I have omitted extra data due to them being irrelevant here):

id	iso_code	local_name	latin_name
1	fa-IR	فارسی	Persian
2	en-US	English	English
3	es-ES	Español	Spanish

Now let's have a post added to our database:

id
1

And one product:

id	price
1	39.99

And that's it. It might seem strange but bear with me as it will make sense later on.

The key column in our language_records table will represent the data key related to our entity being post or product and the value will store its content.

Now let's populate the language records table to represent the data needed to complete our first post and product.

Here is the structure using morphic_table and morphic_id:

id	language_id	morphic_table	morphic_id	key	value
1	1	posts	1	title	سلام دنیا
2	2	posts	1	title	Hello World
3	3	posts	1	title	Hola!
4	1	posts	1	content	آزمایشی
5	2	posts	1	content	Testing
6	3	posts	1	content	Testing in Spanish!
7	1	products	1	name	محصول جدید
8	2	products	1	name	Product
9	3	products	1	name	Product in Spanish
10	1	products	1	content	توضیح محصول
11	2	products	1	content	Product Description
12	3	products	1	content	Product Description in Spanish

And here is the same table using only a colum named morphic_relation:

id	language_id	morphic_realtion	key	value
1	1	posts:1	title	سلام دنیا
2	2	posts:1	title	Hello World
3	3	posts:1	title	Hola!
4	1	posts:1	content	آزمایشی
5	2	posts:1	content	Testing
6	3	posts:1	content	Testing in Spanish!
7	1	products:1	name	محصول جدید
8	2	products:1	name	Product
9	3	products:1	name	Product in Spanish
10	1	products:1	content	توضیح محصول
11	2	products:1	content	Product Description
12	3	products:1	content	Product Description in Spanish

Seeing the table above, we can conclude that we can fetch each language record for all of our data. And access translatable data based on different entities and different languages.

BTW! Check out my free Node.js Essentials E-book here: