Josue Luzardo Gebrim

Posted on Dec 22, 2021

Vitess: Easy database deployment, clustering, and scaling!

#database #nosql #dataops #bigdata

A database clustering system for horizontal scaling of MySQL, Percona, or MariaDB

Vitess is a database solution for deploying, sizing, and managing large clusters of open source database instances. It currently supports MySQL, Percona, and MariaDB. It is architected to work as effectively on a public or private cloud architecture as on dedicated hardware. It combines and extends many essential SQL features with the scalability of a NoSQL database. Vitess can help you with the following problems:

Scale an SQL database, allowing you to share it while keeping application changes to a minimum.
Migrating from baremetal to a private or public cloud.
Deploy and manage a large number of SQL database instances.,
Vitess includes compatible JDBC and Go database drivers using a native query protocol. Furthermore, it implements the MySQL server protocol, compatible with almost any other language.

Vitess has served all of YouTube’s database traffic for over five years. Many companies have already adopted Vitess for their production needs.

Architecture

The Vitess platform consists of several server processes, command-line utilities, and web-based utilities, supported by a consistent metadata store.

Depending on the current state of your application, you can arrive at a complete Vitess implementation through several different process flows. For example, if you are building a service from scratch, your first step with Vitess would be to define the database topology. However, if you need to scale your existing database, you’ll likely start by deploying a connection proxy.

Vitess tools and servers are designed to help you, whether you start with a complete database fleet or start small over time. For smaller implementations, vtttablet features like connection pooling and query rewrite help you get more out of your existing hardware. Vitess automation tools provide additional benefits for larger deployments.

The diagram below illustrates the Vitess components:

These and other components will be explained later in this publication, with the secret behind this solution. :)

Supported database:

MySQL
Percona
MariaDB

Example of Vitess Operator on Kubernetes:

In this example, we will use Minikube installed locally on a Linux with at least 11 gigs free; for that, follow the steps below to deploy and use Vitess on Kubernetes(Minikube):

Start Minikube:

minikube start — kubernetes-version=v1.14.10 — cpus=8 — memory=11000 — disk-size=50g

Install kubectl:

curl -LO https://storage.googleapis.com/kubernetes-release/release/v1.14.9/bin/linux/amd64/kubectl

Install the client to access MySQL:

apt install mysql-client

Install vtctlclient:
NOTE: For this step, it is necessary to have the GO language installed before.

go get vitess.io/vitess/go/cmd/vtctlclient

Install Operator:

Clone the git project:

git clone git@github.com:vitessio/vitess.git
cd vitess/examples/operator

Perform the installation:

kubectl apply -f operator.yaml

Raise a local cluster:

kubectl apply -f 101_initial_cluster.yaml

check your cluster:

Configure the connection port in Kubernetes:

./pf.sh &
alias vtctlclient=”vtctlclient -server=localhost:15999"
alias mysql=”mysql -h 127.0.0.1 -P 15306 -u user”

Create a schema:

vtctlclient ApplySchema -sql=”$(cat create_commerce_schema.sql)” commerce
vtctlclient ApplyVSchema -vschema=”$(cat vschema_commerce_initial.json)” commerce

Connect to your cluster:

We deploy a single unshared keyspace called “commerce” in this example. Unshared keyspaces have a single shard named “0”. The following schema reflects a common e-commerce scenario created by the script:

“Auto-sharding”: the real secret behind Vitess:

Usually, when you shred a database, all you think about it, what’s your shred key, and you shred it that way, right? But what was actually done in the database, its features, what is essential, and how many of those features really make sense in a fragmented environment. So it’s not just the fragmentation key.

The first thing you do, just as databases have Schema, Vitess has a VSchema, which basically, just as a Schema describes how your tables are organized, the VSchema describes how your sharding is organized. Databases have primary keys. Vitess has primary vindex, like a shred key, but much more because, on a shredded system, you just choose your shred key, and the system decides where your shred goes, where your data goes. Using Vitess, you will define your fragmentation key and the mapping function.

This was elaborated because the data is distributed differently in different scenarios with this motivation. Sometimes it grows gradually. Sometimes you don’t want this incremental growth to cause hot shards, or sometimes you want the ranges to be in the same shard. So all these decisions mean that you want to decide how your system will be fragmented. The primary vindex lets you choose the column and the mapping function you wish to use to determine where these rows go.

Secondary vindex is another crucial concept, and essentially, it allows you to use a WHERE clause, unlike fragmentation keys. It’s a cross-fragment index, but the way it’s implemented is just another table in Vitess. It’s just another table that maps the column to what we call keyspace ID, so it’s kind of a street address for a row. The advantage is that you insert a row into your main table. Vitess will automatically populate this table. It will insert another reverse lookup into the table to find the row later if there is a WHERE clause with that column. You can have a single, non-exclusive; this is an elegant extension of database concepts in a fragmented environment.
There are foreign keys in databases, and the parallel is shared vindexes; MySQL has automatic increments, and Vitess has sequences. This reduces the pain of migrating from a non-fragmented system to a fragmented one.

The best part of Vitess in all of this is that once you’re shredded and suddenly your shreds have reached a specific size, and now you have to get bigger, you can “harden” again in Vitess with no downtime. You can do splits and merges, and the app doesn’t even know anything about what’s going on. Everything is done securely with all kinds of data checks to ensure no data is lost. This is really one of the most popular features of Vitess, one of the most loved features of Vitess.

This publication summarizes what I found interesting about this massive MySQL database clustering solution and its derivatives; part of the text of this publication is also in the official documentation and the speech by Sugu Sougoumarane, Co-creator of the solution to InfoQ. I recommend it for anyone who wants to delve deeper into the topic.

DEV Community

Vitess: Easy database deployment, clustering, and scaling!

Top comments (0)