Databend

Posted on Jul 8, 2022

Archiving and Analyzing MySQL Data Using Databend — a Cloud Native Data Warehouse

Requirement analysis on archiving MySQL data

MySQL is commonly used in OLTP to provide external services with qualified hardware resources, the amount of data provided can often reach TB level. In many scenarios, the storage data has a relatively long TTL. However, data records may lose business significance after being online for a period (examples below).

A certain service is deactivated
The life cycle of data exceeds service requirements. For example, a service only requires data within nearly 3 months
Archiving log data
Merging databases and tables to provide statistical query and analysis services
Regular backup, archiving, and audit services
......

Usually, archiving schemes are proposed by DBAs, and developers then analyze which data can be archived. The archiving process can be completed with the help of standardization and automated execution.

Common solutions for archiving MySQL data

Current archiving methods are generally divided into two categories: MySQL and MariaDB. The key tool is pt-archiver or obtaining archived data by parsing binlog.

Using MySQL for storage and archiving

This is the most common solution: the archiving, or even synchronous backup, of the online production repository is processed by a PC (usually with a large capacity -- about 50T, and large memory, to run instances). It's also possible to build a master/slave server configuration offline to archive PolarDB data and provide offline intranet queries.

Advantages

Based on the familiar MySQL environment, it's easy to manage
Highly compatible with the online environment
Archiving environments can be built with large, inexpensive disks

Disadvantages

In this kind of architecture, binlog is usually turned off in in the archive nodes for cost reduction. There will be a backup in the object storage, and no slave database. Therefore, once the data or the hard disk is damaged, it will take a long time to recover
There's not enough computing power, and not much ability to extend the computing node either. The data needs to be extracted and put into a big data environment when computing is required
Large amount of idle CPU and RAM resources

Using MariaDB for archiving

S3 engine is an experimental feature introduced in MariaDB, it has a better compression capability while retaining the usage habits of MySQL users. The complete archiving process includes writing to InnoDB first, then alter table tb_ name engine=s3;

Advantages

Maintaining MySQL compatibility
Supporting S3 class object storage
Supporting highly compressed storage

Disadvantages

S3 engine is write only
Append writing is not supported, so conversion to an InnoDB table is required
The conversion from InnoDB to S3 engine takes a very long time, adding complexity to the process

Archiving method provided by Databend

So, is there a more elegant solution? Here we recommend Databend, a cloud native data warehouse.

Introduction & Architecture

Databend is a modern, open-source data warehouse developed using Rust. It's completely cloud-oriented, provides extremely fast elastic expansion capabilities, and is committed to creating an on-demand and volume-based Data Cloud product experience.

Features are as follows：

A star project among open-source, cloud-based data warehouses
Vectorized execution, pull&push-based processor model
Real storage/computing separation architecture, high performance, low cost, on-demand use
Complete database support, compatible with MySQL, Clickhouse protocol, SQL over HTTP, etc.
Complete transaction,
Ensured transaction integrity, support time travel, database clone, data share and other functions
Support multi-tenant read/write and sharing operations on the same data

Databend's design principles：

No Partition
No index（Auto Index）
Support Transaction
Data Time travel/Data Zero copy clone/Data Share
Enough Performance/Low Cost

Deployment

Three processing methods are supported，including MySQL, Clickhouse and SQL Over Http.

Please refer to https://databend.rs/doc/deploy for installation instructions.

For more support during installation or usage, please contact us via wechat (wechat number: 82565387).

Writing methods

Insert into

Insert writing operations using JDBC, python and golang are supported. Here is a recommended guidance: https://databend.rs/doc/develop.

It's suggested to use Bulk insert to achieve batch writing operations, which has similar usage with MySQL.

Streaming load

Please refer to https://databend.rs/doc/load-data/local to see more of streaming load.

It only takes about 3 minutes to load an 81G file with 200 million rows of data into Databend (As seen in the picture above).

Besides, Databend now supports reading directly from compressed files. For example:

ls ./dataset/*.csv.gz|xargs -P 8 -I{} curl -H "insert_sql:insert into ontime format CSV" -H "skip_header:1"   -H "compression:gzip" -F "upload=@{}" -XPUT http://root:@localhost:8000/v1/streaming_load

It should be noted that the scheme for reading compressed files has not been optimized, it takes about 13 minutes to load the same data using this method. There is much room for performance improvement in the future.

Using Stage

Stage can be considered as an online storage manager of Databend, please refer to https://databend.rs/doc/load-data/stage for detailed syntax。

The above PPT page shows the process of creating Stage, uploading files, and viewing files online. Files in Stage can be loaded into Databend by "copy into" command.

Advantages of archiving MySQL data with Databend

We recommend to use Databend combined with object storage for MySQL data archiving.

The advantages are as follows:

Object storage breaks the limit of storage capacity
Databend has a relatively high data compression ratio of 10:1, which saves a lot of storage resource
Databend manages data based on MySQL protocols, so users don't need to change their usage habits
The storage/computing separated architecture makes it easier to scale up, when facing computing resources insufficiency, and there is no need to worry about the high availability of storage
Most original MySQL tools can be reused

Databend now supports object storage: AWS S3, Azure, Alicloud, Tencent Cloud, QingCloud, Kingsoft Cloud and equipments like minio and ceph. Meanwhile the surprising computing ability of Databend supports data computation services if needed.

People can make better use of cloud resources and obtain qualified performance with a relatively low cost using Databend.

Please contact us via wechat (wechat number: 82565387) to get more information!

About Databend

Databend is an open-source modern data warehouse with elasticity and low cost. It can do real-time data analysis on object-based storage.

We look forward to your attention and hope to explore the cloud native data warehouse solution, and create a new generation of open-source data cloud together.

Databend docs：https://databend.rs/
Twitter：https://twitter.com/Datafuse_Labs
Slack：https://datafusecloud.slack.com/
WeChat：Databend
GitHub ：https://github.com/datafuselabs/databend

DEV Community

Archiving and Analyzing MySQL Data Using Databend — a Cloud Native Data Warehouse

Requirement analysis on archiving MySQL data

Common solutions for archiving MySQL data

Using MySQL for storage and archiving

Advantages

Disadvantages

Using MariaDB for archiving

Advantages

Disadvantages

Archiving method provided by Databend

Introduction & Architecture

Deployment

Writing methods

Insert into

Streaming load

Using Stage

Advantages of archiving MySQL data with Databend

About Databend

Top comments (0)

Read next

Voxel51 Filtered Views Newsletter - May 24, 2024

6 Profitable Micro SaaS Ideas in 2024

Basic CRUD Operations Using Golang, Gin Gonic, and GORM

Construtores semânticos