DEV Community

Databend
Databend

Posted on

Archiving and Analyzing MySQL Data Using Databend — a Cloud Native Data Warehouse

Requirement analysis on archiving MySQL data

MySQL is commonly used in OLTP to provide external services with qualified hardware resources, the amount of data provided can often reach TB level. In many scenarios, the storage data has a relatively long TTL. However, data records may lose business significance after being online for a period (examples below).

  • A certain service is deactivated
  • The life cycle of data exceeds service requirements. For example, a service only requires data within nearly 3 months
  • Archiving log data
  • Merging databases and tables to provide statistical query and analysis services
  • Regular backup, archiving, and audit services
  • ......

Usually, archiving schemes are proposed by DBAs, and developers then analyze which data can be archived. The archiving process can be completed with the help of standardization and automated execution.

Common solutions for archiving MySQL data

Current archiving methods are generally divided into two categories: MySQL and MariaDB. The key tool is pt-archiver or obtaining archived data by parsing binlog.

Using MySQL for storage and archiving

This is the most common solution: the archiving, or even synchronous backup, of the online production repository is processed by a PC (usually with a large capacity -- about 50T, and large memory, to run instances). It's also possible to build a master/slave server configuration offline to archive PolarDB data and provide offline intranet queries.

Image description

Advantages
  1. Based on the familiar MySQL environment, it's easy to manage

  2. Highly compatible with the online environment

  3. Archiving environments can be built with large, inexpensive disks

Disadvantages
  1. In this kind of architecture, binlog is usually turned off in in the archive nodes for cost reduction. There will be a backup in the object storage, and no slave database. Therefore, once the data or the hard disk is damaged, it will take a long time to recover
  2. There's not enough computing power, and not much ability to extend the computing node either. The data needs to be extracted and put into a big data environment when computing is required
  3. Large amount of idle CPU and RAM resources

Using MariaDB for archiving

S3 engine is an experimental feature introduced in MariaDB, it has a better compression capability while retaining the usage habits of MySQL users. The complete archiving process includes writing to InnoDB first, then alter table tb_ name engine=s3;

Advantages
  1. Maintaining MySQL compatibility
  2. Supporting S3 class object storage
  3. Supporting highly compressed storage
Disadvantages
  1. S3 engine is write only

  2. Append writing is not supported, so conversion to an InnoDB table is required

  3. The conversion from InnoDB to S3 engine takes a very long time, adding complexity to the process

Archiving method provided by Databend

So, is there a more elegant solution? Here we recommend Databend, a cloud native data warehouse.

Introduction & Architecture

图片

Databend is a modern, open-source data warehouse developed using Rust. It's completely cloud-oriented, provides extremely fast elastic expansion capabilities, and is committed to creating an on-demand and volume-based Data Cloud product experience.

Features are as follows:

  • A star project among open-source, cloud-based data warehouses
  • Vectorized execution, pull&push-based processor model
  • Real storage/computing separation architecture, high performance, low cost, on-demand use
  • Complete database support, compatible with MySQL, Clickhouse protocol, SQL over HTTP, etc.
  • Complete transaction,
  • Ensured transaction integrity, support time travel, database clone, data share and other functions
  • Support multi-tenant read/write and sharing operations on the same data

Databend's design principles:

  • No Partition

  • No index(Auto Index)

  • Support Transaction

  • Data Time travel/Data Zero copy clone/Data Share

  • Enough Performance/Low Cost

Deployment

Image description

Three processing methods are supported,including MySQL, Clickhouse and SQL Over Http.

Please refer to https://databend.rs/doc/deploy for installation instructions.

For more support during installation or usage, please contact us via wechat (wechat number: 82565387).

Writing methods

Insert into

Insert writing operations using JDBC, python and golang are supported. Here is a recommended guidance: https://databend.rs/doc/develop.

It's suggested to use Bulk insert to achieve batch writing operations, which has similar usage with MySQL.

Streaming load

Please refer to https://databend.rs/doc/load-data/local to see more of streaming load.

Image description

It only takes about 3 minutes to load an 81G file with 200 million rows of data into Databend (As seen in the picture above).

Besides, Databend now supports reading directly from compressed files. For example:

ls ./dataset/*.csv.gz|xargs -P 8 -I{} curl -H "insert_sql:insert into ontime format CSV" -H "skip_header:1"   -H "compression:gzip" -F "upload=@{}" -XPUT http://root:@localhost:8000/v1/streaming_load
Enter fullscreen mode Exit fullscreen mode

It should be noted that the scheme for reading compressed files has not been optimized, it takes about 13 minutes to load the same data using this method. There is much room for performance improvement in the future.

Using Stage

Stage can be considered as an online storage manager of Databend, please refer to https://databend.rs/doc/load-data/stage for detailed syntax。

Image description
The above PPT page shows the process of creating Stage, uploading files, and viewing files online. Files in Stage can be loaded into Databend by "copy into" command.

Advantages of archiving MySQL data with Databend

We recommend to use Databend combined with object storage for MySQL data archiving.

The advantages are as follows:

  1. Object storage breaks the limit of storage capacity
  2. Databend has a relatively high data compression ratio of 10:1, which saves a lot of storage resource
  3. Databend manages data based on MySQL protocols, so users don't need to change their usage habits
  4. The storage/computing separated architecture makes it easier to scale up, when facing computing resources insufficiency, and there is no need to worry about the high availability of storage
  5. Most original MySQL tools can be reused

Databend now supports object storage: AWS S3, Azure, Alicloud, Tencent Cloud, QingCloud, Kingsoft Cloud and equipments like minio and ceph. Meanwhile the surprising computing ability of Databend supports data computation services if needed.

People can make better use of cloud resources and obtain qualified performance with a relatively low cost using Databend.

Please contact us via wechat (wechat number: 82565387) to get more information!

About Databend

Databend is an open-source modern data warehouse with elasticity and low cost. It can do real-time data analysis on object-based storage.

We look forward to your attention and hope to explore the cloud native data warehouse solution, and create a new generation of open-source data cloud together.

Top comments (0)