DEV Community

Databend
Databend

Posted on

Excellent Performance of Databend in Data Archiving Analysis

What's Databend?

Databend, developed with Rust, is a new open-source and cloud-native data warehouse. It offers high-speed elastic expansion capabilities and is committed to building an on-demand and volume-based data cloud product experience:

Famous open-source cloud data warehouse project.

Vectorized Execution and Pull & Push-Based Processor Model.

Separation of compute and storage: Available on-demand with high performance and low cost.

Support for various databases: Compatible with MySQL, Clickhouse protocol, SQL Over HTTP, etc.

Perfect capabilities for handling transactions: Support for Time Travel, Database Clone, Data Share, etc.

Support for reading, writing, and sharing the same data by multiple tenants.

Databend tutorials: https://databend.rs/doc/deploy

Deploying Databend to Work with Ceph

Image description

  • Databend Architecture:

Query node: Processes protocol analysis and SQL push-down.

Meta node: Stores metadata on the local disk.

  • Supported storage solutions:

Public: OSS products from AWS, Alibaba, Tencent, etc.Self-hosted: S3-compatible products such as MinlO and Ceph.

  • Deployment steps:

1.Download the latest binary package.

2.Unzip the package and create folders.

3.Modify the configuration file.

4.Start up Databend.

  • Deployment environment:

Operating system: Centos7
Cpeh version: 12.2.13
Databend version: v0.7.65Overall,
Databend deployment is fairly straightforward.

Step 1: Download Databend.

[root@testsrv ~]#
wget https://github.com/datafuselabs/databend/releases/download/v0.7.65-nightly/databend-v0.7.65-nightly-x86_64-unknown-linux-musl.tar.gz
Enter fullscreen mode Exit fullscreen mode

Step 2: Unzip the package and create folders.

[root@testsrv ~]#
tar -zxvf databend-v0.7.65-nightly-x86_64-unknown-linux-musl.tar.gz
mkdir /usr/local/databend/{bin,data,etc,logs} -p
mv databend-meta /usr/local/databend/bin/ 
mv databend-query /usr/local/databend/bin/
Enter fullscreen mode Exit fullscreen mode

Step 3: Modify the configuration files for startup.

[root@testsrv ~]#  Modify config file for meta node
cd /usr/local/databend/etc/
cat databend-meta.toml
log_dir            = "/usr/local/databend/logs/_logs1"
metric_api_address = "0.0.0.0:28100"
admin_api_address  = "0.0.0.0:28101"
grpc_api_address   = "0.0.0.0:9191"
[raft_config]
id            = 1
raft_dir ="/usr/local/databend/data/_meta1"
raft_api_port = 28103#172.16.16.12#
raft_listen_host = "172.16.16.12"#
raft_advertise_host = "172.16.16.12"
# Start up mode: singlenode cluster
single        = true
Enter fullscreen mode Exit fullscreen mode
[root@testsrv ~]# Modify config file for query node
cat databend-query-node-1.toml
[query]
max_active_sessions = 256
wait_timeout_mills = 5000
# For flight rpc. Use the IP and ports of the current machine
flight_api_address = "本机IP:9091"
# Databend Query http address.
# For admin RESET 
API.http_api_address = "0.0.0.0:8081"
# Databend Query metrics RESET API.
metric_api_address = "0.0.0.0:7071"
# Databend Query MySQL Handler.
mysql_handler_host = "0.0.0.0"
mysql_handler_port = 3307
# Databend Query ClickHouse Handler.
clickhouse_handler_host = "0.0.0.0"
clickhouse_handler_port = 9001
# Databend Query HTTP Handler.
http_handler_host = "0.0.0.0"
http_handler_port = 8000
tenant_id = "test_tenant"
cluster_id = "test_cluster"
table_engine_memory_enabled = true
table_engine_csv_enabled = true
table_engine_parquet_enabled = true
database_engine_github_enabled = true
table_cache_enabled = true
table_memory_cache_mb_size = 1024
table_disk_cache_root = "/usr/local/databend/data/_cache"
table_disk_cache_mb_size = 10240
[log]
log_level = "DEBUG"
log_dir = "/usr/local/databend/logs/_logs"
[meta]
# To enable embedded meta-store, set meta_address to ""
meta_embedded_dir = "/usr/local/databend/data/_meta_embedded_1"
meta_address = "0.0.0.0:9191"
meta_username = "root"
meta_password = "root"
meta_client_timeout_in_second = 60
# Storage config.
[storage]storage_type = "s3"
# DISK storage.
[storage.disk]
data_path = "/usr/local/databend/data/stateless_test_data"
# S3 storage. If you want you s3 ,please storage type : s3
[storage.s3]
bucket="databend"
region="region"
endpoint_url="<Your Ceph S3 address>"
access_key_id="<Your Ceph S3 key id>"
secret_access_key="<Your Ceph S3 access key>"
# Azure storage
[storage.azure_storage_blob]
Enter fullscreen mode Exit fullscreen mode
[root@testsrv ~]# #Modify start.sh
/usr/local/databend/bin
[root@testsrv ~]#cat start.sh
ulimit  -n 65535cd /usr/local/databend/nohup /usr/local/databend/bin/databend-meta --config-file=/usr/local/databend/etc/databend-meta.toml 2>&1 >meta.log &
sleep 3
nohup /usr/local/databend/bin/databend-query --config-file=/usr/local/databend/etc/databend-query-node-1.toml 2>&1 >query.log &
cd -
echo "Please usage: mysql -h127.0.0.1 -P3307 -uroot
Enter fullscreen mode Exit fullscreen mode

Step 4: Start Databend

[root@testsrv ~]#
 bash start.sh&
Enter fullscreen mode Exit fullscreen mode

Step 5: Verify that Databend was deployed successfully to work with Ceph

[root@testsrv ~]# mysql -h127.0.0.1 -P3307 -uroot  -- No passwords by default  
-- Execute SQL Statements 'root'@127.0.0.1 18:59:  [(none)]> 
select * from system.configs; 
-- Successful if ceph address and key are displayed
s3.region              
s3.endpoint_url        
s3.access_key_id       
s3.secret_access_key   
s3.bucket
Enter fullscreen mode Exit fullscreen mode

Excellent Performance of Databend in Data Archiving Analysis

  • Archiving options

A large amount of historical data (such as log and transaction flow) persisting in MySQL will occupy a lot of storage space and affect your business performance (for example, jitters might be caused). However, you cannot permanently delete the historical data because you might want to read it for some analysis purposes at a later time.
You might want, for example, to calculate a total number on a certain condition in a certain month of the year 2000, so it is necessary to consider regularly archiving the data. Now there are many options to do it. You can use the pt-archiver tool, or an archiving applet developed by DBA.The data you choose to archive must meet the following conditions:

Compatible with MySQL protocol for minimizing business logic changes. Other protocols might require a big change.

High-performance compression ratio to save storage costs.

Support for data calculation and analysis.

Here are some options for you: MySQL (separate archive cluster), Databend, etc.

  • Comparing data compressions

We generated 200 million rows of data and imported them into MySQL and Databend respectively to compare the physical sizes after compression. The table below shows that you can have a better compression ratio when using Databend.

Physical Size
SQL 88 G
CSV 84 G
Databend 8 G
MySQL 47 G
  • Data query tests

This section compares the response time of MySQL and Databend for SQL queries.

Test environments:
Server hardware: 40 Cores, 256 G, SSD hard disk.MySQL: Innodb buffer pool set to 100 G. SSD hard disk.Databend: Default configuration. The S3 service must be running on a server with HDD.

select count(*) from ontime;

Execution Time
Databend-hdd 0.02 sec
Databend-ssd 0.04sec
MySQL 4 min 9.05 sec

select count(*),Year from ontime group by Year; (No Indexes)

Execution Time
Databend 1.89 sec
MySQL 5 min 19.20 sec

select count(*),Year from ontime group by Year;( Indexed by year)

Execution Time
Databend-hdd 0.56 sec
Databend-ssd 1.89 sec
MySQL 2 min 46.72 sec

Conclusion: Compared with MySQL, Databend has a slight advantage in the response time of SQL queries for data analysis.

  • Compatible with MySQL protocol

Databend supports MySQL protocol, Clickhouse protocol, and HTTP protocol. Programs that use MySQL are basically compatible with Databend.

Summary

  • Archiving
    Databend has advantages over MySQL in data archiving, such as data compression and SQL queries for data analysis. Databend works with mechanical hard drives, so it doesn't need very good hardware to get good results. I would recommend Databend for data archiving analysis. If you only need data archiving now, Databend can help lower your cost. If you need data analysis at a later time, remember that Databend is easy to scale up and down.

  • Deployment
    Deploying Databend to work with Ceph is easy. Databend has more advantages over traditional databases in the cloud-native computing scenarios.

  • High availability
    The Query node is a stateless node. Metadata must be kept properly, and you can create copies of metadata to keep it from being lost. In my opinion, metadata can be saved to the storage layer for real-time backup, which is only used for emergency recovery.

  • Storage layer
    Cloud scenarios will ensure high availability of the OSS layer. For private cloud environments, you need to ensure the high availability of Ceph or MinIo. We're improving Databend to support k8s out of the box and provide a real pay-as-you-go experience.

Author information: shuaimeng Tian, senior DBA

Top comments (0)