DEV Community

Apache Doris
Apache Doris

Posted on

Deploying Apache Doris with Storage-Compute Separation Using MinIO: A Practical Guide

Modern data processing faces multiple challenges. The ever-growing volume of data drives up traditional storage costs, especially with unstructured data becoming more prevalent. Data quality issues further increase the burden of storage and cleansing. Additionally, enterprises often struggle with data integration across multiple internal systems, which raises the bar for efficient and cost-effective data analytics.

Apache Doris, a high-performance real-time analytics database with lakehouse capabilities, combined with MinIO, a high-performance S3-compatible object storage system, offers a powerful solution. Together, they enable an efficient, low-cost data analytics platform. This article explores the strengths of Apache Doris and MinIO and provides a step-by-step deployment guide.

Why Choose Apache Doris and MinIO?

Apache Doris: High-Performance Real-Time Analytics Database

Apache Doris is built on an MPP (Massively Parallel Processing) architecture, known for its efficiency, simplicity, and versatility—delivering sub-second query results on massive datasets. Key advantages:

  • High Performance: Sub-second responses for large datasets, supporting high-concurrency point queries and complex analytics.

  • Real-Time Analytics: Enables real-time data ingestion and querying for instant insights.

  • Ease of Use: Streamlined design with low operational and maintenance costs.

  • Scalability: Horizontal scaling via MPP to handle large-scale data and high-concurrency workloads.

  • Multi-Scenario Support: Ideal for reports, ad-hoc queries, user profiling, log retrieval, etc.

  • Robust Integration: Seamlessly works with MySQL, PostgreSQL, Hive, Flink, and other tools.

  • Active Community: Backed by 600+ contributors, deployed in production by 5,000+ organizations (including TikTok, Baidu).

Doris supports two deployment modes:

  • Integrated storage-compute (data stored internally)

  • Separate storage-compute (uses third-party storage like MinIO)

MinIO: High-Performance Object Storage

MinIO is an open-source, distributed object storage system optimized for cloud-native workloads. Core strengths:

  • High Performance: Fast data access to meet real-time analytics demands.

  • Scalability: Horizontal scaling for growing data volumes.

  • Cost-Effectiveness: Open-source, on-premises deployable (avoids cloud storage premiums).

  • S3 Compatibility: Fully compatible with Amazon S3 API for easy tool integration.

  • High Availability: Uses erasure coding for data redundancy.

  • Flexible Deployment: Supports bare-metal, Kubernetes, or cloud environments.

These features make MinIO an ideal storage backend for Doris in a storage-compute separation architecture.

Deployment Guide

Planning

Software Versions

Software Version Description
MinIO latest High-performance object storage
Apache Doris 3.0.6 Real-time analytics database
Doris Manager 25.0.0 Visual tool for Doris installation/deployment

Server Layout

Node IP Doris Manager MinIO MetaService FE BE
172.20.1.2 ✔️ ✔️ ✔️ ✔️ ✔️
172.20.1.3 ✔️ ✔️ ✔️ ✔️
172.20.1.4 ✔️ ✔️ ✔️ ✔️
172.20.1.5 ✔️

For production environments: Use higher-spec machines and isolate components for optimal performance.

Preparation

1. Modify OS Parameters

swapoff -a

cat >> /etc/sysctl.conf << EOF
vm.max_map_count = 2000000
EOF

# Take effect immediately
sysctl -p

vi /etc/security/limits.conf 
* soft nofile 1000000
* hard nofile 1000000

Enter fullscreen mode Exit fullscreen mode

2. Install Required Tools

apt update
apt install -y net-tools
apt install -y cron
apt install -y iputils-ping

Enter fullscreen mode Exit fullscreen mode

Deploying MinIO

1. Download MinIO

wget https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio

Enter fullscreen mode Exit fullscreen mode

2. Start MinIO on Each Node

export MINIO_REGION_NAME=us-east-1
export MINIO_ROOT_USER=minio
export MINIO_ROOT_PASSWORD=minioadmin
mkdir -p /mnt/disk{1..4}/minio
nohup minio server --address :9000 --console-address :9001 http://172.20.1.{2...5}:9000/mnt/disk{1...4}/minio 2>&1 &
Enter fullscreen mode Exit fullscreen mode

3. Configure MinIO Client

wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
./mc alias set myminio http://127.0.0.1:9000 minio minioadmin
./mc mb myminio/doris

Enter fullscreen mode Exit fullscreen mode

Note: If MinIO is deployed on a local network without TLS, explicitly include http:// in the endpoint.

Deploying Doris Manager

1. Download Doris Manager

wget https://enterprise-doris-releases.oss-accelerate.aliyuncs.com/doris-manager/velodb-manager-25.0.0-x64-bin.tar.gz
Enter fullscreen mode Exit fullscreen mode

2. Extract and Start Service

tar -zxf velodb-manager-25.0.0-x64-bin.tar.gz
cd velodb-manager-25.0.0-x64-bin/webserver/bin
bash start.sh

Enter fullscreen mode Exit fullscreen mode

3. Access Web Interface

Open your browser and navigate to http://<Doris Manager IP>:8004. Follow the prompts to create an admin account.

Deploying Apache Doris

1. Download Doris

wget https://apache-doris-releases.oss-accelerate.aliyuncs.com/apache-doris-3.0.6.2-bin-x64.tar.gz
mv apache-doris-3.0.6.2-bin-x64.tar.gz /opt/downloads/doris

Enter fullscreen mode Exit fullscreen mode

2. Create Cluster via Doris Manager

  1. Select Doris version (3.0.6) and set root password

  1. Enter MinIO details:

3. Configure Nodes

  1. Run this script on all nodes to deploy agent:

    wget http://172.20.1.2:8004/api/download/deploy.sh -O deploy_agent.sh && chmod +x deploy_agent.sh && ./deploy_agent.sh
    
    
  2. Input node IPs in the Doris Manager interface

  1. Configure FE nodes (specify roles and resources)

  1. Configure BE nodes (specify storage paths and resources)

4. Deploy Cluster

Click "Deploy" and wait for the process to complete (10-15 minutes). Verify cluster status in Doris Manager.

Querying Data

Data Preparation

1. Access Query Interface

2. Create Doris Table

CREATE DATABASE IF NOT EXISTS `test`;
USE `test`;
CREATE TABLE `amazon_reviews` (  
  `review_date` int(11) NULL,  
  `marketplace` varchar(20) NULL,  
  `customer_id` bigint(20) NULL,  
  `review_id` varchar(40) NULL,
  `product_id` varchar(10) NULL,
  `product_parent` bigint(20) NULL,
  `product_title` varchar(500) NULL,
  `product_category` varchar(50) NULL,
  `star_rating` smallint(6) NULL,
  `helpful_votes` int(11) NULL,
  `total_votes` int(11) NULL,
  `vine` boolean NULL,
  `verified_purchase` boolean NULL,
  `review_headline` varchar(500) NULL,
  `review_body` string NULL
) ENGINE=OLAP
DUPLICATE KEY(`review_date`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`review_date`) BUCKETS 16
PROPERTIES (
  "compression" = "ZSTD"
);

Enter fullscreen mode Exit fullscreen mode

3. Download Sample Data

wget https://datasets-documentation.s3.eu-west-3.amazonaws.com/amazon_reviews/amazon_reviews_2010.snappy.parquet

Enter fullscreen mode Exit fullscreen mode

4. Load Data into Doris

curl --location-trusted -u root:<your password> \
-T amazon_reviews_2010.snappy.parquet \
-H "format:parquet" \
http://127.0.0.1:8030/api/test/amazon_reviews/_stream_load

Enter fullscreen mode Exit fullscreen mode

5. Verify Data in MinIO

Log into MinIO Console (http://<MinIO IP>:9001) → Check doris bucket for data files.

Sample Query

SELECT
    product_id,
    AVG(product_title),
    AVG(star_rating) AS rating,
    COUNT() AS count
FROM
    amazon_reviews
WHERE
    review_body LIKE '%is super awesome%'
GROUP BY
    product_id
ORDER BY
    count DESC,
    rating DESC,
    product_id
LIMIT 5;

Enter fullscreen mode Exit fullscreen mode

Summary

This setup is ideal for enterprises looking to balance performance and cost in real-time analytics scenarios. Try it out with the guide above and share your experience!

Top comments (0)