Modern data processing faces multiple challenges. The ever-growing volume of data drives up traditional storage costs, especially with unstructured data becoming more prevalent. Data quality issues further increase the burden of storage and cleansing. Additionally, enterprises often struggle with data integration across multiple internal systems, which raises the bar for efficient and cost-effective data analytics.
Apache Doris, a high-performance real-time analytics database with lakehouse capabilities, combined with MinIO, a high-performance S3-compatible object storage system, offers a powerful solution. Together, they enable an efficient, low-cost data analytics platform. This article explores the strengths of Apache Doris and MinIO and provides a step-by-step deployment guide.
Why Choose Apache Doris and MinIO?
Apache Doris: High-Performance Real-Time Analytics Database
Apache Doris is built on an MPP (Massively Parallel Processing) architecture, known for its efficiency, simplicity, and versatility—delivering sub-second query results on massive datasets. Key advantages:
High Performance: Sub-second responses for large datasets, supporting high-concurrency point queries and complex analytics.
Real-Time Analytics: Enables real-time data ingestion and querying for instant insights.
Ease of Use: Streamlined design with low operational and maintenance costs.
Scalability: Horizontal scaling via MPP to handle large-scale data and high-concurrency workloads.
Multi-Scenario Support: Ideal for reports, ad-hoc queries, user profiling, log retrieval, etc.
Robust Integration: Seamlessly works with MySQL, PostgreSQL, Hive, Flink, and other tools.
Active Community: Backed by 600+ contributors, deployed in production by 5,000+ organizations (including TikTok, Baidu).
Doris supports two deployment modes:
Integrated storage-compute (data stored internally)
Separate storage-compute (uses third-party storage like MinIO)
MinIO: High-Performance Object Storage
MinIO is an open-source, distributed object storage system optimized for cloud-native workloads. Core strengths:
High Performance: Fast data access to meet real-time analytics demands.
Scalability: Horizontal scaling for growing data volumes.
Cost-Effectiveness: Open-source, on-premises deployable (avoids cloud storage premiums).
S3 Compatibility: Fully compatible with Amazon S3 API for easy tool integration.
High Availability: Uses erasure coding for data redundancy.
Flexible Deployment: Supports bare-metal, Kubernetes, or cloud environments.
These features make MinIO an ideal storage backend for Doris in a storage-compute separation architecture.
Deployment Guide
Planning
Software Versions
| Software | Version | Description |
|---|---|---|
| MinIO | latest | High-performance object storage |
| Apache Doris | 3.0.6 | Real-time analytics database |
| Doris Manager | 25.0.0 | Visual tool for Doris installation/deployment |
Server Layout
| Node IP | Doris Manager | MinIO | MetaService | FE | BE |
|---|---|---|---|---|---|
| 172.20.1.2 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
| 172.20.1.3 | ✔️ | ✔️ | ✔️ | ✔️ | |
| 172.20.1.4 | ✔️ | ✔️ | ✔️ | ✔️ | |
| 172.20.1.5 | ✔️ |
For production environments: Use higher-spec machines and isolate components for optimal performance.
Preparation
1. Modify OS Parameters
swapoff -a
cat >> /etc/sysctl.conf << EOF
vm.max_map_count = 2000000
EOF
# Take effect immediately
sysctl -p
vi /etc/security/limits.conf
* soft nofile 1000000
* hard nofile 1000000
2. Install Required Tools
apt update
apt install -y net-tools
apt install -y cron
apt install -y iputils-ping
Deploying MinIO
1. Download MinIO
wget https://dl.min.io/server/minio/release/linux-amd64/minio
chmod +x minio
2. Start MinIO on Each Node
export MINIO_REGION_NAME=us-east-1
export MINIO_ROOT_USER=minio
export MINIO_ROOT_PASSWORD=minioadmin
mkdir -p /mnt/disk{1..4}/minio
nohup minio server --address :9000 --console-address :9001 http://172.20.1.{2...5}:9000/mnt/disk{1...4}/minio 2>&1 &
3. Configure MinIO Client
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
./mc alias set myminio http://127.0.0.1:9000 minio minioadmin
./mc mb myminio/doris
Note: If MinIO is deployed on a local network without TLS, explicitly include http:// in the endpoint.
Deploying Doris Manager
1. Download Doris Manager
wget https://enterprise-doris-releases.oss-accelerate.aliyuncs.com/doris-manager/velodb-manager-25.0.0-x64-bin.tar.gz
2. Extract and Start Service
tar -zxf velodb-manager-25.0.0-x64-bin.tar.gz
cd velodb-manager-25.0.0-x64-bin/webserver/bin
bash start.sh
3. Access Web Interface
Open your browser and navigate to http://<Doris Manager IP>:8004. Follow the prompts to create an admin account.
Deploying Apache Doris
1. Download Doris
wget https://apache-doris-releases.oss-accelerate.aliyuncs.com/apache-doris-3.0.6.2-bin-x64.tar.gz
mv apache-doris-3.0.6.2-bin-x64.tar.gz /opt/downloads/doris
2. Create Cluster via Doris Manager
- Select Doris version (3.0.6) and set root password
- Enter MinIO details:
3. Configure Nodes
-
Run this script on all nodes to deploy agent:
wget http://172.20.1.2:8004/api/download/deploy.sh -O deploy_agent.sh && chmod +x deploy_agent.sh && ./deploy_agent.sh Input node IPs in the Doris Manager interface
- Configure FE nodes (specify roles and resources)
- Configure BE nodes (specify storage paths and resources)
4. Deploy Cluster
Click "Deploy" and wait for the process to complete (10-15 minutes). Verify cluster status in Doris Manager.
Querying Data
Data Preparation
1. Access Query Interface
2. Create Doris Table
CREATE DATABASE IF NOT EXISTS `test`;
USE `test`;
CREATE TABLE `amazon_reviews` (
`review_date` int(11) NULL,
`marketplace` varchar(20) NULL,
`customer_id` bigint(20) NULL,
`review_id` varchar(40) NULL,
`product_id` varchar(10) NULL,
`product_parent` bigint(20) NULL,
`product_title` varchar(500) NULL,
`product_category` varchar(50) NULL,
`star_rating` smallint(6) NULL,
`helpful_votes` int(11) NULL,
`total_votes` int(11) NULL,
`vine` boolean NULL,
`verified_purchase` boolean NULL,
`review_headline` varchar(500) NULL,
`review_body` string NULL
) ENGINE=OLAP
DUPLICATE KEY(`review_date`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`review_date`) BUCKETS 16
PROPERTIES (
"compression" = "ZSTD"
);
3. Download Sample Data
wget https://datasets-documentation.s3.eu-west-3.amazonaws.com/amazon_reviews/amazon_reviews_2010.snappy.parquet
4. Load Data into Doris
curl --location-trusted -u root:<your password> \
-T amazon_reviews_2010.snappy.parquet \
-H "format:parquet" \
http://127.0.0.1:8030/api/test/amazon_reviews/_stream_load
5. Verify Data in MinIO
Log into MinIO Console (http://<MinIO IP>:9001) → Check doris bucket for data files.
Sample Query
SELECT
product_id,
AVG(product_title),
AVG(star_rating) AS rating,
COUNT() AS count
FROM
amazon_reviews
WHERE
review_body LIKE '%is super awesome%'
GROUP BY
product_id
ORDER BY
count DESC,
rating DESC,
product_id
LIMIT 5;
Summary
This setup is ideal for enterprises looking to balance performance and cost in real-time analytics scenarios. Try it out with the guide above and share your experience!












Top comments (0)