DEV Community

Apache Doris
Apache Doris

Posted on

Hands-on Guide to Enable Compute Nodes for Data Lake Analytics in Apache Doris

Compute Node is a type of backend nodes in Apache Doris that is designed for remote federated query workloads, such as those in data lake analytics. Unlike normal backend nodes, the Compute Nodes are stateless and do not store any data, making them flexible and easy enough to join a cluster for scaling.

How to enable Compute Nodes in Apache Doris

Firstly, follow the doc to install Apache Doris.

Note:

FE configuration

Add the following configurations to fe.conf:

prefer_compute_node_for_external_table=true
min_backend_num_for_external_table=1
Enter fullscreen mode Exit fullscreen mode
  1. prefer_compute_node_for_external_table=true: This means that external table queries will be preferentially assigned to the Compute Nodes. If this is set to false, external table queries will be assigned to any backend nodes. The maximum number of Compute Nodes is determined by min_backend_num_for_external_table.

  2. min_backend_num_for_external_table: This is only effective when prefer_compute_node_for_external_table is true. If there are less Compute Nodes in the cluster than the value of min_backend_num_for_external_table, some external table queries will be executed on mixed nodes; otherwise, all external table queries will be executed on Compute Nodes. The default value of this parameter is 3.

BE configuration

Add the following configurations to be.conf:

be_node_role=computation
Enter fullscreen mode Exit fullscreen mode

By default, this parameter is set to mix, which means the normal mixed backend nodes.

If you have added nodes to your cluster and started the nodes, you can check information of backend nodes. mix in the NodeRole field means it is a mixed node while computation means it is a Compute Node.

In the following example, 192.168.0.128 and 192.168.0.129 are set to be Compute Nodes.

mysql> show backends\G;
************************* 1. row *************************
            BackendId: 11007
              Cluster: default_cluster
                    IP: 192.168.0.114
              HostName: 192.168.0.114
        HeartbeatPort: 9050
                BePort: 9060
              HttpPort: 8040
              BrpcPort: 8060
        LastStartTime: 2023-06-03 21:51:24
        LastHeartbeat: 2023-06-03 21:51:40
                Alive: true
  SystemDecommissioned: false
ClusterDecommissioned: false
            TabletNum: 21
      DataUsedCapacity: 0.000
        AvailCapacity: 177.323 GB
        TotalCapacity: 196.735 GB
              UsedPct: 9.87 %
        MaxDiskUsedPct: 9.87 %
    RemoteUsedCapacity: 0.000
                  Tag: {"location" : "default"}
                ErrMsg:
              Version: doris-2.0.0-alpha-a925ec9
                Status: {"lastSuccessReportTabletsTime":"2023-06-03 21:51:26","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false}
HeartbeatFailureCounter: 0
              NodeRole: mix
*************************** 2. row ***************************
            BackendId: 11026
              Cluster: default_cluster
                    IP: 192.168.0.128
              HostName: 192.168.0.128
        HeartbeatPort: 9050
                BePort: 9060
              HttpPort: 8040
              BrpcPort: 8060
        LastStartTime: 2023-06-03 21:50:34
        LastHeartbeat: 2023-06-03 21:51:40
                Alive: true
  SystemDecommissioned: false
ClusterDecommissioned: false
            TabletNum: 0
      DataUsedCapacity: 0.000
        AvailCapacity: 177.323 GB
        TotalCapacity: 196.735 GB
              UsedPct: 9.87 %
        MaxDiskUsedPct: 9.87 %
    RemoteUsedCapacity: 0.000
                  Tag: {"location" : "default"}
                ErrMsg:
              Version: doris-2.0.0-alpha-a925ec9
                Status: {"lastSuccessReportTabletsTime":"2023-06-03 21:51:38","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false}
HeartbeatFailureCounter: 0
              NodeRole: computation
************************* 3. row *************************
            BackendId: 11045
              Cluster: default_cluster
                    IP: 192.168.0.129
              HostName: 192.168.0.129
        HeartbeatPort: 9050
                BePort: 9060
              HttpPort: 8040
              BrpcPort: 8060
        LastStartTime: 2023-06-03 21:49:52
        LastHeartbeat: 2023-06-03 21:51:40
                Alive: true
  SystemDecommissioned: false
ClusterDecommissioned: false
            TabletNum: 0
      DataUsedCapacity: 0.000
        AvailCapacity: 177.319 GB
        TotalCapacity: 196.735 GB
              UsedPct: 9.87 %
        MaxDiskUsedPct: 9.87 %
    RemoteUsedCapacity: 0.000
                  Tag: {"location" : "default"}
                ErrMsg:
              Version: doris-2.0.0-alpha-a925ec9
                Status: {"lastSuccessReportTabletsTime":"2023-06-03 21:51:02","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false}
HeartbeatFailureCounter: 0
              NodeRole: computation
3 rows in set (0.00 sec)
Enter fullscreen mode Exit fullscreen mode

Test

The following uses MySQL external tables as an example.

Create MySQL Catalog:

CREATE CATALOG mysql properties (
  "type"="jdbc",
  "jdbc.user"="root",
  "jdbc.password"="NewPass4321!",
  "jdbc.jdbc_url"="jdbc:mysql://192.168.0.250:3306/test",
  "jdbc.driver_url"="mysql-connector-java-8.0.25.jar",
  "jdbc.driver_class"="com.mysql.cj.jdbc.Driver"
)
Enter fullscreen mode Exit fullscreen mode

Query external tables from the Catalog:

mysql> set enable_profile = true;
Query OK, 0 rows affected (0.00 sec)

mysql> select date,user_src,new_order,payed_order from mysql.test.order_analysis limit 2;
+--------------------+------------+-----------+------------+
| date               | user_src   | new_order | payed_order|
+--------------------+------------+-----------+------------+
| 2015-10-12 00:00:00| QR code    |     15253 |      13210 |
| 2015-10-14 00:00:00| H5 page    |     17134 |      11270 |
+--------------------+------------+-----------+------------+
2 rows in set (0.03 sec)
Enter fullscreen mode Exit fullscreen mode

Check FE WebUI QueryProfile

From the SQL Profile, you can see that the query on this MySQL Catalog external table is executed by a Compute Node, instead of a mixed node.

SQL Profile-1

Ingest data from the external table into Doris:

mysql> create table test_01 as select * from mysql.test.order_analysis;
ERROR 1105 (HY000): Unexpected exception: errCode = 2, detailMessage = Failed to execute CTAS Reason: errCode = 2, detailMessage = Failed to find 3 backend(s) for policy: cluster=default_cluster | query=false | load=false | schedule=true | tags=[{"location" : "default"}] | medium=HDD
mysql>
Enter fullscreen mode Exit fullscreen mode

From the above, you can tell that the create table as select operation fails. This is because in this case, we have only one mixed node while the table is created with 3 replicas by default. This is also a demonstration of the fact that Compute Nodes are stateless and do not store any tablet replicas.

You can fix the above problem by specifying the tablet number to be 1 upon table creation. Again, in this case, we have two Compute Nodes and one mixed node (192.168.0.114).

mysql> create table test_01 PROPERTIES("replication_num" = "1") as select * from mysql.test.order_analysis;
Query OK, 5061 rows affected (0.29 sec)
{'label':'insert_9c013d7ccf064a16_a7ca128d72869a35', 'status':'VISIBLE', 'txnId':'1'}

mysql> select count(*) from test_01;
+----------+
| count(*) |
+----------+
|     5061 |
+----------+
1 row in set (0.07 sec)

mysql>
Enter fullscreen mode Exit fullscreen mode

Then, execute the query on Doris internal table. It can be told from the WebUI that the query is executed by a mixed node:

SQL Profile-2

Offlining the Compute Nodes

Taking a Compute Node offline is the same as doing that to a mixed node, except it is faster. This is because Compute Node do not store any data, so there won't be any tablet balancing processes.

alter system DECOMMISSION backend "192.168.0.128:9050";
Enter fullscreen mode Exit fullscreen mode

See? The Compute Nodes in Apache Doris are quick and easy to use. If you need any help, come join the Apache Doris community on Slack

Top comments (0)