DEV Community: Databend

Feature Preview: Iceberg Integration with Databend

Databend — Fri, 04 Aug 2023 00:43:16 +0000

A few weeks ago, during the yearly conferences of Databricks and Snowflake, AI was getting a lot of attention, but the progress in data lakes and data warehouses was also significant because data is fundamental. Apache Iceberg emerged as a prominent solution for data lakes, and Databricks unveiled UniForm to better handle Apache Iceberg and Hudi table formats from Delta data. Meanwhile, Snowflake made timely updates to Iceberg Tables, aiming to eliminate data silos.

One of the significant new features that Databend has been working on in recent months is supporting the reading of data in Apache Iceberg table format. Though it's still a work in progress, they have made good progress so far.

This article is intended to give you a preview of this new capability by demonstrating how to use Databend to mount and query an Iceberg Catalog. We will cover the core concepts of Iceberg and table formats while also introducing Databend's solutions, including its ability to handle multiple data catalogs and the implementation of IceLake in Rust from scratch. As part of the demonstration, a comprehensive workshop will be provided, so you can try it out yourself.

What is Apache Iceberg?

An increasing amount of data is now moving to the cloud and being stored in object storage. However, this setup may not fully meet the demands of modern analytics. There are two key issues to address: First, how to organize data in a more structured manner, achieving a more organized data storage approach. Second, how to provide users with broader consistency guarantees, necessary schema information, and advanced features that cater to the requirements of modern analytics workloads.

Data lakes often focus on addressing and resolving the first issue, while table formats are dedicated to providing solutions for the second one.

Apache Iceberg is a high-performance open table format designed for large-scale analytics workloads. It is known for its simplicity and reliability. It supports various query engines like Spark, Trino, Flink, Presto, Hive, and Impala. One of its killer features includes full schema evolution, time travel, and rollback capabilities. Additionally, Apache Iceberg's data partitioning and well-defined data structures make concurrent access to data sources more secure, reliable, and convenient.

If you're interested in Iceberg we recommend reading Docker Spark And Iceberg: The Fastest Way To Try Iceberg!.

Table Format

Table Format is a specification for storing data using a collection of files. It consists of definitions for the following three parts:

How to store data in files
How to store metadata for related files
How to store metadata for the table itself

Table format files are usually stored in underlying storage services such as HDFS, S3 or GCS, and are accessed by upper-level data warehouses such as Databend and Snowflake. Compared to CSV or Parquet, table format offers a standardized and structured data definition in tabular form, enabling its usage without the need to load it into a data warehouse.

Although there are strong competitors like Delta Lake and Apache Hudi in the field of table formats, this article focuses on Apache Iceberg. Let's take a look at its underlying file organization structure together.

The figures above illustrate "s0" and "s1," representing snapshots of the table. A snapshot captures the table's state at specific points in time. Each commit results in a snapshot, and each snapshot is associated with an inventory list (manifest list). The inventory list can hold multiple addresses of manifest files, along with statistical information, including path and partition range. The manifest file serves to record the addresses and statistical details of data files generated from current operations, such as maximum/minimum values and number of rows per column.

Multiple Catalog

To integrate Databend with Iceberg, the first step is to add the Multiple Catalog capability to Databend. The Multiple Catalog enables the data that was previously managed by other data analysis systems to be mounted onto Databend.

From the very beginning, Databend's objective has been to function as a cloud-native OLAP data warehouse, with a focus on addressing the complexities of handling multiple data sources. In Databend, data is structured into three layers: catalog -> database -> table. The catalog represents the highest level and encompasses all databases and tables.

Based on this foundation, our team designed and implemented support for Hive and Iceberg data catalogs, providing various mounting forms such as configuration files and CREATE CATALOG statements.

To mount Iceberg Catalog located in S3, simply execute the following SQL statement:

CREATE CATALOG iceberg_ctl
TYPE=ICEBERG
CONNECTION=(
    URL='s3://warehouse/path/to/db'
    AWS_KEY_ID='admin'
    AWS_SECRET_KEY='password'
    ENDPOINT_URL='your-endpoint-url'
);

IceLake - A Pure Rust Implementation of Apache Iceberg

Although the Rust ecosystem has seen the rise of many new projects related to databases and big data analysis in recent years, there is still a notable absence of mature Apache Iceberg bindings in Rust. This has presented significant challenges for Databend when it comes to integrating with Iceberg.

IceLake, supported and initiated by Databend Labs, aims to overcome the challenges and establish an open ecosystem where:

Users can read/write iceberg tables from ANY storage services like s3, gcs, azblob, hdfs and so on.
ANY databases can integrate with icelake to facilitate reading and writing of iceberg tables.
Provides NATIVE support transmute between arrows.
Provides bindings so that other languages can work with iceberg tables powered by Rust core.

Currently, IceLake only supports reading data (in Parquet format) from Apache Iceberg storage services. The design and implementation of Databend's Iceberg catalog is supported by Icelake and has been validated through integration with Databend.

In addition, we have also collaborated with the Iceberg community to initiate and participate in the iceberg-rust project. The project aims to contribute Iceberg-related implementations from IceLake upstream, and the first version is currently under intense development.

Workshop: Experience Iceberg with Databend

In this workshop, we will demonstrate how to prepare data in Iceberg table format and mount it onto Databend as a Catalog, and perform some basic queries. Relevant files and configurations can be found at PsiACE/databend-workshop.

If you already have data that conforms to the Iceberg table format stored in a storage service supported by OpenDAL, we recommend using Databend Cloud so that you can skip the tedious process of service deployment and data preparation, and easily get started with the Iceberg Catalog.

Starting Services

To simplify the service deployment and data preparation issues of Iceberg, we will be using Docker and Docker Compose. You need to install these components first, and then write the docker-compose.yml file.

version: "3"

services:
  spark-iceberg:
    image: tabulario/spark-iceberg
    container_name: spark-iceberg
    build: spark/
    networks:
      iceberg_net:
    depends_on:
      - rest
      - minio
    volumes:
      - ./warehouse:/home/iceberg/warehouse
      - ./notebooks:/home/iceberg/notebooks/notebooks
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
    ports:
      - 8888:8888
      - 8080:8080
      - 10000:10000
      - 10001:10001
  rest:
    image: tabulario/iceberg-rest
    container_name: iceberg-rest
    networks:
      iceberg_net:
    ports:
      - 8181:8181
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
      - CATALOG_WAREHOUSE=s3://warehouse/
      - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
      - CATALOG_S3_ENDPOINT=http://minio:9000
  minio:
    image: minio/minio
    container_name: minio
    environment:
      - MINIO_ROOT_USER=admin
      - MINIO_ROOT_PASSWORD=password
      - MINIO_DOMAIN=minio
    networks:
      iceberg_net:
        aliases:
          - warehouse.minio
    ports:
      - 9001:9001
      - 9000:9000
    command: ["server", "/data", "--console-address", ":9001"]
  mc:
    depends_on:
      - minio
    image: minio/mc
    container_name: mc
    networks:
      iceberg_net:
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
    entrypoint: >
      /bin/sh -c "
      until (/usr/bin/mc config host add minio http://minio:9000 admin password) do echo '...waiting...' && sleep 1; done;
      /usr/bin/mc rm -r --force minio/warehouse;
      /usr/bin/mc mb minio/warehouse;
      /usr/bin/mc policy set public minio/warehouse;
      tail -f /dev/null
      "      
networks:
  iceberg_net:

In the above configuration file, we use MinIO as the underlying storage, Iceberg provides table formatting capabilities, and spark-iceberg can help us prepare some pre-set data and perform conversion operations.

Next, we start all services in the directory corresponding to the docker-compose.yml file.

docker-compose up -d

Data Preparation

In this workshop, we plan to use the NYC Taxis dataset (data on taxi rides in New York City), which is already built into spark-iceberg in Parquet format. We just need to convert it to Iceberg format.

First, enable pyspark-notebook:

docker exec -it spark-iceberg pyspark-notebook

Next, we can start Jupyter Notebook at http://localhost:8888:

Here we need to run the following code to implement the data conversion operation:

df = spark.read.parquet("/home/iceberg/data/yellow_tripdata_2021-04.parquet")
df.write.saveAsTable("nyc.taxis", format="iceberg")

The first line will read the Parquet data and the second line will convert it into Iceberg format.

To verify that the data has been successfully converted, we can access the MinIO instance located at http://localhost:9001 and notice that the data is managed according to the Iceberg underlying file organization described earlier.

Deploying Databend

Here we will manually deploy a single-node Databend service. The overall deployment process can refer to Docs | Deploying a Standalone Databend, and some details that need attention are as follows:

First, prepare the relevant directories for logs and meta data.

sudo mkdir /var/log/databend
sudo mkdir /var/lib/databend
sudo chown -R $USER /var/log/databend
sudo chown -R $USER /var/lib/databend

Secondly, because the default admin_api_address has been occupied by spark-iceberg, it is necessary to edit databend-query.toml to make some modifications to avoid conflicts:
```
admin_api_address = "0.0.0.0:8088"
```
In addition, according to Docs | Configuring Admin Users, we also need to configure administrator users. Since this is just a workshop, we choose the simplest way by simply uncommenting [[query.users]] field and root user:
```
[[query.users]
name = "root"
auth_type = "no_password"
```
Because we deployed MinIO locally without setting certificate encryption, we need to use insecure HTTP protocol to load data. Therefore, it is necessary to change the configuration file of databend-query.toml in order to allow this behavior. Please try your best not enable it in production services:
```
...

[storage]

...

allow_insecure = true

...
```

The next step is to start up Databend:

```bash
./scripts/start.sh
```

We strongly recommend using BendSQL as a client tool for accessing Databand. Of course, we also support various access methods such as MySQL Client and HTTP API.

Mounting Iceberg Catalog

According to the previous configuration file, you only need to execute the following SQL statement to mount the Iceberg Catalog.

CREATE CATALOG iceberg_ctl
TYPE=ICEBERG
CONNECTION=(
    URL='s3://warehouse/'
    AWS_KEY_ID='admin'
    AWS_SECRET_KEY='password'
    ENDPOINT_URL='http://localhost:9000'
);

To verify the mounting, we can execute SHOW CATALOGS:

Of course, Databend also supports SHOW DATABASES and SHOW TABLES statements. The nyc.taxis in the data conversion corresponds to a second-level directory in MinIO and will be mapped to databases and tables in Databend.

Running Queries

Now that the data has been mounted, let's try some simple queries:

Firstly, let's count the number of rows in the data. We can see that a total of 2 million rows have been mounted to Databend:

SELECT count(*) FROM iceberg_ctl.nyc.taxis;

Let's try to retrieve some data from a few columns:

SELECT tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count FROM iceberg_ctl.nyc.taxis LIMIT 5;

The following query can help us explore the correlation between passenger count and travel distance. Here we only take 10 results:

SELECT
  passenger_count,
  to_year(tpep_pickup_datetime) AS year,
  round(trip_distance) AS distance,
  count(*)
FROM
  iceberg_ctl.nyc.taxis
GROUP BY
  passenger_count,
  year,
  distance
ORDER BY
  year,
  count(*) DESC
LIMIT
  10;

Summary

In this article, we introduced Apache Iceberg table format and Databend solution for it, and provided a workshop for everyone to gain some hands-on experience.

Currently, Databend only provide catalog mounting capability for Iceberg Integration, but it can handle some basic query processing tasks. We also welcome everyone to try it out on their own interested data and provide us with feedback.

Designing and Querying JSON in Databend

Databend — Thu, 29 Sep 2022 07:37:20 +0000

JSON (JavaScript Object Notation) is a commonly used semi-structured data type. With the self-describing schema structure, JSON can hold all data types, including multi-level nested data types, such as Array, Object, etc. JSON takes advantage of high flexibility and easy dynamic expansion compared with the structured data types that must strictly follow the fields in a tabular data structure.

As data volume increases rapidly in recent years, many platforms have started to use and get the most out of semi-structured data types (such as JSON). For example, the JSON data shared by various platforms through open interfaces, and the public datasets and application logs stored in JSON format.

Databend supports structured data types, as well as JSON. This post dives deeply into the JSON data type in Databend.

Working with JSON in Databend

Databend stores semi-structured data as the VARIANT (also called JSON) data type:

CREATE TABLE test 
  ( 
     id INT32, 
     v1 VARIANT, 
     v2 JSON 
  );

The JSON data needs to be generated by calling the” parse_json”or “try_parse_json”function. The input string must be in the standard JSON format, including Null, Boolean, Number, String, Array, and Object. In case of parsing failure due to invalid string, the “parse_json”function will return an error while the “try_parse_json”function will return a NULL value.

INSERT INTO test VALUES
  (1, parse_json('{"a":{"b":1,"c":[1,2]}}'), parse_json('[["a","b"],{"k":"a"}]')),
  (2, parse_json('{"a":{"b":2,"c":[3,4]}}'), parse_json('[["c","d"],{"k":"b"}]'));
SELECT * FROM test;
+----+-------------------------+-----------------------+
| id | v1                      | v2                    |
+----+-------------------------+-----------------------+
| 1  | {"a":{"b":1,"c":[1,2]}} | [["a","b"],{"k":"a"}] |
| 2  | {"a":{"b":2,"c":[3,4]}} | [["c","d"],{"k":"b"}] |
+----+-------------------------+-----------------------+

JSON usually holds data of Array or Object type. Due to the nested hierarchical structure, the internal elements can be accessed through JSON PATH. The syntax supports the following delimiters:

“:”: Colon can be used to obtain the elements in an object by the key.
“.”: Dot can be used to obtain the elements in an object by the key. Do NOT use a dot as the first delimiter in a statement, or Databend would consider the dot as the delimiter to separate the table name from the column name.
“[]”: Brackets can be used to obtain the elements in an object by the key or the elements in an array by the index.
You can mix the three types of delimiters above.

SELECT v1:a.c, v1:a['b'], v1['a']:c, v2[0][1], v2[1].k FROM test;
+--------+-----------+-----------+----------+---------+
| v1:a.c | v1:a['b'] | v1['a']:c | v2[0][1] | v2[1].k |
+--------+-----------+-----------+----------+---------+
| [1,2]  | 1         | [1,2]     | "b"      | "a"     |
| [3,4]  | 2         | [3,4]     | "d"      | "b"     |
+--------+-----------+-----------+----------+---------+

The internal elements extracted through JSON PATH are also of JSON type, and they can be converted to basic types through the cast function or using the conversion operator “::”.

SELECT cast(v1:a.c[0], int64), v1:a.b::int32, v2[0][1]::string FROM test;
+--------------------------+---------------+------------------+
| cast(v1:a.c[0] as int64) | v1:a.b::int32 | v2[0][1]::string |
+--------------------------+---------------+------------------+
| 1                        | 1             | b                |
| 3                        | 2             | d                |
+--------------------------+---------------+------------------+

Parsing JSON from GitHub

Many public datasets are stored in JSON format. We can import these data into Databend for parsing. The following introduction uses the GitHub events dataset as an example.

The GitHub events dataset (downloaded from GH Archive) uses the following JSON format:

{
  "id":"23929425917",
  "type":"PushEvent",
  "actor":{
    "id":109853386,
    "login":"teeckyar-bot",
    "display_login":"teeckyar-bot",
    "gravatar_id":"",
    "url":"https://api.github.com/users/teeckyar-bot",
    "avatar_url":"https://avatars.githubusercontent.com/u/109853386?"
  },
  "repo":{
    "id":531248561,
    "name":"teeckyar/Times",
    "url":"https://api.github.com/repos/teeckyar/Times"
  },
  "payload":{
    "push_id":10982315959,
    "size":1,
    "distinct_size":1,
    "ref":"refs/heads/main",
    "head":"670e7ca4085e5faa75c8856ece0f362e56f55f09",
    "before":"0a2871cb7e61ce47a6790adaf09facb6e1ef56ba",
    "commits":[
      {
        "sha":"670e7ca4085e5faa75c8856ece0f362e56f55f09",
        "author":{
          "email":"support@teeckyar.ir",
          "name":"teeckyar-bot"
        },
        "message":"1662804002 Timehash!",
        "distinct":true,
        "url":"https://api.github.com/repos/teeckyar/Times/commits/670e7ca4085e5faa75c8856ece0f362e56f55f09"
      }
    ]
  },
  "public":true,
  "created_at":"2022-09-10T10:00:00Z",
  "org":{
    "id":106163581,
    "login":"teeckyar",
    "gravatar_id":"",
    "url":"https://api.github.com/orgs/teeckyar",
    "avatar_url":"https://avatars.githubusercontent.com/u/106163581?"
  }
}

From the data above, we can see that the “actor”, “repo”, “payload”, and “org”fields have a nested structure and can be stored as JSON. Others can be stored as basic data types. So we can create a table like this:

CREATE TABLE `github_data` 
             ( 
                          `id`   VARCHAR, 
                          `type` VARCHAR, 
                          `actor` JSON, 
                          `repo` JSON, 
                          `payload` JSON, 
                          `public` BOOLEAN, 
                          `created_at` timestamp(0), 
                          `org` json 
             );

Use the COPY INTO command to load the data:

COPY INTO github_data
FROM 'https://data.gharchive.org/2022-09-10-10.json.gz'
FILE_FORMAT = (
  compression = auto
  type = NDJSON
);

The following code returns the top 10 projects with the most commits:

SELECT   repo:name, 
         count(id) 
FROM     github_data 
WHERE    type = 'PushEvent' 
GROUP BY repo:name 
ORDER BY count(id) DESC 
LIMIT    10;
+----------------------------------------------------------+-----------+
| repo:name                                                | count(id) |
+----------------------------------------------------------+-----------+
| "Lombiq/Orchard"                                         | 1384      |
| "maique/microdotblog"                                    | 970       |
| "Vladikasik/statistic"                                   | 738       |
| "brokjad/got_config"                                     | 592       |
| "yanonono/booth-update"                                  | 537       |
| "networkoperator/demo-cluster-manifests"                 | 433       |
| "kn469/web-clipper-bed"                                  | 312       |
| "ufapg/jojo"                                             | 306       |
| "bj5nj7oh/bj5nj7oh"                                      | 291       |
| "appseed-projects2/500f32d3-8019-43ee-8f2a-a273163233fb" | 247       |
+----------------------------------------------------------+-----------+

The following code returns the top 10 users with the most forks:

SELECT   actor:login, 
         count(id) 
FROM     github_data 
WHERE    type='ForkEvent' 
GROUP BY actor:login 
ORDER BY count(id) DESC 
LIMIT    10;
+-----------------------------------+-----------+
| actor:login                       | count(id) |
+-----------------------------------+-----------+
| "actions-marketplace-validations" | 191       |
| "alveraboquet"                    | 59        |
| "ajunlonglive"                    | 50        |
| "Shutch420"                       | 13        |
| "JusticeNX"                       | 13        |
| "RyK-eR"                          | 12        |
| "DroneMad"                        | 10        |
| "UnqulifiedEngineer"              | 9         |
| "PeterZs"                         | 8         |
| "lgq2015"                         | 8         |
+-----------------------------------+-----------+

Performance Optimization

The JSON data generally is saved in plaintext format and needs to be parsed to generate the enumeration value of serde_json::Value every time the data is read. Compared to other basic data types, handling JSON data takes more parsing time and needs more memory space.

Databend has improved the read performance of JSON data using the following methods:

To speed up the parsing and reduce memory usage, Databend stores the JSON data as JSONB in binary format and uses the built-in j_entry structure to hold data type and offset position of each element.
Adding virtual columns to speed up the queries. Databend extracts the frequently queried fields and the fields of the same data type and stores them as separate virtual columns. Data will be directly read from the virtual columns when querying, which makes Databend achieve the same performance as querying other basic data types.

Learn Databend's New SQL Type System in Five Minutes

Databend — Fri, 23 Sep 2022 08:27:48 +0000

Introduction

Type system is an important part of database, which provides a consistent way to determine the data type in SQL. The design of the type system greatly affects the usability and robustness of the database, as a well-designed and consistent type system makes it easy for users to check SQL behaviors. A poorly designed type system, on the contrary, may raise various errors and inconsistent behaviors, bringing a series of potential inconvenience to users. Let's take programming languages as an example, the type system of JavaScript has always been criticized.

Therefore, we hope to implement a type inference system that is powerful yet easy to understand in Databend. To achieve this goal, we learned from the compiler designs of many excellent programming languages, and selected a subset that is suitable for SQL. The design principles of this system is described below.

Interface Design

The "low coupling and high cohesion" principle, which we often refer to, is to combine the code that does the same thing, and then define a simple interface for external use. Since the type inference system has relatively complex functions, its interfaces need to be defined from the very beginning, that is, what functions can be provided and how to call them.

In short, there are three functions in the type inference system we designed:

1.Check whether the input SQL statements (RawExpr) conform to the type rules and select appropriate overloads for the functions called, and return executable expressions (Expr)

2.Return the expression value corresponding to the input data

3.Return the expression value range corresponding to the input data range (stored in meta data)

Callers only need to finish these operations:

4.Define type signatures, mappings of definition domains to function domains, and the bodies of all available functions

5.Call the executor when executing SQL or constant folding

Use the “and”function as an example. The function definition is as follows:

registry.register_2_arg::<BooleanType, BooleanType, BooleanType, _, _>(
    "and",
    FunctionProperty::default(),
    |lhs, rhs| {
        Some(BooleanDomain {
            has_false: lhs.has_false || rhs.has_false,
            has_true: lhs.has_true && rhs.has_true,
        })
    },
    |lhs, rhs| lhs && rhs,
);

A complete example of execution:

// Parse the SQL expressions into structured AST
let raw_expr = parse_raw_expr("and(true, false)");

// Get built-in functions, like the 'and' function mentioned before
let fn_registry = builtin_functions();

// Check type validity
let expr = type_check::check(&raw_expr, &fn_registry).unwrap();

// Execute
let evaluator = Evaluator {
    input_columns: Chunk::new(vec![]),
    context: FunctionContext::default(),
};
let result: Value<AnyType> = evaluator.run(&raw_expr).unwrap();

assert_eq!(result, Value::Scalar(Scalar::Boolean(false)));

Principles of Type inference

The new type system supports the following data types:

Null
Boolean
String
UInt8
UInt16
UInt32
UInt64
Int8
Int16
Int32
Int64
Float32
Float64
Date
Interval
Timestamp
Array
Nullalbe
Variant

Let's us learn about how the type inference system works through an example. Suppose this expression is input:

1 + 'foo'

The type inferencer first converts the expression input to a function call:

plus(1, 'foo')

Then the type checker can simply infer the type of the constant:

1 :: Int8
'foo' :: String

The type checker knows that there are six overloads for the function plus after querying FunctionRegistry.

plus(Null, Null) :: Null
plus(Int8, Int8) :: Int8
plus(Int16, Int16) :: Int16
plus(Int32, Int32) :: Int32
plus(Float32, Float32) :: Float32
plus(Timestamp, Timestamp) :: Timestamp

Since the argument types Int8 and String don't match any of the overloads, an error is raised by the type checker:

1 + 'foo'
  ^ function `plus` has no overload for parameters `(Int8, String)`

  available overloads:
    plus(Int8, Int8) :: Int8
    plus(Int16, Int16) :: Int16
    plus(Int32, Int32) :: Int32
    plus(Float32, Float32) :: Float32
    plus(Timestamp, Timestamp) :: Timestamp

One exception in type checking is that a subtype can be converted to its super type (CAST), so that functions can takes parameters of the subtypes. Here's an example:

plus(1, 2.0)

The type inferencer infers the constants' types according to rules:

 1 :: Int8
 2.0 :: Float32

By querying FunctionRegistry, we can find that there are two overloads for function plus that seems to fit but neither one matches perfectly.

(Int8, Int8) :: Int8
plus(Float32, Float32) :: Float32

The type checker will try to select an overload according to the CAST rule. Since values of type Int8 can be lossless converted to type Float 32, the type checker then overwrites the expression and recheck the types:

plus(CAST(1 AS Float32), 2.0)

The type check is passed in this way.

Genericity

The new type checker supports the use of genericity in function signature definitions to reduce the workload of manually defining overloaded functions. For example, we can define a function by stating “array_ Get(Array, UInt64):: T0”, which accepts an array and a subscript, and returns the element corresponding to the subscript in the array.

Compared with the type check process mentioned in the previous section, checking functions with generic signatures requires one additional step: select an appropriate specific type to replace the generic type. The replaced types should conform to the type check rules, and explanations (such as conflicting constraints) are required when there is no available type. This step is generally called Unification, and we also have an example to illustrate it:
Suppose there are two expressions:

ty1 :: (X, Boolean) 
ty2 :: (Int32, Y)

If we want “ty1”and“ty2” to have the same type (for example, when “ty1”is used as the input parameter type and “ty2”is used as the input parameter signature), “unify”will try to replace “X”and “Y”with specific types:

let subst: Subsitution = unify(ty1, ty2).unwrap();

assert_eq!(subst['X'], DataType::Int32]);
assert_eq!(subst['Y'], DataType::Boolean]);

For readers interested in “unify”, the source code of “type_check.rs”is highly recommended. We also recommend a book: Types and Programming Languages, which introduces the development of type inferences for programming languages and discusses in details the principles and trade-offs of various inference theories. A supporting toy implementation is provided for every major concept. You'll obtain great pleasure from this book especially on sleepless nights.

Summary

In this article, we introduce the design background and operation principles of our new type system, and explain how to use the executor. We look forward to introduce you the detailed methods of defining SQL functions and some related Rust tricks in another article, for this topic is as exciting as type inference.

About Databend

Databend is an open source modern data warehouse with elasticity and low cost. It can do real-time data analysis on object-based storage.We look forward to your attention and hope to explore the cloud native data warehouse solution, and create a new generation of open source data cloud together.

Databend documentation：https://databend.rs/

Twitter：https://twitter.com/Datafuse_Labs

Slack：https://datafusecloud.slack.com/

Wechat：Databend

GitHub ：https://github.com/datafuselabs/databend

Sqllogictest Illustrated

Databend — Fri, 16 Sep 2022 09:13:01 +0000

Background

You might want to know that the Databend team is rolling out a sqllogictest to replace the stateless tests. In this post, we share our experience of designing, developing, and using the sqllogictest.

About sqllogictest

Database quality assurance

Test dimension and test coverage are the keys to ensure database quality. Test dimension includes unit test, fuzzy test, functional test (sqllogitest belongs to this category), end-to-end (e2e) test, performance test, etc.

The basic idea of functional tests for database is to compare the execution returns with expectations. Generally, the following issues need to be considered in advance:

How to prescribe the test-script format?
How to compare the results? In most solutions, the returns are saved as files, making detailed execution results invisible . We had to add additional output between cases to determine approximately where the cases were making errors.
How to bridge the differences between clients and databases? In most cases, different clients have different return formats, similarly different databases have different outputs for certain types.

What is sqllogictest?

Developed by SQLite's author D. Richard Hipp, sqllogictest was originally designed to test SQLite. See more about its design concepts in https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki.The sqllogictest program seeks to answer just one question: Does the database engine compute the correct answer? No attention is paid to performance, optimal use of index, disk and memory, transactional behavior, or concurrency and locking issues.

At present, mainstream databases have all implemented their own sqllogictest test tools and test cases, yet the syntaxes of test cases are slightly different and not compatible with each other. The implementation methods of test tools are also different:

[YDB] implemented by Python
[CockroachDB] implemented by Go

Why Databend needs sqllogictest?

There is already a set of functional test tools implemented in Databend, which divides the functional test cases into stateless tests and stateful tests, using Clickhouse for reference. The use cases are written as scripts (or SQL files), and the expected results are saved as files with the same name but different suffixes. We can run Databend-test(written in Python) to execute tests, and then use diff to compare the results.

This method, however, is not friendly to the compilation and modification of error cases. Moreover, Databend supports multiple sets of handlers (such as MySQL, HTTP, and Clickhouse), yet this method cannot conduct tests for every handler (kind of like testing different databases). Therefore, we looked for methods and tools that can solve these problems.

How to implement sqllogictest in Databend?

The implementation versions of sqllogictest differ greatly, which is not only reflected in the supported use case syntaxes, but also in the technical stacks used and the functions implemented. As a result, it is difficult to use the existing test cases or the tools directly.

After analyzing and comparing different implementation schemes, we found that the core function requirements of sqllogictest are in fact not too much. The existing versions in the community differed greatly from each other and can't be used directly. Moreover, with the advancement of testing practice, many new demands would be proposed, therefore the workload of customized development would certainly be quite arduous. Considering all these, we chose to start from scratch with Python.

In sqllogictest, there are multiple Runners interacting with different databases or handlers. Each runner should implement all the methods in the base class SuiteRunner, including:

execute_ok
execute_error
execute_query
batch_execute

These methods are the core of sqllogictest execution. Besides, the SuiteRunner class also stores some of the state and control variables during execution.

Take the typo of Http Runner as an example . In addition to the necessary interfaces like“execute_ok”，“execute_error”，“execute_query”，“batch_execute”, two functions are also realized for resetting connections and sessions.

The use case file is parsed by the Statement class. At present, a simple method is used for syntax parsing, that is, reading the file line by line and using regular expressions for matching. Compared with implementing an interpreter, the advantage of this approach is that it can be implemented quickly, yet the disadvantage is that it would be cumbersome to extend the syntax support.

Error information is output through logicError, including the name of the runner where the error occurred, the error message (including the details of the error statement) and the error type.

A LogicTestStatistics class is also implemented to record the time cost of each SQL execution. The final statistical output is relatively simple now but can be supplemented in the future.

How to write test scripts for sqllogictest ?

Basic functions

Here is a quick guidehttps://github.com/datafuselabs/databend/blob/main/tests/logictest/suites/select_0.The supported handlers: MySQL handler, HTTP handler, Clickhouse handlerAnnotation: The use of '--' to annotate specific lines is supported.Statement types：

ok
- The statement is executed correctly and no error is returned.
error <error regex>
- An error occurs when the statement is executed, and the returned error info contains expected content. Usually return codes are used, text messages are also feasible (but not intuitive).
query <options> <labels>
- The execution return includes a result set, and the comparison method of the result sets be specified by options and labels parameters.
options: Composed of characters, and each character represents a column in the result set.
The supported characters are:
- B Boolean
- T text
- F floating point
- I integer
- labels: The differences in the processing of results by different databases (handlers) are distinguished by labels. Commas are used when there are multiple differences.

Compared to ok and error statements, query statements are more complicated. Here is a use case of query type (for reference only and may be inconsistent with the actual results):

statement query III label(mysql) 
select number, number + 1, number + 999 from numbers(10); 
----0     1   9991     2  10002     3  10013     4  10024     5  10035     6  10046     7  10057     8  10068     9  10079    10  1008.0 
----  mysql0     1   9991     2  10002     3  10013     4  10024     5  10035     6  10046     7  10057     8  10068     9  10079    10  1008

Control syntax in the testing process:

1.Use“ skipif” to skip specified runner

skipif clickhouse 
statement query I 
select 1; 
---- 
1

2.Use “onlyif” to execute the specified runner

onlyif mysql 
statement query I 
select 1; 
---- 
1

3.When encountering occasional test failures that cannot be solved easily, we can use skipped to skip the use case, or annotate it for now.

statement query skipped I 
select 1; 
---- 
1

Execution output

A SUCCESS example:

The current summary contains simple statistics on the test execution process, including the number of case files executed, the number of statements contained in each case file, the average time of each statement execution and the average time of case execution.

A FAILURE example:

he failed case is in the 4th line of “base/15_query/alias/having_with_alias.test”. The expected return is 1 but get null instead.

Another FAILURE example:

The failed case is in the 1st line of “base/02_function/02_0017_function_strings_oct”. The reported error is table already exists.

As can be seen from the above examples, it is easy to locate the specific use case file or even the index of line or SQL with the returned output. For those that need to compare the results, the expected and actual return values will also be printed out so users can easily find out the error. This greatly improves the use experience of developers and improves the efficiency of debugging.

Using sqllogictest in a pipeline

When a PR(Pull Request) is proposed to the Databend repository, a pipeline will be constructed. Then enter the testing process, the constructed product will be executed in a new environment, and various tests will be conducted at the same time. Sqllogictest is an important part of the process.

As shown in the figure:

Only after all the tests pass successfully can the submission be merged into the branch, ensuring that none revision will interfere with the expected functions. All we need to do is to extend the use cases and prompt the test coverage.

Running sqllogictest

For contributors

Directly execute “make sqllogic-test”in the Databend directory after cloning.

For users

1.Deploy and run Databend, refer to https://databend.rs/doc/deploy/deploying-databend.

2.Copy Databend code of corresponding version, then change the directory to tests/logictest.

3.Install python3(>=3.8).

4.Install dependencies.

pip3 install -r requirements.txt

5.Run “python3” main.py.

Run parameters

Command-line parameters

“--suites other_dir” will run all the case files in ."/other_dir"
“--run-dir ydb” will run all the case files from directories in."/suites/"with "ydb" contained in their names
"--skip-dir ydb" will skip all the case files from directories in."/suites/"with "ydb" contained in their names
"python main.py "03_0001""will run the specific case file with "03_0001" contained in its name

Environment variables parameters

A	B
SKIP_TEST_FILES	Cases containing the specified filename are skipped, separated by commas
DISABLE_MYSQL_LOGIC_TEST	Close the test of mysql handler, any value
DISABLE_HTTP_LOGIC_TEST	Close the test of http handler, any value
DISABLE_CLICKHOUSE_LOGIC_TEST	Close the test of clickhouse handler, any value
QUERY_MYSQL_HANDLER_HOST	mysql handler address
QUERY_MYSQL_HANDLER_PORT	mysql handler port
QUERY_HTTP_HANDLER_HOST	http handler address
QUERY_HTTP_HANDLER_PORT	http handler port
QUERY_CLICKHOUSE_HANDLER_HOST	clickhouse handler address
QUERY_CLICKHOUSE_HANDLER_PORT	clickhouse handler port
MYSQL_DATABASE	Default database,usually default
MYSQL_USER	Default user, usually root
ADDITIONAL_HEADERS	Usually used for the extension requirements of HTTP protocol, such as identity authentication

These parameters can meet personalized running conditions, such as Databend not deployed locally or testing MySQL and Clickhouse (when only HTTP is supported, and Clickhouse native protocol is not supported)Note: due to the SQL dialect problem, our use cases may have statements that are not supported by other databases, and the use cases of other databases may also have similar situations.

Tips

Use ok statements for use cases where the results are less important.
Prefer to use error codes in error statements, as messages are unstable.
The spaces in the result sets of query statements are only used to distinguish different columns. Additional spaces will not affect the test results.
When using query statements, a /t tab key is needed if there is a blank line in the returned result.
Since we dropped the support for sorting and retry syntaxes in use cases (now implemented in the test tool), it is necessary to add an order by statement to ensure that the result order is always consistent

How to organize test cases?

Test modules are under the first level directory. For example, we currently have two modules: base and ydb, where base stores our own use cases, and ydb represents cases imported from YDB.As for organizations in the respective modules, there is no clear specification yet. These methods are generally followed:

Organize cases by statements like cockroachdb
Organize cases by statement types or related modules, such as DML, DDL and planner_v2, according to function development process

Extension

Regular matching of returned columns is required, for currently only precise matching is supported in query statements, and this can't meet the needs of some fuzzy matching.

Time formats matching, thus use cases without a fixed time in its return would be supported.

Future plans

Improve the use experience and toolchain of sqllogictest

The use experience of sqllogictest includes the improvement of functional requirements, more friendly log output method, use case migration tools (from SQL files or third-party sqllogictest use case files), etc.

Extend test cases and raise test coverage

The test case sets are valuable assets, which often take a lot of time to design and perfect. It is of great significance to improve the test coverage by using migration cases. We also need to improve our own test scenarios and functional test coverage at the same time.

Open-SQL-Logictest?

Many database projects have included a sqllogictest. A sqllogictest is usually implemented based on a specified project and cannot work in another project. We need a set of standardized practical methods for the sqllogictest implementation that includes all the considerations of the requirements from sqllogictest.

If one day we can sort out all the requirements of sqllogictest and define certain standards, this can then be an option.

Reference

https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki

https://github.com/datafuselabs/databend/tree/main/tests/logictest

Reading Source Code of Databend (2) ：Query Server Startup, Session Management & Request Processing

Databend — Fri, 09 Sep 2022 06:54:21 +0000

Entrypoint of query server

The entrypoint of query server is in the directory of databend/src/binaries/query/main.rs. After initial configurations are completed, a “ GlobalServices” and a “shutdown_handle” are created. The latter handles shutdown logic when the server is closed.

GlobalServices::init(conf.clone()).await?;
let mut shutdown_handle = ShutdownHandle::create()?;

GlobalServices

“GlobalServices” are responsible for starting all the global services for databend-query, and all of them follow the single responsibility principle.

pub struct GlobalServices {
    global_runtime: UnsafeCell<Option<Arc<Runtime>>>,
    // Process query logs
    query_logger: UnsafeCell<Option<Arc<QueryLogger>>>,
    // Implement cluster discovery mechanism for databend-query
    cluster_discovery: UnsafeCell<Option<Arc<ClusterDiscovery>>>,
    // Interact with the storage layer to read/write data
    storage_operator: UnsafeCell<Option<Operator>>,
    async_insert_manager: UnsafeCell<Option<Arc<AsyncInsertManager>>>,
    cache_manager: UnsafeCell<Option<Arc<CacheManager>>>,
    catalog_manager: UnsafeCell<Option<Arc<CatalogManager>>>,
    http_query_manager: UnsafeCell<Option<Arc<HttpQueryManager>>>,
    data_exchange_manager: UnsafeCell<Option<Arc<DataExchangeManager>>>,
    session_manager: UnsafeCell<Option<Arc<SessionManager>>>,
    users_manager: UnsafeCell<Option<Arc<UserApiProvider>>>,
    users_role_manager: UnsafeCell<Option<Arc<RoleCacheManager>>>,
}

All the global services in “GlobalServices” implement a trait singleton. This article focuses on the logics of session processing, and the global managers will be introduced in subsequent articles.

pub trait SingletonImpl<T>: Send + Sync {
    fn get(&self) -> T;

    fn init(&self, value: T) -> Result<()>;
}

pub type Singleton<T> = Arc<dyn SingletonImpl<T>>;

ShutdownHandler

Next, the handlers are initialized according to the network protocol and registered in “shutdown_handler” service, types that implements server trait can all be added to the services.

#[async_trait::async_trait]
pub trait Server: Send {
    async fn shutdown(&mut self, graceful: bool);
    async fn start(&mut self, listening: SocketAddr) -> Result<SocketAddr>;
}

Currently, Databend supports three request protocols (MySQL, Clickhouse HTTP, and raw HTTP).

// MySQL handler.
{
    let hostname = conf.query.mysql_handler_host.clone();
    let listening = format!("{}:{}", hostname, conf.query.mysql_handler_port);
    let mut handler = MySQLHandler::create(session_manager.clone());
    let listening = handler.start(listening.parse()?).await?;
    // register the service in shutdown_handle to process server shutdown，same as below
    shutdown_handle.add_service(handler);
}

// ClickHouse HTTP handler.
{
    let hostname = conf.query.clickhouse_http_handler_host.clone();
    let listening = format!("{}:{}", hostname, conf.query.clickhouse_http_handler_port);

    let mut srv = HttpHandler::create(session_manager.clone(), HttpHandlerKind::Clickhouse);
    let listening = srv.start(listening.parse()?).await?;
    shutdown_handle.add_service(srv);
}

// Databend HTTP handler.
{
    let hostname = conf.query.http_handler_host.clone();
    let listening = format!("{}:{}", hostname, conf.query.http_handler_port);

    let mut srv = HttpHandler::create(session_manager.clone(), HttpHandlerKind::Query);
    let listening = srv.start(listening.parse()?).await?;
    shutdown_handle.add_service(srv);
}

Then some other services are created:

Metric service: Metrics related services
Admin service: Administration related services
RPC service: RPC service for query nodes, which handles the communications between query nodes using arrow flight protocol

// Metric API service.
{
    let address = conf.query.metric_api_address.clone();
    let mut srv = MetricService::create(session_manager.clone());
    let listening = srv.start(address.parse()?).await?;
    shutdown_handle.add_service(srv);
    info!("Listening for Metric API: {}/metrics", listening);
}

// Admin HTTP API service.
{
    let address = conf.query.admin_api_address.clone();
    let mut srv = HttpService::create(session_manager.clone());
    let listening = srv.start(address.parse()?).await?;
    shutdown_handle.add_service(srv);
    info!("Listening for Admin HTTP API: {}", listening);
}

// RPC API service.
{
    let address = conf.query.flight_api_address.clone();
    let mut srv = RpcService::create(session_manager.clone());
    let listening = srv.start(address.parse()?).await?;
    shutdown_handle.add_service(srv);
    info!("Listening for RPC API (interserver): {}", listening);
}

Finally, this query node is registered in meta server.

// Cluster register.
{
    let cluster_discovery = session_manager.get_cluster_discovery();
    let register_to_metastore = cluster_discovery.register_to_metastore(&conf);
    register_to_metastore.await?;
}

About sessions

There are four parts in session management：

session_manager: Globally unique, manages client sessions
session: Every time a new client connects to the server, a session is created and registered to the session_ manager
query_ctx: Each query creates a query_ctx, which is used to store the context information
query_ctx_shared: Context information shared by subqueries

Let's look at them one by one.

SessionManager (query/src/sessions/session_mgr.rs)

pub struct SessionManager {
    pub(in crate::sessions) conf: Config,
    pub(in crate::sessions) max_sessions: usize,
    pub(in crate::sessions) active_sessions: Arc<RwLock<HashMap<String, Arc<Session>>>>,
    pub status: Arc<RwLock<SessionManagerStatus>>,

    // When session type is MySQL, insert into this map, key is id, val is MySQL connection id.
    pub(crate) mysql_conn_map: Arc<RwLock<HashMap<Option<u32>, String>>>,
    pub(in crate::sessions) mysql_basic_conn_id: AtomicU32,
}

“SessionManager” is mainly used to create and destroy sessions, the specific methods are as follows:

// Create a session according to the client protocol
pub async fn create_session(self: &Arc<Self>, typ: SessionType) -> Result<SessionRef>

// Destroy a session by session_ids
pub fn destroy_session(self: &Arc<Self>, session_id: &String)

Session (query/src/sessions/session.rs)

Context information of the client server stored in the session. No more detail will be described as the code logic is already clear.

pub struct Session {
    pub(in crate::sessions) id: String,
    pub(in crate::sessions) typ: RwLock<SessionType>,
    pub(in crate::sessions) session_ctx: Arc<SessionContext>,
    status: Arc<RwLock<SessionStatus>>,
    pub(in crate::sessions) mysql_connection_id: Option<u32>,
}
pub struct SessionContext {
    conf: Config,
    abort: AtomicBool,
    current_catalog: RwLock<String>,
    current_database: RwLock<String>,
    current_tenant: RwLock<String>,
    current_user: RwLock<Option<UserInfo>>,
    auth_role: RwLock<Option<String>>,
    client_host: RwLock<Option<SocketAddr>>,
    io_shutdown_tx: RwLock<Option<Sender<Sender<()>>>>,
    query_context_shared: RwLock<Option<Arc<QueryContextShared>>>,
}

pub struct SessionStatus {
    pub session_started_at: Instant,
    pub last_query_finished_at: Option<Instant>,
}

Another major function of “Session” is to create and obtain “QueryContext” s. Each time a query request is received, a “QueryContext” is created and bound to the corresponding query statement.

QueryContext (query/src/sessions/query_ctx.rs)

QueryContexts are used to maintain context information of certain queries . They're created by “QueryContext::create_from_shared(query_ctx_shared)”

#[derive(Clone)]
pub struct QueryContext {
    version: String,
    statistics: Arc<RwLock<Statistics>>,
    partition_queue: Arc<RwLock<VecDeque<PartInfoPtr>>>,
    shared: Arc<QueryContextShared>,
    precommit_blocks: Arc<RwLock<Vec<DataBlock>>>,
    fragment_id: Arc<AtomicUsize>,
}

Among the members, “partition_queue” stores the corresponding PartInfo, including the address and version of the part, rows of data involved, the compression algorithm used, and the meta information related to column. Partition is set when building the pipeline. There will be subsequent articles on pipeline. Metadata that has been written to the storage by the temporary insert operation but has not been submitted is stored in “precommit_blocks” . “DataBlock” contains the meta information reference of column and information of arrow schema.

QueryContextShared (query/src/sessions/query_ctx_shared.rs)

For queries containing subqueries, much context information needs to be shared. This is why we need “QueryContextShared”.

/// It is important that data is shared among query context, for example:
///     USE database_1;
///     SELECT
///         (SELECT scalar FROM table_name_1) AS scalar_1,
///         (SELECT scalar FROM table_name_2) AS scalar_2,
///         (SELECT scalar FROM table_name_3) AS scalar_3
///     FROM table_name_4;
/// runtime, session, progress, init_query_id are shared among the subqueries
pub struct QueryContextShared {
    /// scan_progress for scan metrics of datablocks (uncompressed)
    pub(in crate::sessions) scan_progress: Arc<Progress>,
    /// write_progress for write/commit metrics of datablocks (uncompressed)
    pub(in crate::sessions) write_progress: Arc<Progress>,
    /// result_progress for metrics of result datablocks (uncompressed)
    pub(in crate::sessions) result_progress: Arc<Progress>,
    pub(in crate::sessions) error: Arc<Mutex<Option<ErrorCode>>>,
    pub(in crate::sessions) session: Arc<Session>,
    pub(in crate::sessions) runtime: Arc<RwLock<Option<Arc<Runtime>>>>,
    pub(in crate::sessions) init_query_id: Arc<RwLock<String>>,
    ...
}

It provides all the basic information required by the query context.

Handler

As mentioned earlier, Databend supports multiple handlers. Let's take MySQL handlers as an example to see the processing procedure of handlers and how they interact with sessions. First, a reference to “SessionManager” is contained in the “MySQLHandler” .

pub struct MySQLHandler {
    abort_handle: AbortHandle,
    abort_registration: Option<AbortRegistration>,
    join_handle: Option<JoinHandle<()>>,
}

After it's started, the “MySQLHandler” spawns a tokio task to continuously listen to TCP stream, create a session, and then start a task to execute the following query requests.

fn accept_socket(session_mgr: Arc<SessionManager>, executor: Arc<Runtime>, socket: TcpStream) {
    executor.spawn(async move {
        // create a session
        match session_mgr.create_session(SessionType::MySQL).await {
            Err(error) => Self::reject_session(socket, error).await,
            Ok(session) => {
                info!("MySQL connection coming: {:?}", socket.peer_addr());
                // execut queries
                if let Err(error) = MySQLConnection::run_on_stream(session, socket) {
                    error!("Unexpected error occurred during query: {:?}", error);
                };
            }
        }
    });
}

In the function of “MySQLConnection::run_on_stream”, the session first attaches to the corresponding client host and registers a shutdown closure to handle related cleanups when the connection is closed.

Related code is as follows：

// mysql_session.rs
pub fn run_on_stream(session: SessionRef, stream: TcpStream) -> Result<()> {
    let blocking_stream = Self::convert_stream(stream)?;
    MySQLConnection::attach_session(&session, &blocking_stream)?;

    ...
}

fn attach_session(session: &SessionRef, blocking_stream: &std::net::TcpStream) -> Result<()> {
    let host = blocking_stream.peer_addr().ok();
    let blocking_stream_ref = blocking_stream.try_clone()?;
    session.attach(host, move || {
        // register shutdown 
        if let Err(error) = blocking_stream_ref.shutdown(Shutdown::Both) {
            error!("Cannot shutdown MySQL session io {}", error);
        }
    });

    Ok(())
}

// session.rs
pub fn attach<F>(self: &Arc<Self>, host: Option<SocketAddr>, io_shutdown: F)
where F: FnOnce() + Send + 'static {
    let (tx, rx) = oneshot::channel();
    self.session_ctx.set_client_host(host);
    self.session_ctx.set_io_shutdown_tx(Some(tx));

    common_base::base::tokio::spawn(async move {
        // trigger cleanups when the session quits 
        if let Ok(tx) = rx.await {
            (io_shutdown)();
            tx.send(()).ok();
        }
    });
}

Then a MySQL InteractiveWorker is started to handle subsequent queries.

let join_handle = query_executor.spawn(async move {
    let client_addr = non_blocking_stream.peer_addr().unwrap().to_string();
    let interactive_worker = InteractiveWorker::create(session, client_addr);
    let opts = IntermediaryOptions {
        process_use_statement_on_query: true,
    };
    let (r, w) = non_blocking_stream.into_split();
    let w = BufWriter::with_capacity(DEFAULT_RESULT_SET_WRITE_BUFFER_SIZE, w);
    AsyncMysqlIntermediary::run_with_options(interactive_worker, r, w, &opts).await
});
let _ = futures::executor::block_on(join_handle);

“InteractiveWorker” implements AsyncMysqlShim trait methods, such as “on_execute”, “on_query” and so on. When a query arrives, these methods are called to make executions.Take “on_query”

for example, the core code is as follows:

async fn on_query<'a>(
    &'a mut self,
    query: &'a str,
    writer: QueryResultWriter<'a, W>,
) -> Result<()> {
    ...

    // response writer
    let mut writer = DFQueryResultWriter::create(writer);

    let instant = Instant::now();
    // execute queries
    let blocks = self.base.do_query(query).await;

    // write results
    let format = self.base.session.get_format_settings()?;
    let mut write_result = writer.write(blocks, &format);

    ...

    // metrics info
    histogram!(
        super::mysql_metrics::METRIC_MYSQL_PROCESSOR_REQUEST_DURATION,
        instant.elapsed()
    );

    write_result
}

In “do_query”, a “QueryContext ” is created and the subsequent SQL queries are executed with the parsed SQL process. Related code is as follows:

// create a QueryContext
let context = self.session.create_query_context().await?;
// attach it to the query statement
context.attach_query_str(query);

let settings = context.get_settings();

// parse sql
let stmts_hints = DfParser::parse_sql(query, context.get_current_session().get_type());
...

// Define and generate a query plan
let mut planner = Planner::new(context.clone());
let interpreter = planner.plan_sql(query).await.and_then(|v| {
    has_result_set = has_result_set_by_plan(&v.0);
    InterpreterFactoryV2::get(context.clone(), &v.0)
})

// Execute queries and return the results
Self::exec_query(interpreter.clone(), &context).await?;
let schema = interpreter.schema();
Ok(QueryResult::create(
    blocks,
    extra_info,
    has_result_set,
    schema,
))

Epilogue

The whole process from starting Databend to accepting SQL requests and starting processing is described in this article. Recently, we removed the Clickhouse native TCP client for some reasons (The Clickhouse TCP protocol is biased towards the underlying protocol of Clickhouse. With heavy historical burdens and no public documentation, the debugging process became too exhausting. See more in: https://github.com/datafuselabs/databend/pull/7012) .

Please feel free to discuss your good ideas here with us. In addition, if any relevant problem is found, you can always submit issues to help improve Databend's stability. Databend community welcomes all well-intentioned comments and suggestions :)

About Databend

Databend documentation：https://databend.rs/

Twitter：https://twitter.com/Datafuse_Labs

Slack：https://datafusecloud.slack.com/

Wechat：Databend

GitHub ：https://github.com/datafuselabs/databend

How to Read Code

Databend — Fri, 02 Sep 2022 09:04:19 +0000

The ability to read source code is considered to be one of the underlying fundamental programmer skills, and the reason why this ability is important is that

inevitably need to read or take over other people’s projects. For example, researching an open source project, such as taking over a project from someone else.
Reading good project source code is one of the important ways to learn from other people’s good experience, which I know myself.

Reading code and writing code are two not quite the same skills, the reason is that “writing code is expressing yourself, reading code is understanding others”. Because of the many projects, the authors of the project have their own style, it takes a lot of energy to understand.

I’ve been in the business for years reading the project source code in general and in detail, and have written some code analysis articles one after another, so I’ll briefly summarize my approach in this article.

The first thing:run it!

The first step to start reading a project source code, is to let the project through your own compilation and run smoothly. This is especially important.

Some projects are complex and depend on many components, so it is not easy to set up a debugging environment, so it is not always possible for all projects to run smoothly. If you can compile and run it yourself, then the scenario analysis, plus debugging code, debugging, and so on will have the basis to unfold.

In my experience, a project code, whether the debugging environment can be built smoothly, the efficiency is very different.

After running, and to try to streamline their environment to reduce the debugging process of interference information. For example, Nginx uses multiple processes to process requests. In order to debug the behavior of Nginx, I often set the number of workers to 1, so that when debugging you know which process is to be tracked.

For example, many projects are compiled with optimization options or without debugging information by default, which can be a problem when debugging, so I modify the makefile to compile to “-O0 -g”, which is compiled to generate a version with debugging information and no optimization.

All in all, the debugging efficiency can be improved a lot after running, but under the premise of running and trying to streamline the environment to exclude the disturbing factors.

Clearly define your purpose

Although it is important to read the project source code, not all projects need to be read from start to finish. Before you start reading, you need to be clear about your purpose: whether you need to understand the implementation of one of the modules, the general structure of the framework, the implementation of one of the algorithms, and so on.

For example, many people look at the code of Nginx, and the project has many modules, including the basic core modules (epoll, network sending and receiving, memory pooling, etc.) and modules that extend a specific function, and not all of these modules need to be understood very clearly.

Understanding the underlying processes at the core of Nginx and the data structures.
Understanding how Nginx implements a module. With this general understanding of the project, all that remains is to look at the specific code implementation when you encounter a specific problem. All in all, it is not recommended to start reading the code of a project without a purpose, as looking at it headlessly will only consume your time and enthusiasm.

Distinguish between main and branch storylines

With a clear purpose in mind, you can distinguish between main and subplots in the reading process. For example.

For example, if you want to understand the implementation of a business logic that uses a dictionary to store data in a function, here, “how the dictionary data structure is implemented” is a side plot, and you don’t need to look deeper into its implementation.

Guided by this principle, the reader only needs to understand the external interfaces to the code of the stems, such as a class that does not need to understand its implementation, and to understand the entry and exit parameters and the role of these interfaces, treating this part as a “black box”.

By the way, in the early years, I saw a way of writing C++ in which the header file contains only the external interface declaration of a class, and the implementation is transferred to the C++ file through an internal impl class, for example.

Header file.

// test.h
class Test {
public:
  void fun();
private:
  class Impl;
  Impl *impl_;
};

C++ file.

void Test::fun() {
  impl_->fun()
}
class Test::Impl {
public:
  void fun() {
    // Concrete implementation
  }
}

This way of writing makes the header file a lot cleaner: there are no private members or functions associated with the implementation, only the exposed interface, so the user can know at a glance what the class offers to the public.

The “main” and “branch” storylines switch frequently throughout the code reading process, requiring the reader to have some experience in knowing which part of the code they are reading is the main storyline.

Vertical and horizontal

The code reading process is divided into two different directions.

Vertical: Read in the order of the code. Vertical reading is often required when a specific understanding of a process or algorithm is needed.
Horizontal: Read different modules, and often when you need to first figure out the overall framework, you need to read horizontally.

The two directions of reading, should alternate, which requires the code reader to have some experience and be able to grasp the current direction of code reading. My advice is: the process still puts the whole first, and do not go too deep into a particular detail without understanding the overall premise. Consider a function or data structure as a black box, know their input and output, as long as it does not affect the understanding of the whole, then put it aside and move on.

Scenario analysis

If you have the previous foundation, you have been able to make the project run smoothly in your own debugging environment, and you have clarified the functions you want to understand, then you can do a scenario analysis of the project code.

The so-called “scenario analysis” is to construct some scenarios by yourself, and then analyze the behavior in these scenarios by adding breakpoints and debugging statements.

For example, when I wrote Lua Design and Implementation, I explained the process of interpreting and executing Lua virtual machine instructions, and needed to analyze each instruction, so I used scenario analysis. I would simulate the Lua script code that uses the instruction, and then breakpoint in the program to debug the behavior in these scenarios.

My usual approach is to add a breakpoint to an important entry function, then construct debugging code that triggers the scenario, and when the code stops at the breakpoint, observe the behavior of the code by looking at the stack, variable values, and so on.

For example, in Lua interpreter code, generating Opcode will eventually call the function luaK_code, so I’ll add a breakpoint above this function and construct the scene I want to debug, and as soon as I break at the breakpoint, I’ll see the complete call flow through the function stack:

(lldb) bt
* thread #1: tid = 0xb1dd2, 0x00000001000071b0 lua`luaK_code, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x00000001000071b0 lua`luaK_code
frame #1: 0x000000010000753e lua`discharge2reg + 238
frame #2: 0x000000010000588f lua`exp2reg + 31
frame #3: 0x000000010000f15b lua`statement + 3131
frame #4: 0x000000010000e0b6 lua`luaY_parser + 182
frame #5: 0x0000000100009de9 lua`f_parser + 89
frame #6: 0x0000000100008ba5 lua`luaD_rawrunprotected + 85
frame #7: 0x0000000100009bf4 lua`luaD_pcall + 68
frame #8: 0x0000000100009d65 lua`luaD_protectedparser + 69
frame #9: 0x00000001000047e1 lua`lua_load + 65
frame #10: 0x0000000100018071 lua`luaL_loadfile + 433
frame #11: 0x0000000100000eb9 lua`pmain + 1545
frame #12: 0x00000001000090cd lua`luaD_precall + 589
frame #13: 0x00000001000098c1 lua`luaD_call + 81
frame #14: 0x0000000100008ba5 lua`luaD_rawrunprotected + 85
frame #15: 0x0000000100009bf4 lua`luaD_pcall + 68
frame #16: 0x00000001000046fb lua`lua_cpcall + 43
frame #17: 0x00000001000007af lua`main + 63
frame #18: 0x00007fff6468708d libdyld.dylib`start + 1

The advantage of scenario analysis is that instead of looking for a needle in a haystack, you can narrow down the problem to a scope and understand it.

The concept of “scenario analysis” is not a term I came up with. For example, there are several books that analyze code, such as Linux Kernel Source Code Scenario Analysis, Windows Kernel Scenario Analysis, and Windows Kernel Scenario Analysis. Windows Kernel Scenario Analysis"](https://book.douban.com/subject/3715700/).

Make use of good test cases

Good projects come with a lot of use cases, examples of this type are: etcd, a few open source projects produced by google.

If the test cases are written very carefully, then it is worthwhile to study them. The reason is: test cases are often for a single scenario, alone to construct some data to verify the process of the program. So, like the previous “scenario analysis”, it’s a way to move you from a big project to a specific scenario.

Clarify the relationship between core data structures

Although it is said that “programming = algorithm + data structure”, my experience in practice is that data structure is more important.

Because the structure defines the architecture of a program, there is no concrete implementation until the structure is set. Like building a house, the data structure is the framework structure of the house, if a house is very large, and you do not know the structure of the house, will be lost in it. As for the algorithm, if it is a part of the details that you do not need to delve into for the time being, you can refer to the previous section “Distinguishing the main and branch storylines” to understand the entrance and exit parameters and their roles first.

Linus says: “Bad programmers care about code. Good programmers

Therefore, when reading a piece of code, it is especially important to clarify the relationships between the core data structures. At this time, it is necessary to use some tools to draw the relationship between these structures. There are many such examples in my source code analysis class blogs, such as Notes on Reading Leveldb Code, Implementation of Etcd Storage, and so on.

It should be noted that there is no strict sequential relationship between the two steps of scenario analysis and clarifying the core data structure; it does not have to be something first and then something else, but rather interactively.

For example, if you have just taken over a project and need to understand the project briefly, you can first read the code to understand what core data structures are available. Once you understand it, if you are not sure about the process in certain scenarios, you can use scenario analysis. In short, alternate until your questions are answered.

Ask yourself more questions

The learning process cannot be separated from the interaction.

If reading code is just an input, then there needs to be an output. Only simple input is like feeding something to you, and only better digestion can become your own nutrition, and output is an important means to better digest knowledge.

In fact, this idea is very common, for example, students need to do practice assignments (Output) in class (Input), such as learning algorithms (Input) need to practice their own coding (Output), and so on. In short, output is a kind of timely feedback in the learning process, and the higher the quality, the more efficient the learning.

There are many means of output, and when reading the code, it is more recommended to be able to ask yourself more questions, such as.

Why did you choose this data structure to describe this problem? How are other projects designed in similar scenarios? What data structures are there to do such a thing?
If I were to design such a project, what would I do?

And so on and so forth. The more active and positive thinking you do, the better output you will have, and the quality of output is directly proportional to the quality of learning

Write your own code reading note

nce I started blogging, I have been writing code reading articles for various projects, and my screen name “codedump” also comes from the idea of trying to “dump the code internal implementation principle”.

As mentioned earlier, the quality of learning is directly proportional to the quality of output, which is my own deep experience. Because of this, I insist on writing my own analysis notes after reading the source code.

Here are a few things to keep in mind when writing these kinds of notes.

Although they are notes, imagine that you are explaining the principles to someone who is less familiar with the project, or imagine that you are looking back at the article months or even years later. In this case, try to organize the language as well as possible and explain it in a step-by-step manner.

Try to avoid posting large paragraphs of code. I think it’s a bit self-defeating to post large paragraphs of code in such articles: it just looks like you understand it, but you don’t. If you really want to explain a piece of code, you can use pseudo-code or reduced code. Remember: don’t kid yourself, really get it. If you really want to add your own comments to the code, one suggestion I have is to fork a copy of some version of the project and commit it to your own github, where you can always add your own comments and save the commit. For example, my own commented code for etcd 3.1.10: etcd-3.1.10-codedump, similarly other projects I read will fork a project on github with codedump suffix of the project.

Draw more diagrams, a picture is worth a thousand words, use graphics to show the relationship between the code flow, data structures. I just recently discovered that the ability to draw diagrams is also an important ability, and I’m learning how to use images to express my ideas from scratch.

Writing is an important foundational ability, and a friend of mine recently educated me to the effect that if you are strong in something, if you add good writing and good English, then it will greatly amplify your ability in that area. And similar to writing, English such as the underlying basic ability, not a handful, need to keep practicing for a long time to be able to. And writing a blog, for technical staff, is a good means to exercise writing.

PS: If many things, you then do when you can think of the future to face the output of the person is your own, such as their own written code to maintain the back of their own articles written for their own eyes, and so on, the world will be much better. For example, writing a technical blog about these things, because I am writing when I consider that the person who will look at this document may be myself, so when writing will try to be clear and easy to understand, trying to see myself after a period of time when I see their own document, I can immediately recall the details of the time, but also because of this, I rarely post large sections of code in the blog, as far as possible to supplement the legend.

Summary

The above is my brief summary of some of the means and methods of attention when reading source code, in general there are so many points, right?

Only better output can better digest the knowledge, the so-called build debugging environment, scenario analysis, ask yourself more questions, write code reading notes, etc. are all around the output to start. In short, you can’t be like a dead fish and expect to fully understand its principles just by reading the code, you need to find ways to interact with it.
Writing is one of the basic hard skills of a person, not only to exercise their ability to express themselves, but also to help organize their thoughts. One means of exercising writing skills for programmers is to write a blog, and the sooner you start exercising, the better. Finally, as with any skill that can be acquired, the ability to read code requires long hours and lots of repetition, so next time start working on a project that interests you. ## About Databend Databend is an open source modern data warehouse with elasticity and low cost. It can do real-time data analysis on object-based storage. We look forward to your attention and hope to explore the cloud native data warehouse solution, and create a new generation of open source data cloud together.
Databend documentation：https://databend.rs/
Twitter：https://twitter.com/Datafuse_Labs
Slack：https://datafusecloud.slack.com/
Wechat：Databend
GitHub ：https://github.com/datafuselabs/databend

Reading Source Code of Databend (1) ：Introduction

Databend — Fri, 26 Aug 2022 07:51:31 +0000

Preface

Databend has gained the attention of many community members since it was open sourced in 2021. Databend is developed with Rust, therefore we designed Rust related courses and established several Rust interest groups in order to attract more developers, especially those with zero Rust development experience.

Databend also introduced Good First issue” labels to encourage newcomers of the community to make their first contributions. So far, there are more than 100 contributors, which is quite something.

However, after several iterations in the past year, the code of Databend has become increasingly complex. With 260, 000 lines of Rust code and 46 crates in the master branch at present, even developers familiar with Rust would be confused after cloning the repository, not to mention the increasing difficulty for newcomers. An article column on reading source code of Databend has been appealed for multiple times in many community groups to help make Databend code more accessible.

In response, we will launch a series of articles on Reading source code of Databend, we hope that these articles can help strengthen the communication between developers and the community, and create a source of inspiration.

The Story of Databend

A question that many developers have asked me is that: Why do you use Rust to build a database from scratch? In fact, this question can be divided into two sub-questions:

Why choose rust？

Answer: Most of our early members were contributors to well-known databases such as Clickhouse, tidb and tokudb. From the perspective of technology stacks, we are more familiar with C++ and Go. Tiger brother u/bohutang also used Go to implement a small database prototype vectorsql during the epidemic. Some developers said that the vectorsql architecture is very elegant and worth learning from.

All languages have advantages and disadvantages, and should be selected according to the specific scenarios. At present, most DMBS use C++ or Java, and the rather new NewSQL uses more of Go. According to experience, C and C++ stand for high performance because it is easy for developers to write C/C++ code with high operating efficiency. However, the development efficiency of C++ is unbearably low, and it is difficult for developers to write memory-safe and concurrent-safe code at one time with the deficient tool chain. On the contrary, Go fits the standard of elegance and simplicity better with its sufficient tool chain and high development efficiency. Yet the generics procedure is too slow, and the memory of DB cannot be flexibly controlled. The running performance cannot be compared with C/C + +, especially when using SIMD instructions where interaction with assembly code is required. What we needed is a language with both development efficiency (memory security, concurrency security, and sufficient tool chain) and operation efficiency. At that time rust seemed to be our only choice, and we never regretted our choice afterwards. Rust can meet our needs perfectly, and is also cool!

Why build a database system from scratch？

In general, there are only two routes：

Secondary Development and Optimization based on well-known Open-Source Database

Most people may choose this route, because with a good database base there is no need to do more repetitive work and thus can focus on optimization, improvement and reconstruction. In this way, the release of versions can be promoted and commercialized earlier. The disadvantage is that versions after forking is equivalent to another independent system, which cannot be fed back to the community, such as various sub genres under PG.

Building a new database system from scratch

This route is rather hard to follow, because the whole concept of database system is too large, and each sub direction is worthy of ten years of study or more. On the other hand, since there is no existing foundation, the designer can adjust and design more flexibly without paying too much attention to the historical problems. At the very beginning, Databend was designed for the scenario of cloud native data warehouse, which is very different from the traditional database system. The cost of code transformation may be the same as the cost of doing it from scratch. Therefore, we chose the second route to create a new cloud data warehouse from scratch.

Architecture

The architecture determines the superstructure, so let's start with the architecture of Databend.

r/DatafuseLabs - Reading Source Code of Databend (1) ：Introduction
Although we started from scratch with Rust, we have also integrated some excellent open-source components or ecology to avoid duplication of work, for example, Databend is compatible with the ANSI SQL standard, provides support for mainstream protocols such as MySQL / Clickhouse, embraces the arrow ecology of the Internet of things and the storage format is based on the parquet format of big data, etc. We not only actively make contributions to the upstream communities, such as arrow2/Tokio and other open- source libraries, but also open sourced some common components as independent projects in GitHub (openraft, opendal, opencache, opensrv, etc.).

Databend is a cloud native elastic database. We not only separated computing and storage, but also carefully designed each layer to obtain extreme elasticity. Databend can be divided into three layers: meta-service layer, query layer and storage layer. These three layers can be flexibly expanded, which means that users can choose the most suitable cluster size for their business and scale the cluster according to the development of business.

Next, we will introduce the main code modules of Databend from the perspective of these three layers.

Modules

MetaService Layer

MetaService layer is mainly used to store and read persistent meta-data information, such as Catalogs / Users.

Package	Usage
meta	The MetaService service is deployed as an independent process and can be deployed in multiple clusters. The bottom layer uses Raft for distributed consensus, and queries are sent and received as Grpc and MetaService.
meta/types	Definition of various structures that are stored in MetaService layer. Since these structures need to be persisted eventually, data serialization methods also need to be considered. Currently, Protobuf format is used for serialization and deserialization, the mutual serialization rule code of related Rust structures and Protobuf is defined in the common/ proto-conv subdirectory.
meta/sled-store	Currently, sled is used in MetaService layer to save persistent data. The interfaces related to sled are encapsulated in this subdirectory.
meta/raft-store	The openraft user layer needs to implement a storage interface to save data in the raft store. This subdirectory is the storage layer of openraft implemented by MetaService layer, which depends on sled storage, besides a state machine is implemented here which needs be customized by the openraft user layer.
meta/api	The user layer API interface exposed to query implemented based on kvapi.
common/meta/grpc	A client module encapsulated by grpc, the MetaService client uses this to communicate with the MetaService.
raft	https://github.com/datafuselabs/openraft, a full asynchronous Raft library derived from the async-raft project.

Query Layer

Query nodes are mainly used for calculation. Multiple query nodes can form MPP clusters, and the performance will expand horizontally with the number of query nodes theoretically. The SQL instructions will undergo the following conversion processes in query:

A SQL string is first parsed into AST syntax tree by parser, then bound with other information like catalog and turned into logical plan by Binder. Next, the logical plan is converted into physical plan by a series of optimizer processing. Finally, the physical plan is traversed to build the corresponding execution logic. The modules involved in the query layer are as follows:

Package	Usage
query	Query service, the entry of the entire function is in bin/databend-query.rs, and contains some sub modules. Here are some important ones: (1) api - exposed to external http/RPC interfaces, (2) catalogs - catalogs management. The default catalog (stored in MetaService) and hive catalog (stored in hive meta store) are supported at present, (3) Clusters - query clustering, (4) Config - query configuration, (5) databases - database engines supported by query, (6) evaluator - expression evaluation tools, (7) Interpreters - SQL executor, which performs physical execution after the plan is built by SQL, (8) pipelines - the implemented scheduling framework of physical operators, (9) Servers - exposed services, including Clickhouse/MySQL/http, etc, (10) Sessions - session management, (11) SQL - including new planner design, new binder logic, new optimizers design, (12) Storages - table engines, the most common one is the fuse engine, (13) table_functions - table functions, such as numbers.
common/ast	New SQL parser implemented base on nom_rule.
common/datavalues	The definition of various columns, representing the layout of data in memory. This part will be gradually migrated to common/expressions.
common/datablocks	Datablock represents Vec Set, which encapsulates some common methods. This part will be gradually migrated to common/expressions.
common/functions	Declarations of scalar functions and aggregate functions.
common/hashtable	The implementation of a linear detection, which is mainly used in group by aggregation functions and join scenarios.
common/formats	Serialization and deserialization of external data in various formats, such as CSV/TSV/Json formats.
opensrv	https://github.com/datafuselabs/opensrv

Storage Layer

Storage layer mainly involves the management of Snapshots, Segments and index information of tables, and the interaction with the underlying IO. One of the highlights of the storage layer is the implementation of increment view like that of Iceberge based on Snapshot isolation. We can now obtain time travel access to tables in any historical state.

Future work

This source code reading series has just started, the following tutorials will explain the source code of each module step by step according to the introduction sequence. Most of the tutorials will be in the form of articles, and when it comes to some important and interesting module designs, we may use live video to encourage communication. It is only a preliminary plan at present. We will accept your suggestions to make adjustment on time or content during the process. In any case, we hope that this activity may appeal to more like-minded people to participate in the development of Databend, and to learn, communicate and grow together.

About Databend
Databend is an open source modern data warehouse with elasticity and low cost. It can do real-time data analysis on object-based storage.We look forward to your attention and hope to explore the cloud native data warehouse solution, and create a new generation of open source data cloud together.

Databend documentation：https://databend.rs/

Twitter：https://twitter.com/Datafuse_Labs

Slack：https://datafusecloud.slack.com/

Wechat：Databend

GitHub ：https://github.com/datafuselabs/databend

Databend v0.8

Databend — Thu, 25 Aug 2022 03:13:55 +0000

Hello, everyone! I’m Xuanwo. Today, on behalf of the Databend community, I would like to announce the official release of v0.8.
Development of Databend v0.8 started on March 28th, with 5000+ commits and 4600+ file changes. In the last 5 months, the community of 120+ contributors added 420K lines of code and removed 160K lines, equivalent to rewriting Databend once. In this release, the community made significant improvements to the SQL Planner framework and migrated all SQL statements to the new Planner, providing full JOIN and subquery support.

Download Latest Version Now (link to release here)

Let’s see what has been done in v0.8.

What’s Databend?

Databend is a modern cloud data warehouse based on Rust that enables high-performance, elastic and scalable real-time data analysis and activates the data potential of users.

Significant improvements

New Planner: JOIN! JOIN! JOIN!

To better support complex SQL queries and improve user experience, Databend v0.8 is designed with a new Planner framework.

Databend has added JOIN and proper subquery support, driven by New Planner.

select vip_info.Client_ID, vip_info.Region
      from vip_info right
      join purchase_records
      on vip_info.Client_ID = purchase_records.Client_ID;

New Parser: The Best Parser！

While refactoring Planner, the databend community has implemented a new nom-based Parser that balances development efficiency with user experience.New Parser makes it easy for developers to design/develop/test complex SQL syntax in an intuitive way

COPY
     ~ INTO ~ #copy_unit
     ~ FROM ~ #copy_unit
     ~ ( FILES ~ "=" ~ "(" ~ #comma_separated_list0(literal_string) ~ ")")?
     ~ ( PATTERN ~ "=" ~ #literal_string)?
     ~ ( FILE_FORMAT ~ "=" ~ #options)?
     ~ ( VALIDATION_MODE ~ "=" ~ #literal_string)?
     ~ ( SIZE_LIMIT ~ "=" ~ #literal_u64)?

It also gives the user specific and precise information about the error.

MySQL [(none)]> select number from numbers(10) as t inner join numbers(30) as t1 using(number); 
ERROR 1105 (HY000): Code: 1065, displayText = error:
   --> SQL:1:8
   |
 1 | select number from numbers(10) as t inner join numbers(30) as t1 using(number)
   |        ^^^^^^ column reference is ambiguous

No more worrying about not knowing what’s wrong with SQL.

Visit The New Databend SQL Planner for more information.

New Features

In addition to the newly designed Planner, the Databend community has implemented a number of new features.

COPY Enhancement

COPY capabilities have been greatly enhanced, and Databend can now:

Copy data from any supported storage service (even https!)

COPY
      INTO ontime200
      FROM 'https://repo.databend.rs/dataset/stateful/ontime_2006_[200-300].csv'      
  FILE_FORMAT = (TYPE = 'CSV')

Support for copying compressed files

COPY
      INTO ontime200
      FROM 's3://bucket/dataset/stateful/ontime.csv.gz'
      FILE_FORMAT = (TYPE = 'CSV' COMPRESSION=AUTO)

UNLOAD data to any supported storage service

COPY
      INTO 'azblob://bucket/'
       FROM ontime200
       FILE_FORMAT = (TYPE = 'PARQUET‘)

Hive Support

Databend v0.8 designed and developed the Multi Catalog and implemented Hive Metastore support on top of it!
Databend can now interface directly to Hive and read data from HDFS.

select * from hive.default.customer_p2 order by c_nation;

Time Travel

A long time ago, the Databend community shared an implementation of the underlying FUSE Engine, From Git to Fuse Engine, where one of the most important features was the support for time travel, allowing us to query data tables at any point in time.
Starting from v0.8, this feature is now officially installed and we can now

Query the data table for a specified time

Travel to the time when the last row was inserted 
select * from demo at (TIMESTAMP => '2022-06-22 08:58:54.509008'::TIMESTAMP);  
+----------+ 
| c        | 
+----------+ 
| batch1.1 | 
| batch1.2 | 
| batch2.1 | 
+----------+

Recover mistakenly deleted data tables

DROP TABLE test;

SELECT * FROM test; 
ERROR 1105 (HY000): Code: 1025, displayText = Unknown table 'test'.  

-- un-drop table 
UNDROP TABLE test; 

-- check 
SELECT * FROM test; 
+------+------+ 
| a    | b    | 
+------+------+ 
|    1 | a    | 
+------+------+

Make business data have more security!

CTE Support

CTE (Common Table Expression) is a frequently used feature in OLAP business to define a temporary result set within the execution of a single statement, which is valid only during the query period, enabling the reuse of code segments, improving readability and better implementation of complex queries.
Databend v0.8 re-implements the CTE based on New Planner and now users can happily use WITH to declare the CTE.

WITH customers_in_quebec
       AS (SELECT customername,
                  city
           FROM   customers
           WHERE  province = 'Québec')  
SELECT customername  
FROM   customers_in_quebec 
WHERE  city = 'Montréal'  
ORDER  BY customername;

In addition to these features mentioned above, Databend v0.8 also supports UDFs, adds DELETE statements, further enhances support for semi-structured data types, not to mention the numerous SQL statement improvements and new methods added. Thanks to all the contributors to the Databend community, without you all the new features mentioned here would not have been possible!

Quality Enhancement

Feature implementation is just the first part of product delivery. In Databend v0.8, the community introduced the concept of engineering quality, which evaluates the quality of Databend development in three dimensions: users, contributors, and community.

Reassuring users

In order for users to use Databend with confidence, the community has added a lot of tests over the last three months, fetching stateless test sets from YDB and others, adding stateful tests for ontime, hits and other datasets, putting SQL Logic Test online to cover all interfaces, and enabling SQL Fuzz testing to cover boundary cases.Furthermore, the community has also gone live with Databend Perf to do continuous performance testing of Databend in production environments to catch unexpected performance regressions in time.

Make contributors comfortable

Databend is a large Rust project that has been criticized by the community for its build time.To improve this issue and make contributors feel comfortable, the community went live with a highly configurable, specially tuned Self-hosted Runner to perform integration tests for PR and enabled several services or tools such as Mergify, mold, dev-tools, etc. to optimize the CI process.We also initiated a new plan to restructure the Databend project, splitting the original huge query crate into multiple sub-crates to avoid, as much as possible, the situation of changing one line of code and check execution for five minutes.

Keeping the community happy

Databend is a contributor and participant in the open source community. During the development of v0.8, the Databend community established the principle of Upstream First, actively following and adopting the latest upstream releases, giving feedback on known bugs, contributing their own patches, and starting Tracking issues of upstream first violation to keep up with the latest developments.

The Databend community is actively exploring integration with other open source projects and has already implemented integration and support for third-party drivers such as Vector, sqlalchemy, clickhouse-driver, etc.

Next Steps

Databend v0.8 is a solid foundation release with a new Planner that makes it easier to implement features and make optimizations. In version 0.9, we expect improvements in the following areas.

Query Result Cache
JSON Optimization
Table Share
Processor Profiling
Resource Quota
Data Caching Please check the Release proposal: Nightly v0.9 for the latest news~

Get going now!

Visit the release log (link) and download the latest version (link) to learn more, and feel free to submit feedback using Github Issues if you encounter problems!

How to write RFCs for open source projects

Databend — Fri, 05 Aug 2022 08:29:49 +0000

About RFCs

The importance of RFCs has been emphasized by many people. As @tison said in How to Participate in the Apache project community:

A description is certainly needed for any non-trivial change to explain the motivation. For major changes, design documentation becomes even more necessary since no one has a permanent memory and people always forget why they did something in the first place. The precipitation of design documentation plays a vital role in freeing the community from uncertainty evolution of human activities.

I myself also elaborated my understanding of RFC in How to Refactor in Open Source Projects？：

It takes more than qualified code to make a good open source project, talking only about abstract techs and code with the open source community aside would be meaningless. Therefore, we must clarify our ideas and explain our motives before submitting large-scale changes to open source projects, this way the community will understand what contributions we want to make and how we plan to do that.

These archived documents can help supplement information, improve ideas, and build better designs during discussion. From a long-term perspective, these documents can also help latecomers understand why such a design was proposed at that time, and help promote the community more efficiently. Moreover, good design documentation can often influence and inspire the design of other open source projects, thus promoting the progress of the entire industry.

The following examples can prove that open source projects that work well often have sound RFC processes, and the relationship between them is complementary:

So here comes the question:

How to write an RFC

In my opinion, writing RFCs is a very natural thing to do. The essential reason why some people find it hard is often the lack of sufficient preparation, since it’s easy to put forward a new idea, but transforming the idea into a feasible solution takes hard work. RFCs are the embodiment of this transforming process.Usually I would go through the following steps when writing an RFC:

Collect background information
Analyze feasible schemes
Write an RFC
Discuss among the community

Collect background information

The most laborious and often overlooked step is to collect background information.

After coming up with a good idea, we need to refer to the historical RFCs and relevant issues/PR to make sure whether this idea is feasible or not. We need to collect enough information to answer the following questions:

How do the existing related modules work? What is the problem to be solved now? Is it really necessary to make further changes?
Is there any similar work in related fields, and how’s that going? Is there any similar experience for reference in other projects?
Has this idea been considered before, and why was it shelved？What has changed the situation now?

This process of collecting background information allows us to get more context information and have a better understanding about relevant modules, and thus avoid proposing improvements which are practically infeasible. Besides, collecting background information can help prevent us from carrying out repeated work, after all there is nothing new under the sun.

I made the following preparations before proposing RFC: Config Backward Compatibility：

Conduct one-vs-one communication with users who raised relevant questions to understand their demands and needs
Understand the core logic of how Databend implements config processing
Find out how similar projects implement config compatibility

Analyze feasible schemes

After fully understanding the background information, what to do next is to analyze feasible schemes.

There are often many possible solutions for a technical problem, so rather than trying to find a definite answer, we need to analyze and investigate among various schemes and select a relatively better one. Sometimes a simple demo is necessary to help verify your ideas.

Here are several mistakes I often make in this process:：

• Laziness: Knowing that there are other possible solutions and not taking the initiative to investigate, this usually led to being pointed out by the community when entering the discussion stage and forced to make additional explanations. It takes much more time to go back and forth than thinking things straight at the beginning.

• Path dependence: Focusing too much on how to fix the appeared problems, and fail to jump out of the box with better solutions.

• Self-defense: The moment we come up with an idea, a general design is also formed in our minds, and we are inclined to this existing design when conducting research work. Even if the existing design has serious defects, we often are unwilling to give it up and choose to add unreasonable settings for it.

• Implementation before design: The implementation has been completed before the investigating process, and therefore the analysis of the schemes becomes the discussion and replication of the specific implementation.

These mistakes often lead to obvious bias in the subsequent writing of RFCs, and thus the conclusions are biased. Either the RFCs will be greatly adjusted, or in more serious situations there will be major differences and even verbal conflicts with the defenders. To avoid such problems as much as possible, we should strive to make a fair and objective evaluation of different implementation schemes. Surely everyone has his/her own technological preferences, so there is no need to be too strict about fairness, as long as the advantages and disadvantages are clearly analyzed.

Write an RFC

Now that the preparatory work has been completed, we enter the stage of writing RFCs.

Usually every project has its own format and requirements, and thus the way to write RFCs also vary according to local conditions.

Most RFCs include the following chapters:

Background/ Motivation
Detailed description
Basic principles
Unsolved problems
Future possibilities

Background/ Motivation In this chapter we need to explain the background and motivation of this change, and explain why this change should be made. According to different situations, the specific description should be slightly different according to specific situations. For example, when adding new functions we should explain that why the existing function cannot meet the requirements, what usage scenarios it would support and what the expected output is. In other situations when function reconfiguration and modification is needed, we should focus on explaining what problems exist in the existing implementation and what attempts have been made before.

Detailed description This chapter needs to specify how the project needs to be changed, and also needs to be adjusted according to the specific requirements. Take Rust as an example, there are two main parts in this chapter: guide-level explanation and reference-level explanation.

Guide-level explanation: This part explains what changes this change will bring from the perspective of users, what new concepts have been introduced and what new syntax has been added.

Reference-level explanation: This part explains how to implement the change from a technical point of view, how to interact with other features, and which edge cases need to be considered. Please remember not to paste too much detailed implementations in this chapter, brief explanation of the ideas and some core code is enough. This way reviewer will have a better grasp of the macro design rather than get caught up in the implementation details.

Basic principles This chapter should explain why the change is implemented in this particular way.

Is there any other implementation schemes? What are the advantages and disadvantages of each, and what reasons drove us to adopt scheme A instead of scheme B? What is the existing technology? How are other projects realized, and what are their respective starting points?

Unsolved problems This chapter needs to explain the limitations of this change. For example, problems raised in the review but cannot be solved at present can be recorded in this chapter.Future possibilities This chapter is used to record the future possibilities of this change, which has been considered but has not been implemented this time.

Such as how a newly introduced feature can be combined with other features in the future. Or when someone puts forward ideas beyond the scope of the current RFC during reviews, these ideas can be recorded in this chapter for future reference.

Discuss among the community

After completing the RFC, we can propose it to the community for discussion.

Of course we don’t want our own RFCs be rejected after all this work, so we need to participate in community discussions and respond to the concerns of community members actively. It’s necessary to adjust our implementation plan according to the feedback we receive, common adjustments are like supplementing what is not considered, adjusting the idea of implementation, splitting RFCs when necessary, implementing part of the original plan, and so on. Keep in mind that the interests of community members tend to be consistent, and every member’s ultimate goal is to help develop the project. Negative behaviors like taking others’ objections as an attack or passively confronting community members are not conducive to the further promotion of RFCs.

The RFC I proposed for DataBend — RFC: Config Backward Compatibility was originally a larger one — Scope: Versioned Config. In the original RFC, I planned to introduce the concept of version into DataBend config so that developers can make destructive changes with more confidence. However, the community members were concerned about the complexity of the design, and they generally believed that the design was too complex and would introduce a lot of redundant code. Moreover, they believed that DataBend had not released

a stable version yet, and there was no need to introduce versioned config then. Based on the feedback, I made my own response：

Comment: Our community members have great concern about the complexity of introducing a versioned config at this stage(no stable release, no production users). It seems better to only split outer and inner configs and leave the decision of versioned config till the future.Response: I have updated the RFC and moved content about the versioned config to Future possibilities part. This change only introduces a reconfiguration, which splits outer and inner configs.

After this adjustment, the community gave a LGTM and merged this RFC to the specific implementation stage.

Summary

Writing RFCs is an effective way to communicate and exchange ideas with the community. By completing the steps of collecting background information, analyzing feasible schemes, writing RFCs and community discussion, combined adjustments according to the particular project, we can all write a reliable proposal.

Go submit a proposal for your favorite open source project!

About Databned

GitHub ：https://github.com/datafuselabs/databend

File： https://databend.rs/

Excellent Performance of Databend in Data Archiving Analysis

Databend — Fri, 29 Jul 2022 08:51:54 +0000

What's Databend?

Databend, developed with Rust, is a new open-source and cloud-native data warehouse. It offers high-speed elastic expansion capabilities and is committed to building an on-demand and volume-based data cloud product experience:

Famous open-source cloud data warehouse project.

Vectorized Execution and Pull & Push-Based Processor Model.

Separation of compute and storage: Available on-demand with high performance and low cost.

Support for various databases: Compatible with MySQL, Clickhouse protocol, SQL Over HTTP, etc.

Perfect capabilities for handling transactions: Support for Time Travel, Database Clone, Data Share, etc.

Support for reading, writing, and sharing the same data by multiple tenants.

Databend tutorials: https://databend.rs/doc/deploy

Deploying Databend to Work with Ceph

Databend Architecture:

Query node: Processes protocol analysis and SQL push-down.

Meta node: Stores metadata on the local disk.

Supported storage solutions:

Public: OSS products from AWS, Alibaba, Tencent, etc.Self-hosted: S3-compatible products such as MinlO and Ceph.

Deployment steps:

1.Download the latest binary package.

2.Unzip the package and create folders.

3.Modify the configuration file.

4.Start up Databend.

Deployment environment:

Operating system: Centos7
Cpeh version: 12.2.13
Databend version: v0.7.65Overall,
Databend deployment is fairly straightforward.

Step 1: Download Databend.

[root@testsrv ~]#
wget https://github.com/datafuselabs/databend/releases/download/v0.7.65-nightly/databend-v0.7.65-nightly-x86_64-unknown-linux-musl.tar.gz

Step 2: Unzip the package and create folders.

[root@testsrv ~]#
tar -zxvf databend-v0.7.65-nightly-x86_64-unknown-linux-musl.tar.gz
mkdir /usr/local/databend/{bin,data,etc,logs} -p
mv databend-meta /usr/local/databend/bin/ 
mv databend-query /usr/local/databend/bin/

Step 3: Modify the configuration files for startup.

[root@testsrv ~]#  Modify config file for meta node
cd /usr/local/databend/etc/
cat databend-meta.toml
log_dir            = "/usr/local/databend/logs/_logs1"
metric_api_address = "0.0.0.0:28100"
admin_api_address  = "0.0.0.0:28101"
grpc_api_address   = "0.0.0.0:9191"
[raft_config]
id            = 1
raft_dir ="/usr/local/databend/data/_meta1"
raft_api_port = 28103#172.16.16.12#
raft_listen_host = "172.16.16.12"#
raft_advertise_host = "172.16.16.12"
# Start up mode: singlenode cluster
single        = true

[root@testsrv ~]# Modify config file for query node
cat databend-query-node-1.toml
[query]
max_active_sessions = 256
wait_timeout_mills = 5000
# For flight rpc. Use the IP and ports of the current machine
flight_api_address = "本机IP:9091"
# Databend Query http address.
# For admin RESET 
API.http_api_address = "0.0.0.0:8081"
# Databend Query metrics RESET API.
metric_api_address = "0.0.0.0:7071"
# Databend Query MySQL Handler.
mysql_handler_host = "0.0.0.0"
mysql_handler_port = 3307
# Databend Query ClickHouse Handler.
clickhouse_handler_host = "0.0.0.0"
clickhouse_handler_port = 9001
# Databend Query HTTP Handler.
http_handler_host = "0.0.0.0"
http_handler_port = 8000
tenant_id = "test_tenant"
cluster_id = "test_cluster"
table_engine_memory_enabled = true
table_engine_csv_enabled = true
table_engine_parquet_enabled = true
database_engine_github_enabled = true
table_cache_enabled = true
table_memory_cache_mb_size = 1024
table_disk_cache_root = "/usr/local/databend/data/_cache"
table_disk_cache_mb_size = 10240
[log]
log_level = "DEBUG"
log_dir = "/usr/local/databend/logs/_logs"
[meta]
# To enable embedded meta-store, set meta_address to ""
meta_embedded_dir = "/usr/local/databend/data/_meta_embedded_1"
meta_address = "0.0.0.0:9191"
meta_username = "root"
meta_password = "root"
meta_client_timeout_in_second = 60
# Storage config.
[storage]storage_type = "s3"
# DISK storage.
[storage.disk]
data_path = "/usr/local/databend/data/stateless_test_data"
# S3 storage. If you want you s3 ,please storage type : s3
[storage.s3]
bucket="databend"
region="region"
endpoint_url="<Your Ceph S3 address>"
access_key_id="<Your Ceph S3 key id>"
secret_access_key="<Your Ceph S3 access key>"
# Azure storage
[storage.azure_storage_blob]

[root@testsrv ~]# #Modify start.sh
/usr/local/databend/bin
[root@testsrv ~]#cat start.sh
ulimit  -n 65535cd /usr/local/databend/nohup /usr/local/databend/bin/databend-meta --config-file=/usr/local/databend/etc/databend-meta.toml 2>&1 >meta.log &
sleep 3
nohup /usr/local/databend/bin/databend-query --config-file=/usr/local/databend/etc/databend-query-node-1.toml 2>&1 >query.log &
cd -
echo "Please usage: mysql -h127.0.0.1 -P3307 -uroot

Step 4: Start Databend

[root@testsrv ~]#
 bash start.sh&

Step 5: Verify that Databend was deployed successfully to work with Ceph

[root@testsrv ~]# mysql -h127.0.0.1 -P3307 -uroot  -- No passwords by default  
-- Execute SQL Statements 'root'@127.0.0.1 18:59:  [(none)]> 
select * from system.configs; 
-- Successful if ceph address and key are displayed
s3.region              
s3.endpoint_url        
s3.access_key_id       
s3.secret_access_key   
s3.bucket

Excellent Performance of Databend in Data Archiving Analysis

Archiving options

A large amount of historical data (such as log and transaction flow) persisting in MySQL will occupy a lot of storage space and affect your business performance (for example, jitters might be caused). However, you cannot permanently delete the historical data because you might want to read it for some analysis purposes at a later time.
You might want, for example, to calculate a total number on a certain condition in a certain month of the year 2000, so it is necessary to consider regularly archiving the data. Now there are many options to do it. You can use the pt-archiver tool, or an archiving applet developed by DBA.The data you choose to archive must meet the following conditions:

Compatible with MySQL protocol for minimizing business logic changes. Other protocols might require a big change.

High-performance compression ratio to save storage costs.

Support for data calculation and analysis.

Here are some options for you: MySQL (separate archive cluster), Databend, etc.

Comparing data compressions

We generated 200 million rows of data and imported them into MySQL and Databend respectively to compare the physical sizes after compression. The table below shows that you can have a better compression ratio when using Databend.

	Physical Size
SQL	88 G
CSV	84 G
Databend	8 G
MySQL	47 G

Data query tests

This section compares the response time of MySQL and Databend for SQL queries.

Test environments:
Server hardware: 40 Cores, 256 G, SSD hard disk.MySQL: Innodb buffer pool set to 100 G. SSD hard disk.Databend: Default configuration. The S3 service must be running on a server with HDD.

select count(*) from ontime;

	Execution Time
Databend-hdd	0.02 sec
Databend-ssd	0.04sec
MySQL	4 min 9.05 sec

select count(*),Year from ontime group by Year; (No Indexes)

	Execution Time
Databend	1.89 sec
MySQL	5 min 19.20 sec

select count(*),Year from ontime group by Year;（ Indexed by year）

	Execution Time
Databend-hdd	0.56 sec
Databend-ssd	1.89 sec
MySQL	2 min 46.72 sec

Conclusion: Compared with MySQL, Databend has a slight advantage in the response time of SQL queries for data analysis.

Compatible with MySQL protocol

Databend supports MySQL protocol, Clickhouse protocol, and HTTP protocol. Programs that use MySQL are basically compatible with Databend.

Summary

Archiving
Databend has advantages over MySQL in data archiving, such as data compression and SQL queries for data analysis. Databend works with mechanical hard drives, so it doesn't need very good hardware to get good results. I would recommend Databend for data archiving analysis. If you only need data archiving now, Databend can help lower your cost. If you need data analysis at a later time, remember that Databend is easy to scale up and down.
Deployment
Deploying Databend to work with Ceph is easy. Databend has more advantages over traditional databases in the cloud-native computing scenarios.
High availability
The Query node is a stateless node. Metadata must be kept properly, and you can create copies of metadata to keep it from being lost. In my opinion, metadata can be saved to the storage layer for real-time backup, which is only used for emergency recovery.
Storage layer
Cloud scenarios will ensure high availability of the OSS layer. For private cloud environments, you need to ensure the high availability of Ceph or MinIo. We're improving Databend to support k8s out of the box and provide a real pay-as-you-go experience.

Author information: shuaimeng Tian, senior DBA

The New Databend SQL Planner

Databend — Fri, 15 Jul 2022 10:39:12 +0000

To support complex SQL queries and improve user experience, a large-scale refactoring work for Databend's SQL planner was started several months ago. At present, the refactoring is coming to an end. You can now modify the Session settings of Databend as follows to enable the new planner for early access:

Feature Highlights

A more friendly query experience

Data analysts and developers usually get various errors when coding SQL queries, and troubleshooting can be a nightmare when the queries are complex. I hate MySQL's error prompts because I have coded a query with dozens of JOIN clauses.The new planner now includes a step for strict semantic checking so that most errors can be intercepted during the compilation. A new error prompt algorithm was also introduced to help users locate the errors. When there is invalid syntax in your SQL query (for example, misspelled keywords or missing clauses), you will receive an error message that is more instructive.

If your SQL query has a semantic error (for example, you reference a column that is ambiguous, or a column does not exist at all), Databend can help you locate it.

You can also get a better experience when coding complex queries:

Support for JOIN queries and correlated subqueries

The new SQL planner supports JOIN queries (INNER JOIN, OUTER JOIN, CROSS JOIN) and correlated subqueries, and provides a Hash Join algorithm to execute JOIN queries. For more information about how to use JOIN in Databend, go to https://databend.rs/doc/reference/sql/query-syntax/dml-joinJOIN is a very important part of the OLAP query. In traditional star and snowflake schemas, we join dimensional tables with fact tables through the JOIN query to generate the resulting report.TPC-H Benchmark is a set of OLAP query benchmarks developed by the TPC committee to evaluate the OLAP capabilities of database systems. It contains the following eight tables:

Lineitem: Holds product information.
Orders: Holds order information.
Customer: Holds customer information.
Part: Holds parts information.
Supplier: Holds supplier information.
Partsupp: Parts-Supplier Relationship Table
Nation: Holds nation information.
Region: Holds region information.

TPC-H includes 22 complex queries, corresponding to different business needs. The new SQL planner now supports the Q9 query that calculates the profit amount for a specified year and region using a large number of JOIN calculations:

Correlated subqueries are also an essential part of SQL for coding complex queries. The Q4 query of TPC-H shows the order delivery status of various priority levels over a period of time and uses a correlated subquery with the EXISTS clause to filter overdue orders:

Brand new architecture

We redesigned the process of SQL parsing for the new SQL planner to support more complex semantic analysis and SQL optimization.After the client sends a SQL statement to the databend-query server, the components in the new SQL planner process the SQL statement in the order shown in the flowchart below before returning the query result to the client:

The Parse starts to parse a SQL query after receiving it. If a syntax error is found during the parsing, the error information will be directly returned to the client; If the parsing is successful, an AST (Abstract Syntax Tree) for the query will be constructed.

Parser

To provide more powerful syntax analysis functions and a better development experience, we have developed a DSL (Domain Specific Language) nom-rule based on the nom Parser combinator and rewritten SQL Parser based on this framework.With this framework, we can easily define the syntax for a statement. Taking the CREATE TABLE statement as an example, we can use DSL to describe it as follows:

The elegant syntax brings more fun to the work of coding a parser. Try it out if you’re interested.

Binder

After the AST is successfully parsed by the Parser, we will semantically analyze it through Binder and generate an initial logical plan. During this process, we perform different types of semantic analysis:

Name resolution: Check the validity of the variables referenced in the SQL query by querying the relevant table and column object information in the Databend Catalog and bind the valid variables to their corresponding objects for subsequent analysis.
Type check: Check the validity of the expression according to the information obtained in the name resolution, and find a proper return type for the expression.
Subquery unnesting: Extract the subquery from the expression and translate it into relational algebra.
Grouping check: For queries with aggregate calculations, check whether non-aggregate columns are referenced. With semantic analysis, we can eliminate most semantic errors and return them to the user during the compilation to provide the best troubleshooting experience. ### Optimizer After getting the initial logical plan, the optimizer will rewrite and optimize it and, finally, generate an executable physical plan.The new planner introduced a set of Transformer Rule-based optimizer frameworks (Volcano/Cascades). An independent rule can be implemented by defining a relational algebra subtree structure pattern with related transform logic.Take Predicate Push Down as a simple example:

We only need to define the pattern of the input plan:

And then implement a conversion function:

Interpreter

After the physical plan is generated by the Optimizer, we will translate it into an executable pipeline and hand it over to Databend's processor execution framework for calculation.

What's Next

Building a SQL planner from the ground up is a very challenging job, but the redesign and development let us find the most suitable architecture and functionalities for the system. In the future, we will continue to improve and consolidate the new SQL planner on these functions:

Cost-based Optimization (CBO)
Distributed query optimization
More optimization rules Currently, we’re in the middle of migrating to the new SQL planner. We will release an announcement when the migration is complete (around July 2022). Stay tuned.

Archiving and Analyzing MySQL Data Using Databend — a Cloud Native Data Warehouse

Databend — Fri, 08 Jul 2022 10:23:53 +0000

Requirement analysis on archiving MySQL data

MySQL is commonly used in OLTP to provide external services with qualified hardware resources, the amount of data provided can often reach TB level. In many scenarios, the storage data has a relatively long TTL. However, data records may lose business significance after being online for a period (examples below).

A certain service is deactivated
The life cycle of data exceeds service requirements. For example, a service only requires data within nearly 3 months
Archiving log data
Merging databases and tables to provide statistical query and analysis services
Regular backup, archiving, and audit services
......

Usually, archiving schemes are proposed by DBAs, and developers then analyze which data can be archived. The archiving process can be completed with the help of standardization and automated execution.

Common solutions for archiving MySQL data

Current archiving methods are generally divided into two categories: MySQL and MariaDB. The key tool is pt-archiver or obtaining archived data by parsing binlog.

Using MySQL for storage and archiving

This is the most common solution: the archiving, or even synchronous backup, of the online production repository is processed by a PC (usually with a large capacity -- about 50T, and large memory, to run instances). It's also possible to build a master/slave server configuration offline to archive PolarDB data and provide offline intranet queries.

Advantages

Based on the familiar MySQL environment, it's easy to manage
Highly compatible with the online environment
Archiving environments can be built with large, inexpensive disks

Disadvantages

In this kind of architecture, binlog is usually turned off in in the archive nodes for cost reduction. There will be a backup in the object storage, and no slave database. Therefore, once the data or the hard disk is damaged, it will take a long time to recover
There's not enough computing power, and not much ability to extend the computing node either. The data needs to be extracted and put into a big data environment when computing is required
Large amount of idle CPU and RAM resources

Using MariaDB for archiving

S3 engine is an experimental feature introduced in MariaDB, it has a better compression capability while retaining the usage habits of MySQL users. The complete archiving process includes writing to InnoDB first, then alter table tb_ name engine=s3;

Advantages

Maintaining MySQL compatibility
Supporting S3 class object storage
Supporting highly compressed storage

Disadvantages

S3 engine is write only
Append writing is not supported, so conversion to an InnoDB table is required
The conversion from InnoDB to S3 engine takes a very long time, adding complexity to the process

Archiving method provided by Databend

So, is there a more elegant solution? Here we recommend Databend, a cloud native data warehouse.

Introduction & Architecture

Databend is a modern, open-source data warehouse developed using Rust. It's completely cloud-oriented, provides extremely fast elastic expansion capabilities, and is committed to creating an on-demand and volume-based Data Cloud product experience.

Features are as follows：

A star project among open-source, cloud-based data warehouses
Vectorized execution, pull&push-based processor model
Real storage/computing separation architecture, high performance, low cost, on-demand use
Complete database support, compatible with MySQL, Clickhouse protocol, SQL over HTTP, etc.
Complete transaction,
Ensured transaction integrity, support time travel, database clone, data share and other functions
Support multi-tenant read/write and sharing operations on the same data

Databend's design principles：

No Partition
No index（Auto Index）
Support Transaction
Data Time travel/Data Zero copy clone/Data Share
Enough Performance/Low Cost

Deployment

Three processing methods are supported，including MySQL, Clickhouse and SQL Over Http.

Please refer to https://databend.rs/doc/deploy for installation instructions.

For more support during installation or usage, please contact us via wechat (wechat number: 82565387).

Writing methods

Insert into

Insert writing operations using JDBC, python and golang are supported. Here is a recommended guidance: https://databend.rs/doc/develop.

It's suggested to use Bulk insert to achieve batch writing operations, which has similar usage with MySQL.

Streaming load

Please refer to https://databend.rs/doc/load-data/local to see more of streaming load.

It only takes about 3 minutes to load an 81G file with 200 million rows of data into Databend (As seen in the picture above).

Besides, Databend now supports reading directly from compressed files. For example:

ls ./dataset/*.csv.gz|xargs -P 8 -I{} curl -H "insert_sql:insert into ontime format CSV" -H "skip_header:1"   -H "compression:gzip" -F "upload=@{}" -XPUT http://root:@localhost:8000/v1/streaming_load

It should be noted that the scheme for reading compressed files has not been optimized, it takes about 13 minutes to load the same data using this method. There is much room for performance improvement in the future.

Using Stage

Stage can be considered as an online storage manager of Databend, please refer to https://databend.rs/doc/load-data/stage for detailed syntax。

The above PPT page shows the process of creating Stage, uploading files, and viewing files online. Files in Stage can be loaded into Databend by "copy into" command.

Advantages of archiving MySQL data with Databend

We recommend to use Databend combined with object storage for MySQL data archiving.

The advantages are as follows:

Object storage breaks the limit of storage capacity
Databend has a relatively high data compression ratio of 10:1, which saves a lot of storage resource
Databend manages data based on MySQL protocols, so users don't need to change their usage habits
The storage/computing separated architecture makes it easier to scale up, when facing computing resources insufficiency, and there is no need to worry about the high availability of storage
Most original MySQL tools can be reused

Databend now supports object storage: AWS S3, Azure, Alicloud, Tencent Cloud, QingCloud, Kingsoft Cloud and equipments like minio and ceph. Meanwhile the surprising computing ability of Databend supports data computation services if needed.

People can make better use of cloud resources and obtain qualified performance with a relatively low cost using Databend.

Please contact us via wechat (wechat number: 82565387) to get more information!

About Databend

Databend is an open-source modern data warehouse with elasticity and low cost. It can do real-time data analysis on object-based storage.

We look forward to your attention and hope to explore the cloud native data warehouse solution, and create a new generation of open-source data cloud together.

Databend docs：https://databend.rs/
Twitter：https://twitter.com/Datafuse_Labs
Slack：https://datafusecloud.slack.com/
WeChat：Databend
GitHub ：https://github.com/datafuselabs/databend

DEV Community: Databend

Feature Preview: Iceberg Integration with Databend

What is Apache Iceberg?

Table Format

Multiple Catalog

IceLake - A Pure Rust Implementation of Apache Iceberg

Workshop: Experience Iceberg with Databend

Starting Services

Data Preparation

Deploying Databend

Mounting Iceberg Catalog

Running Queries

Summary

Designing and Querying JSON in Databend

Working with JSON in Databend​

Parsing JSON from GitHub​

Performance Optimization​

Learn Databend's New SQL Type System in Five Minutes

Introduction

Interface Design

Principles of Type inference

Genericity

Summary

About Databend

Sqllogictest Illustrated

Background

About sqllogictest

Database quality assurance

What is sqllogictest?

Why Databend needs sqllogictest?

How to implement sqllogictest in Databend?

How to write test scripts for sqllogictest ?

Basic functions

Execution output

Using sqllogictest in a pipeline

Running sqllogictest

Run parameters

Tips

How to organize test cases?

Extension

Future plans

Open-SQL-Logictest?

Reading Source Code of Databend (2) ：Query Server Startup, Session Management & Request Processing

Entrypoint of query server

GlobalServices

ShutdownHandler

About sessions

SessionManager (query/src/sessions/session_mgr.rs)

Session (query/src/sessions/session.rs)

QueryContext (query/src/sessions/query_ctx.rs)

QueryContextShared (query/src/sessions/query_ctx_shared.rs)

Handler

Epilogue

About Databend

How to Read Code

The first thing:run it!

Clearly define your purpose

Distinguish between main and branch storylines

Vertical and horizontal

Scenario analysis

Make use of good test cases

Clarify the relationship between core data structures

Ask yourself more questions

Write your own code reading note

Summary

Reading Source Code of Databend (1) ：Introduction

Preface

The Story of Databend

Architecture

Modules

MetaService Layer

Query Layer

Storage Layer

Future work

Databend v0.8

What’s Databend?

Significant improvements

New Planner: JOIN! JOIN! JOIN!

New Parser: The Best Parser！

New Features

Working with JSON in Databend

Parsing JSON from GitHub

Performance Optimization