DEV Community

Databend
Databend

Posted on

Learn Databend's New SQL Type System in Five Minutes

Introduction

Type system is an important part of database, which provides a consistent way to determine the data type in SQL. The design of the type system greatly affects the usability and robustness of the database, as a well-designed and consistent type system makes it easy for users to check SQL behaviors. A poorly designed type system, on the contrary, may raise various errors and inconsistent behaviors, bringing a series of potential inconvenience to users. Let's take programming languages as an example, the type system of JavaScript has always been criticized.

Image description

Therefore, we hope to implement a type inference system that is powerful yet easy to understand in Databend. To achieve this goal, we learned from the compiler designs of many excellent programming languages, and selected a subset that is suitable for SQL. The design principles of this system is described below.

Interface Design

The "low coupling and high cohesion" principle, which we often refer to, is to combine the code that does the same thing, and then define a simple interface for external use. Since the type inference system has relatively complex functions, its interfaces need to be defined from the very beginning, that is, what functions can be provided and how to call them.

In short, there are three functions in the type inference system we designed:

1.Check whether the input SQL statements (RawExpr) conform to the type rules and select appropriate overloads for the functions called, and return executable expressions (Expr)

2.Return the expression value corresponding to the input data

3.Return the expression value range corresponding to the input data range (stored in meta data)

Callers only need to finish these operations:

4.Define type signatures, mappings of definition domains to function domains, and the bodies of all available functions

5.Call the executor when executing SQL or constant folding

Use the “and”function as an example. The function definition is as follows:

registry.register_2_arg::<BooleanType, BooleanType, BooleanType, _, _>(
    "and",
    FunctionProperty::default(),
    |lhs, rhs| {
        Some(BooleanDomain {
            has_false: lhs.has_false || rhs.has_false,
            has_true: lhs.has_true && rhs.has_true,
        })
    },
    |lhs, rhs| lhs && rhs,
);
Enter fullscreen mode Exit fullscreen mode

A complete example of execution:

// Parse the SQL expressions into structured AST
let raw_expr = parse_raw_expr("and(true, false)");

// Get built-in functions, like the 'and' function mentioned before
let fn_registry = builtin_functions();

// Check type validity
let expr = type_check::check(&raw_expr, &fn_registry).unwrap();

// Execute
let evaluator = Evaluator {
    input_columns: Chunk::new(vec![]),
    context: FunctionContext::default(),
};
let result: Value<AnyType> = evaluator.run(&raw_expr).unwrap();

assert_eq!(result, Value::Scalar(Scalar::Boolean(false)));
Enter fullscreen mode Exit fullscreen mode

Principles of Type inference

The new type system supports the following data types:

  • Null

  • Boolean

  • String

  • UInt8

  • UInt16

  • UInt32

  • UInt64

  • Int8

  • Int16

  • Int32

  • Int64

  • Float32

  • Float64

  • Date

  • Interval

  • Timestamp

  • Array

  • Nullalbe

  • Variant

Let's us learn about how the type inference system works through an example. Suppose this expression is input:

1 + 'foo'
Enter fullscreen mode Exit fullscreen mode

The type inferencer first converts the expression input to a function call:

plus(1, 'foo')
Enter fullscreen mode Exit fullscreen mode

Then the type checker can simply infer the type of the constant:

1 :: Int8
'foo' :: String
Enter fullscreen mode Exit fullscreen mode

The type checker knows that there are six overloads for the function plus after querying FunctionRegistry.

plus(Null, Null) :: Null
plus(Int8, Int8) :: Int8
plus(Int16, Int16) :: Int16
plus(Int32, Int32) :: Int32
plus(Float32, Float32) :: Float32
plus(Timestamp, Timestamp) :: Timestamp
Enter fullscreen mode Exit fullscreen mode

Since the argument types Int8 and String don't match any of the overloads, an error is raised by the type checker:

1 + 'foo'
  ^ function `plus` has no overload for parameters `(Int8, String)`

  available overloads:
    plus(Int8, Int8) :: Int8
    plus(Int16, Int16) :: Int16
    plus(Int32, Int32) :: Int32
    plus(Float32, Float32) :: Float32
    plus(Timestamp, Timestamp) :: Timestamp
Enter fullscreen mode Exit fullscreen mode

One exception in type checking is that a subtype can be converted to its super type (CAST), so that functions can takes parameters of the subtypes. Here's an example:

plus(1, 2.0)
Enter fullscreen mode Exit fullscreen mode

The type inferencer infers the constants' types according to rules:

 1 :: Int8
 2.0 :: Float32
Enter fullscreen mode Exit fullscreen mode

By querying FunctionRegistry, we can find that there are two overloads for function plus that seems to fit but neither one matches perfectly.

(Int8, Int8) :: Int8
plus(Float32, Float32) :: Float32
Enter fullscreen mode Exit fullscreen mode

The type checker will try to select an overload according to the CAST rule. Since values of type Int8 can be lossless converted to type Float 32, the type checker then overwrites the expression and recheck the types:

plus(CAST(1 AS Float32), 2.0)
Enter fullscreen mode Exit fullscreen mode

The type check is passed in this way.

Genericity

The new type checker supports the use of genericity in function signature definitions to reduce the workload of manually defining overloaded functions. For example, we can define a function by stating “array_ Get(Array, UInt64):: T0”, which accepts an array and a subscript, and returns the element corresponding to the subscript in the array.

Compared with the type check process mentioned in the previous section, checking functions with generic signatures requires one additional step: select an appropriate specific type to replace the generic type. The replaced types should conform to the type check rules, and explanations (such as conflicting constraints) are required when there is no available type. This step is generally called Unification, and we also have an example to illustrate it:
Suppose there are two expressions:

ty1 :: (X, Boolean) 
ty2 :: (Int32, Y)
Enter fullscreen mode Exit fullscreen mode

If we want “ty1”and“ty2” to have the same type (for example, when “ty1”is used as the input parameter type and “ty2”is used as the input parameter signature), “unify”will try to replace “X”and “Y”with specific types:

let subst: Subsitution = unify(ty1, ty2).unwrap();

assert_eq!(subst['X'], DataType::Int32]);
assert_eq!(subst['Y'], DataType::Boolean]);
Enter fullscreen mode Exit fullscreen mode

For readers interested in “unify”, the source code of “type_check.rs”is highly recommended. We also recommend a book: Types and Programming Languages, which introduces the development of type inferences for programming languages and discusses in details the principles and trade-offs of various inference theories. A supporting toy implementation is provided for every major concept. You'll obtain great pleasure from this book especially on sleepless nights.

Summary

In this article, we introduce the design background and operation principles of our new type system, and explain how to use the executor. We look forward to introduce you the detailed methods of defining SQL functions and some related Rust tricks in another article, for this topic is as exciting as type inference.

About Databend

Databend is an open source modern data warehouse with elasticity and low cost. It can do real-time data analysis on object-based storage.We look forward to your attention and hope to explore the cloud native data warehouse solution, and create a new generation of open source data cloud together.

Databend documentation:https://databend.rs/

Twitter:https://twitter.com/Datafuse_Labs

Slack:https://datafusecloud.slack.com/

Wechat:Databend

GitHub :https://github.com/datafuselabs/databend

Top comments (0)