Apache ShardingSphere 5.0.0 Kernel: API Adjustments

#database #sql #opensource #tutorial

To help users minimize related costs, the 5.0.0 GA version has made many optimizations at the API level as well. According to some community feedback, Data Sharding API was too complex and difficult to understand. After a community-level discussion, we decided to provide a brand-new data sharding API in the new GA version.

With Apache ShardingSphere project positioning changed from a database middleware to a distributed database ecosystem, we had to develop a transparent data sharding function. To be precise, in the 5.0.0 GA, we provide users with Auto Sharding Strategy, so they don't need to worry about the details of the databases and tables because they can use auto sharding to specify the number of shards. Due to the new pluggable architecture and some enhanced functions such as shadow database stress testing, kernel function APIs have been adjusted accordingly. In this section, we introduce the adjustments made in different APIs.

Data Sharding API

Following the previous 4.x version was released, users often reached out to us in the community and complained that the API for data sharding was too complex and hard to use. The code block below shows you the data sharding configuration in the 4.1.1 GA version. In the old version, there were five sharding strategies, namely standard, complex, inline, 'hint', and none. It was difficult for users to understand and use different parameters of different sharding strategies.

shardingRule:
  tables: 
    t_order:  
      databaseStrategy: 
        standard:  
          shardingColumn: order_id
          preciseAlgorithmClassName: xxx
          rangeAlgorithmClassName: xxx
        complex:  
          shardingColumns: year, month
          algorithmClassName: xxx
        hint:
          algorithmClassName: xxx
        inline:  
          shardingColumn: order_id
          algorithmExpression: ds_${order_id % 2}
        none:
      tableStrategy:
        ...

In the 5.0.0 GA version, we simplify the sharding strategies in Data Sharding API. First, the original inline, strategy is now removed, and we retain the remaining four sharding strategies i.e. standard, complex, hint and none.

At the same time, the Sharding Algorithm is extracted from Sharding Strategy. Now users can configure it under the property shardingAlgorithms and shardingAlgorithmNameas a reference in Sharding Strategy.

  - !SHARDING
  tables: 
    t_order: 
      databaseStrategy: 
        standard: 
          shardingColumn: order_id
          shardingAlgorithmName: database_inline   
        complex: 
          shardingColumns: year, month
          shardingAlgorithmName: database_complex
        hint: 
          shardingAlgorithmName: database_hint
        none:
      tableStrategy:
        ...

  shardingAlgorithms:
    database_inline:
      type: INLINE
      props:
        algorithm-expression: ds_${order_id % 2}
    database_complex:
      type: CLASS_BASED
      props:
        strategy: COMPLEX
        algorithmClassName: xxx
    database_hint:
      type: CLASS_BASED
      props:
        strategy: HINT
        algorithmClassName: xxx

The code block above is the new configuration, which differs from the Sharding configuration in the 4.1.1 GA version. The new sharding API is more concise and clear.

To help users reduce configuration workload, Apache ShardingSphere provides many built-in sharding algorithms, and they can also choose custom settings via the sharding algorithm CLASS_BASED.

To implement transparent data sharding, we add Automated Sharding Strategy into the 5.0.0 GA version. The code block below shows you the difference between Automated Sharding Strategy configuration and manual sharding strategy configuration:

   rules:
- !SHARDING
  autoTables:
    # Automated Sharding Strategy
    t_order:
      actualDataSources: ds_0, ds_1
      shardingStrategy:
        standard:
          shardingColumn: order_id
          shardingAlgorithmName: auto_mod
      keyGenerateStrategy:
        column: order_id
        keyGeneratorName: snowflake
  shardingAlgorithms:
    auto_mod:
      type: MOD
      props:
        sharding-count: 4

  tables:
    # Manual Sharding Strategy
    t_order: 
      actualDataNodes: ds_${0..1}.t_order_${0..1}
      tableStrategy:
        standard:
          shardingColumn: order_id
          shardingAlgorithmName: table_inline
      dataBaseStrategy:
        standard:
          shardingColumn: user_id
          shardingAlgorithmName: database_inline

Automated Sharding Strategy must be configured under autoTablesattribute. Users only need to specify the data source for data storage as well as the number of shards via Automated Sharding Algorithm. They no longer need to manually set data distribution through actualDataNodes, or to pay extra attention to setting database sharding strategy and table sharding strategy, as Apache ShardingSphere automatically helps users manage data sharding.

We also remove defaultDataSourceName from Data Sharding API. We have repeatedly highlighted that Apache ShardingSphere is a distributed database ecosystem now. The message we want to send to users is that you can directly use the services provided by Apache ShardingSphere but when you use the services, you'll probably feel like you are just using a traditional database. You don't have to perceive underlying database storage. Apache ShardingSphere's built-in SingleTableRule can manage single tables beyond data sharding, aiming to help users implement single table automatic loading & routing.

Additionally, to further simplify configuration, in conjunction with the defaultDatabaseStrategy and defaultTableStrategy sharding strategies in Data Sharding API, defaultShardingColumn as the default sharding key is added as well.
When multiple tables have the same sharding key, the user only needs to use the default defaultShardingColumn configuration rather than shardingColumn. The sharding strategy of the t_order table is set via the default defaultShardingColumn configuration(see the code below).

rules:
- !SHARDING
  tables:
    t_order: 
      actualDataNodes: ds_${0..1}.t_order_${0..1}
      tableStrategy: 
        standard:
          shardingAlgorithmName: table_inline
  defaultShardingColumn: order_id
  defaultDatabaseStrategy:
    standard:
      shardingAlgorithmName: database_inline
  defaultTableStrategy:
    none:

Read/Write Splitting API

We didn't make a lot of changes to the Read/write Splitting API in the 5.0.0 GA version. We only adjusted from MasterSlave to ReadWriteSplitting while other usages are unchanged. The following code block shows you the differences between the Read/write Splitting API of the 4.1.1 GA version and that of the 5.0.0 GA version.

# 4.1.1 GA Read/Write Splitting API
masterSlaveRule:
  name: ms_ds
  masterDataSourceName: master_ds
  slaveDataSourceNames:
    - slave_ds_0
    - slave_ds_1

# 5.0.0 GA Read/Write Splitting API
rules:
- !READWRITE_SPLITTING
  dataSources:
    pr_ds:
      writeDataSourceName: write_ds
      readDataSourceNames:
        - read_ds_0
        - read_ds_1

Additionally, the High Availability function developed in the pluggable architecture plus Read/write Splitting can provide an automated switch between master and slave, producing a high availability version of read-write splitting. If you are interested in the high-availability function, keep an eye on our GitHub repo or socials. We will soon publish related documents and technical blogs.

Encryption & Decryption API

We add queryWithCipherColumn property at the table level into Encryption & Decryption API, making it convenient for users to switch plaintext and ciphertext of encrypted/decrypted fields in a table. There are no other changes in the 5.0.0 version API.

- !ENCRYPT
  encryptors:
    aes_encryptor:
      type: AES
      props:
        aes-key-value: 123456abc
    md5_encryptor:
      type: MD5
  tables:
    t_encrypt:
      columns:
        user_id:
          plainColumn: user_plain
          cipherColumn: user_cipher
          encryptorName: aes_encryptor
        order_id:
          cipherColumn: order_cipher
          encryptorName: md5_encryptor
      queryWithCipherColumn: true
  queryWithCipherColumn: false

Shadow Database Stress Testing API

We completely adjust the Shadow Database Stress Testing API in version 5.0.0 GA. The first adjustment is the deletion of logical columns in Shadow Database, and the creation of Shadow Database Matching Algorithm to help users flexibly control routing.
The code block below is the Shadow Database Stress Testing API of the old 4.1.1 GA version. Honestly, the function is quite simple: according to the logic column value, users can judge whether the shadow database stress test is enabled or not.

shadowRule:
  column: shadow
  shadowMappings:
    ds: shadow_ds

In the 5.0.0 GA version, Shadow Database Stress Testing API is much more powerful. Users can enable the test via enable attribute. At the same time, fine-grained control of production tables is implemented.
The new API also supports a variety of matching algorithms, such as column value matching algorithm, column regular expression matching algorithm, and SQL comment matching algorithm.

rules:
- !SHADOW
  enable: true
  dataSources:
    shadowDataSource:
      sourceDataSourceName: ds
      shadowDataSourceName: shadow_ds
  tables:
    t_order:
      dataSourceNames:
        - shadowDataSource
      shadowAlgorithmNames:
        - user-id-insert-match-algorithm
        - simple-hint-algorithm
  shadowAlgorithms:
    user-id-insert-match-algorithm:
      type: COLUMN_REGEX_MATCH
      props:
        operation: insert
        column: user_id
        regex: "[1]"
    simple-hint-algorithm:
      type: SIMPLE_NOTE
      props:
        shadow: true
        foo: bar

Due to the word limit of the article, we cannot introduce the shadow database stress testing function in detail - but we will share more related technical content soon. If you're interested in shadow database matching algorithms, please read Shadow Algorithm

Conclusion

In the future, we will continue to develop more new features of the pluggable kernel to expand the Apache ShardingSphere ecosystem with amazing functions. We will make more efforts to optimize Apache ShardingSphere. ** As always, you're welcome to join us in developing the Apache ShardingSphere project.**