DEV Community

yedf2
yedf2

Posted on • Edited on

Use workflow to handle distributed transactions

In the world of microservices a transaction is now distributed to multiple services that are called in a sequence to complete the entire transaction.

With the advent of microservice architecture we are losing the ACID nature of databases. Transactions may now span multiple microservices and therefore databases.

To address the problems of distributed transactions, the following list of approaches have been described:

  • Two-Phase Commit / XA
  • Eventual Consistency and Compensation / SAGA
  • Try, Confirm, Cancel / TCC.

In classic solutions, the developers should choose one of the three patterns to handle distributed transactions.

But in this article, we introduce a workflow pattern in github.com/dtm-labs/dtm. Under this pattern, a mixture of XA, SAGA and TCC can be applied to different branches in a single distributed transactions, allowing users to customize most of the contents of a distributed transaction, providing great flexibility.

workflow example

In Workflow mode of DTM, both HTTP and gRPC protocols can be used. The following is an example of the gRPC protocol, which is divided into the following steps.

  • Initialize the SDK
  • Register workflow
  • Execute workflow

Initialize the SDK

First you need to Initialize the SDK's workflow before you can use it.

import "github.com/dtm-labs/dtmgrpc/workflow"

// Initialize the workflow SDK with three parameters.
// the first parameter, the dtm server address
// the second parameter, the business server address
// The third parameter, grpcServer
// workflow needs to receive the dtm server callback from the "business server address" + "grpcServer"
workflow.InitGrpc(dtmGrpcServer, busi.BusiGrpc, gsvr)
Enter fullscreen mode Exit fullscreen mode

Register workflow

Then you need to register workflow's handler function. The following is a saga workflow to do cross-bank transfer.

wfName := "wf_saga"
err := workflow.Register(wfName, func(wf *workflow.Workflow, data []byte) error {
  req := MustUnmarshalReqGrpc(data)
  wf.NewBranch().OnRollback(func(bb *dtmcli.BranchBarrier) error {
    _, err := busi.BusiCli.TransOutRevert(wf.Context, req)
    return err
  })
  _, err := busi.BusiCli.TransOut(wf.Context, req)
  if err ! = nil {
    return err
  }
  wf.NewBranch().OnRollback(func(bb *dtmcli.BranchBarrier) error {
    _, err := busi.BusiCli.TransInRevert(wf.Context, req)
    return err
  })
  _, err = busi.BusiCli.TransIn(wf.Context, req)
  return err
})
Enter fullscreen mode Exit fullscreen mode
  • This registration operation needs to be executed after the business service is started, because when the process crashes, dtm will call back to the business server to continue the unfinished task
  • The above code NewBranch will create a transaction branch, one that will include a forward action and a callback on global transaction commit/rollback
  • OnRollback/OnCommit will register a callback on global transaction rollback/commit for the current transaction branch, in the above code, only OnRollback is specified, so it is in Saga mode
  • The busi.BusiCli in the above code needs to add a workflow interceptor which will automatically record the results of the rpc request to dtm as follows
conn1, err := grpc.Dial(busi.BusiGrpc, grpc.WithUnaryInterceptor(workflow.Interceptor), nossl)
busi.BusiCli = busi.NewBusiClient(conn1)
Enter fullscreen mode Exit fullscreen mode

You can of course add workflow.Interceptor to all gRPC clients, this middleware will only handle requests under wf.Context and wf.NewBranchContext()

  • When the workflow function returns nil/ErrFailure, the global transaction enters the Commit/Rollback phase, calling the operations registered in OnCommit/OnRollback inside the function in reverse order

Execute workflow

Finally the workflow is executed

req := &busi.ReqGrpc{Amount: 30}
err = workflow.Execute(wfName, shortuuid.New(), dtmgimp.MustProtoMarshal(req))
Enter fullscreen mode Exit fullscreen mode
  • When the result of Execute is nil/ErrFailure, the global transaction has succeeded/been rolled back.
  • When the result of Execute is other values, the dtm server will subsequently call back this workflow task to retry

Principle of workflow

How does workflow ensure data consistency in distributed transactions? When a business process has a crash or other problem, the dtm server will find that this workflow global transaction has timed out and not completed, then the dtm server will use an exponential retreat algorithm and retry the workflow transaction. When the workflow retry request reaches the business service, the SDK will query the progress of the global transaction from the dtm server, and for the completed branch, it will take the previously saved result and return the branch result directly through an interceptor such as gRPC/HTTP. Eventually the workflow will complete successfully.

Workflow functions need to be idempotent, i.e. the first call, or subsequent retries, should get the same result

Saga in Workflow

The core idea of the Saga pattern, derived from this paper SAGAS, is that long transactions are split into short transactions, coordinated by the Saga transaction coordinator, and if each short transaction operation successfully completes, then the global transaction completes normally, and if a step fails, the compensating operations are invoked one at a time in reverse order.

In Workflow mode, you can call the function for the operation directly in the function and then write the compensation operation to OnRollback of the branch, and then the compensation operation will be called automatically, achieving the effect of Saga mode

Tcc under Workflow

The Tcc pattern is derived from this paper Life beyond Distributed Transactions: an Apostate's Opinion, he divides a large transaction into multiple smaller transactions, each of which has three operations.

  • Try phase: attempts to execute, completes all business checks, set aside enough business resources
  • Confirm phase: if the Try operation succeeds on all branches, it goes to the Confirm phase, which actually executes the transaction without any business checks, using only the business resources set aside in the Try phase
  • Cancel phase: If one of the Try operations from all branches fails, we go to the Cancel phase, which releases the business resources reserved in the Try phase.

For our scenario of an inter-bank transfer from A to B, if SAGA is used and the balance is adjusted in the forward operation, and is adjusted reversely in the compensating operation, then the following scenario would occur.

  • A deducts the money successfully
  • A sees the balance decrease and tells B
  • The transfer of the amount to B fails and the whole transaction is rolled back
  • B never receives the funds

This causes great distress to both ABs. This situation cannot be avoided in SAGA, but it can be resolved by TCC with the following design technique.

  • Introduce a trading_balance field in addition to the balance field in the account
  • Try phase to check if the account is frozen, check if the account balance-trading_balance is sufficient, and then adjust the trading_balance (i.e. the funds frozen for business purposes)
  • Confirm phase, adjust balance, adjust trading_balance (i.e. unfrozen funds for the business)
  • Cancel phase, adjust trading_balance (i.e. unfrozen funds on the business)

In this case, once end user A sees his balance deducted, then B must be able to receive the funds

In Workflow mode, you can call the Try operation directly in the function, then register the Confirm operation to OnCommit in the branch and register the Cancel operation to OnRollback in the branch, achieving the effect of the Tcc mode

XA under Workflow

XA is a specification for distributed transactions proposed by the X/Open organization. The XA specification essentially defines the interface between a (global) Transaction Manager (TM) and a (local) Resource Manager (RM). Local databases such as mysql play the RM role in the XA

XA is divided into two phases.

  • Phase 1 (prepare): i.e. all participants RM prepare to execute the transaction and lock the required resources. When the participants are ready, they report to TM that they are ready.
  • Phase 2 (commit/rollback): When the transaction manager (TM) confirms that all participants (RMs) are ready, it sends a commit command to all participants.

Currently all major databases support XA transactions, including mysql, oracle, sqlserver, postgres

In Workflow mode, you can call NewBranch().DoXa in the workflow function to open your XA transaction branch.

Mixing multiple modes

In Workflow mode, Saga, Tcc and XA as described above are all patterns of branching transactions, so you can use one pattern for some branches and another pattern for others. The flexibility offered by this mixture of patterns allows for sub-patterns to be chosen according to the characteristics of the branch transaction, so the following is recommended.

  • XA: If the business has no row lock contention, and the global transaction will not last long, then XA can be used. This pattern requires less additional development and Commit/Rollback is done automatically by the database. For example, this pattern is suitable for order creation business, where different orders lock different order rows and have no effect on concurrency between each other; it is not suitable for deducting inventory, because orders involving the same item will all compete for the row lock of this item, which will lead to low concurrency.
  • Saga: common business that is not suitable for XA can use this model, this model has less extra development than Tcc, only need to develop forward operation and compensation operation
  • Tcc: suitable for high consistency requirements, such as the transfer described earlier, this pattern has the most additional development and requires the development of operations including Try/Confirm/Cancel

gRPC support

Support for gRPC, consists of two aspects:

  • communicating with the dtm server by gRPC protocol, e.g. in the above example, after calling workflow.InitGrpc above to initialize, the dtm SDK will take the gRPC interface to interact with the dtm server
  • In the above example, reply, err = busi.BusiCli.TransIn(wf.Context, req), this gRPC call will automatically save the result of the call to the dtm server through the gRPC interceptor. When workflow is executed again, the result saved by the dtm server will be returned directly to the caller.

HTTP support

Support for HTTP, consists of two aspects:

  • Communicating with the dtm server by means of the HTTP protocol. After initialization of workflow.InitHTTP, the dtm SDK will take the HTTP interface to interact with the dtm server.
  • Accessing the transaction branch by HTTP protocol, e.g. resp, err := wf.NewBranch().NewRequest().SetBody(req).Post(Busi + "/TransOut"), in this HTTP call, the result of the call will be automatically saved to the dtm server and, when workflow is executed again, the result saved by the dtm server is automatically returned directly to the caller.

Local Operation Support

In Workflow mode, your transaction branch, not only can use remote branches such as HTTP/gRPC, but can also be a local operation. The following code demonstrates a local transactional operation.

wf.Do(func(bb *dtmcli.BranchBarrier) ([]byte, error) {
  return nil, bb.CallWithDB(dbGet(), func(tx *sql.Tx) error {
    return busi.SagaAdjustBalance(tx, busi.TransOutUID, -req.Amount, "")
  })
})
Enter fullscreen mode Exit fullscreen mode

You can perform local transaction operations as well as other operations, as flexibly as possible.

gRPC/HTTP/Local Mixed Usage

Within a distributed transaction, you can also use a mix of gRPC/HTTP/local. These approaches to your transaction branch can give you a great deal of flexibility. It provides a very good solution to introducing distributed transactions for multiple technology stacks, for various scenarios such as the coexistence of multiple protocols, and for existing legacy systems.

Idempotency Requirements

In the Workflow pattern, when a crash occurs, a retry is performed, so the individual operations are required to support idempotency, i.e. the result of the first call is the same as the next tries, returning the same result. In business, the unique key of the database is usually used to achieve idempotency, specifically insert ignore "unique-key", if the insert fails, it means that this operation has been completed, this time directly ignored to return; if the insert succeeds, it means that this is the first operation, continue with the subsequent business operations.

If your business itself is idempotent, then you can operate your business directly; if your business does not provides idempotent functionality, then dtm provides a BranchBarrier helper class, based on the above unique-key principle, which can easily help developers implement idempotent operations for Mysql/Mongo/Redis.

Please note that the following two are typical non-idempotent operations.

  • Timeout rollback: If you have an operation in your business that may take a long time, and you want your global transaction to roll back after waiting for the timeout to return a failure. Then this is not an idempotent operation because in the extreme case of two processes calling the operation at the same time, one returns a timeout failure and the other a success, resulting in different results
  • Rollback after reaching the retry limit: the analysis process is the same as above.

Workflow mode does not support the above timeout rollback and retry limit rollback at the moment, if you have a relevant scenario, please send us the specific scenario, we will actively consider whether to add this kind of support

Branch Operation Results

Branching operations will return the following results.

  • Success: the branch operation returns HTTP-200/gRPC-nil
  • Business failure: the branch operation returns HTTP-409/gRPC-Aborted, no more retries, and the global transaction needs to be rolled back
  • In progress: branch operation returns HTTP-425/gRPC-FailPrecondition, this result indicates that the transaction is in progress normally and requires the dtm to retry not with the exponential retreat algorithm but with fixed interval retries
  • Unknown error: the branch operation returns other results, indicating an unknown error, and dtm will retry this workflow, using the exponential retreat algorithm

If your existing service has a different result to the above, then you can customise this part of the result with workflow.Options.HTTPResp2DtmError/GRPCError2DtmError

Saga's Compensation operation and Tcc's Confirm/Cancel operation are not allowed to return business failures according to the Saga and Tcc protocols, because when in the second stage of the workflow, Commit/Rollback, is neither successful nor allowed to retry, then the global transaction cannot be completed, so please take care to avoid this when designing

Transaction Completion Notification

Some business scenarios where you want to be notified of the completion of a transaction can be achieved by setting an OnFinish callback on the first transaction branch. By the time the callback is called, all business operations have been performed and the global transaction is therefore substantially complete. The callback function can determine whether the global transaction has finally been committed or rolled back based on the isCommit passed in.

One thing to note is that when the OnFinish callback is called, the state of the transaction has not yet been modified to final state on the dtm server, so if you use a mixture of transaction completion notifications and querying global transaction results, the results of the two may not be consistent, and it is recommended that users use only one of these methods rather than a mixture.

Conclusion

We introduce a workflow pattern to support the mixed usage of Saga, XA and Tcc. This pattern also support HTTP, gRPC and local transactions.

We also illustrate the usage of workflow in Golang with runnable examples.

You are welcome to visit https://github.com/dtm-labs/dtm

Top comments (0)