DEV Community

Cover image for 95% Cost Reduction: How RisingWave Powers Metabit Trading’s Real-Time Monitoring Platform
RisingWave Labs
RisingWave Labs

Posted on

95% Cost Reduction: How RisingWave Powers Metabit Trading’s Real-Time Monitoring Platform

Company Background

Metabit Trading is a technology-driven quantitative investment firm founded on artificial intelligence and machine learning, dedicated to combining rigorous theories of mathematical statistics with cutting-edge computer technology to yield long-term sustainable returns for investors. The core team members hail from top-tier institutions and companies such as Stanford, CMU, Facebook, and Google. Currently, the firm manages assets exceeding 1 billion USD.

Metabit Trading has established a series of alert rules for trading activities, such as monitoring account activity, cash reserve sufficiency, and their compliance with relevant regulations, to ensure the health and legality of its trading system. The input for this alert system comes from real-time business information from the trading machines, and its output is a judgment on whether current trading activities meet expectations. This system, which demands stability, low latency, and the accuracy of computation results, is built by Metabit using RisingWave. Currently, RisingWave can process input messages with tens of thousands of QPS and output alert messages within only seconds of delay.

Why Replace the Existing Database System?

Before adopting RisingWave, Metabit used a self-built service to write live data into a mainstream OLAP database (referred to as System X hereafter) in real time, issuing alerts with periodic queries. The reasons for choosing System X are as follows.

  1. System X exhibits strong writing performance with a rating of approximately 50-200 MB/s. Coupled with high-throughput middleware, it enables low-latency data ingestion, suitable for scenarios of continuous writing without modifications.
  2. Materialized views in System X could preprocess data to optimize query performance, especially for aggregation operations involved in alert rules.

However, as the development and use of System X progressed, Metabit faced new challenges:

  1. System X is not conducive to high-concurrency queries.  System X is designed to handle a small number of computationally intensive requests, which means a single query could potentially exhaust all its resources. The officially recommended QPS is 100, limiting the further development of the monitoring business. To address this issue, Metabit implemented some mitigation measures such as redesigning partitioning and index granularity, limiting resources available for a single query, etc. Moreover, Metabit also uses materialized views to preprocess data to optimize query performance, especially for aggregation operations involved in alert rules.
  2. System X lacks sufficient support for distributed consistency. When data is changed, System X creates a temporary partition without modifying the original data files. Changes to data files are delayed until data merging occurs, ensuring only eventual consistency instead of strong consistency. This required Metabit to process data with extreme caution, and avoid join operations to prevent inconsistencies in table results, which complicated data stream processing.
  3. System X offers limited support for horizontal scaling.  Inter-shard communication affects query performance; horizontal scaling directly results in slower queries, challenging developers to balance sharding and query performance. Additionally, the asynchronous nature of most System X processes introduces data inconsistency, posing significant scalability challenges for the system.

So, is there a system that could solve all these business pain points at once? In seeking a new database solution, Metabit encountered RisingWave. RisingWave is dedicated to building a high-performance, highly available, and highly scalable streaming database. After thorough research, Metabit chose RisingWave to replace System X for its risk control system. Using RisingWave, Metabit has achieved a threefold improvement in data timeliness even with a halved number of compute nodes, significantly enhancing the cluster's cost-effectiveness.

Why Choose RisingWave?

Efficient Incremental Model for Ultimate Performance

As a streaming database, RisingWave allows users to quickly establish real-time stream processing tasks by creating materialized views with SQL compatible with Postgres. RisingWave's materialized views support real-time incremental updates, meaning each batch of new data triggers calculations and updates the real-time materialized view results. Under the hood, the internal state of each operator involved in the materialized views is maintained for better incremental update capabilities. This high-performance incremental model design of RisingWave delivers performance surpassing that of System X. Previously, Metabit implemented minute-level monitoring using System X, with a second-level P90 latency for data entry. After deploying RisingWave, this figure has been reduced by several times, with sub-second data entry.

Rich Incremental Computations for Ultimate Resource Costs Savings

In data processing, full computations process all data, while incremental computations only process newly added or modified data. Therefore, the former leads to significant performance overhead and an increase in resources and costs. In business, Metabit aimed to avoid full computations as much as possible, but unfortunately, System X's materialized views had limitations in some common syntax. For example, window functions, very common in trade analysis for calculating aggregate results within a specific range around each row, were better supported by RisingWave through its OverWindow operator. The reduction in incremental computations also decreased Metabit's resource cost overhead. In one of Metabit's monitoring clusters, the same monitoring business support required hundreds of cores without materialized views in System X, and tens of cores (13%) with materialized views and fine-tuning in System X. In contrast, RisingWave only needed fewer than ten cores to complete the same task. Compared to System X clusters without materialized views, RisingWave reduced Metabit's computing resources by over 95% and at least 70% compared to System X clusters with materialized views, achieving ultimate cost savings.

Hybrid Architecture for Superior Horizontal Scaling

Balancing horizontal scaling and query performance has always been a challenge for developers, but this issue does not exist with RisingWave. Thanks to its maintenance of the hybrid state of both remote and local states, RisingWave not only utilizes cloud storage features for stronger elastic scaling but also maintains high query performance through local cache. Such architectural design frees developers from the dilemma of seeking compromise between horizontal scaling and query performance.

Strong Distributed Consistency

RisingWave introduces the Barrier mechanism to ensure consistency across nodes in the computation graph, externally presenting as consistency among materialized views (MViews). The Barrier mechanism is a synchronization method used in distributed systems to guarantee the atomicity and consistency of data processing. In RisingWave, this mechanism ensures that even in a distributed environment, accessing multiple materialized views yields strongly consistent results. This strong consistency guarantee allows users to obtain accurate and consistent results when performing SQL operations, regardless of how data is distributed or updated. This is particularly important in complex distributed queries and real-time data analysis, as it eliminates confusion and errors caused by data inconsistencies, making it convenient and meaningful for data users.

Extensive Sink Interface Support

RisingWave offers a wide range of source connectors and sink connectors, greatly simplifying data integration and transmission. These connectors support seamless connection of various sources to sinks, efficiently managing diverse data streams. Additionally, RisingWave supports multiple encoding formats such as JSON, Avro, and Protobuf, optimizing system integration, enhancing data compatibility, and improving processing efficiency. Especially for building real-time monitoring systems, these features enable RisingWave to process and analyze large amounts of data streams in real time, providing strong data support and stable performance for the system. Meanwhile, combined with RisingWave's materialized view features, it is possible to feed processed materialized view data into Kafka, transitioning from passive to proactive alerts, freeing from the pain points of high-concurrency queries.

UDF Custom Service Support

Similar to MySQL's custom functions, RisingWave allows users to add custom logic to RisingWave's stream computing through User-Defined Functions (UDFs). While SQL is already very powerful, UDFs offer more flexibility, especially suitable for scenarios involving sensitive logic or a lack of full compatibility due to migration of the existing systems. With UDF custom functions, Metabit can migrate business logic more conveniently without changing the overall architecture.

Observability

RisingWave has built a set of observability systems for itself by using Grafana to provide a variety of metrics and dashboards of different granularities, and supporting the integration of monitoring into internal or user-maintained Prometheus systems. This feature allows RisingWave to seamlessly integrate into the company's existing Prometheus monitoring system, greatly facilitating monitoring management.

Technical Support

In the POC stage of the project, RisingWave provided Metabit with strong technical support, helping developers better understand the usage of the RisingWave streaming database. When business logic migration incompatibilities occurred, timely feedback and suggestions were also provided.

Solution Based on RisingWave

After selecting RisingWave as the database and computing engine for its real-time monitoring system, Metabit built the following real-time monitoring architecture: trading machines transfer business data through Kafka. RisingWave creates materialized views on top of Kafka source tables to calculate trading data under different aggregation conditions, compare them with thresholds, and generate alert results. RisingWave also offers the functionality of writing data back to Kafka. The alert service listens to the Kafka stream and sends alert notifications to monitoring personnel.

Metabit’s real-time monitoring architecture. Image created by the author.<br>

The current monitoring solution runs stably and excellently, characterized by:

  1. All-in-one storage and computation & support for comprehensive data. The RisingWave database centralizes all transaction-related information, including original data like trade orders and account information, which enables centralized customization of alert rules. The RisingWave database also stores the results of monitoring rule calculations. Once an alert is triggered, monitoring personnel can use SQL queries to access original data, intermediate states, and processed data, making issue localization more convenient.
  2. All-in-one stream and table. RisingWave presents stream computing as materialized views, requiring SQL for materialized view creation. As a declarative language, SQL can cover the semantics expressed in the majority of code programming. When modifying alert rules, changing SQL is more flexible than modifying code on machines. Besides the development advantages, RisingWave realizes true real-time data warehousing. It consumes and processes data in a streaming manner, ensuring low latency and accuracy of dynamic results.
  3. Data flow. RisingWave supports feeding processed data into Kafka. Compared to other databases, this feature changes the usual passive query mode of databases and breaks free from the limitation of database query QPS.

CONCLUSION

After in-depth learning of and practice with RisingWave, Metabit has successfully applied RisingWave to its live trading production environment and achieved stable operation. RisingWave provides strong reliability, scalability, efficient connectivity, excellent observability, and outstanding customer support, addressing some of the pain points Metabit previously experienced in its system business, offering a better solution for the real-time monitoring platform. Moving forward, RisingWave will continue to focus on integrating streaming and batch processing. In the future, RisingWave will be able to process real-time data from various sources and efficiently analyze large-scale data stored in data lakes, enhancing users’ data analysis and processing capabilities to help enterprises adapt to the changing market demands and technological challenges.

About RisingWave Labs

RisingWave is an open-source distributed SQL database for stream processing. It is designed to reduce the complexity and cost of building real-time applications. RisingWave offers users a PostgreSQL-like experience specifically tailored for distributed stream processing.

Official Website: https://www.risingwave.com/

Documentation: https://docs.risingwave.com/docs/current/intro/

GitHub: https://github.com/risingwavelabs/risingwave

LinkedIn: linkedin.com/company/risingwave-labs

Slack: https://risingwave.com/slack

Top comments (0)