As the company's businesses scale, the system is becoming increasingly complex. The R&D testing team is facing various challenges, including high business complexity, large data construction workload, full regression testing, high communication costs, large maintenance of test data, and automation of case management. Each of these will affect the efficiency and quality of the testing team, posing challenges to the software development process.
In summary, there are two difficulties: cost and complexity.
Cost: there will always be a trade-off between cost and quality. It's a quite challenge to ensure quality while iterating quickly and ensuring quality within limited inputs.
Complexity: when business rules accumulate over time, the complexity of business processes, rules, scenarios, and data processing increases exponentially after stacking, posing great challenges to the quality of testing work.
Despite the availability of many automation testing tools on the market today, there are still two problems:
Automated testing mainly focuses on automating execution. The maintenance work is still done manually, resulting in a less-than-optimal output ratio.
In scenarios where a large amount of test data needs to be constructed, regression testing involves a wide scope, releases are frequent, and data validation and writing are required, both manual and automated testing still face a huge workload in terms of maintaining test cases and data, and the pain points of testing have not been effectively addressed.
Replay Traffic Testing
Replay traffic refers to capturing the production traffic and replaying it in a test environment, allowing us to exercise updated systems to simulate actual production conditions.
We used this concept to build an automated testing platform AREX, which combines recording, playback, and comparison. (https://github.com/arextest).
Record: record the production requests as well as the data involved in the request processing.
Replay: replay the requests and mock the data involved in the invocation chain.
Comparative Analysis and Reporting: compare and analyze the responses from recording and replay, and get a comprehensive report for the analysis.
What is AREX?
AREX is an open-source automation test platform. It records real traffic in the production environment and mocks all invocations to third-party dependencies with Java Agent bytecode injection technology, then replays the request and mock data in the test environment. Furthermore, it compares the recorded and replayed response messages, and lists the differences, allowing developers and testers to quickly troubleshoot.
The process of replay traffic testing
Record
AOP (Aspect Oriented Programming) is a technique for building common, reusable routines that can be applied applicationwide. A Java Agent can be used to apply AOP principles by intercepting method calls and weaving in additional behavior. It provides the necessary infrastructure to implement AOP in Java applications.
In the flow of a request, the AREX Java Agent records the request information of Java applications in the production environment. It sends this information to the AREX Storage Service, which imports and stores the data in a MongoDB database.
Replay
During replay, the AREX Schedule Service retrieves the recorded data (requests) of the tested application from the database according to the user's specifications. It then sends interface requests to the target verification service. At the same time, the Java Agent returns the recorded responses of external dependencies (external requests/DB) to the tested application. After the target service completes the request logic, it returns the response message.
Comparative Analysis and Reporting
After replay testing, AREX compares the recorded response with the playback response to verify the correctness of the system logic. It then utilizes the comparison results to generate a testing report to review. Throughout this process, the AREX Cache Service (Redis) is responsible for caching mock data and comparison results during the replay process, improving the efficiency of the comparisons.
Technical challenges
Technology Stack of AREX Java Agent
Due to the good performance and code readability, we chose the Byte Buddy library to modify specific bytecode instructions from Java code.
In addition, we use SPI (Service Provider Interface), which is an API proposed to be implemented by a third-party provider. It can be used as an extension or replaceable by existing implementations. Our injection component is implemented through this plug-in model.
Tracing
While recording, AREX will capture the request and response messages of the main servlet and third-party dependencies as a complete test case. It's a challenge to trace these data.
To address this problem, the AREX Java agent generates a unique Record ID and saves it to the Thread Local variable. Injecting the agent code at the beginning and end of the application function, this code reads the value in the Thread Local variable and stores it together with the intercepted data.
Record and Replay Solution
Let's start with a simple function as an example to understand how to record and replay. Suppose we have the following function that converts a given IP string to an integer value:
public Integer parse(String ip) {
int result = 0;
if (checkFormat(ip)) { // Check whether IP address is legal
String[] ipArray = ip.split("\\.");
for (int i = 0; i < ipArray.length; i++) {
result = result << 8;
result += Integer.parseInt(ipArray[i]);
}
}
return result;
}
- Record (traffic collection)
When this function is called, we save the corresponding request parameters and the return result for later use in traffic replay. Here's the code:
if (needRecord()) {
// Data collection, saving parameters and results in DB
DataService.save("parseIp", ip, result);
}
- Replay (traffic replay)
While replaying the traffic, we can use the previously collected data to automatically mock this function, without actually sending the request:
if (needReplay()) {
return DataService.query("parseIp", ip);
}
By examining the complete code, we can better understand the logic :
public Integer parseIp(String ip) {
if (needReplay()) {
// replay scenes, use the collected data as the return result, which is essentially mock.
return DataService.query("parseIp", ip);
}
int result = 0;
if (checkFormat(ip)) {
String[] ipArray = ip.split("\\.");
for (int i = 0; i < ipArray.length; i++) {
result = result << 8;
result += Integer.parseInt(ipArray[i]);
}
}
if (needRecord()) {
// Recorded scenes, save the parameters and results to the database
DataService.save("pareseIp", ip, result);
}
return result;
}
Compatible Component Versions
Popular components often have multiple versions that are used on different systems at the same time, and the implementations of the different versions may be very different or even incompatible.
To address this problem, AREX makes multiple versions compatible. In the application startup, the AREX Agent will capture all the dependent package information, such as the JAR package Manifest.MF file, from the Manifest to get the version information of the library, and then according to the version information to start the corresponding AREX injection code, thus realizing the realization of multiple versions of compatibility.
As shown in the following figure, the version range of the current injection script adaptation is set so that AREX can identify the versions of the components that the application depends on before these classes are loaded, and then later match the versions when the classes are loaded to ensure correct code injection.
Code Isolation
Since most scenarios are recorded in the production environment and replayed in the test environment, stability is of paramount importance. For the stability of the system and to prevent the Agent's code from affecting the code execution of the application under test, AREX has realized code isolation and interoperability.
The AREX core JAR is loaded in an independent ClassLoader, which is not interoperable with the user's application code. To ensure that the injected code can be accessed correctly at runtime, a simple modification to the ClassLoader is made.
Mock Time
Many business scenarios are time-sensitive. Imagine that during the replay of requests, the recorded time has expired, resulting in failed testing.
Here we implement currentTimeMillis()
to proxy the original call of Java's currentTimeMillis()
. While recording the traffic, capture the current time for each test case. By using a proxy for System.currentTimeMillis(), the time during replay will be replaced with simulated time that matches the time of recording, to mock time.
Mock Caches
Applications in real production environments often utilize different types of caches to enhance runtime performance. However, the variations in cached data can lead to inconsistent execution results.
These dynamic classes can be mocked in AREX, by accessing the local cache method configured as a dynamic class, equivalent to your customization of the method for mock, which will be recorded in the production environment where you configure the dynamic class method data, replay corresponding to the match out of the data returned.
What can you do with AREX?
Diversity validation
To verify the correctness of business logic after modifying the system, merely checking the return results is not enough. Usually, it is also necessary to verify the correctness of intermediate process data, such as whether the data content written to the database by the business system is correct.
In response to this, AREX also supports validating the data writing to the third-party dependencies.
During the recording and replay process, AREX will record the database requests sent to the outside by both the old and new versions of the system, and compare these two requests. If there are differences, they will be displayed in the report.
Since AREX mocks all requests to third-party dependencies, it supports the verification of data in databases, message queues, Redis, and even runtime memory data. Moreover, during the playback process, it does not generate calls to the database, so there is no dirty data.
Reproduce the Production Issues
In actual usage, AREX can also be used to quickly localize production issues.
After a production issue occurs, due to version differences, data differences, and other issues, it can be difficult for developers to reproduce on their local machines, and the cost of debugging is high and time-consuming.
By using AREX, you can force the recording of problematic cases (the response message will generate a unique Record ID) on the production environment. Then start your local development environment and add this Record ID to the request message header. This allows you to restore the recorded request and data on your local machine using the playback function, and then directly debug the production issue using local code.
Community⤵️
🐦 Follow us on Twitter
📝 Join AREX Slack
📧 Join the Mailing List
Top comments (0)