gvison

Posted on Oct 14

No More Complex Scripts! PerfTest Distributed Load Testing Uses a Single YAML to Orchestrate Massive Cluster Tests Gracefully

#test #performance #go #tooling

Preface

In the previous article, we detailed the single-machine stress testing capabilities of perftest, showcasing how it achieves high-performance testing for HTTP/1.1, HTTP/2, HTTP/3, and WebSocket with a minimalist command-line interface. However, when dealing with large-scale business systems, distributed service deployments, and complex network links, the capabilities of a single machine are clearly insufficient to simulate real-world production environments.

Fortunately, perftest goes beyond a single machine. It also supports distributed cluster stress testing. Through a Collector + Agent architecture, you can easily launch tests from multiple machines simultaneously, enabling performance assessments at the scale of millions or even tens of millions of concurrent users.

Why Choose Distributed Cluster Stress Testing?

1. Replicate Real-World Traffic Scenarios
Single-machine testing cannot simulate user access behavior from different geographical regions, network environments, and latency conditions. Distributed cluster testing allows requests to be sent from multiple geographical nodes at the same time, more closely resembling a real production environment.

2. Break Through Performance Bottlenecks
When a single machine's CPU, memory, or network bandwidth reaches its limit, the test results are no longer accurate. perftest avoids a single point of failure by distributing requests across multiple machines.

3. Validate the System's Horizontal Scaling Capabilities
When your application runs in Kubernetes, a Service Mesh, or a microservices cluster, distributed stress testing can help you verify your system's load balancing strategies and elastic scaling effectiveness under high concurrency scenarios.

4. Provide More Comprehensive Data Observation
Real-time monitoring in cluster mode not only allows you to see overall QPS and latency curves but also to break down the performance distribution of each Agent node, precisely locating performance bottlenecks.

Architecture Design: Collector and Agent

The distributed stress testing system of perftest adopts a minimalist and stable Master-Agent (Collector-Agent) architecture, as shown in the diagram below:

Collector (Master Node)
Responsible for creating test sessions, scheduling all Agents, aggregating performance metrics, and displaying overall data in real-time on a Web UI.
Agent (Execution Node)
Actually executes the stress testing tasks and periodically pushes performance metrics to the Collector. The Agent features mechanisms like automatic registration, hot-reloading of configurations, and automatic reconnection to ensure stable operation in large-scale distributed scenarios.

You can think of it as a "commander" (Collector) directing a stress testing "army" composed of multiple "soldiers" (Agents)—with a single command, all nodes open fire simultaneously.

How It Works: From Registration to Aggregation

The entire stress testing process is very clear:

Start the Collector: The Collector service starts and provides a web management interface.
Create a Test Session: In the web interface, specify the desired number of agent nodes to participate in the test, then start a new session.
Start the Agents: After each agent starts, it reads its configuration file and automatically registers with the Collector.
Start the Test: Once the preset number of agents have completed registration, the Collector notifies all agents to start the stress test simultaneously.
Aggregate Results: Agents push performance data (throughput, latency, etc.) to the Collector in real-time, which the Collector displays on a live dashboard.
Test End and Report Export: After the test is finished, the Collector automatically generates a complete report in Markdown format, which can be downloaded and saved with one click.

Tip: The Agent actively listens to its agent.yml configuration file. Any changes will be automatically hot-reloaded without restarting the agent process, making it very convenient to adjust test parameters.

Quick Start: A Simple Cluster Example

Here is a basic cluster example with 1 Collector and 3 Agents:

Role	IP Address	Description
Collector	192.168.1.20	Management Node
Agent 1	192.168.1.101	Execution Node
Agent 2	192.168.1.102	Execution Node
Agent 3	192.168.1.103	Execution Node

Step 1. Start the Collector

On machine 1 (192.168.1.20), start the Collector service:

sponge perftest collector

Access http://192.168.1.20:8888 in your browser, create a new test session, and set the number of Agents to 3.

Step 2. Start the Agents

Example agent.yml configuration file for each Agent:

# 1. Protocol Configuration (supports: http | http2 | http3)
protocol: http

# 2. Target API Configuration
testURL: "http://localhost:8080/get"
method: "GET"        # Supported methods: GET | POST | PUT | PATCH | DELETE
body: ""             # Supported data types: JSON, Form-data, Text. e.g.: "{\"key\": \"value\"}"
headers:
  - "Authorization: Bearer <token>"
  #- "Content-Type: application/json"

# 3. Stress Test Strategy (choose one: fixed duration or fixed number of requests)
duration: 10s        # e.g.: 10s, 1m, 2h
# total: 500000      # Total number of requests

# 4. Service Discovery, ensure the collector and agent services can communicate
collectorHost: "http://192.168.1.20:8888"      # Address of the Collector service
agentHost: "http://<agent-host-ip>:6601"       # The accessible IP and port of the current agent
agentPushInterval: 1s                          # Metric push frequency

Important Note:

Please replace <agent-host-ip> in agentHost with the actual IP address of each agent machine.

For the same test session, the testURL and method in all agent configuration files must be identical.

Start the agent processes on machines 2, 3, and 4 respectively:

sponge perftest agent -c agent.yml

Once all 3 Agents have successfully registered, the Collector will automatically synchronize and start the stress test, displaying real-time graphs as shown below:

After the test is complete, you can also click "Download Test Report" to get a detailed Markdown report, including all statistical metrics and chart data, for subsequent analysis or performance regression comparisons.

Automated Deployment in Kubernetes Scenarios

For large-scale testing, managing nodes manually is impractical. perftest can be easily deployed on Kubernetes, allowing you to scale your number of agents effortlessly.

A complete YAML deployment manifest and specific operational examples are provided. Please see the documentation Deploying perftest on Kubernetes.

Horizontal Comparison: The "Players" in Distributed Stress Testing

The distributed capability of perftest is also unique. Let's see how it compares to other mainstream distributed stress testing tools in the industry.

Feature / Tool	JMeter	Locust	k6 (Distributed)	perftest (Distributed)
Core Architecture	GUI Master + Agents	Code-based Master/Worker	k6 Operator / CRD	Web-UI Collector + Agents
Test Definition	GUI (XML)	Python script	JavaScript script	YAML Configuration
Resource Consumption	High (Java/GUI)	Medium (Python/gevent)	Medium (Go)	Low (Go)
K8s/Cloud-Native	Complex (requires self-containerization)	Good (provides Helm Chart)	Excellent (native Operator)	Natively Friendly (provides deployment manifest)
Ease of Use	High	Medium	Medium	Extremely Low
Dynamic Configuration	Requires restart	Requires restart	Requires restart	✅ (Hot-reloading)

JMeter (The Veteran): The most powerful and comprehensive stress testing tool, with a vast ecosystem of plugins. However, its Java and GUI-based architecture makes it resource-intensive, and its deployment and automation in cloud-native environments are relatively cumbersome.
Locust (The Coder's Choice): Defines user behavior in Python, making it very friendly for programmers and easy to write complex business logic. Its Master/Worker architecture is clear, but large-scale deployment still requires some DevOps experience.
k6 (The Modern Contender): k6 is deeply integrated with Kubernetes for distributed testing, managing tests through an Operator and CRDs (Custom Resources), making it a powerful choice for cloud-native scenarios. However, it still relies on writing JavaScript scripts to define the load.
perftest (The Pragmatist): perftest continues its "simple and pure" philosophy in distributed scenarios. It replaces scripts with configuration, greatly lowering the barrier to entry for distributed stress testing. Its Collector + Agent architecture is clear and easy to understand, allowing tests to be initiated and monitored via a Web UI, and it provides out-of-the-box Kubernetes deployment files. The hot-reloading feature makes adjusting test parameters exceptionally efficient, eliminating the need to rebuild and redeploy, perfectly fitting the needs of agile iteration.

Conclusion: From Single-Point to Cluster, Simple but Not Simplistic

perftest does not intend to replace comprehensive "Swiss Army knife" tools like k6 or JMeter. Instead, it offers developers and SRE engineers a more modern, focused, and cloud-native-friendly option.

It cleverly covers two core scenarios of performance testing:

For single-machine stress testing, it is a sharp "scalpel," allowing you to quickly validate and compare the latest network protocols, including HTTP/3 and WebSocket, with a minimal learning curve. Through seamless integration with Prometheus, it incorporates performance data into your observability system.
For distributed stress testing, it is a lightweight "command center." Through a concise Collector-Agent architecture and a configuration-driven model, it minimizes the complexity of large-scale stress testing. Its native friendliness to Kubernetes allows it to easily scale to hundreds or thousands of load-generating nodes to simulate massive concurrency.

When your needs are:

Single-machine or cluster stress testing of services or systems using http1/2/3 protocols.
Quickly validating the performance of the latest network protocols (HTTP/3, WebSocket).
Seamlessly integrating performance testing with the Prometheus monitoring system for visualized stress testing.
Performing lightweight performance regression testing in CI/CD.
Quickly setting up and executing large-scale distributed stress tests without writing complex scripts.
Easily achieving elastic scaling of stress testing capabilities in a Kubernetes environment.

Then, perftest is definitely worth a try. In an era of increasingly complex tools, such a tool that can smoothly scale from a "small and beautiful" single-machine utility to a "broad and strong" distributed platform may be the best answer you've been looking for to tackle modern web performance challenges.

perftest is a component of the Sponge ecosystem. Sponge is a powerful and easy-to-use Go development framework that adheres to the core philosophy of "Definition is Code." It enables the easy construction of stable, reliable, and high-performance backend services through a low-code approach, supporting various service types including RESTful API, gRPC, HTTP+gRPC, and gRPC Gateway. Sponge's built-in AI assistant can perceive service code files and their context, generating more suitable business logic code under precise AI constraints, significantly improving development efficiency.

Sponge Github Address: https://github.com/go-dev-frame/sponge

DEV Community