π‘ Why Do We Need to Limit Concurrent Connections?
First, we need to understand a harsh reality: server resources are limited! Just like a small restaurant, if too many customers rush in at the same time, the service quality will inevitably decline and may even lead to chaos. The same applies to web servers:
- π― Limited memory resources
- π― Limited CPU processing power
- π― Limited network bandwidth
- π― Limited database connections
If we don't limit the number of concurrent connections, it may lead to:
- π± Slower server response
- π± Memory overflow
- π± Complete service downtime
- π± Other users unable to access
Let's take an example of a service we wrote:
- Allocated 2GB memory
- No request limit
- Each request consumes some memory
Combining these 3 conditions, when there are too many requests and memory exceeds the limit, the service crashes directly. Let's simulate this:
const http = require('http');
const { promisify } = require('util');
const fs = require('fs');
const readFileAsync = promisify(fs.readFile);
async function loadLargeFileIntoMemory() {
try {
const data = await readFileAsync('./largefile');
return data;
} catch (error) {
console.error('Error reading large file:', error);
return null;
}
}
const server = http.createServer(async (req, res) => {
const largeFileData = await loadLargeFileIntoMemory();
if (largeFileData) {
res.writeHead(200, { 'Content-Type': 'text/plain' });
res.end('Request processed');
} else {
res.writeHead(500, { 'Content-Type': 'text/plain' });
res.end('Internal Server Error');
}
});
const PORT = process.env.PORT || 3000;
server.listen(PORT, () => {
console.log(`Server listening on port ${PORT}`);
});
Before sending requests, our program only occupied 35.1 MB of memory.
Simulated 200 concurrent connections using ab
Now our Node.js program has reached its peak memory usage! 27GB!
Let's limit the memory and make it crash.
node --max-old-space-size=2048 server.js
Can this set the Node.js memory usage?
Let's briefly mention the difference between V8 heap memory and process total memory:
--max-old-space-size
only limits V8 engine's heap memory, but Node.js process total memory also includes:
- Off-heap memory (Buffer, thread pool, etc.)
- System memory (system calls, file operations, etc.)
- Native memory (C++ level memory allocation)
For example, if we use Buffer, Buffer is allocated outside the V8 heap, here's an example:
const randomString = crypto.randomBytes(1024).toString('hex');
Then the Buffer created by crypto.randomBytes()
is not limited by --max-old-space-size
.
Taking fs.readFile
in our program as an example, it's used to asynchronously read the entire contents of a file. When reading a file, the content is stored in a Buffer
object, so it won't be limited by heap memory allocation.
So how should we limit it?
- System-level memory limits (ulimit or Docker)
- Process management tools (PM2)
- Code-level memory monitoring and control
Interested readers can try it themselves. The above example shows that not controlling request concurrency can have disastrous effects on our program, so after development, we need to estimate how many visitors we will have to prepare appropriate resources for deployment.
π οΈ Implementation Solutions
Let's first implement the basic functionality of limiting concurrency
1. Using Queue to Control Concurrency
Let's look at a simple but practical implementation solution using a queue to control concurrent requests:
const express = require('express');
const app = express();
// Create a simple queue class
class RequestQueue {
constructor(maxConcurrent) {
this.maxConcurrent = maxConcurrent;
this.currentRequests = 0;
this.queue = [];
}
// Add request to queue
enqueue(req, res, next) {
if (this.currentRequests < this.maxConcurrent) {
this.currentRequests++;
next();
} else {
this.queue.push({ req, res, next });
}
}
// Release resources after processing
dequeue() {
this.currentRequests--;
if (this.queue.length > 0) {
const { req, res, next } = this.queue.shift();
this.currentRequests++;
next();
}
}
}
// Create queue instance, max concurrency set to 100
const requestQueue = new RequestQueue(100);
// Middleware: limit concurrency
const limitConcurrent = (req, res, next) => {
requestQueue.enqueue(req, res, next);
};
// Use middleware
app.use(limitConcurrent);
// Release resources when request ends
app.use((req, res, next) => {
res.on('finish', () => {
requestQueue.dequeue();
});
next();
});
// Example route
app.get('/api/test', async (req, res) => {
// Simulate time-consuming operation
await new Promise(resolve => setTimeout(resolve, 1000));
res.json({ message: 'Request processed successfully!' });
});
app.listen(3000, () => {
console.log('Server started on port 3000');
});
We limit concurrent request handling to 10, let's test it with ab
ab -n 100 -c 20 http://127.0.0.1:3000/api/test
The effect is very good, the results show that it can process 9 requests per second.
Let's explain the code briefly. First, we create a queue to store queued requests. If the current number of requests has already exceeded, say 10, then we put these requests in the queue
if (this.currentRequests < this.maxConcurrent) {
this.currentRequests++;
next();
} else {
this.queue.push({ req, res, next });
}
When the previous requests are processed, we retrieve these queued requests to continue processing
res.on('finish', () => {
requestQueue.dequeue();
});
2. Implementation Using Third-Party Libraries
With the basic principles covered above, we can take a look at mature libraries like bottleneck
to see how they implement concurrency limiting. The implementation methods are quite similar.
const Bottleneck = require('bottleneck');
// Create a limiter instance
const limiter = new Bottleneck({
maxConcurrent: 100, // Maximum number of concurrent requests
minTime: 100 // Minimum time between requests (ms)
});
// Apply the limiter middleware
app.use(limiter);
π¨ Optimizing Our Program's Concurrency Control
1. Implementing Graceful Degradation π―
Our current implementation makes users wait in line, consuming server resources while waiting. Another approach is to implement service degradation. Let requests return directly so that users won't be stuck waiting, which provides a better user experience and reduces the server load. We just need to return when the concurrency limit is reached.
// Return a friendly prompt when the concurrency limit is reached
if (requestQueue.currentRequests >= requestQueue.maxConcurrent) {
return res.status(503).json({
message: 'The server is busy. Please try again later.'
});
}
After modification, we can see that the program's concurrent processing capacity has improved. It can handle 50 requests per second. Only the business logic of 10 requests will actually be processed, and other requests will return directly.
2. Monitoring and Alert Mechanism π
After the program is deployed online, we need a complete monitoring and alert mechanism. For example, to achieve the following purposes:
- I can have data to observe the concurrency situation in the past two days.
- When the concurrency exceeds a certain value, I should be notified.
At this time, we need to:
- Expose the current connection situation of the program.
- Collect connection information and customize alert rules.
In the Node.js ecosystem, prom-client
is a commonly used library for creating and exposing monitoring metrics. It works well with monitoring systems (such as Prometheus), making it convenient for us to collect and display various indicator data of the application.
const prometheus = require('prom-client');
const counter = new prometheus.Counter({
name: 'concurrent_requests',
help: 'Current number of concurrent requests'
});
// Record the number of concurrent requests
app.use((req, res, next) => {
counter.inc();
res.on('finish', () => counter.dec());
next();
});
Through this integration, we have exposed the connection number of the program to the /metric
path. The next step is to configure the collector to collect data and observe and customize alert rules on the Prometheus platform.
3. Dynamically Adjusting Concurrency Limits π
The concurrency limits we set earlier, such as 10 or 100, are not very intelligent. Although we can dynamically adjust them through environment variables and deployment resources to cope with unknown traffic volumes, is there a smarter way to help us determine what the limit should be set to? Yes, there is!
Dynamic concurrency limiting can intelligently adjust the maximum number of concurrent requests based on the real-time load of the system. When the system load is light, appropriately increase the concurrency limit to make full use of idle resources and improve the processing capacity of the application.
When the system load is heavy, timely reduce the concurrency limit to avoid excessive competition and depletion of resources and ensure the basic response ability and stability of the service.
To obtain the current load capacity of the system, we need to use the os
library. The os.loadavg()
method returns an array containing the average loads for 1 minute, 5 minutes, and 15 minutes:
[ 3.521484375, 3.57373046875, 3.6845703125 ]
This value is related to the number of CPU cores. For example, in a single-core system, a return value of 1 indicates a fully loaded state. Taking a single-core system as an example, we can dynamically adjust the concurrency limit for our program.
const os = require('os');
function startMonitoring() {
setInterval(() => {
const load = os.loadavg()[0];
if (load > 0.7) {
requestQueue.maxConcurrent = Math.max(50, requestQueue.maxConcurrent - 10);
} else if (load < 0.3) {
requestQueue.maxConcurrent = Math.min(200, requestQueue.maxConcurrent + 10);
}
}, 60000);
}
const server = http.createServer((req, res) => {
handleRequest(req, res);
});
server.listen(3000, () => {
startMonitoring();
console.log('Server listening on port 3000');
});
By monitoring the system load in real time and adjusting the maximum number of concurrent requests flexibly according to the load situation, we can avoid resource waste and service failures. However, implementing this dynamic concurrency limiting mechanism requires comprehensive testing based on our actual scenario. In this example, loadavg
provides us with a good reference:
For example, a Node.js application may frequently perform disk I/O operations (such as reading or writing files) or network I/O operations (such as sending HTTP requests and waiting for responses). These I/O operations may cause the process to enter a waiting state, which will be counted towards the system load. Therefore, even if the CPU usage is not high, if there are a large number of processes waiting for I/O completion, the value of loadavg
may still be high.
π Practical Suggestions
- π― Set a reasonable concurrency limit based on the server configuration and have a clear understanding of your program's capabilities.
- π― Consider the characteristics of the business. It may be necessary to set different limits for different APIs. This article only sets the maximum concurrency limit and does not consider fairness.
- π― Regularly monitor and analyze performance data to avoid being unaware of the situation online.
- π― Establish a comprehensive degradation plan.
π Summary
Through reasonable concurrency control, we can:
- π Protect the server from being overloaded.
- π Provide more stable services.
- π Optimize resource utilization.
- π Improve the user experience.
Rather than waiting for the server to be overwhelmed and then trying to rescue it, it's better to take preventive measures in advance! I hope this article is helpful to everyone! If you find it useful, don't forget to like and follow me! π
Top comments (1)
Hey everyone! I'm really excited to hear what you all have to say about this. Whether you're a beginner or an expert, your input is valuable. Let's have a great discussion together.