I've been pondering the possibility of improving the performance of the server used for Angular Universal applications, specifically in terms of requests per second. Without delving too deep into Angular's internals and its change detection processes, we typically encounter scenarios with a mix of pre-rendered pages and pages that always undergo the Angular rendering process on the server. A more efficient origin server would naturally be capable of handling a higher volume of requests. Let's explore various strategies to improve the speed of our origin servers. It's important to note that we won't be considering infrastructure-level optimizations or anything extending beyond the origin server, such as CloudFlare. Our focus will be on code-level improvements.
⚠️ The link to the GitHub repository will be provided at the end.
For this article, I have a simple Angular application that consists solely of an app.component.ts
. The template is as follows:
<form>
<mat-form-field>
<mat-label>First name</mat-label>
<input matInput />
</mat-form-field>
<mat-form-field>
<mat-label>Last name</mat-label>
<input matInput />
</mat-form-field>
<button mat-raised-button>Submit</button>
</form>
I intentionally utilized Material to ensure Angular includes its CSS, thereby increasing the final HTML size.
After building the app, I executed the server with node
and utilized the autocannon
tool to benchmark the server:
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬───────┬───────┬───────┬────────┬─────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼───────┼───────┼───────┼────────┼─────────┼──────────┼────────┤
│ Latency │ 29 ms │ 38 ms │ 62 ms │ 126 ms │ 40.8 ms │ 13.48 ms │ 153 ms │
└─────────┴───────┴───────┴───────┴────────┴─────────┴──────────┴────────┘
┌───────────┬────────┬────────┬────────┬────────┬─────────┬─────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼────────┤
│ Req/Sec │ 1879 │ 1879 │ 2433 │ 2901 │ 2404.34 │ 417.73 │ 1879 │
├───────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼────────┤
│ Bytes/Sec │ 138 MB │ 138 MB │ 179 MB │ 213 MB │ 176 MB │ 30.6 MB │ 138 MB │
└───────────┴────────┴────────┴────────┴────────┴─────────┴─────────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
7k requests in 3.02s, 529 MB read
Please note that these results may vary between different operating systems and computers.
Performance tends to degrade as the complexity of any app increases. Often, we overlook performance until later in the project life cycle (me too).
In the next steps, we will inspect what's happening behind the scenes and attempt to increase the number of requests within those 3 seconds.
Inspecting Flame Graph
We can inspect the flame graph by using the 0x
tool. All we need to do is run the server, replacing the node command with 0x
. Then, we can view the generated flame graph:
We can observe the etag
function which has the following label:
*etag dist/domino-perf/server/server.mjs:26153:18
Top of Stack: 15.1% (1355 of 8964 samples)
On Stack: 15.1% (1355 of 8964 samples)
The 'Top of Stack' metric, at 15.1%, indicates that this function was at the top of the call stack for 15.1% of the time during the profiler's sample recording.
The etag
function, part of the express
package, is executed whenever res.send
is called, as it contains the following in its implementation:
res.send = function send2(body) {
// ...
var etagFn = app2.get("etag fn");
var generateETag = !this.get("ETag") && typeof etagFn === "function";
};
this.get
indicates that it's retrieving something from the app settings. Therefore, we can disable the etag
by setting etag
to false
before starting the server:
const commonEngine = new CommonEngine();
server.set('etag', false);
server.set('view engine', 'html');
server.set('views', browserDistFolder);
Simply by disabling the etag
setting, I reran the server and the autocannon
tool:
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬───────┬───────┬───────┬───────┬──────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼────────┤
│ Latency │ 19 ms │ 24 ms │ 45 ms │ 57 ms │ 27.02 ms │ 12.09 ms │ 151 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴────────┘
┌───────────┬────────┬────────┬────────┬────────┬────────┬─────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Req/Sec │ 2583 │ 2583 │ 4017 │ 4291 │ 3630 │ 748.69 │ 2583 │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Bytes/Sec │ 189 MB │ 189 MB │ 295 MB │ 315 MB │ 266 MB │ 54.9 MB │ 189 MB │
└───────────┴────────┴────────┴────────┴────────┴────────┴─────────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
11k requests in 3.02s, 798 MB read
For some reason, the number of requests per 3 seconds has increased.
The issue is that disabling the ETag setting is not a proper solution in our case, as it is used to identify the resource version. Express generates weak ETags (prefixed with W/
) for resources by default (refer to the defaultConfiguration
function when the application is created). If the ETag is not changed for the server-side rendered content, the Express server would only send a 304 status code and not the content through the network.
Now, let's examine the etag
implementation used by Express:
// node_modules/etag/index.js
function entitytag (entity) {
if (entity.length === 0) {
// fast-path empty
return '"0-2jmj7l5rSw0yVb/vlWAYkK/YBwk"'
}
// compute hash of entity
var hash = crypto
.createHash('sha1')
.update(entity, 'utf8')
.digest('base64')
.substring(0, 27)
// compute length of entity
var len = typeof entity === 'string'
? Buffer.byteLength(entity, 'utf8')
: entity.length
return '"' + len.toString(16) + '-' + hash + '"'
}
So, it essentially generates a SHA-1 hash, retrieves the computed hash in base64 format, and returns only the first 27 characters from the computed hash. While the SHA-1 algorithm might seem like overkill for generating ETags, and since security is not a concern in this case, we could have opted for the MD5 algorithm. However, after checking the MD5 hash, it doesn't appear to be faster. I then explored the CRC-32 algorithm, commonly used for calculating checksums, but it can also serve to generate ETags for resources. The output value from the CRC-32 is a 32-bit unsigned integer. The zlib library contains the CRC-32 function implementation, allowing us to create a C++ addon that generates the 32-bit checksum:
#include <js_native_api.h>
#include <node_api.h>
#include <zlib.h>
#include <cstring>
#include <string>
const char* fast_path = "W/\"0\"";
napi_value etag(napi_env env, napi_callback_info info) {
size_t argc = 1;
napi_value argv[1];
napi_get_cb_info(env, info, &argc, argv, NULL, NULL);
size_t buffer_length;
void* buffer;
napi_get_buffer_info(env, argv[0], &buffer, &buffer_length);
napi_value result;
if (buffer_length == 0) {
// Fast-path for empty buffer.
napi_create_string_utf8(env, fast_path, 5, &result);
return result;
}
char* buffer_content = new char[buffer_length + 1]();
// Faster than `memcpy` from `string.h`.
std::memcpy(buffer_content, buffer, buffer_length);
buffer_content[buffer_length] = '\n';
// Calculate CRC-32 over the entire content.
uint32_t crc = crc32(0L, Z_NULL, 0);
crc = crc32(crc, reinterpret_cast<Bytef*>(buffer_content), buffer_length);
delete[] buffer_content;
// Create the final string.
// `std::to_string` is faster than `std::format` and `std::stringstream`.
std::string final =
"W/\"" + std::to_string(buffer_length) + "-" + std::to_string(crc) + "\"";
napi_create_string_utf8(env, final.c_str(), final.size(), &result);
return result;
}
napi_value init(napi_env env, napi_value exports) {
napi_value etag_fn;
napi_create_function(env, NULL, NAPI_AUTO_LENGTH, etag, NULL, &etag_fn);
napi_set_named_property(env, exports, "etag", etag_fn);
return exports;
}
NAPI_MODULE(NODE_GYP_MODULE_NAME, init)
This is a bit mixed C and C++ code.
I had previously examined OpenSSL's implementations of SHA1
and MD5
:
#include <openssl/sha.h>
unsigned char hash[SHA_DIGEST_LENGTH];
SHA1(reinterpret_cast<const unsigned char*>(etag_string),
buffer_length,
hash);
However, they're slower compared to CRC-32.
To set the custom ETag function, it's necessary to override the etag
setting:
const { etag } = require(`${process.cwd()}/build/Release/etag.node`);
const etagFn = server.get('etag fn');
server.set('etag', (body, encoding) => {
// Faster than `Buffer.isBuffer`.
if (body?.buffer) {
return etag(body);
} else {
return etagFn(body, encoding);
}
});
Now, let's run the server with 0x
and execute the autocannon
tool again:
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬───────┬───────┬───────┬────────┬──────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼───────┼───────┼───────┼────────┼──────────┼──────────┼────────┤
│ Latency │ 23 ms │ 27 ms │ 47 ms │ 120 ms │ 30.53 ms │ 12.62 ms │ 159 ms │
└─────────┴───────┴───────┴───────┴────────┴──────────┴──────────┴────────┘
┌───────────┬────────┬────────┬────────┬────────┬─────────┬────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼─────────┼────────┼────────┤
│ Req/Sec │ 2299 │ 2299 │ 3501 │ 3859 │ 3219.67 │ 667.22 │ 2298 │
├───────────┼────────┼────────┼────────┼────────┼─────────┼────────┼────────┤
│ Bytes/Sec │ 169 MB │ 169 MB │ 257 MB │ 283 MB │ 236 MB │ 49 MB │ 169 MB │
└───────────┴────────┴────────┴────────┴────────┴─────────┴────────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
10k requests in 3.02s, 708 MB read
Let's look at the flamegraph:
The etag
function is no longer at the top of the stack. Now, I will generate a flame graph using the perf
tool that comes with Linux. This should provide additional information, especially about internal and system calls:
We can observe node:internal/fs/promises
on the stack. The line 37052
has the retrieveSSGPage
method, which is a part of the CommonEngine
. It has the following:
const pagePath = join(publicPath, pathname, "index.html");
if (this.pageIsSSG.get(pagePath)) {
return fs.promises.readFile(pagePath, "utf-8");
}
The pagePath
is an absolute path to the browser/index.html
file. The content of the page is not cached, and the file is read for every request. Let's update the CommonEngine
constructor by adding the cache property and then placing the content into the cache inside the retrieveSSGPage
method. I will make this update directly within the dist/{app}/server/server.mjs
file.
var CommonEngine = class {
constructor(options) {
this.options = options;
this.inlineCriticalCssProcessor = new InlineCriticalCssProcessor({
minify: false,
});
this.cache = new Map(); // 👈
}
retrieveSSGPage() {
// ...
if (this.pageIsSSG.get(pagePath)) {
if (this.cache.has(pagePath)) {
return this.cache.get(pagePath); // 👈
} else {
const content = yield fs.promises.readFile(pagePath, 'utf-8');
this.cache.set(pagePath, content);
return content;
}
}
}
};
Now, let's run the server and benchmark again:
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬───────┬───────┬───────┬───────┬──────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼────────┤
│ Latency │ 16 ms │ 21 ms │ 41 ms │ 61 ms │ 23.96 ms │ 11.37 ms │ 136 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴────────┘
┌───────────┬────────┬────────┬────────┬────────┬─────────┬─────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼────────┤
│ Req/Sec │ 2771 │ 2771 │ 4491 │ 4975 │ 4078.34 │ 945.31 │ 2770 │
├───────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼────────┤
│ Bytes/Sec │ 203 MB │ 203 MB │ 330 MB │ 365 MB │ 299 MB │ 69.4 MB │ 203 MB │
└───────────┴────────┴────────┴────────┴────────┴─────────┴─────────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
12k requests in 3.02s, 897 MB read
Now, on the perf
flame graph, we can observe that regular expressions also consume a considerable amount of time:
We can patch RegExp.prototype
functions to track all of the invocations per request:
server.listen(port, () => {
console.log(`Node Express server listening on http://localhost:${port}`);
['test', 'exec'].forEach(fn => {
const proto = RegExp.prototype as any;
const originalFn = proto[fn];
proto[fn] = function (...args: any[]) {
console.error('fn is called: ', fn);
console.trace();
return originalFn.apply(this, args);
};
});
});
If we build, run the server, and hit the index page once, we can observe all of the prototype logs. The exec
function is called 11 times, and test
is called 7 times per request. Since regular expressions deal with strings, we can cache results for strings that have already matched the specified pattern. One of the packages that actively uses regular expressions is content-type
and has the following patterns:
// node_modules/content-type/index.js
var PARAM_REGEXP =
/; *([!#$%&'*+.^_`|~0-9A-Za-z-]+) *= *("(?:[\u000b\u0020\u0021\u0023-\u005b\u005d-\u007e\u0080-\u00ff]|\\[\u000b\u0020-\u00ff])*"|[!#$%&'*+.^_`|~0-9A-Za-z-]+) */g;
var TEXT_REGEXP = /^[\u000b\u0020-\u007e\u0080-\u00ff]+$/;
var TOKEN_REGEXP = /^[!#$%&'*+.^_`|~0-9A-Za-z-]+$/;
var QESC_REGEXP = /\\([\u000b\u0020-\u00ff])/g;
var QUOTE_REGEXP = /([\\"])/g;
var TYPE_REGEXP = /^[!#$%&'*+.^_`|~0-9A-Za-z-]+\/[!#$%&'*+.^_`|~0-9A-Za-z-]+$/;
While these patterns may seem complicated and extensive, the V8 engine boasts an exceptionally fast regular expressions engine compared to the C++ standard regular expressions library (std::regex
), RE-flex
, and re2
.
The content-type
package exports the parse
and format
functions, which run multiple exec
and test
functions also within the while
loop:
function format(obj) {
// ...
for (var i = 0; i < params.length; i++) {
param = params[i];
if (!TOKEN_REGEXP.test(param)) {
throw new TypeError('invalid parameter name');
}
string += '; ' + param + '=' + qstring(parameters[param]);
}
// ...
}
function parse(string) {
// ...
while ((match = PARAM_REGEXP.exec(header))) {
// ...
}
// ...
}
I also checked the regex benchmark at https://github.com/mariomka/regex-benchmark?tab=readme-ov-file#performance, which lists the Nim language at the top, stating it's significantly faster than other implementations. Before making any JavaScript changes, I decided to assess whether it would bring any benefits. The necessary steps include:
- writing Nim code, which needs to be compiled into a static library
- developing a C++ addon to act as a caller proxy to the Nim function, as we'll link against the static library
- writing the JavaScript code to call the addon
I will place the Nim code into the nim
folder:
// nim/matches.nim
import regex
const
TYPE_REGEXP = re2"^[!#$%&'*+.^_`|~0-9A-Za-z-]+\/[!#$%&'*+.^_`|~0-9A-Za-z-]+$"
proc matchesTypeRegexp(source: cstring): bool {.cdecl, exportc.} =
return match($source, TYPE_REGEXP)
We now need to compile it into a static library by running the following command:
$ nim c --app:staticlib --nimblePath:. --passC:-fPIC -d:release matches.nim
The above command will generate the libmatches.a
file. Now, we need to write the C++ code that will call the matchesTypeRegexp
function with the string provided by JavaScript:
#include <js_native_api.h>
#include <node_api.h>
extern "C" {
bool matchesTypeRegexp(char*);
}
napi_value matches_type_regexp(napi_env env, napi_callback_info info) {
size_t argc = 1;
napi_value argv[1];
napi_get_cb_info(env, info, &argc, argv, NULL, NULL);
size_t source_length;
napi_get_value_string_utf8(env, argv[0], NULL, 0, &source_length);
char* source = new char[source_length + 1]();
napi_get_value_string_utf8(env, argv[0], source, source_length + 1, NULL);
napi_value result;
napi_get_boolean(env, matchesTypeRegexp(source), &result);
delete[] source;
return result;
}
napi_value init(napi_env env, napi_value exports) {
napi_value matches_type_regexp_fn;
napi_create_function(env, NULL, NAPI_AUTO_LENGTH, matches_type_regexp, NULL,
&matches_type_regexp_fn);
napi_set_named_property(env, exports, "matchesTypeRegexp",
matches_type_regexp_fn);
return exports;
}
NAPI_MODULE(NODE_GYP_MODULE_NAME, init)
The binding.gyp
file should be also updated to include the .a
file:
{
'targets': [
{
'target_name': 'content_type',
'defines': ['NDEBUG', 'NAPI_DISABLE_CPP_EXCEPTIONS'],
'sources': ['native/content-type.cc'],
'libraries': ['<(module_root_dir)/nim/libmatches.a'],
'cflags_cc': ['-std=c++17', '-fexceptions', '-O3', '-Wall', '-Wextra']
}
]
}
Now, after running node-gyp build
and obtaining the dynamically linked library within the build
folder, we can test whether the addon is faster or not:
// benchmark.js
const { matchesTypeRegexp } = require('./build/Release/content_type.node');
const { performance } = require('perf_hooks');
let t0, t1;
const TYPE_REGEXP =
/^[!#$%&'*+.^_`|~0-9A-Za-z-]+\/[!#$%&'*+.^_`|~0-9A-Za-z-]+$/;
t0 = performance.now();
for (let i = 0; i < 1e6; i++) {
let v = 'text/html'.match(TYPE_REGEXP);
}
t1 = performance.now();
console.log(`JS: ${t1 - t0}ms`);
t0 = performance.now();
for (let i = 0; i < 1e6; i++) {
let v = matchesTypeRegexp('text/html');
}
t1 = performance.now();
console.log(`Nim: ${t1 - t0}ms`);
If we execute the above file we'll get the following results:
$ node benchmark.js
JS: 25.362835999578238ms
Nim: 886.7281489986926ms
The benchmark indicates that the JS
implementation is significantly faster compared to the Nim implementation. However, it's worth noting that internally, the JS implementation deals with more code in this case when calling the Nim function. Every time we call the C++ addon function, V8 has to invoke its built-in functions, which handle calling external DLL functions. If we debug the Node process running the file with a single call to matchesTypeRegexp('text/html')
, we'll observe the following:
#0 0x00007ffff7e34442 in matches_type_regexp(napi_env__*, napi_callback_info__*) ()
#1 0x0000000000b10d7d in v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) ()
#2 0x0000000000db0230 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#3 0x0000000000db176f in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) ()
#4 0x00000000016ef579 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit ()
While this also requires being a regex expert to improve those regular expressions, it may be easier and sufficient to cache results and avoid executing regular expressions against the same arguments. We can create a local cache and refrain from running test
and exec
against the argument if it has already been formatted or parsed:
const formatCache = new Map();
function format(obj) {
if (!obj || typeof obj !== 'object') {
throw new TypeError('argument obj is required');
}
const cacheKey = JSON.stringify(obj);
if (formatCache.has(cacheKey)) {
return formatCache.get(cacheKey);
}
// ...
formatCache.set(cacheKey, string);
return string;
}
const parseCache = new Map();
function parse(string) {
// ...
var header = typeof string === 'object' ? getcontenttype(string) : string;
if (typeof header !== 'string') {
throw new TypeError('argument string is required to be a string');
}
if (parseCache.has(header)) {
return JSON.parse(parseCache.get(header));
}
// ...
parseCache.set(header, JSON.stringify(obj));
return obj;
}
After these changes I ran the build and the benchmark again and got the following results:
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬───────┬───────┬───────┬───────┬──────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼────────┤
│ Latency │ 12 ms │ 19 ms │ 42 ms │ 50 ms │ 21.61 ms │ 11.93 ms │ 144 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴────────┘
┌───────────┬────────┬────────┬────────┬────────┬────────┬─────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Req/Sec │ 2885 │ 2885 │ 4951 │ 5723 │ 4519 │ 1197.63 │ 2884 │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Bytes/Sec │ 212 MB │ 212 MB │ 363 MB │ 420 MB │ 331 MB │ 87.8 MB │ 212 MB │
└───────────┴────────┴────────┴────────┴────────┴────────┴─────────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
14k requests in 3.02s, 994 MB read
The framework itself may also slow things down because using the createServer
in conjunction with engine.render
would allow having more requests per 3 seconds:
const server = createServer(async (req, res) => {
const html = await commonEngine.render({
bootstrap,
documentFilePath: indexHtml,
url: `http://${req.headers.host}${req.url}`,
publicPath: browserDistFolder,
providers: [{ provide: APP_BASE_HREF, useValue: '' }],
});
res.end(html);
});
Results after running autocannon
look as follows:
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬──────┬───────┬───────┬───────┬──────────┬─────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼───────┼───────┼───────┼──────────┼─────────┼────────┤
│ Latency │ 9 ms │ 11 ms │ 24 ms │ 41 ms │ 12.93 ms │ 8.83 ms │ 137 ms │
└─────────┴──────┴───────┴───────┴───────┴──────────┴─────────┴────────┘
┌───────────┬────────┬────────┬────────┬────────┬─────────┬─────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼────────┤
│ Req/Sec │ 5011 │ 5011 │ 8503 │ 8815 │ 7440.67 │ 1723.46 │ 5008 │
├───────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼────────┤
│ Bytes/Sec │ 367 MB │ 367 MB │ 623 MB │ 646 MB │ 545 MB │ 126 MB │ 367 MB │
└───────────┴────────┴────────┴────────┴────────┴─────────┴─────────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
22k requests in 3.02s, 1.64 GB read
Since Angular SSR is framework-agnostic and no longer directly relies on the Express engine (as it did with @nguniversal/express-engine
), we're flexible to use any framework. We only have to run the engine.render
when necessary to return HTML for specific route. Initially, I installed and benchmarked Fastify, but it didn't yield to significant benefits. It handled a comparable number of requests to Express when ETag setting was disabled (since Fastify doesn't set the ETag header by default). Afterwards, I installed Koa and its features such as koa-static
to serve static files. Koa outperformed both Fastify and Express in terms of speed. Here's the code I used for Koa, it resembles the Express code we had previously:
const commonEngine = new CommonEngine();
const app = new Koa();
const router = new Router();
router.get(/.*/, async ctx => {
const { protocol, headers, originalUrl } = ctx;
const html = await commonEngine.render({
bootstrap,
documentFilePath: indexHtml,
url: `${protocol}://${headers.host}${originalUrl}`,
publicPath: browserDistFolder,
providers: [{ provide: APP_BASE_HREF, useValue: '' }],
});
ctx.body = html;
});
app.use(serveStatic(browserDistFolder));
app.use(router.routes());
These are results after running autocannon
:
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬───────┬───────┬───────┬───────┬──────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼────────┤
│ Latency │ 12 ms │ 15 ms │ 39 ms │ 52 ms │ 18.68 ms │ 12.23 ms │ 163 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴────────┘
┌───────────┬────────┬────────┬────────┬────────┬─────────┬─────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼────────┤
│ Req/Sec │ 3145 │ 3145 │ 5935 │ 6547 │ 5208.34 │ 1480.24 │ 3145 │
├───────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼────────┤
│ Bytes/Sec │ 231 MB │ 231 MB │ 435 MB │ 480 MB │ 382 MB │ 108 MB │ 230 MB │
└───────────┴────────┴────────┴────────┴────────┴─────────┴─────────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
16k requests in 3.02s, 1.15 GB read
So, this is only slightly faster (9%) than what we achieved previously with Express manipulations. However, we want to maintain the previous behavior with generating ETag for the content to be able to send a 304 response when the content has not changed. We need to install koa-conditional-get
and koa-etag
since they work in conjunction:
const conditional = require('koa-conditional-get');
const etag = require('koa-etag');
app.use(conditional());
app.use(etag());
app.use(serveStatic(browserDistFolder));
app.use(router.routes());
Now, let's build and run autocannon
again:
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬───────┬───────┬───────┬───────┬──────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼────────┤
│ Latency │ 13 ms │ 19 ms │ 48 ms │ 67 ms │ 21.81 ms │ 14.25 ms │ 164 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴────────┘
┌───────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Req/Sec │ 2601 │ 2601 │ 5103 │ 5699 │ 4467 │ 1341.71 │ 2600 │
├───────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Bytes/Sec │ 2.34 MB │ 2.34 MB │ 4.59 MB │ 5.12 MB │ 4.02 MB │ 1.21 MB │ 2.34 MB │
└───────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
13k requests in 3.02s, 12 MB read
Okay, so the ETag calculation slowed things down by 18%. The koe-etag
also uses the etag
package, but it does the following:
// node_modules/koa-etag/index.js
const calculate = require('etag')
module.exports = function etag (options) {
return async function etag (ctx, next) {
await next()
const entity = await getResponseEntity(ctx)
setEtag(ctx, entity, options)
}
}
async function getResponseEntity (ctx) {
// no body
const body = ctx.body
if (!body || ctx.response.get('etag')) return
// type
const status = ctx.status / 100 | 0
// 2xx
if (status !== 2) return
if (body instanceof Stream) {
if (!body.path) return
return await stat(body.path)
} else if ((typeof body === 'string') || Buffer.isBuffer(body)) {
return body
} else {
return JSON.stringify(body)
}
}
function setEtag (ctx, entity, options) {
if (!entity) return
ctx.response.etag = calculate(entity, options)
}
We may notice that it also checks whether the ctx.body
is a stream. For pre-rendered files, it's actually a stream because koa-static
uses koa-send
, which can handle index files automatically when visiting the root location. For example, if Angular pre-renders the /home
route, it places the output content within the browser/home/index.html
. This file is then picked up by koa-send
before it reaches engine.render
. koa-send
sets ctx.body
to fs.createReadStream('path-to-index.html')
each time the /
URL is hit. Subsequently, koa-etag
checks it's a stream (body instanceof Stream
) and runs fs.stat(body)
to retrieve the stats object.
We can update the koa-send
code by implementing a small caching mechanism instead of executing fs.createReadStream
each time the index.html
need to be read:
// This could also be an LRU cache.
const cache = new Map()
async function send(ctx, path, opts = {}) {
// ...
if (!cache.has(path)) {
const buffer = await fs.promises.readFile(path)
cache.set(path, buffer)
}
ctx.body = cache.get(path)
}
Even though this would cache every file that is attempted to be read, we could extend the opts
to specify a list of files that are never updated and should be cached forever.
We can then update koa-etag
to use our C++ addon in cases where the provided argument is a buffer:
const calculate = require('etag')
const { etag } = require(`${process.cwd()}/build/Release/etag.node`)
function setEtag (ctx, entity, options) {
if (!entity) return
// Faster than `Buffer.isBuffer`.
if (entity.buffer) {
ctx.response.etag = etag(entity)
} else {
ctx.response.etag = calculate(entity, options)
}
}
Let's build and run the benchmark again:
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬───────┬───────┬───────┬───────┬──────────┬──────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼────────┤
│ Latency │ 11 ms │ 12 ms │ 30 ms │ 41 ms │ 14.43 ms │ 10.17 ms │ 154 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴────────┘
┌───────────┬────────┬────────┬────────┬────────┬────────┬─────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Req/Sec │ 4191 │ 4191 │ 7743 │ 8255 │ 6728 │ 1806.68 │ 4189 │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Bytes/Sec │ 307 MB │ 307 MB │ 568 MB │ 606 MB │ 494 MB │ 133 MB │ 307 MB │
└───────────┴────────┴────────┴────────┴────────┴────────┴─────────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
20k requests in 3.02s, 1.48 GB read
Let's revisit the flamegraph and observe the frequent occurrence of the __async
function at the top of the stack:
The __async
function replaces async/await code due to zone.js's inability to track async/await. While zone.js tracks promises by overriding the promise constructor (Promise -> ZoneAwarePromise
), this approach falls short when dealing with async/await. We're not going to delve deeply into zone.js complexities, it's worth noting that downleveling is only necessary for the Angular code. Third-party packages such as koa-etag
doesn't require downleveling. However, this is done by ESBuild for every package. Downleveling transforms this:
module.exports = function etag (options) {
return async function etag (ctx, next) {
await next()
const entity = await getResponseEntity(ctx)
setEtag(ctx, entity, options)
}
}
To this:
module.exports = function etag(options) {
return function etag(ctx, next) {
return __async(this, null, function* () {
yield next();
const entity = yield getResponseEntity(ctx);
setEtag(ctx, entity, options);
});
};
};
For curiosity's sake, I moved Koa packages into a separate file called koa.ts
(at the same level as server.ts
) and re-exported the packages we're using in our example:
import Koa from 'koa';
import Router from '@koa/router';
import serveStatic from 'koa-static';
const conditional = require('koa-conditional-get');
const etag = require('koa-etag');
export { Koa, Router, serveStatic, conditional, etag };
And then I used await import('./koa')
within the server.ts
inside the app
function (I had to mark it as async
) so that ESBuild would create a separate chunk specifically for Koa third-party packages. It's then easier to modify the output file directly when the async/await is replaced with __async
. After running the build, I went to server/chunk-HASH.mjs
that contained Koa packages and replaced all downleveling transformations back from __async
to async/await. After that, I reran the benchmark:
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬──────┬──────┬───────┬───────┬──────────┬────────┬────────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼───────┼──────────┼────────┼────────┤
│ Latency │ 8 ms │ 8 ms │ 23 ms │ 31 ms │ 10.06 ms │ 7.4 ms │ 125 ms │
└─────────┴──────┴──────┴───────┴───────┴──────────┴────────┴────────┘
┌───────────┬────────┬────────┬────────┬────────┬────────┬─────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Req/Sec │ 5367 │ 5367 │ 11215 │ 11607 │ 9394 │ 2852.72 │ 5365 │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Bytes/Sec │ 394 MB │ 394 MB │ 823 MB │ 851 MB │ 689 MB │ 209 MB │ 394 MB │
└───────────┴────────┴────────┴────────┴────────┴────────┴─────────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
28k requests in 3.02s, 2.07 GB read
Now, let's also try running this benchmark with Bun. Bun is not friendly yet with N-API for the time being this article is written (January, 2023). As thus, we have to revert koa-etag
changes back and use the etag
package instead of our C++ addon. We also need to remove zone.js
from angular.json
polyfills because Bun isn't friendly with its patches, no matter what, it just doesn't return any response (and says the response is empty) when zone.js
is used. We should also add ɵprovideZonelessChangeDetection()
to our app.config.ts
, this function is exported from @angular/core
since 17.1.0-rc.0
(please note that it may have already been exported without a theta symbol at the time you're reading this):
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬───────┬───────┬───────┬───────┬──────────┬─────────┬───────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼───────┼───────┼───────┼───────┼──────────┼─────────┼───────┤
│ Latency │ 16 ms │ 20 ms │ 34 ms │ 41 ms │ 20.55 ms │ 6.76 ms │ 95 ms │
└─────────┴───────┴───────┴───────┴───────┴──────────┴─────────┴───────┘
┌───────────┬────────┬────────┬────────┬────────┬────────┬─────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Req/Sec │ 4379 │ 4379 │ 4811 │ 5051 │ 4746 │ 278.06 │ 4377 │
├───────────┼────────┼────────┼────────┼────────┼────────┼─────────┼────────┤
│ Bytes/Sec │ 321 MB │ 321 MB │ 353 MB │ 370 MB │ 348 MB │ 20.4 MB │ 321 MB │
└───────────┴────────┴────────┴────────┴────────┴────────┴─────────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
14k requests in 3.02s, 1.04 GB read
I have also once heard about the Elysia framework (https://elysiajs.com). It's developed to run with Bun. Since we needed to update the ESBuild configuration with the external
property to exclude elysia
and @elysiajs/static
from the compilation, I opted to directly update the dist/**/server.mjs
file:
Running the benchmark:
$ autocannon -c 100 -d 3 http://localhost:4200
Running 3s test @ http://localhost:4200
100 connections
┌─────────┬──────┬──────┬───────┬───────┬─────────┬─────────┬───────┐
│ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │
├─────────┼──────┼──────┼───────┼───────┼─────────┼─────────┼───────┤
│ Latency │ 9 ms │ 9 ms │ 13 ms │ 20 ms │ 9.97 ms │ 3.87 ms │ 81 ms │
└─────────┴──────┴──────┴───────┴───────┴─────────┴─────────┴───────┘
┌───────────┬────────┬────────┬────────┬────────┬─────────┬─────────┬────────┐
│ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │
├───────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼────────┤
│ Req/Sec │ 8983 │ 8983 │ 9783 │ 9831 │ 9529.34 │ 388.94 │ 8981 │
├───────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼────────┤
│ Bytes/Sec │ 678 MB │ 678 MB │ 738 MB │ 742 MB │ 719 MB │ 29.3 MB │ 678 MB │
└───────────┴────────┴────────┴────────┴────────┴─────────┴─────────┴────────┘
Req/Bytes counts sampled once per second.
# of samples: 3
29k requests in 3.02s, 2.16 GB read
This is faster compared to our previous benchmarks, considering all the changes we made to external packages.
Conclusion
The Express framework has become a "protagonist" in the Node.js ecosystem for many years. Node.js itself, as a runtime, is a reliable and robust technology, especially when compared to emerging runtimes like Bun. While Node.js is linked with the V8 engine and Bun is linked with the JSC engine, Bun may appear to be a less consistent runtime. This is because external libraries might directly utilize the V8 API (rather than an N-API), tying them exclusively to the Node.js execution environment. Despite this, I see great potential for Bun in the future. Whether to use Bun or not ultimately depends on your preference.
The code be found at: https://github.com/arturovt/ssr-perf.
Top comments (0)