Context
It all started two years ago. I was working on a new PWA for a big social network written from scratch that needed a i18n module to handle different languages. The module had to:
- handle interpolation.
- handle PLURAL and SELECT expressions.
- be lightweight (it’s a PWA, must run with limited bandwidth).
- run fast (some users had low-end devices).
And that’s where things got creepy, the only possible library was Google Closure MessageFormat. It was not so fast on low-end devices and weighing heavily on our bundle. So I decided to write my own with performance in mind.
Fast forward to today, the problem is still the same with i18n libraries, so I opened-source 💋Frenchkiss.js a 1kb i18n library 5 to 1000 times faster than others.
Stay with me for a journey on performances optimizations.
👉 Time to speed up your webapp for mobile devices!
🤷 How are i18n modules working?
Under the hood, it sucks, some i18n modules are re-processing the translation on each and every calls, resulting in poor performances.
Here is an example of what can happen inside the translate function (really simplified/naive version of Polyglot.js).
const applyParams = (text, params = {}) => {
// Apply plural if exists
const list = text.split('||||');
const pluralIndex = getPluralIndex(params.count);
const output = list[pluralIndex] || list[0];
// Replace interpolation
return output.replace(/%\{\s*(\w+)\s*\}/g, ($0, $1) => params[$1] || '');
}
applyParams('Hello %{name} !', {
name: 'John'
});
// => Hello John !
In short, on each translations call we split the text, calculate the plural index, create a RegExp and replace all occurrences by the specified given parameter if it exists and returns the result.
It's not that big of a deal, but are you fine doing it multiple time on each render/filter/directive call ?
👉 It's one of the first things we learn when building app in react, angular, vuejs or any other framework : avoid intensive operations inside render methods, filters and directives, it will kill your app !
Some i18n libraries are doing better !
Some others are optimizing things quite a bit, here comes Angular, VueJs-i18n, Google Closure for example.
How are they doing it ? Actually they parse the string only once and cache a list of opcodes to process them on the next calls.
If you aren’t familiar with opcodes, it’s basically a list of instructions to process, in this case just to build a translation. Here's a possible example of opcodes generated from translations :
[{
"type": "text",
"value": "Hello "
}, {
"type": "variable",
"value": "name"
}, {
"type": "text",
"value": " !"
}]
And how we print the result :
const printOpcode = opcodes => opcodes.map(code => (
(code.type === 'text') ? code.value :
(code.type === 'variable') ? (params[code.value] || '') :
(code.type === 'select') ? printOpCode( // recursive
params.data[params[code.value]] || params.data.other
) :
(code.type === 'plural') ? printOpCode( // recursive
params.list[getPluralIndex(params[code.value])] || params.list[0]
) :
'' // TODO not supported ?
)).join('');
With this type of algorithm, more time is allocated for the first call that generate the opcode but we store it and re-use it for faster performance in the next calls :
- It doesn't split the string.
- It doesn't do intensive regex operation.
- It just read the opcode and merge the result together.
Well, that rocks ! But is it possible to go further ?
🤔 How can we speed up things ?
💋Frenchkiss.js is going one step further, it compiles the translation into a native function, this one is so light and pure that the Javascript can easily JIT compile it.
How does it work ?
Quite simple, you can actually build a function from a string doing the following :
const sum = new Function('a', 'b', 'return a + b');
sum(5, 3);
// => 8
For further informations, take a look at Function Constructor (MDN).
The main logic is still to generate an opcode list but instead of using it to generate a translation we use it to generate an optimized function that will returns the translation without further process.
It’s actually possible because of the simple structure of interpolation and SELECT/PLUTAL expressions. It’s basically a returns with some ternary.
const opCodeToFunction = (opcodes) => {
const output = opcodes.map(code => (
(code.type === 'text') ? escapeText(code.value) :
(code.type === 'variable') ? `params[${code.value}]` :
(code.type === 'select') ? ... :
(code.type === 'plural') ? ... :
'' // TODO Something wrong happened (invalid opcode)
));
// Fallback for empty string if no data;
const result = output.join('+') || "";
// Generate the function
return new Function(
'arg0',
'arg1',
`
var params = arg0 || {};
return ${result};
`);
});
⚠️ Note: when building dynamic function, make sure to avoid XSS injection by escaping user input !
Without further ado, let's see the generated functions (note: the real generated functions are a little more complex, but you will get the idea).
Interpolation generated function
// "Hello {name} !"
function generated (params = {}) {
return 'Hello ' + (params.name || '') + ' !';
}
By default, we still fallback to empty string to avoid printing "undefined" as plain text.
Select expression generated function
// "Check my {pet, select, cat{evil cat} dog{good boy} other{{pet}}} :D"
function generated (params = {}) {
return 'Check my ' + (
(params.pet == 'cat') ? 'evil cat' :
(params.pet == 'dog') ? 'good boy' :
(params.pet || '')
) + ' :D';
}
We don't use strict equality to keep supports for numbers.
Plural expression generated function
// "Here {N, plural, =0{nothing} few{few} other{some}} things !"
function generated (params = {}, plural) {
const safePlural = plural ? { N: plural(params.N) } :{};
return 'Here ' + (
(params.N == '0') ? 'nothing' :
(safePlural.N == 'few') ? 'few' :
'some'
) + ' things !';
}
We cache the plural category to avoid re-fetching it in case of multiple checks.
🚀 Conclusion
With generated functions we were able to execute code from 5 to 1000 time faster than others, avoiding doing RegExp, split, map operations in rendering critical path and also avoiding Garbage Collector pauses.
Last best news, it's only 1kB GZIP size !
If you're searching for a i18n javascript library to accelerate your PWA, or your SSR, you should probably give 💋Frenchkiss.js a try !
Top comments (14)
I assume it would be more code on the wire if you server-rendered these functions?
Have your considered rendering them inside a Service Worker and returning the
.js
file?Interesting idea to do it on the server-side/service-worker/build-tool !
Yet I'm not sure there is much gain, the generated function weighs more than the string representation. And it will mean for every update of the library you'll have to re-generate all your functions in case of signature mismatch (and possible fixes on the generated function).
But yeah, it would also be possible with this method to remove the compiler from the code and gain some extra bytes :)
By the way, here is how to extract the function if needed :
With a little more work of the compiler, there are things that could make the function shorter:
Can you clarify to me what the
(a||(a=="0"?0:""))
is doing?Yeah some optimizations can definitively cleanup the generated function. I just wanted to avoid spending much time (and file size) just to prettify the output.
The
var p=a||{};
for example can be removed in case of raw string (that's not actually the case).About the
(a||(a=="0"?0:""))
, it's actually to avoid printing "undefined", "null" in a translation, but keep "0" working :I'm not well-versed in
i18n
, is that the expected behavior ofpet, select, other{{pet}}
? Empty string for entirely missing key?Can I suggest this way of inlining
selects
or will it hit performance?If you do manage to get rid of
new Function
, you could do things like this though (n
being a helper shared by all functions for all locales):Depend of the i18n libs, some are printing undefined, some others are keeping
"{variable}"
in translation.As for me, I think it's a better user experience to have an empty string than a variable name (else the website seems broken).
Should probably report an error in prod(in addition to whatever behavior you suggest)/fail in dev?
Imo the key name is less bad than an empty string, I have been on broken sites/apps where I could complete my flow in part thanks to the variable names in the template. But that's a nitpick.
Ok, I note the suggestion, I'll try to implement it this week end, something like the
onMissingKey
is working :Just saw your comment about SELECT optimization.
I already did some tests with it working using an object mapping to values, but it doesn’t work well with nested expressions.
With nested expressions, you can’t really pre-cache objects and it will execute all the possible branches code before doing the resolution leading to performance issue.
With nested expressions you can have functions for some branches and strings for some. :v
Don't know about performance, but it is usually very compact, and you just need one helper:
Not really so compact if you transpile it to ES5.
Here is an example of a complete solution (if I don't miss a thing ?).
The translation :
The global functions
The generated function demo:
I'll probably do a branch to see if it's a good candidate.
should probably be an object, passed by reference instead of individually through arguments?
It's more about the fact that the browser can store the parsed and even potentially optimized function in cache, not just the string form, when you go through normal pathways.
new Function()
is rather esoteric, and means it will definitely do a parse per instantiation, as well as cause some deoptimization around the instantiation.Furthermore, using the library as is requires
'unsafe-eval'
CSP directive on your entire page, which you otherwise might be able to avoid.