I love the replaceAll string API in JavaScript, as it makes replacing a string far more intuitive than the "good old" global regular expression. This week I had to replace strings with the results of async
calls. Yeah, that is not supported by any API in standard JavaScript.
The idea
So, let's create a replaceAllAsync
that looks like this:
export async function replaceAllAsync(
input: string,
regex: RegExp,
replacement: (match: RegExpMatchArray) => Promise<string>,
mode: PromiseExecutionMode = PromiseExecutionMode.All
): Promise<string> {
// implementation
}
export enum PromiseExecutionMode {
// Uses Promise.All -- which is the fastest, but may overwhelm
All,
// Uses an await on each replacement, making it synchronous
ForEach,
}
The input
is the input string that we'll work on. The regex
is the regular expression that will be used to find matches that need to be replaced. Every match is fed to the replacement
function, which will be awaited. The mode
indicates if we should process all the promises at once or one by one.
In the body of the function, we need to:
- Capture all the matches and feed them to the
replacement
function. - Split the
input
on the regular expression. - Stitch all the components back together into a string.
- Return that string.
Easy, right? Well... it turns out there are a view caveats when it comes to regular expressions.
Caveats of reusing a global regex
Consider the following code:
const str = "Numb3r!1"
const regex = /\d+/g
console.log("Test 1", regex.test(str)) // true
console.log("Test 2", regex.test(str)) // true
console.log("Test 3", regex.test(str)) // false
console.log("Test 4", regex.test(str)) // true
Why? It turns out the RegExp
oject is very stateful when the global flag is set:
JavaScript RegExp objects are stateful when they have the global or sticky flags set (e.g., /foo/g or /foo/y). They store a lastIndex from the previous match. Using this internally, test() can be used to iterate over multiple matches in a string of text (with capture groups).
The lastIndex
is changed per test! So when you reuse a global regex, you might not get what you think. We'll create a new regular expression based on the given one.
Caveats of split with capture groups
Consider the following code:
const str = "I have 12 bananas and 3 apples!"
// no groups:
const regex1 = /\d+/g
console.log("Test 1", str.split(regex1))
// [ 'I have ', ' bananas and ', ' apples!' ]
// capture groups:
const regex2 = /(\d+)/g
console.log("Test 2", str.split(regex2))
// [ 'I have ', '12', ' bananas and ', '3', ' apples!' ]
// non-capture groups:
const regex3 = /(?:\d+)/g
console.log("Test 3", str.split(regex3))
// [ 'I have ', ' bananas and ', ' apples!' ]
Conclusion: if you use capture groups in your regular expression, they will end up in the splitted result.
Caveats of matchAll
So, when you want to capture all the matches of a string you can do a str.matchAll(regex)
. But this only work for global regular expressions.
Implications
Based on the caveats, we have to do a view things:
- It is better not to reuse the given regular expression, but to create a new one from it. This prevents problems with
lastIndex
. - Convert the regular expression to a global one. The end user should already know, but this might make it a bit easier, and it prevents us from having to debug all the time.
- Use 2 regular expressions: one for matching and one for splitting. The regular expression used for splitting should have its capturing groups replaced by non-capturing groups.
The code
Now, let's implement the function using what we know:
export async function replaceAllAsync(
input: string,
regex: RegExp,
replacement: (match: RegExpMatchArray) => Promise<string>,
mode: PromiseExecutionMode = PromiseExecutionMode.All
): Promise<string> {
// replace all implies global, so append if it is missing
const addGlobal = !regex.flags.includes("g")
let flags = regex.flags
if (addGlobal) flags += "g"
// get matches
let matcher = new RegExp(regex.source, flags)
const matches = Array.from(input.matchAll(matcher))
if (matches.length == 0) return input
// construct all replacements
let replacements: Array<string>
if (mode == PromiseExecutionMode.All) {
replacements = await Promise.all(matches.map(match => replacement(match)))
} else if (mode == PromiseExecutionMode.ForEach) {
replacements = new Array<string>()
for (let m of matches) {
let r = await replacement(m)
replacements.push(r)
}
}
// change capturing groups into non-capturing groups for split
// (because capturing groups are added to the parts array
let source = regex.source.replace(/(?<!\\)\((?!\?:)/g, "(?:")
let splitter = new RegExp(source, flags)
const parts = input.split(splitter)
// stitch everything back together
let result = parts[0]
for (let i = 0; i < replacements.length; i++) {
result += replacements[i] + parts[i + 1]
}
return result
}
export enum PromiseExecutionMode {
// Uses Promise.All -- which is the fastest, but may overwhelm
All,
// Uses an await on each replacement, making it synchronous
ForEach,
}
Conclusion
JavaScript is not always as intuitive as I would like it to be. But with regular expressions, you can build some pretty powerful async replacements.
Top comments (0)