The Monkey Dev

Posted on Feb 25, 2023

The evolution of Javascript Strings over the years

#javascript #typescript #programming #webdev

In Javascript, the String object is used to represent and manipulate a sequence of characters. Strings are useful for holding data that can be represented in text form.

Javascript Strings implementation has evolved through time.

So, in this article, we are going to cover the 6 major changes that occurred for Javascript strings.

1. Unicode support

Prior to ES6 (ECMAScript 2015), JavaScript had limited support for Unicode characters.

So, one of the most significant additions to Unicode support in ES6 was the introduction of Unicode code point escapes.

This feature allows developers to represent Unicode characters using their code point values, rather than relying on character literals.

For example, prior to ES6, if you wanted to represent the character "💩" (a popular Unicode emoji), you would need to use a UTF-16 surrogate pair like this: "\uD83D\uDCA9". With code point escapes in ES6, you can represent the same character using its code point value, like this: "\u{1F4A9}".

Here's an example of how code point escapes can be used in practice:

// Using UTF-16 surrogate pairs:
const poop1 = "\uD83D\uDCA9";

// Using Unicode code point escapes:
const poop2 = "\u{1F4A9}";

console.log(poop1 === poop2); // true

This code creates two string variables, poop1 and poop2, that both represent the "💩" character.

The first variable uses UTF-16 surrogate pairs to encode the character, while the second variable uses a code point escape.

The console.log() statement then compares the two variables, which returns true because they both represent the same character.

Using code point escapes allows developers to work with a wider range of Unicode characters, including those that cannot be represented using UTF-16 surrogate pairs. This makes it easier to work with text in different languages and writing systems, and can improve the accuracy and reliability of string processing.

2. Template literals

Template literals were introduced in ES6 as a new way to create strings in JavaScript. They allow developers to embed expressions and variables directly into string literals using interpolation, instead of using concatenation or string manipulation methods.

Here's an example of how template literals can be used in practice:

const name = "Alice";
const age = 27;
const message = `Hello, my name is ${name} and I am ${age} years old.`;

console.log(message);

In this code, we define two variables name and age that contain a person's name and age. We then create a message variable using a template literal that includes the name and age variables using interpolation. The resulting string is "Hello, my name is Alice and I am 27 years old.".

Template literals can also be used to create multi-line strings, which is much simpler than using escape characters or concatenation with regular string literals. Here's an example:

const poem = `
  I wandered lonely as a cloud
  That floats on high o'er vales and hills,
  When all at once I saw a crowd,
  A host, of golden daffodils;
  Beside the lake, beneath the trees,
  Fluttering and dancing in the breeze.
`;

console.log(poem);

In this code, we define a poem variable using a multi-line template literal. The resulting string includes line breaks and whitespace that are preserved in the output. The console.log() statement then displays the poem as formatted text.

Overall, template literals make it easier to create and manipulate strings in JavaScript, especially when working with dynamic content and multi-line text. They offer a more concise and readable syntax compared to traditional string manipulation methods.

3. String methods

JavaScript has always had a rich set of string manipulation methods, but new methods have been added in recent versions of the language.

For example, ES2019 added two new methods to the String.prototype object: trimStart() and trimEnd(). These methods are similar to the existing trim() method, but they only remove whitespace characters from the beginning or end of a string, respectively.

Here's an example of how these methods can be used in practice:

const greeting = "   Hello, world!   ";

console.log(greeting.trim()); // "Hello, world!"
console.log(greeting.trimStart()); // "Hello, world!   "
console.log(greeting.trimEnd()); // "   Hello, world!"

In this code, we define a greeting variable that contains a string with leading and trailing whitespace. We then call the trim(), trimStart(), and trimEnd() methods on the greeting variable, respectively.

The trim() method removes all leading and trailing whitespace characters from the string, resulting in the trimmed string "Hello, world!".

The trimStart() method removes only the leading whitespace characters from the string, resulting in the trimmed string "Hello, world! ".

The trimEnd() method removes only the trailing whitespace characters from the string, resulting in the trimmed string " Hello, world!".

4. Unicode normalization

ES6 also introduced the String.prototype.normalize() method, which allows developers to normalize Unicode strings, reducing the risk of errors due to different Unicode character encodings.

The two main normalization forms are NFC (Normalization Form Canonical Composition) and NFD (Normalization Form Canonical Decomposition), which are defined by the Unicode standard.

Here's an example of how to use the normalize() method in ES6 to normalize a string to its NFC form:

const str = "\u1E9B\u0323"; // A Latin small letter sharp s with dot below

console.log(str); // "ẛ̣"
console.log(str.normalize("NFC")); // "ṩ"

In this code, we define a str variable that contains a string with a Latin small letter sharp s with dot below, which can be represented in two different ways in Unicode. We then call the normalize() method on the str variable with the argument "NFC" to normalize the string to its NFC form. The resulting normalized string is "ṩ", which is the canonical form of the character according to the Unicode standard.

Similarly, we can use the normalize() method with the argument "NFD" to normalize a string to its NFD form:

const str = "\u1E9B\u0323"; // A Latin small letter sharp s with dot below

console.log(str); // "ẛ̣"
console.log(str.normalize("NFD")); // "ṩ"

In this example, the normalize() method is called with the argument "NFD", resulting in a normalized string of "ṩ".

Unicode normalization can be important for accurate text processing, sorting, and searching, especially when working with multilingual text or text that contains combining characters or other diacritics.

5. Raw strings

In ES6, the language also added the ability to create raw strings using template literals.

A raw string is a string literal that allows backslashes to be interpreted as literal backslashes, instead of escape characters. To create a raw string, we can use the String.raw() method with a template literal.

Here's an example of how to use the String.raw() method to create a raw string:

const path = String.raw`C:\Users\Username\Documents\file.txt`;
console.log(path); // "C:\\Users\\Username\\Documents\\file.txt"

In this code, we define a path variable that contains a Windows file path as a raw string using the String.raw() method.
The resulting string is "C:\\Users\\Username\\Documents\\file.txt", which contains literal backslashes instead of escape characters.

Raw strings can be particularly useful when working with regular expressions, where backslashes are commonly used as escape characters. They can also be useful when working with file paths or other strings that contain a large number of backslashes.

6. Performance improvements

Finally, the latest versions of JavaScript have seen significant performance improvements, including faster string operations.

Indeed, JavaScript engine developers are constantly improving the performance of JavaScript strings with each new release of the language.

Here are a few examples of performance improvements that have been made to JavaScript strings in recent versions:

Faster String concatenation: In versions prior to ES2015, concatenating strings using the + operator was inefficient due to the creation of temporary string objects.

However, in later versions of JavaScript, including ES2015 and later, concatenating strings using template literals or the String.prototype.concat() method is much faster and more efficient.

const firstName = "John";
const lastName = "Doe";
const fullName = `${firstName} ${lastName}`; // Using template literals is faster than using the '+' operator

console.log(fullName); // "John Doe"

Faster String indexing: In earlier versions of JavaScript, accessing individual characters in a string using bracket notation (str[index]) was relatively slow. However, in recent versions of JavaScript, including ES2015 and later, indexing into strings has been optimized to be much faster, making it a viable option for high-performance string processing.
Improved String matching: In older versions of JavaScript, regular expressions used for string matching could be slow and inefficient. However, in later versions of JavaScript, including ES2015 and later, regular expressions have been optimized to be much faster, making string matching a faster and more efficient operation.

const str = "The quick brown fox jumps over the lazy dog";
const regex = /quick/;

console.log(regex.test(str)); // Regular expression matching is faster in the latest versions of JavaScript

These are just a few examples of the performance improvements made to JavaScript strings in recent versions of the language. As JavaScript continues to evolve, developers can expect even more optimizations and improvements to string processing performance.

Conclusion

Overall, the evolution of JavaScript's string handling capabilities has made it easier for developers to work with strings in a wide range of contexts, including Unicode support, interpolation, and manipulation. The addition of raw strings and performance improvements have also made JavaScript code more robust and efficient.