Rutkat

Posted on Jul 16, 2021 • Edited on Jul 22, 2021

How to manipulate text strings like a boss!

Have you wondered how censoring words on the internet occurs? Perhaps you want to know why your username on apps has to conform to specific rules? This is done through string manipulation using code such as javascript among many others.

A string is just a specific name used to label a piece of data which contains text and can consist of alpha-numeric characters mixed with numbers and symbols. Why is the important? Every software application with a presentation layer applies a form of string manipulation and it is the foundation to algorithms. Think how it applies to business ideas as well. Grammarly is an excellent example of a business that is all about string manipulation.

By the end of this article, you will have the ability to understand how text manipulation works. I will explain the ins and outs of string manipulation so you can think how code processes it. Don't worry, we won't discuss the bits and bytes of strings but rather the ways you can manipulate strings and how it is done. Also, I will use the terms string and text interchangeably without getting scientific. Remember that a practical example or analogy will help you memorize the specific ways needed to manipulate strings.

TEXT AND STRINGS

First thing to consider is how to engage text manipulation from a visual perspective. For example, if you're a non-coder or just a human being, you know you can write text on paper, on your smartphone, computer, and even rice. Okay maybe not rice. The writing can occur from left-to-right, top-to-bottom, right-handed, left-handed, etc. Afterwards, you can manipulate what you wrote with an eraser, scratching it out, or tapping the backspace key.

From a coder's perspective, it doesn't work the same way, except when writing the actual code. The code instructions for manipulating strings have restrictions and specific methods. You will learn these methods here but let's start with more of a visual approach to envision how code will do the magical transformations.

WHICH DIRECTION

Like writing, strings can be manipulated from left-to-right and right-to-left. The length of a string can be as little as a single space to pages of text, but most commonly in code, a string will not be longer than a sentence. A string can be a username, phone number, a snippet of code, a poem etc.

When working with a specific coding language, there are built-in methods to use or you can create your own custom method. A combination of these methods can manipulate text to do virtually whatever you want. You can become a string master with the force of practice.

Besides processing a string from left-to-right or right-to-left, it can be broken down and manipulated to individual characters using the number representing the position of any character. This is known as the index value of the string. For example, the string "Hello!" contains 6 characters, so your code can directly access any letter by indicating a corresponding index number.

"Hello!"
 123456 (number represents position)

TRAVERSING

Several coding methods will process the string in this ascending-numerical order however since computers compute with a basis of zero, the first item position is always 0. To be more accurate, I should state that the computer is traversing, not processing strings. The difference is that "processing" indicates an effect happens whereas "traversing" indicates a passage or travel across something.

When dealing with code instructions, you should be conscious about the computing resources utilized so you may not need to process every character in a string but rather traverse to the individual character you need to change.

For example, your objective is to remove punctuation so you have several approaches to remove the "!" From "Hello!". You can use a method to find the position of "!" or you can access the last character of the string. These methods include getting the length of the string, getting the index of "!" or traversing the string in reverse.

If use the length method, you have to remember to subtract 1 since computing starts with zero. Also spaces count as part of the string and will have an index position thus increasing the length of the string.

The INDEX number represents the position of a character in a string.

"Hello!"
 012345 character positions

"Hello!".length - 1
Length is a property of a string.

Here are methods to get the position of a character in a string:

"Hello!".indexOf("!")
Find the first position of a character searching from left-to-right.

"Hello!".lastIndexOf("!") 
Find the last position of a character searching from right-to-left.

"Hello!".length - 1
Find the last character in a string.

All give 5 as the result. You can do the opposite with the charAt() method which returns the character from a string specified by the position.

"Hello!".charAt(5)
Result is "!"

ONE CHARACTER

Now you know the basics of traversing a string one character at a time, which are from the left, from the right, and from the end using index numbers. However, not all methods return the position of the character you seek. You may prefer a result as a boolean data type instead. Meaning your search is a test which returns true or false.

Boolean test methods: includes, startsWith, endsWith

"Hello!".includes("!")
True

"Hello!".startsWidth("!")
False

"Hello!".endsWith("!")
True

These character checks are not as useful as finding the position of a character, because you cannot proceed with your algorithm if your purpose is to modify the string with the same search query. Besides there are more powerful methods for true/false checks which we will be described later. Up to this point we have learned to traverse a string left-to-right and right-to-left so what's the next step? Modification!

We can us several built-in methods or create our own for changing the text in a string. Let's start with the methods which don't require indicating a search query or index position. Since humans care more about uppercase and lowercase letters than computers, we can instantly transform an entire string use these two methods:

"Hello!".toUpperCase()
Result "HELLO!"

"Hello!".toLowerCase()
Result "hello!"

If you have seen a camel then you know they have humps and in programming when code LooksLikeThis it is called camel case. This is because it has humps and no spaces. You will have to traverse and recognize this type some day. We do this to make text easier to read for humans because who likes to read "a sEnTEnCe liKE ThiS!?" Actually, this method is also useful for web apps like blogs which take an article title and create a url known as a slug.

Example:
Article name "Mastering String Manipulation"
Slug url "domain.com/mastering-string-manipulation/"

Since there are multiple methods to get the same result, Let's begin with this example of combining strings into one. This is known as concatenation. You can use the "+" symbol or the concat method. Please note that since Javascript does not automatically enforce data types, so you should ensure that the data types are strings as opposed to arrays or booleans when using "+". This topic is for another entire article. With the lack of data type enforcement, erroneous output can occur as a result of type coercion. Meaning the + sign can accidentally change an integer to a string.

"Hello" + "World"
Result "HelloWorld"

"Hello".concat("World")
Result "HelloWorld"

"12" + 12
Result "1212", not 24.

The newest way to concatenate strings is using template strings which utilize the back-tick symbol and curly braces {} after the $ symbol. Yes, using those three symbols is required. You will see this in emails as well as websites to customize the writing output based on the user's information.

var myString = "Hello"
Var string2 = "World"
`${myString} ${World}`
Result "Hello World"

Previously I stated that empty spaces count towards the length of a string, in other words they occupy a space in a string and can be manipulated as well. Since we want to be efficient in saving data as well as making text easy to read, we want to prevent unnecessary blank space and this can be done with the trim method.

It removes empty spaces at the beginning and end of a string, but not in the middle. If you want to remove empty space in the middle of a string, you have to utilize a more powerful method know as a "regular expression" which will be described later.

`" Hello World. ".trim()`
Result "Hello World."

To do the opposite, there is a method for that. You can pad a string at the end or beginning with any character. Let's say your web app deals with sensitive information like credit cards or you have ID numbers which have to conform to a specific length. You can use the padStart and padEnd methods for this. For example, a credit card number is saved in the app but you only want to show the last four digits prefixed with the * symbol.

"4444".padStart(8, "*")
Result "********4444"

"1234".padStart(4, "0")
Result "00001234"

Besides concatenating strings, you can also repeat them with a multiplier. It's uncommon to repeat text so the method will be more useful for symbols such as periods. For example, when you need to truncate a string and indicate to the reader that the string continues, you can use ellipses like this... It could also be useful for songs where lyrics are repeated. Actually, it's rare to see this method in code.

"Hello-".repeat(3)
Result "Hello-Hello-Hello"

PIZZA SLICE

Let's expand our character searches!
Using the previous search methods we are only able to retrieve one character at a time from a string. What if we want to select a word or a section of a string using an index range. Well we can do that by slicing a pizza and eating the slice we want. Almost!

The string method is called slice so a pizza slice is a good metaphor. For this, you have to pass in the start and end positions of your search query. The start position can be a negative number which will traverse the string in reverse or from the end of it. You may think, wouldn't it be easier to just match a word inside a string?

Well yes, but in some cases, coders may not be able to predict what strings they will encountered or the string will be a pre-determined length.

"Hello World".slice(6)
Result "World"

"Hello World".slice(6, 8)
Result "Wo"

"Hello World".slice(-3)
Result "rld"

Up to this point you have learned to traverse strings from the left and from the right, get character positions, do boolean tests, transform character cases, concatenate strings, remove empty space, pad, repeat strings and extract sub-strings. How about we learn how to revise our strings with the replace method. Scenarios for this can be removing explicit words, swapping first name with last name, swapping "-" for empty space " ".

The difference with the replace method compared to the previous methods in this article is that replace accepts strings and regular expressions as search queries. It also accepts a function as a second parameter but we won't go into custom functions at this time. With replace, you don't need to rely on using index positions but you need to be familiar with regular expressions (regexp or regex for short) because it is how you can replace multiple instances of the search query. Note the usage of a regular expression with the forward slashes surround the search term.

"Very bad word".replace("bad", "good")
Result "Very good word"

"Very bad bad word".replace("bad", "good")
Result "Very good bad word"

"Very bad bad word".replace("bad", "good")

"Very bad bad word".replace(/bad/, "good")
Result "Very good bad word"

"Very bad bad word".replace(/bad/g, "good")
Result "Very good good word"

CRYPTIC PATTERNS

Are you beginning to feel the power of string manipulation? You are slowly becoming an expert. A regexp can be denoted using the forward slash / outside of the search word and the letter g after the second slash / indicates a global search which will replace multiple instances of the word inside the string. Generally, it's better to use indexOf() and replace() for faster function execution speed and when searching for one instance of a word.

Otherwise, to understand regular expressions you have to memorize the symbols on your keyboard. Many symbols including letter cases. In fact, there's nothing regular about "regular expressions". It should be called "cryptic patterns" because no human being can read them without finding the meaning of the symbols used. To simplify the meaning for human language consumption, you can also say they are string-searching algorithms.

MAGIC WAND

Before I show you some of the characters used, I would like to paint you a picture of the traversing that happens using regexp. First imagine a magic wand in your hand. Waving the magic wand releases magical stars onto the string which modify it to the desired string you want. Each star represents a symbol in the regular expression and that is what you have to come up with as a search pattern.

Regular expressions are truly powerful search techniques. You can find a needle in a haystack instantly, well more like in micro-seconds. Many input forms on the web use regular expressions to conform text into specific formats such as zip codes, phone numbers, domain names, currency values and the list can goes on. Do note that there are different regular expression engines depending on the programming language and the following is specific to javascript.

We can get more specific and describe those symbols and their purpose. To try out regular expressions yourself you can visit a website such as https://regexr.com/
Practice on this website by copying and pasting examples from this article or invent your own patterns.

/term/ regexp always has to be contained inside two forward slashes. "A/B/C" is not a regexp. Every character or symbol between the slashes represents something other than the symbol itself.

/abc/ any alphabetical character without symbols is equivalent to a regular consecutive search string.

/\$/ An explicit search for a symbol has to be prefixed with a backward slash \, in this case it's the dollar symbol. It's called escaping even though none of them will run away. The symbols still need to escape from the wrath of your cryptic search desires.

/^abc/ and /abc$/ These symbols don't have to be escaped. They are the carrot ^ and dollar sign $. Their purpose is to restrict the search to the beginning and ending of a string respectively. This is also known as anchoring so they can be called anchors. In this case, it means if "abc" is in the middle of "xyzabczyx", it will be ignored. ^ means the string must start with "abc" and $ means that the string must end with "abc". You can apply one or both.

What if you don't want to search for an alphabetical character nor a symbol, but a formatting change in the string. Since I mentioned an empty space has meaning in code, so does, a tab, a new line, and a carriage return. These can be searched using a combination of backslash and one letter. For brevity, we've excluded the surrounding slashes.

\n Find a newline
\t Find a tab
\r Find a carriage return

This is mind-blowing right? You can manipulate empty space and look for invisible metacharacters which control formatting using regexp. Let's try a regexp example based on what we know so far. We want a specific dollar amount in the beginning of a string $10.xx and any cent amount.

/^\$10\.\d\d/

We are using ^ to match the start
then a backslash \ to escape the dollar $ sign
the number 10 followed by an escaped period .
the escaped \d represents any digit 0-9 so we have it twice

As previously mentioned, adding a backslash to any letter changes the search pattern. Here are some search patters with the backlash and letter combination.

\w Matches any word
\d Matches any digit
\s Matches empty space

In addition to that you can match the negation or the opposite with the capital letter equivalents.

\W don't match a word
\D don't match a digit
\S don't match empty space

GLOBALLY INSENSITIVE

Now that you are getting more comfortable with the possibilities of regular expressions, you need to be aware of the letters "g" and "i" at the ending of the regexp term, right after the second forward slash. These are known as flags which modify your search. The "g" means global so it will return more than one result match if available while the "i" means insensitive in regards to text case. Uppercase or lowercase will not matter using this flag.

/term/g Finds multiple instances, not just the first
/term/i Finds uppercase and lowercase characters

To expand on your searches, here's the next addition of complexity. You may want to find a combination of letters, numbers or symbols. You can do this by grouping inside parentheses () and brackets []. The brackets are specific to character ranges such as 0-9 or A-Z uppercase, a-z lowercase. You can use multiple dashes for multiple ranges inside a single set of brackets.

Parentheses are not useful alone, but when you have additional search terms in one regexp. To throw in a monkey wrench, the carrot ^ symbol inside a bracket set will negate the search.

/[abc]/ Matches any of the letter a, b, or c.
/[0-7]/ Matches numbers 0-7 anywhere in the string.
/[^0-7]/ Don't match numbers 0-7 anywhere in the string.

[0-9] is identical to the \d for digits while \w is identical for [a-z] words.

Using parentheses () is useful when you want to search more than one pattern such as international phone numbers while brackets [] or for searching sets. When using parentheses in your search, you may also include the pipe symbol | as an OR operator. This means your result can be the search pattern on either side of the pipe. This is known as alternation. Here are examples:

/[abc](123)/ Matches a, b, or c, followed by 123
/gr[ae]y/ Matches gray or grey
/(gray|grey)/ Matches gray or grey

QUANTIFIERS

Do you want to match a specific amount of letters or numbers? Perhaps 0 or 1, 1 or many, only 4. It's all possible with quantifiers. Here's are quantifier symbols and how you can use them. We will use the letter "a" as part of the example.

/a*/ Matches 0 or more letter a
/a+/ Matches 1 or more letter a
/a?/ Matches 0 or 1 letter a
/a{4}/ Matches exactly 4 consecutive letters a.
/a{2,3}/ Matches between 2-3 letters a.

The possibilities don't stop here. This is why algorithms utilize regular expressions regularly so becoming an export in them is going to take you a long way. In total, there are 11 metacharacters available for regular expressions. They are: \ ^ $ . | ? * + () [] {} Each has a purpose.

Another practical example is to find html tags because they are the foundation of websites. Let's think this through before typing out the expression. 1. We need at least one letter because all tags start with a letter and while it should be lowercase, we may encounter legacy html that is capitalized.

Next, we shall expect more letters or a number such as h1 tags. While the * will get one or more characters, we can limit the amount using {} instead. The following will capture html tags without attributes:

/<[A-Za-z][A-Za-z0-9]*>/g Matches html tags

LOOK AROUND

Finally, there is another advanced concept if regular expressions weren't advanced enough. It is called the lookahead. There's a positive and negative lookahead. It must be placed inside parentheses and begin with a question mark ?. Essentially a lookahead matches the search pattern but does not capture it or you can think of it as to match something not followed by something else. This is useful when doing a combined search pattern by grouping. To demonstrate, let's search for a dollar value in a string that is followed by "USD" but we don't want to capture the "USD". We will use the positive lookahead using (?= and the negative lookahead using (?!

/\$30(?=USD)/ Matches $30 from "The product costs $30USD"
/\$30(?!USD)/ Matches $30 from "The USD value is $30"

SUMMARY

There we have it! A comprehensive usage of javascript for manipulating strings to your desires featuring built-in methods and powerful regular expressions. Now you may approach your coding algorithms easier using all these methods learned and create your own search patterns which can be greedy, fuzzy, lazy, dirty, empty, crazy, etc. This is just the beginning of your algorithm adventures and wielding a superpower called coding.

I hope you will familiarize yourself with all the methods available and try to memorize them through practice, practice, and more practice.