DEV Community

Cover image for Cleaning Extra and Repeated Spaces in String Values with RegExp
JSDev Space
JSDev Space

Posted on

Cleaning Extra and Repeated Spaces in String Values with RegExp

Sometimes the smallest data-cleaning problem becomes one of the most annoying: a string looks normal, but it contains extra spaces, tabs, line breaks, non-breaking spaces, or other whitespace characters.

This is especially common when text comes from users, spreadsheets, Word documents, copied web pages, exports, forms, or manually maintained directories.

The goal is simple:

"  Eat   more   soft\tFrench\nbread  "
Enter fullscreen mode Exit fullscreen mode

should become:

"Eat more soft French bread"
Enter fullscreen mode Exit fullscreen mode

In other words:

  • replace repeated whitespace with a single space;
  • remove spaces from the beginning and end of the string.

The Excel Analogy: TRIM()

Excel has a well-known function called TRIM().

It removes extra spaces from text, leaving only single spaces between words. This is useful when data is imported from other systems and may contain unnecessary spaces.

For example:

"  Eat more of these soft French rolls,  and drink some tea  "
Enter fullscreen mode Exit fullscreen mode

becomes:

"Eat more of these soft French rolls, and drink some tea"
Enter fullscreen mode Exit fullscreen mode

That is usually exactly what we want before comparing values, doing lookups, joining datasets, or saving user-entered text.

Why This Matters

When working with user data, string values are often used for matching records.

For example:

  • matching names from two exports;
  • joining rows from different systems;
  • comparing manually entered values;
  • cleaning reference data;
  • preparing text before saving it to a database.

A small invisible character can break the match.

These two strings may look almost identical:

"John Smith"
"John   Smith"
Enter fullscreen mode Exit fullscreen mode

or:

"John Smith"
"John\tSmith"
Enter fullscreen mode Exit fullscreen mode

but for code, databases, and lookup operations, they are different values.

That is why basic string normalization is often worth doing before comparison or storage.

The Solution

The compact solution is based on a regular expression:

\s+
Enter fullscreen mode Exit fullscreen mode

It means: “find one or more whitespace characters.”

Then we replace every such sequence with a single ordinary space and trim the result.

JavaScript

function purge(value) {
  return value.replace(/\s+/g, " ").trim();
}
Enter fullscreen mode Exit fullscreen mode

Example:

purge("  Hello\t\tworld\nfrom   JavaScript  ");
// "Hello world from JavaScript"
Enter fullscreen mode Exit fullscreen mode

Python

import re

def purge(value: str) -> str:
    if isinstance(value, str):
        return re.sub(r"\s+", " ", value).strip()
    return value
Enter fullscreen mode Exit fullscreen mode

Example:

purge("  Hello\t\tworld\nfrom   Python  ")
# "Hello world from Python"
Enter fullscreen mode Exit fullscreen mode

VBA / VBScript

Function Purge(str_in)
  Set objRegExp = CreateObject("VBScript.RegExp")

  With objRegExp
    .Pattern = "\s+"
    .Multiline = True
    .Global = True
  End With

  Purge = Trim(objRegExp.Replace(str_in, " "))
End Function
Enter fullscreen mode Exit fullscreen mode

What Does \s Match?

In many regular expression engines, \s matches common whitespace characters, including:

Character Meaning
\f form feed
\n line feed
\r carriage return
\t tab
\v vertical tab
space regular space

Depending on the language and regex engine, it may also match additional Unicode whitespace characters.

Where This Should Be Used

This kind of cleanup is useful for short text fields:

  • names;
  • titles;
  • labels;
  • categories;
  • search values;
  • addresses;
  • reference data;
  • one-line form inputs.

It is not always appropriate for long-form text.

For example, if a user writes an article, comment, essay, Markdown document, or text with intentional line breaks, blindly replacing all whitespace may destroy formatting.

So the rule is simple:

Use this for compact string values, not for rich multi-paragraph content.

If you're just getting started with regular expressions—or simply need a quick reference while writing patterns—take a look at this comprehensive Regex Cheat Sheet. It covers the most common character classes, quantifiers, groups, lookarounds, flags, and practical examples that work across modern JavaScript and many other regex engines.

Frontend or Backend?

Ideally, both.

Frontend cleanup improves user experience and gives immediate feedback. Backend cleanup protects the database and keeps data consistent even if the frontend changes later.

Applications evolve. Forms get replaced. APIs get reused. Admin panels appear. Imports are added.

Normalizing important string fields on the backend is usually a good safety net.

Final Thought

This is a tiny helper, but it solves a very real problem.

A single line of RegExp can prevent broken lookups, duplicated records, inconsistent labels, and annoying invisible whitespace bugs.

For compact user-entered string values, this is one of those small utilities that quietly makes the whole system cleaner.

Top comments (0)