When to use the “sed” command efficiently in Shell scripts

#linux #performance #bash

The "sed" command (short for "stream editor") is a powerful and versatile tool that plays a fundamental role in the world of shell programming. However, its use is not always the best choice, especially when it comes to simple string replacements. Let's explore the efficient use of "sed" and its alternatives, highlighting the reasons to avoid excessive use of this command.

Incorrect Writing (Depending on External Commands)

Often, it is common to see the following approach for string replacement in shell scripts:

ret=$(echo "$line" | sed "s/SearchedString/ReplacementString/")

This is a functional way to perform the replacement. Information and references about the "sed" command were obtained from the official GNU "sed" manual, available at https://www.gnu.org/software/sed/manual/sed.html. However, it is not the most efficient or elegant way to do it.

Good Way of Writing (Native Shell Language Way)

Instead of relying on "sed" and external commands, the native way of the shell language is preferable:

ret=${line/SearchedString/ReplacementString}

This approach is more readable, more efficient, and utilizes the native resources of the shell.

Variable Expansion (Parameter Expansion)

A fundamental technique when working with strings in shell scripts is variable expansion, also known as parameter expansion. Variable expansion allows you to manipulate strings effectively using the native features of the shell.

# Example of variable expansion to get the length of a string
v=Varbarbaz
echo "${#v}"      # Result: 9 (length of the string)

# Example of variable expansion for string replacement
v=Varbarbaz
echo "${v/a/A}"   # Result: VarbArbaz (replaces 'a' with 'A' in the first occurrence)
echo "${v//a/A}"  # Result: VarbArbAz (replaces all occurrences of 'a' with 'A')

Variable expansion is a powerful technique that allows you to perform various operations on strings, such as replacements, getting length, and more, all using the native features of the shell language.

The Weight of "sed" in Execution

Using "sed" to replace a single line of string is not efficient. Furthermore, calling external commands like "sed" can be costly in terms of performance.

For example, when comparing the performance between replacement using "sed" and native replacement in a loop of 100,000 iterations, the results are impressive:

# Using sed
$ time bash -c 'for((i=0; i<100000; i++)); do ret=$(echo "$i" | sed "s/SearchedString/ReplacementString/"); done'

real    2m 37.269s
user    1m 23.415s
sys     1m 40.930s

# Using native substitution
$ time bash -c 'for((i=0; i<100000; i++)); do ret=${i/SearchedString/ReplacementString}; done'
real    0m 0.429s
user    0m 0.425s
sys     0m 0.001s

Native substitution is approximately 500 times faster than using "sed." This clearly demonstrates the performance impact of repeatedly calling external commands in a loop.

The Native Shell Alternative

The native method of the shell language for string substitution is an extension of ksh93, which was later incorporated into bash and zsh. You can learn more in the bash manual at https://www.gnu.org/software/bash/manual/bash.html in section 3.5.3 Shell parameter expansion. This method is more efficient and offers better performance.

ret=${line/SearchedString/ReplacementString}

This approach is preferable because it uses native resources of the shell language and avoids the overhead of calling external commands.

When to Use "sed"

While it is advisable to avoid excessive use of "sed" for simple string replacements in shell scripts, it still plays an important role when you need to perform replacements in multiple lines of text based on regular expressions. In such cases, "sed" is the most appropriate tool.

To replace multiple lines of text with "sed," you can use a pipeline and invoke the "sed" command only once. This is more efficient than repeatedly calling external commands in a loop.

# Example of replacement with "sed" in multiple lines of text
(cat file.txt | sed 's/SearchedString/ReplacementString/') > new_file.txt

Conclusion

The "sed" command is a powerful tool but should not be used excessively for simple string replacements in shell scripts. Instead, it is preferable to use the native shell language substitution and leverage variable expansion to manipulate strings effectively.

When writing shell scripts, it is important to be aware of the native language features and use best practices to avoid unnecessary use of external commands. This way, you can create more efficient and portable scripts, making the most of the power of the shell.