loading...

Processing Text with Linux Shell - Part 1

shamil profile image Shamil Updated on ・3 min read

Into the world of sed

If you are using any *nix systems on a daily basis, chances are you are already familiar with, or at least you have heard about the sed command.

sed , short for Stream Editor, is a text transformation tool that comes bundled with every unix system. What makes sed distinguishable from other text editors is the speed at which the text manipulation is performed. sed only makes one pass over the input text, therefore making the processing quite faster.

# Replace those ugly text

sed is a very powerful tool to replace a piece of text with another. The text can be matched using regular expressions.

sed 's/text_to_be_replaced/replacement_text/' file_name

However, this will only print the substitued text in the console, but won't change the same in the file itself. If we want to save the changes to the file, we can use the -i flag.

sed -i 's/text_to_be_replaced/replacement_text/' file_name

This above replaces only the first occurance of the given pattern in each line. So if we want to replace every occurence of the pattern, we can append the g parameter to the end.

sed 's/text_to_be_replaced/replacement_text/g' file_name

Note that the delimiter character / we used in the above commands is not fixed, we can use almost any delimiter character in sed. For example,

sed 's:text_to_be_replaced:replacement_text:g' file_name


sed 's|text_to_be_replaced|replacement_text|g' file_name

Okay, but what if the delimiter character is itself a part of the pattern to be replaced? ¿ⓧ_ⓧﮌ

Well, we can escape that character with a backslash. For example, to replace the word following: with below - , we can do this:

sed 's:following\::below - :' file_name

Notice the use of \: before the delimiter : that separates the pattern and it's replacement.

# Delete that scrap

sed also allows us to delete lines from a file. The d option is used to indicate a delete operation. The generic syntax to delete line is

sed 'Nd' file_name

Here N is the line number that we want to delete. If we want to delete the 10th line from a file, N would be 10.

One most common use of this command is deleting all blank lines in a file.

sed '/^$/d' file_name

The above will delete all the blank lines in the file. The regular expression ^$ marks an empty line and the d option specifies that the line should be deleted.

That's not it. We can also specify a range of lines that should be deleted.

sed 'm,nd' file_name

The above command will delete all the lines starting from mth upto nth.

# Pipelining is important

Now what about pipelining multiple sed commands?

We can pipeline as many sed as we wish and they would be processed in that order. Consider the following example.

echo Linux | sed 's/L/l/' | sed 's/n/N/' | sed 's/l/L/' | sed 's/x/X/'

This will output LiNuX.

Finally let's take a look at how we can use variables within sed command. So far we have used ' ' (single quote) in our commands. However we can aslo use " " when we need to use an expression in our command. Take a look at the following example.

greet=hello

echo hello shamil | sed "s/$greet/hi" file_name

This will replace evaluate the value of $greet and and replace hello with hi.

# Better safe than sorry

When using -i in the sed command, we need to be careful, as it replaces the actual content in the file. (Trust me, I have done this many times)

Therefore, it is a good practice to first use this command without -i flag and check if the replacements are correct. However, if the file contents are too long to be checked like that, you can use the following command to create a backup copy of the same and then modifying the content.

sed -i.bak '12,30d' file_name

This will delete all lines from 12 to 30, but most importantly it will create a file_name.bak in the same directory before modifying the actual file.

Who knows, this might just end up saving your job (◠﹏◠)

(EDIT: See this comment for more info on -i usages)

Discussion

pic
Editor guide
Collapse
shostarsson profile image
Rémi Lavedrine

Good guide.
I love using sed for modifying text. It is very useful for Security testing to modify files very quickly.
You must definitely learn how to use sed and grep commands.
It will save you a lot of times.

Collapse
kip13 profile image
kip

Good guide!

Collapse
math2001 profile image
Mathieu PATUREL

sed -i.bak '12,30d' file_name

this is a very nice trick! Thanks!

Collapse
kip13 profile image
kip

If you ever use vi / vim, this command also works and other commands too.

Collapse
pmcgowan profile image
p-mcgowan

IIRC on mac sed -i behaves differently - The default is not GNU sed. I believe you can brew it to get the good one

Collapse
shamil profile image
Shamil Author

Hi. I have never used any mac. Would you kindly provide details of sed -i behavior so I can update this article accordingly?

Collapse
moopet profile image
Ben Sinclair

BSD sed expects -i to take a file extension so it can save a backup.

Instead of using this:

sed -i YOUR_COMMANDS_HERE foo.txt

you can use something like the following, which will make a backup file and then immediately delete it if the command succeeded:

sed -i.bak YOUR_COMMANDS_HERE -- foo.txt && rm -- "foo.txt.bak"

That looks a little nasty, but it's portable; GNU sed's -i also takes a file extension as an optional argument.

Collapse
sotondolphin profile image
sotondolphin

nice article. many tricks learnt

Collapse
ramnikov profile image
Andrey Ramnikov

Very nice tutorial.

Collapse
flummingbird profile image
Will

Great post!
I love to use this funky site for more fun with sed: grymoire.com/Unix/Sed.html

Collapse
wrlee profile image
Bill Lee

Why rescan the input 4-times to change the l,i,n,u,x? Instead, use sed ‘s/L/l/;s/n/N/;s/l/L/;s/x/X/‘?

I’ll have to play w/-i … it feels dangerous 😟 -i.bak makes me feel more comfortable, though. 😊

I spent a couple of hours yesterday writing a 1-liner listing a java project’s exposed endpoints and their HTTP methods, based on @.*Mapping annotations.

Collapse
moopet profile image
Ben Sinclair

It's just a demonstration of piping, I don't think it's intended to be optimal. You could just do echo Linux | tr 'nx' 'NX' if you wanted that.

Collapse
ferricoxide profile image
Thomas H Jones II

Similarly, have single-invocation choices like:

echo Linux | sed -e 's/L/l/'-e 's/n/N/'-e 's/l/L/' -e 's/x/X/'

Which, to me, is more readable than having everything all bunched up and lends itself to uniform line-breaking in a long, complex script. Similarly, if you go this route, you also improve readability while retaining the "single invocation" efficiencies.

echo Linux | sed '{
   s/L/l/
   s/n/N/
   s/l/L/
   s/x/X/
}

...Though, if you're transforming the "L" to an "l" and then back to an "L", efficiency likely isn't your goal.

Whether using multiple, discreet sed invocations or a single, multi-transform sed, always remember that "order counts".

Collapse
shamil profile image
Shamil Author

if you're transforming the "L" to an "l" and then back to an "L", efficiency likely isn't your goal.

intention was just to demonstrate piping :)

Collapse
shamil profile image
Shamil Author

Why rescan the input 4-times to remove the l,i,n,u,x? Instead, use sed ‘s/L/l/;s/n/N/;s/l/L/;s/x/X/‘?

Guess I just missed that one.

I’ll have to play w/-i … it feels dangerous 😟 -i.bak makes me feel mkre comfortsble, though.

It is indeed. Once I messed up some logs files that I was going through.

Collapse
puritanic profile image
Darkø Tasevski

What's this dark magic? O.o