Victoria Drake

Posted on Nov 21, 2017 • Edited on Mar 16, 2019

Batch renaming images, including image resolution, with awk

#bash #programming #showdev #productivity

The most recent item on my list of "Geeky things I did that made me feel pretty awesome" is an hour's adventure that culminated in this code:

$ file IMG* | awk 'BEGIN{a=0} {print substr($1, 1, length($1)-5),a++"_"substr($8,1, length($8)-1)}' | while read fn fr; do echo $(rename -v "s/$fn/img_$fr/g" *); done
IMG_20170808_172653_425.jpg renamed as img_0_4032x3024.jpg
IMG_20170808_173020_267.jpg renamed as img_1_3024x3506.jpg
IMG_20170808_173130_616.jpg renamed as img_2_3024x3779.jpg
IMG_20170808_173221_425.jpg renamed as img_3_3024x3780.jpg
IMG_20170808_173417_059.jpg renamed as img_4_2956x2980.jpg
IMG_20170808_173450_971.jpg renamed as img_5_3024x3024.jpg
IMG_20170808_173536_034.jpg renamed as img_6_4032x3024.jpg
IMG_20170808_173602_732.jpg renamed as img_7_1617x1617.jpg
IMG_20170808_173645_339.jpg renamed as img_8_3024x3780.jpg
IMG_20170909_170146_585.jpg renamed as img_9_3036x3036.jpg
IMG_20170911_211522_543.jpg renamed as img_10_3036x3036.jpg
IMG_20170913_071608_288.jpg renamed as img_11_2760x2760.jpg
IMG_20170913_073205_522.jpg renamed as img_12_2738x2738.jpg
// ... etc etc

The last item on the aforementioned list is "TODO: come up with a shorter title for this list."

I previously wrote about the power of command line tools like sed. This post expands on how to string all this magical functionality into one big, long, rainbow-coloured, viscous stream of awesome.

Rename files

The tool that actually handles the renaming of our files is, appropriately enough, rename. The syntax is: rename -n "s/original_filename/new_filename/g" * where -n does a dry-run, and substituting -v would rename the files. The s indicates our substitution string, and g for "global" finds all occurrences of the string. The * matches zero or more occurrences of our search-and-replace parameters.

We'll come back to this later.

Get file information

When I run $ file IMG_20170808_172653_425.jpg in the image directory, I get this output:

IMG_20170808_172653_425.jpg: JPEG image data, baseline, precision 8, 4032x3024, frames 3

Since we can get the image resolution ("4032x3024" above), we know that we'll be able to use it in our new filename.

Isolate the information we want

I love awk for its simplicity. It takes lines of text and makes individual bits of information available to us with built in variables that we can then refer to as column numbers denoted by $1, $2, etc. By default, awk splits up columns on whitespace. To take the example above:

1	2	3	4	5	6	7	8	9	10
IMG_20170808_172653_425.jpg:	JPEG	image	data,	baseline,	precision	8,	4032x3024,	frames	3

We can denote different values to use as a splitter with, for example, -F',' if we wanted to use commas as the column divisions. For our current project, spaces are fine.

There are a couple issues we need to solve before we can plug the information into our new filenames. Column $1 has the original filename we want, but there's an extra ":" character on the end. We don't need the ".jpg" either. Column $8 has an extra "," that we don't want as well. To get just to information we need, we'll take a substring of the column with substr():

substr($1, 1, length($1)-5) - This gives us the file name from the beginning of the string to the end of the string, minus 5 characters ("length minus 5").
substr($8,1, length($8)-1) - This gives us the image size, without the extra comma ("length minus 1").

Avoid duplicate file names

To ensure that two images with the same resolutions don't create identical, competing file names, we'll append a unique incrementing number to the filename.

BEGIN{a=0} - Using BEGIN tells awk to run the following code only once, at the (drumroll) beginning. Here, we're declaring the variable a to be 0.
a++ - Later in our code, at the appropriate spot for our file name, we call a and increment it.

When awk prints a string, it concatenates everything that isn't separated by a comma. {print a b c} would create "abc" and {print a,b,c} would create "a b c", for example.

We can add additional characters to our file name, such as an underscore, by inserting it in quotations: "_".

String it all together

To feed the output of one command into another command, we use "pipe," written as |.

If we only used pipe in this instance, all our data from file and awk would get fed into rename all at once, making for one very, very long and probably non-compiling file name. To run the rename command line by line, we can use while and read. Similarly to awk, read takes input and splits it into variables we can assign and use. In our code, it takes the first bit of output from awk (the original file name) and assigns that the variable name $fn. It takes the second output (our incrementing number and the image resolution) and assigns that to $fr. The variable names are arbitrary; you can call them whatever you want.

To run our rename commands as if we'd manually entered them in the terminal one by one, we can use echo $(some command). Finally, done ends our while loop.

Bonus round: rainbow output!

I wasn't kidding with that "rainbow-coloured" bit...

$ pip install lolcat

Here's our full code:

$ file IMG* | awk 'BEGIN{a=0} {print substr($1, 1, length($1)-5),a++"_"substr($8,1, length($8)-1)}' | while read fn fs; do echo $(rename -v "s/$fn/img_$fs/g" *); done | lolcat

Enjoy!

Top comments (5)

Vinay Pai • Nov 21 '17 • Edited

Nice post, Vicky.

You can actually take advantage of a couple of awk features and other shell tools to simplify your one-liner a bit, especially if you're willing to assume that there are no colons or commas in the original filenames. You should also be aware that file will produce additional output if there is an EXIF tag in your JPEG image, which will cause the resolution to no longer be in the 8th field.

IMG_4299.JPG: JPEG image data, Exif standard: [TIFF image data, little-endian, direntries=12, manufacturer=Canon, model=Canon EOS 5D Mark II, orientation=upper-left, xresolution=196, yresolution=204, resolutionunit=2, datetime=2017:04:07 18:27:16], baseline, precision 8, 5616x3744, frames 3

Okay, so AWK gives you a couple of variables for free. You get NR which is the number of records (lines) seen so far, so you don't actually need your a variable. It also gives you NF which is the number of fields. This is useful here because regardless of the output format, the resolution is always in the second to last field. i.e. $(NF-2).

You can use printf to produce what is almost the final filenames (it still has the extra : and , but assuming no colons or commas in the original filename, you can just use tr -d :, to delete all the colons and commas.

Putting it all together;

file IMG* | awk '{ printf "%s img_%d_%s.jpg\n",$1,NR,$(NF-2) }' | tr -d :, | xargs -L1 mv

Of course, file produces a pretty different output for other image formats so things will go horribly wrong if some PNGs or other file formats sneak in there.

Victoria Drake • Nov 21 '17

Great notes! Thanks Vinay!

Yeah, definitely not a copy-paste solution, and needs adjustment for different formats and data output from file. Thanks for telling me about NR and tr -d!

Olivier “Ölbaum” Scherler • Nov 21 '17

If you have ImageMagick installed, you can also use identify -format '%Wx%H' filename.png to give you the resolution (and a lot of other image properties).