typedcode

Posted on Jan 16, 2022 • Edited on Jan 21, 2022

Scanning multipage documents without duplex mode (but with feeder scanner) on linux (crossmerge pdf files)

#linux #bash #tutorial #scanning

If one wants to digitalise multi page documents but just has a feeder scanner that does not support duplex scanning: This is for you.

Having small documents that are printed on both pages is a pain to scan into one single file but it is possible.
E.G. one could scan each individual page and combine everything using pdfunite or pdftk or any other tool that supports joining pdf documents.
If the document one has are 20 or more pages. This method is very time consuming.

If one has a feeder scanner, here is what one could do:

Terminology

In this tutorial I will refer with A as the front side of a sheet and B will be the back. In addition numbers will refer to a specific sheet number.
e.G. 4B will be the backside of the 4th sheet (= page 8)

HowTo

One puts all the pages into the feeder scanner and scan each A side of each document. The result will be a file with the pages 1A 2A 3A... nA. We will call this file front_pages.pdf
Now one turns the stash of pages around so that the last page is on top. One puts the pages into the feeder scanner and scans the back pages of each sheet. The result will be a document containing every B side of each sheet in reverse order Bn, Bn-1, Bn-2, ..., B3, B2, B1. We will call this file back_pages_reverse.pdf

The last thing to do is to merge the pages. Because back_pages_reverse.pdf is in reverse order one must merge the documents in the way:

front_pages[1]
back_pages_reverse[n]
front_pages[2]
back_pages_reverse[n-1]

...

front_pages[n-1]
back_pages_reverse[2]
front_pages[n]
back_pages_reverse[1]

This can be done with a simple bash script:
Prerequisites: The script uses pdftk. So to use it one must install pdftk first.

Example to install pdftk on fedora or debian based systems

#fedora
sudo dnf install pdftk

#debian
sudo apt-get install pdftk

Script for creating the meged pdf file

crossmergereverse.sh

#!/bin/bash

numPages=$(pdftk $1 dump_data | grep NumberOfPages | awk '{print $2}')

param=""

for ((i=1 ; i <= $numPages ; i++ ));
do
    bindex=$(($numPages-$i+1))
    param="$param A$i B$bindex"
done

pdftk A=$1 B=$2 cat $param output $3

Running the script like

sh crossmergereserve.sh front_pages.pdf back_pages_reverse.pdf completeDocument.pdf

Will result in a newly created document completeDocument.pdf with all the pages in the correct order 1A 1B 2A 2B... nA, nB

The script crossmergereserve.sh can be found here. The repository also contains a script for a simple cross-merge of pdf documents.

Why `back_pages_reverse.pdf`?

The script for merging the documents would be simpler if there would be two documents like 1A, 2A, ..., nA and 1B, 2B, ..., nB. But to achieve that one must reorder the entire document before scanning the back sides. Things one has to do manually are are less if one does it that way.

DEV Community

Scanning multipage documents without duplex mode (but with feeder scanner) on linux (crossmerge pdf files)

Terminology

HowTo

Why `back_pages_reverse.pdf`?

Top comments (0)

Read next

Shell Scripting for Beginners: Automating Common Coding Tasks

Optimizando la Integración de APIs de Blog: Lecciones Aprendidas con Dev.to y Hashnode

Build a Simple Chatbot with Svelte and ElizaBot

How to solve Data Synchronization in Next.js

Terminology

HowTo

Why back_pages_reverse.pdf?

Read next

Shell Scripting for Beginners: Automating Common Coding Tasks

Optimizando la Integración de APIs de Blog: Lecciones Aprendidas con Dev.to y Hashnode

Build a Simple Chatbot with Svelte and ElizaBot

How to solve Data Synchronization in Next.js

Why `back_pages_reverse.pdf`?