DEV Community

Cover image for Convert any .pdf file πŸ“š into an audio πŸ”ˆ book with Python
Mustafa Anas
Mustafa Anas

Posted on • Updated on

Convert any .pdf file πŸ“š into an audio πŸ”ˆ book with Python

(edit: I am glad you all liked this project! It got to be the top Python article of the week!)

A while ago I was messing around with google's Text to Speech python library.
This library basically reads out any piece of text and converts it to .mp3 file. Then I started thinking of making something useful out of it.

My installed, saved, and unread pdf books πŸ˜•

I like reading books. I really do. I think language and ideas sharing is fascinating. I have a directory at which I store pdf books that I plan on reading but I never do. So I thought hey, why dont I make them audio books and listen to them while I do something else πŸ˜„!

So I started planning how the script should look like.

  • Allow user to pick a .pdf file
  • Convert the file into one string
  • Output .mp3 file.

Without further needless words, lets get to it.

Allow user to pick a .pdf file

Python can read files easily. I just need to use the method open("filelocation", "rb") to open the file in reading mode. I dont want to be copying and pasting files to the directory of the code everytime I want to use the code though. So to make it easier we will use tkinter library to open up an interface that lets us choose the file.

from tkinter import Tk
from tkinter.filedialog import askopenfilename

Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
filelocation = askopenfilename() # open the dialog GUI
Enter fullscreen mode Exit fullscreen mode

Great. Now we have the file location stored in a filelocation variable.

Allow user to pick a .pdf file βœ”οΈ

Convert the file into one string

As I said before, to open a file in Python we just need to use the open() method. But we also want to convert the pdf file into regular pieces of text. So we might as well do it now.
To do that we will use a library called pdftotext.
Lets install it:

sudo pip install pdftotext
Enter fullscreen mode Exit fullscreen mode

Then:

from tkinter import Tk
from tkinter.filedialog import askopenfilename
import pdftotext

Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
filelocation = askopenfilename() # open the dialog GUI

with open(filelocation, "rb") as f:  # open the file in reading (rb) mode and call it f
    pdf = pdftotext.PDF(f)  # store a text version of the pdf file f in pdf variable
Enter fullscreen mode Exit fullscreen mode

Great. Now we have the file stored in the variable pdf.
if you print this variable, you will get an array of strings. Each string is a line in the file. to get them all into one .mp3 file, we will have to make sure they are all stored as one string. So lets loop through this array and add them all to one string.

from tkinter import Tk
from tkinter.filedialog import askopenfilename
import pdftotext

Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
filelocation = askopenfilename() # open the dialog GUI

with open(filelocation, "rb") as f:  # open the file in reading (rb) mode and call it f
    pdf = pdftotext.PDF(f)  # store a text version of the pdf file f in pdf variable

string_of_text = ''
for text in pdf:
    string_of_text += text
Enter fullscreen mode Exit fullscreen mode

Sweet πŸ˜„. Now we have it all as one piece of string.

Convert the file into one string βœ”οΈ

Output .mp3 file πŸ”ˆ

Now we are ready to use the gTTS (google Text To Speech) library. all we need to do is pass the string we made, store the output in a variable, then use the save() method to output the file to the computer.
Lets install it:

sudo pip install gtts
Enter fullscreen mode Exit fullscreen mode

Then:

from tkinter import Tk
from tkinter.filedialog import askopenfilename
import pdftotext
from gtts import gTTS

Tk().withdraw() # we don't want a full GUI, so keep the root window from appearing
filelocation = askopenfilename() # open the dialog GUI

with open(filelocation, "rb") as f:  # open the file in reading (rb) mode and call it f
    pdf = pdftotext.PDF(f)  # store a text version of the pdf file f in pdf variable

string_of_text = ''
for text in pdf:
    string_of_text += text

final_file = gTTS(text=string_of_text, lang='en')  # store file in variable
final_file.save("Generated Speech.mp3")  # save file to computer
Enter fullscreen mode Exit fullscreen mode

As simple as that! we are done πŸŽ‡
(edit: I am glad you all liked this article! The intention of all my writings is to be as simple as possible so all-levels readers can understand. If you wish to know more about customizing this API, please check this page: https://gtts.readthedocs.io/en/latest/)

Buy Me A Coffee

I am on a lifetime mission to support and contribute to the general knowledge of the web community as much as possible. Some of my writings might sound too silly, or too difficult, but no knowledge is ever useless.If you like my articles, feel free to help me keep writing by getting me coffee :)

Oldest comments (52)

Collapse
 
kriska profile image
Kristina Gocheva • Edited

My favorite part is (if I am not mistaken) that this would work for any language PDF as long as google text to speech supports the language.

Collapse
 
mustafaanaskh99 profile image
Mustafa Anas

hahaha omg how could I not think about doing the research.
You're true.
check this out
cloud.google.com/text-to-speech/

Collapse
 
belkin profile image
Belkin

Do you have any demo audio files? I'm really interested to hear it. :)

Collapse
 
mustafaanaskh99 profile image
Mustafa Anas

Run this code and hear the result

from gtts import gTTS
final_file = gTTS(text='Demo String', lang='en')  # store file in variable
final_file.save("Generated Speech.mp3")  # save file to computer
Collapse
 
rishabhagg97 profile image
Rishabh Aggarwal

Hey, this is really cool.

Collapse
 
mustafaanaskh99 profile image
Mustafa Anas

hey thanks buddy!
glad you liked it

Collapse
 
cricarba profile image
Cristian Carvajal πŸ‘½

Great!!
Does it work in any language?

Collapse
 
mustafaanaskh99 profile image
Mustafa Anas
Collapse
 
trippymonk profile image
Blake Stansell

Awesome, awesome, awesome! I'm guessing they're ok to listen to?

Collapse
 
mustafaanaskh99 profile image
Mustafa Anas

Yea they get the job done

Collapse
 
bgatwitt profile image
bga

Really useful article.

Collapse
 
mustafaanaskh99 profile image
Mustafa Anas

thanks buddy!

Collapse
 
suryabranwal profile image
SURAJ BRANWAL

Thanks a lot for the article, I tried a lot finding such thing but now am able to read(listen) to all my untouched PDFs.

Collapse
 
mustafaanaskh99 profile image
Mustafa Anas

That was my intention.
Glad you liked it :)

Collapse
 
suryabranwal profile image
SURAJ BRANWAL

I tried this on Win10, but was unable to install pdftotext package in Python 3.8.
Hence, I did this using another way :

github.com/suryabranwal/TIL/blob/m...

Collapse
 
schwepmo profile image
schwepmo

Really cool and quick project! One thing I would suggest is to use python's join() method instead of looping over the list of strings. I think that's the more "pythonic" way and should also perform a little better.

Collapse
 
mustafaanaskh99 profile image
Mustafa Anas

Thanks for the tip!
I sure will start using that

Collapse
 
ash_wanth profile image
Ashwanth

I am really intrigued by this article. I tried everything to install pdftotext lib on my mac but was unsuccessful. I keep getting this error --> " error: command 'gcc' failed with exit status 1"
I installed OS dependencies , Poppler using brew but didn't work. Can you anyone help me?

Collapse
 
mustafaanaskh99 profile image
Mustafa Anas

make sure you have these two installed:
python-dev
libevent-dev

Collapse
 
ash_wanth profile image
Ashwanth

Yup i installed them . NO matter what i do, i keep getting this error --> "ERROR: Command errored out with exit status 1"
and i installed gcc too!

Thread Thread
 
redeving profile image
Kelvin Thompson • Edited

I just started getting the same thing on my system (Ubuntu). After a lot of Google/StackExchange, this worked (copy from my annotations):

For whatever reason, in order to install the following two, I had to install some stuff on my Ubuntu Mate ** system-wide ** to get rid of compile errors:

sudo apt-get install python3-setuptools python3-dev libpython3-dev
sudo apt-get update
sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev

I'm using PyCharmCE. After the above, I could use this in the PyCharm terminal:

pip3 install pdftotext
pip3 install gtts

After I did all of that, successful! Program works like a charm (hehe).

Cheers!

Thread Thread
 
mustafaanaskh99 profile image
Mustafa Anas

Thanks for sharing your solution!

Thread Thread
 
redeving profile image
Kelvin Thompson

A pleasure to finally be able to give back a little!

Thread Thread
 
ash_wanth profile image
Ashwanth

I have a Mac, brother. Can't use app-get. what should i do now?

Thread Thread
 
davidsouza profile image
David Souza

Are you using the default Python 2.7?? You may need to use Python 3.x

Thread Thread
 
davidsouza profile image
David Souza

I got this working on the Mac using Python 3.7.4 using virtual env and brew. Works fine.

Thread Thread
 
jogeshpi03 profile image
Jogesh

I am using docker with my Macbook without any issue. And it is a great alternative to start working on any environment, stack, etc.

Collapse
 
maskedman99 profile image
Rohit Prasad

They mention what all has to be installed for various O.S's in here pypi.org/project/pdftotext/

Collapse
 
nezhar profile image
Harald Nezbeda

Have you tried to install the OS dependencies as specified in the docs? github.com/jalan/pdftotext#macos

Collapse
 
narenandu profile image
Narendra Kumar Vadapalli

I am on fedora and had to install the following dependencies to get this working before I could pip install pdftotext

Sequence would be

sudo dnf install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config
pip install pdftotext gtts
Collapse
 
gadgetsteve profile image
Steve (Gadget) Barnes

I would suggest adding two lines to save the MP3 file to the same location and name as the PDF file.

from os.path import splitext

outname = splitext(filelocation)[0] + '.mp3'

then use:

final_file.save(outname)

Collapse
 
mustafaanaskh99 profile image
Mustafa Anas

That would be a nice add!

Collapse
 
sadorect profile image
sadorect

Oh, fantastic! I was looking to add this by myself but I don't know python coding. Thanks for bringing it up!

Collapse
 
usmankamal profile image
Usman Kamal

Nice one Mustafa!

I'm curious what would happen if the PDF has images or mathematical equations?

Collapse
 
sadorect profile image
sadorect

This is a life-saving procedure you shared. I tried it and works like charm. Thank you so very much.

I have a question though...
I know this is a simplistic approach to just explain the basics( and its awesome). Please, is it possible to change the reader's voice and reading speed?

Collapse
 
mustafaanaskh99 profile image
Mustafa Anas

I am glad you liked it!
The intention of all my writings is to be as simple as possible so all-levels readers can understand.
If you wish to know more about customizing this API, please check this page:
gtts.readthedocs.io/en/latest/

Collapse
 
sadorect profile image
sadorect • Edited

An observation here ( I'm sure this has to do with the gtts engine though ):

The reader would rather spell some words than pronounce the actual words and its a bit strange. I did a conversion where the word "first" was spelt rather than pronounced. Initially, I thought such occurs when words are not properly written and the text recognition engine is affected. "Five" was pronounced fai-vee-e,and other spellings like that.

Overall though, it is manageable and one can make good sense out of the readings. Now I can "read" my e-books faster with this ingenious solution.

Thanks again, @mustapha

Collapse
 
probeta1 profile image
Abhinav Kumar Srivastava

Really cool !
However , when I tried to convert a decent sized pdf file (3.0 MB) , I got the following error :

"gtts.tts.gTTSError: 500 (Internal Server Error) from TTS API. Probable
cause: Uptream API error. Try again later."

Is Gtts blocking me from using their API ? How shall I resolve this ?