Agenda:
Hey welcome back, so today we are going to do something to automate our reading task using Python. We are going to build a GUI program to select pdf files and then play them inside our software, exactly no more eye reading it'll read it for you and all you need to do is to sit back and enjoy.
Prerequisites:
- Happy relationship with basic Python and
tkinter
.
yeah that's it, I'll be explaining the rest π.
Analysing:
So now we know what we are going to do so let's break it down into smaller chunks and focus on each of them individually.
First of all we are going to create a window and a dialog box to open the desired pdf file. We also create a text box for displaying the pdf text content and a button play
for start playing it as audio.
Modules used:
-
tkinter
(for dealing with GUI) -
gTTS
(for converting text into speech) -
playsound
(for playing the audio file) -
PyMuPDF
(for reading pdf files)
before moving ahead I want to tell you something that in most of the online tutorials you'll find people using PyPDF2
for working with pdf files but the reason we are not using it is because it does not always work, like till the date I'm writing this post if you use PyPDF2
for reading pdf generated by google like from google docs, it's not able to read text from it.
gTTS
stands for Google Text To Speech it's a Python library as well as a CLI tool for converting text into speech.
playsound
is also a Python library for playing audio files like .mp3
or .wav
.
we are using
playsound
just for playing the audio file that will be created usinggTTS
, you can use any Python library for that likepydub
or useos
module to play on native audio player installed on terminal, but I guess this only works on Mac OS X and linux
Let's dive into the code now β¨
Step 1 :
In this step we'll be creating our GUI so open up your favourite code editor and create a file as main.py
and import tkinter
.
tkinter
comes preinstalled with Python so no need to install it frompip
.
from tkinter import *
from tkinter import filedialog
# creating main window instance from Tk class defined in tkinter
window = Tk()
window.title("convert pdf to audiobook")
window.geometry("500x500") # setting default size of the window
# creating text box for displaying pdf content
text_box = Text(window, height=30, width=60)
text_box.pack(pady=10)
# creating menu instance from Menu class
menu = Menu(window)
window.config(menu=menu)
# adding `File` tab into menu defined above
file_menu = Menu(menu, tearoff=False)
menu.add_cascade(label="File", menu=file_menu)
# adding drop-downs to `file_menu`
file_menu.add_command(label="Open")
file_menu.add_command(label="clear")
file_menu.add_separator()
file_menu.add_command(label="Exit")
# adding play button for playing audio
play_btn = Button(text="Play")
play_btn.pack(pady=20)
# for keeping window open till we don't close it manually
window.mainloop()
Now if you run it, you'll see something like this,
Step 2 :
In this step we will create function open_pdf
this function will create a dialogue box for selecting pdf file and then reading all of it text and showing inside the text box created earlier, then it'll use gTTS
for creating audio file of all the text from text box.
import fitz # fitz is actually PyMuPDF
from gtts import gTTS
def open_pdf():
# creating dialogue box
open_file = filedialog.askopenfilename(
initialdir="/Users/swayam/Downloads/",
title="Open PDF file",
filetypes=(
("PDF Files", "*.pdf"),
("All Files", "*.*")
)
)
if open_file:
#reading pdf file and creating instance of Document class from fitz
doc = fitz.Document(open_file)
# getting total number of pages
total_pages = doc.page_count
# looping through all the pages, collecting text from each page and showing it on text box
for n in range(total_pages):
page = doc.load_page(n)
page_content = page.get_textpage()
content = page_content.extractText()
text_box.insert(END, content)
# after whole pdf content is stored then retrieving it from textbox and storing it inside variable
text = text_box.get(1.0, END)
# using gTTS to convert that text into audio and storing it inside file named as audio.mp3
tts = gTTS(text, lang='en')
tts.save("audio.mp3")
You need to install
gTTS
andPyMuPDF
, so inside your terminal runpip install PyMuPDF
andpip install gTTS
for installing them.
As you can see above code is self explanatory but still I want to highlight some points. First look at the line that says text_box.insert(END, content)
basically END
is defined inside tkinter
and it returns the last index that means where is the end of file, similarly 1.0
means the beginning of the text.
So basically when we store the first page data inside text box then starting index == last index == END
after that we'll keep inserting text at the end of the previous stored text.
Step 3 :
Now we have the function so it's time to provide each widget it's own functionality like pressing button and clicking on menu really perform something.
Go to the code and add command
attribute to all the file_menu
drop-downs and play_btn
as show below
from playsound import playsound
file_menu.add_command(label="Open", command=open_pdf)
file_menu.add_command(label="clear", command=lambda: text_box.delete(1.0, END))
file_menu.add_command(label="Exit", command=window.quit)
play_btn = Button(text="Play", command=lambda: playsound("audio.mp3"))
playsound
requirespyobjc
as dependency for working so you need to install it bypip install pyobjc
Basically function provided in command
will execute as you click on the widget. For short function like clear
or exit
we used lambda
functions.
window.quit
will close the window and clear
is self explanatory. As the audio.mp3 gets saved playsound("audio.mp3")
will play it after you click the button.
So if you followed well then in the end your final code will somewhat look like:
from tkinter import *
from tkinter import filedialog
import fitz
from gtts import gTTS
from playsound import playsound
window = Tk()
window.title("convert pdf to audiobook")
window.geometry("500x500")
def open_pdf():
open_file = filedialog.askopenfilename(
initialdir="/Users/swayam/Downloads/",
title="Open PDF file",
filetypes=(
("PDF Files", "*.pdf"),
("All Files", "*.*")
)
)
if open_file:
doc = fitz.Document(open_file)
total_pages = doc.page_count
for n in range(total_pages):
page = doc.load_page(n)
page_content = page.get_textpage()
content = page_content.extractText()
text_box.insert(END, content)
text = text_box.get(1.0, END)
tts = gTTS(text, lang='en')
tts.save("audio.mp3")
text_box = Text(window, height=30, width=60)
text_box.pack(pady=10)
menu = Menu(window)
window.config(menu=menu)
file_menu = Menu(menu, tearoff=False)
menu.add_cascade(label="File", menu=file_menu)
file_menu.add_command(label="Open", command=open_pdf)
file_menu.add_command(label="clear", command=lambda: text_box.delete(1.0, END))
file_menu.add_separator()
file_menu.add_command(label="Exit", command=window.quit)
play_btn = Button(text="Play", command=lambda: playsound("audio.mp3"))
play_btn.pack(pady=20)
window.mainloop()
Let's test it
Now it's time to run our code and check if everything is working or not, take a sample pdf file with some text and open it.
YAYYYYYY...... π π₯³, WE DID IT GUYS
we just created our own pdf to audio book convertor, now if you want to go some steps further I will recommend you to read gTTS official documentation also if someone wants then you can convert this python script into exe
file and share it with your friends so they can have fun too
let's keep
converting python scripts to .exe files
for next tutorial π .
What's next !
If you are still reading, make sure to follow me on Twitter as I share some cool projects and updates there and yeah don't forget I have some exciting stuff coming up every weekend. See Y'all next time and stay safe ^^ π»
Top comments (5)
Is there any way to pop up an option for choosing the page from which the reading will start & option for choosing the pdf file is there, I am pasting the code
import pyttsx3 as py
import PyPDF2 as pd
pdfReader = pd.PdfFileReader(open('Excel-eBook.pdf', 'rb'))
from tkinter.filedialog import *
speaker = py.init()
voices = speaker.getProperty('voices')
for voice in voices:
speaker.setProperty('voice', voice.id)
book = askopenfilename()
pdfreader = pd.PdfFileReader(book)
pages = pdfreader.numPages
for num in range(0, pages): # O is the number from where the reading will start
page = pdfreader.getPage(num)
text = page.extractText()
player = py.init()
player.say(text)
player.runAndWait()
also if you want more real life effect check out gTTS documentation (link provided in article)
and specially section of Localized βaccentsβ , pre-processing, tokenizing, with some extra touch you can make it feel much real-life type
"you'll find people using PyPDF2 for working with pdf files but the reason we are not using it is because it does not always work, like till the date I'm writing this post if you use PyPDF2 for reading pdf generated by google like from google docs, it's not able to read text from it" - you think this is still true after a lot of updates on pypdf2 this year? (51 releases)
Good Article!
Personally, I still prefer wxPython, but tkinter works.
You need to call a thread to keep the GUI from continuing to freeze when opening files and playing audio.
That's a nice idea, thanks π