DEV Community

Astra Bertelli
Astra Bertelli

Posted on

AI enthusiasm #10 - Summarize PDFs with AI๐Ÿ—Ž

We live in a fast society...

... And everyone is always on the run! It always seems like we have too little time to read, inform ourselves and enjoy some quality content. In this atmosphere, we can just hope that something comes to help, and this something can be AI.๐Ÿฆพ

Build a chat interface to summarize text and PFDs

We can quickly leverage our python knowledge to implement and deploy a text summarization chatbot, using pretrained AI models and the frontend framework I already described several times, gradio.

The summarization pipeline

First of all, we want to define a "facility", within the python script, that takes charge of loading and calling the AI model:

from transformers import pipeline

model_checkpoint = "FalconsAI/text_summarization"
summarizer = pipeline("summarization", model=model_checkpoint)
Enter fullscreen mode Exit fullscreen mode

We chose, to start, a fairly little model, which is good to get to know the matter, but is now outperformed by state-of-the-art summarizers.

Define the pdfs preprocessing functions

We need some important and useful functions, such as on that merges pdf if they are uploaded in bulk...

def merge_pdfs(pdfs: list):
    merger = PdfMerger()
    for pdf in pdfs:
        merger.append(pdf)
    merger.write(f"{pdfs[-1].split('.')[0]}_results.pdf")
    merger.close()
    return f"{pdfs[-1].split('.')[0]}_results.pdf"
Enter fullscreen mode Exit fullscreen mode

...And one that turns the merged pdf into a text string suitable for the summarizer:

def pdf2string(pdfpath):
    loader = PyPDFLoader(pdfpath)
    documents = loader.load()

    ### Split the documents into smaller chunks for processing
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)
    fulltext = ""
    for text in texts:
        fulltext += text.page_content+"\n\n\n"
    return fulltext
Enter fullscreen mode Exit fullscreen mode

Now that we have the preprocessing functions set, let's design our chatbot with Gradio: we want it to be multimodal (supporting bot direct texts from our side and uploaded documents), so building it will be a little more difficult.

Build the chatbot

We want a function that manages our chat history, separating text messages from pdf documents, so we write this piece of code:

def add_message(history, message):
    if len(message["files"]) > 0:
        history.append((message["files"], None))
    if message["text"] is not None and message["text"] != "":
       history.append((message["text"], None))
    return history, gr.MultimodalTextbox(value=None, interactive=False)
Enter fullscreen mode Exit fullscreen mode

This function returns history as a list of tuples containing:

  • A tuple of the paths for the uploaded files (like this: ("/path/to/file1.pdf", "path/to/file2.pdf...")) and None (which represents the message from the chatbot, that has not been written yet)
  • A text string with our message (like: "In this article, we will see why cats are so overwhelmingly cute...") and None (which represents the message from the chatbot, that has not been written yet)

Let's see how we can use the history to generate our text:

def bot(history):
    global histr
    if not history is None:
        if type(history[-1][0]) != tuple:
            text = history[-1][0]
            response = summarizer(text, max_length=int(len(text.split(" "))*0.5), min_length=int(len(text.split(" "))*0.05), do_sample=False)[0]
            response = response["summary_text"]
            history[-1][1] = ""
            for character in response:
                history[-1][1] += character
                time.sleep(0.05)
                yield history
        if type(history[-1][0]) == tuple:
            filelist = []
            for i in history[-1][0]:
                filelist.append(i)
            finalpdf = merge_pdfs(filelist)
            text = pdf2string(finalpdf)
            response = summarizer(text, max_length=int(len(text.split(" "))*0.5), min_length=int(len(text.split(" "))*0.05), do_sample=False)[0]
            response = response["summary_text"]
            history[-1][1] = ""
            for character in response:
                history[-1][1] += character
                time.sleep(0.05)
                yield history
    else:
        history = histr
        bot(history)
Enter fullscreen mode Exit fullscreen mode

As you can see, we check whether the first element of the last tuple in history (history[-1][0]) is a tuple (which means that it consists of uploaded pdfs) or not:

  • If it is a tuple, we merge all the pdfs in there, turn them into string and pipe the text to the summarizer, returning, as output, a text which contains less than 50% but more than 5% of the words in the original document
  • If it is a text string, we directly summarize that one.

After that, we stream the output summary as the chatbot response (history[-1][1], which previously was None) with yield.

And now we only need to build the multimodal chatbot!

with gr.Blocks() as demo:
    chatbot = gr.Chatbot(
        [[None, "Hi, I'm **ai-summarizer**๐Ÿค–, your personal summarization assistant๐Ÿ˜Š"]],
        label="ai-summarizer",
        elem_id="chatbot",
        bubble_full_width=False,
    )

    chat_input = gr.MultimodalTextbox(interactive=True, file_types=["pdf"], placeholder="Enter message or upload file...", show_label=False)

    chat_msg = chat_input.submit(add_message, [chatbot, chat_input], [chatbot, chat_input])
    bot_msg = chat_msg.then(bot, chatbot, chatbot, api_name="bot_response")
    bot_msg.then(lambda: gr.MultimodalTextbox(interactive=True), None, [chat_input])
Enter fullscreen mode Exit fullscreen mode

We then launch our app like this:

demo.queue()

if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0", share=False)
Enter fullscreen mode Exit fullscreen mode

And make it run (assuming we saved the script in app.py):

python3 app.py
Enter fullscreen mode Exit fullscreen mode

Once everything is loaded, you will be able to see the chatbot on localhost:7860.

And we're done: we can just seat back and relax while our summarization assistant works!๐Ÿ˜Ž

Conclusion

This is the last post for the AI enthusiasm series: I don't exclude I will make a season 2 of it, but for now I'll be embarking in a new, exciting educational blog series about Docker and devcontainers.

HUGE THANKS TO THE DEV COMMUNITY FOR THE 2.5K+ FOLLOWERS!๐ŸŽ‰๐ŸŽ‰

It is an incredible achievement, and I hope my community here on DEV continues to grow!๐Ÿ“ˆ

Let me know in the comments if you would like me to start also a YouTube channel (it's a project that I would be eager to work on if I knew I would have some support from my community here on DEV!).๐Ÿ”ฅ๐Ÿš€

Thank you so much to everyone, let's keep up!โค๏ธ

Top comments (0)