DEV Community

Cover image for Building a Document QA with Streamlit & OpenAI
CyprianTinasheAarons
CyprianTinasheAarons

Posted on โ€ข Edited on

7 2 2 2 2

Building a Document QA with Streamlit & OpenAI

What is Streamlit? ๐Ÿš€

Streamlit is an open-source Python framework for data scientists and AI/ML engineers to deliver dynamic data apps with only a few lines of code.

Streamlit is exciting for AI engineers who want to quickly demo or create Proof of concept projects.

Streamlit provides great documentation that is easy to understand, and for any developer to pick up easily. ๐Ÿ“ˆ


Some Fundamentals before we dive into our project ๐Ÿงฉ

Installation ๐Ÿ› ๏ธ

To install Streamlit, we can run the following command:

pip install streamlit
Enter fullscreen mode Exit fullscreen mode

To test if we have installed it successfully, we run the following:

streamlit hello
Enter fullscreen mode Exit fullscreen mode

Once we have built our application script i.e <streamlit_script.py>, we can run it using the following command:

streamlit run <streamlit_script.py>
Enter fullscreen mode Exit fullscreen mode

Displaying Text or Diagrams ๐Ÿ“

Using st.write we can display information in our app:

st.write("hello world")
Enter fullscreen mode Exit fullscreen mode

Text Elements โœ๏ธ

We can display strings in different formats, e.g., markdown, title, header, and subheader:

st.markdown("*Streamlit* is **really** ***cool***.")
Enter fullscreen mode Exit fullscreen mode

Widgets ๐ŸŽ›๏ธ

Streamlit has many widgets that include buttons, select boxes, checkboxes, etc.:

st.button("Click me")
Enter fullscreen mode Exit fullscreen mode

Layout ๐Ÿ–ผ๏ธ

We can work with sidebars, columns, and expanders. For example, st.sidebar will show a sidebar on our app interface:

st.sidebar.write("I am a sidebar")
Enter fullscreen mode Exit fullscreen mode

๐Ÿ‘‰ Going through the Streamlit docs and cheat sheet will quickly get you updated on the entire syntax:

Hosting a Streamlit App ๐ŸŒ

Hosting a Streamlit app is very easy when working with Streamlit Cloud:


Prerequisites ๐Ÿ“‹

  1. You are a Python developer.
  2. You have a basic understanding of Gen AI and LLMs like OpenAI.
  3. You love learning and upskilling.
  4. Your preferred IDE e.g VScode.

A Breakdown of our Document Question & Answer Streamlit application

We start by importing Streamlit and OpenAI into our app.py file:

import streamlit as st
from openai import OpenAI
Enter fullscreen mode Exit fullscreen mode

Next, we make use of st.title and st.write to display the title and description:

st.title("๐Ÿ“„ Document Question Answering")
st.write(
    "Upload a document below and ask a question about it โ€“ GPT will answer! "
    "To use this app, you need to provide an OpenAI API key, which you can get [here](https://platform.openai.com/account/api-keys). "
)
Enter fullscreen mode Exit fullscreen mode

Image description

Next up, is the st.text_input function by Streamlit to add our OpenAI key giving our application AI capabilities:

openai_api_key = st.text_input("OpenAI API Key", type="password")
Enter fullscreen mode Exit fullscreen mode

Image description

Lastly, when Implementing the core logic for the platform, we start with an if not condition to check if the key exists; otherwise, we show the st.info to ask the user to add the key:

if not openai_api_key:
    st.info("Please add your OpenAI API key to continue.", icon="๐Ÿ—๏ธ")
Enter fullscreen mode Exit fullscreen mode

Image description

The else condition shows our fully functional Doc QA:

else:

    # Create an OpenAI client.
    client = OpenAI(api_key=openai_api_key)

    # Let the user upload a file via `st.file_uploader`.
    uploaded_file = st.file_uploader(
        "Upload a document (.txt or .md)", type=("txt", "md")
    )

    # Ask the user for a question via `st.text_area`.
    question = st.text_area(
        "Now ask a question about the document!",
        placeholder="Can you give me a short summary?",
        disabled=not uploaded_file,
    )

    if uploaded_file and question:

        # Process the uploaded file and question.
        document = uploaded_file.read().decode()
        messages = [
            {
                "role": "user",
                "content": f"Here's a document: {document} \n\n---\n\n {question}",
            }
        ]

        # Generate an answer using the OpenAI API.
        stream = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=messages,
            stream=True,
        )

        # Stream the response to the app using `st.write_stream`.
        st.write_stream(stream)
Enter fullscreen mode Exit fullscreen mode

Image description

A Breakdown of the Code ๐Ÿง

  1. Initializing our OpenAI client using the added OpenAI key:

    client = OpenAI(api_key=openai_api_key)
    
  2. Using file_uploader from Streamlit, we upload our types .txt and .md:

    uploaded_file = st.file_uploader(
        "Upload a document (.txt or .md)", type=("txt", "md")
    )
    
  3. Using text_area, we take the input from the user:

    question = st.text_area(
        "Now ask a question about the document!",
        placeholder="Can you give me a short summary?",
        disabled=not uploaded_file,
    )
    
  4. We implement a condition to check if the user has uploaded a file and inputted a question:

    if uploaded_file and question:
    
  5. We read the file and process what the user uploaded:

    document = uploaded_file.read().decode()
    
  6. We initialize the messages and pass them to our OpenAI chat completions endpoint:

    messages = [
        {
            "role": "user",
            "content": f"Here's a document: {document} \n\n---\n\n {question}",
        }
    ]
    
    # Generate an answer using the OpenAI API.
    stream = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        stream=True,
    )
    
  7. Finally, using write_stream, we stream the output:

    st.write_stream(stream)
    

Setting Up the Project Locally on your machine ๐Ÿ—๏ธ

Clone the repository:

git clone git@github.com:CyprianTinasheAarons/document-qa.git
cd document-qa/
Enter fullscreen mode Exit fullscreen mode

Create a virtual environment:

python3 -m venv venv
Enter fullscreen mode Exit fullscreen mode

Activate the environment:

source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

Install the requirements found in requirements.txt:

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Yay!! Now we can run our code:

streamlit run streamlit_app.py
Enter fullscreen mode Exit fullscreen mode

Image description


We navigate to our local URL and add our OpenAI key:

Get your API key here: OpenAI API Keys

Image description

Image description


๐ŸŽ‰ Conclusion

Congratulations on getting this far! Now you can go and launch great AI solutions that will make the world better! ๐ŸŽŠ

Feel free to follow me on Twitter for more updates and projects. Also, check out my website here. ๐ŸŒโœจ


๐Ÿ“š Resources


๐Ÿš€ Ready to Level Up Your Tech Insights?

Hi there! ๐Ÿค—

Iโ€™m Cyprian Aarons, a Senior Software Engineer specializing in AI.

Join me on ๐Ÿฆ Twitter for daily deep-dives into cutting-edge tech, behind-the-scenes looks at innovative projects, and strategies to help you thrive in the digital era.

โœจ Stay Connected & Informed! โœจ

๐Ÿ‘‰ ๐Ÿ“ข Follow me on Twitter

๐Ÿ‘‰ ๐Ÿ“ฐ Subscribe to my Newsletter


๐ŸŒŸ Why Follow or Subscribe?

  • ๐Ÿ” In-Depth Analysis: Get detailed insights into the latest AI and tech trends.
  • ๐ŸŽฅ Exclusive Content: Access behind-the-scenes content and project updates.
  • ๐Ÿ’ก Actionable Strategies: Learn strategies to excel in the digital landscape.
  • ๐Ÿ“ฌ Regular Updates: Receive newsletters packed with valuable information straight to your inbox.

Image of Timescale

๐Ÿš€ pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applicationsโ€”without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more

Top comments (4)

Collapse
 
anna_lapushner profile image
anna lapushner โ€ข

We love planet Earth !!! This is a serious post that reminds us to do what we want to do, QA and upskilling is so important! Thank you for the VIP treatment โ€ฆ

Collapse
 
cypriantinasheaarons profile image
CyprianTinasheAarons โ€ข

Thanks, Anna. SO TRUE!!

Collapse
 
smarak_pani_8d6924a30c268 profile image
smarak pani โ€ข

Great explanation

Collapse
 
cypriantinasheaarons profile image
CyprianTinasheAarons โ€ข

thanks smarak

Sentry image

See why 4M developers consider Sentry, โ€œnot bad.โ€

Fixing code doesnโ€™t have to be the worst part of your day. Learn how Sentry can help.

Learn more

๐Ÿ‘‹ Kindness is contagious

Please leave a โค๏ธ or a friendly comment on this post if you found it helpful!

Okay