Anand

Posted on Aug 12, 2024

Simple Wikipedia Search App with Streamlit 🐍🕸️💻

#webscraping #datascience #webdev #python

Hey there! 👋 I recently worked on a small project where I created a simple web app that lets you search for Wikipedia articles and display them in a chat-like interface. I used Streamlit to build the app and BeautifulSoup for web scraping. I wanted to share how I did it so you can try it out too!

What You Need

Before we dive in, make sure you have these Python libraries installed:

streamlit: To build the web app.
requests: To send requests to websites and get data.
beautifulsoup4: To scrape and parse the HTML content.

You can install them using pip:

pip install streamlit requests beautifulsoup4

The Code Explained

1. Setting Up the App

First, I imported the necessary libraries and set up the basic configuration for the Streamlit app.

import streamlit as st
import requests
from bs4 import BeautifulSoup
import time
import random

st.set_page_config(page_title="WikiStream", page_icon="ℹ")
st.title("Wiki-Fetch")
st.sidebar.title("Options")

2. Adding Themes and Chat Interface

I added an option in the sidebar for users to switch between Light and Dark themes. I also set up a basic chat interface where the user can enter a topic and see the responses.

theme = st.sidebar.selectbox("Choose a theme", ["Light", "Dark"])
if theme == "Dark":
    st.markdown("""
    <style>
    .stApp {
        background-color: #2b2b2b;
        color: white;
    }
    </style>
    """, unsafe_allow_html=True)

if 'messages' not in st.session_state:
    st.session_state.messages = []

for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

3. Generating and Fetching Wikipedia Links

Next, I created a function to generate a Google search link based on the user’s input. Then, I scraped the search results to find the actual Wikipedia link and fetched the content from that page.

def generate_link(prompt):
    if prompt:
        return "https://www.google.com/search?q=" + prompt.replace(" ", "+") + "+wiki"
    else:
        return None

def generating_wiki_link(link):
    res = requests.get(link)
    soup = BeautifulSoup(res.text, 'html.parser')
    for sp in soup.find_all("div"):
        try:
            link = sp.find('a').get('href')
            if ('en.wikipedia.org' in link):
                actua_link = link[7:].split('&')[0]
                return scraping_data(actua_link)
                break
        except:
            pass

4. Scraping Wikipedia Content

This is where the content gets extracted from Wikipedia. I used BeautifulSoup to grab all the text from the page, clean it up, and display it at a speed chosen by the user.

def scraping_data(link):
    actual_link = link
    res = requests.get(actual_link)
    soup = BeautifulSoup(res.text, 'html.parser')
    corpus = ""
    for i in soup.find_all('p'):
        corpus += i.text
        corpus += '\n'
    corpus = corpus.strip()
    for i in range(1, 500):
        corpus = corpus.replace('[' + str(i) + ']', " ")

    speed = st.sidebar.slider("Text Speed", 0.1, 1.0, 0.2, 0.1)

    for i in corpus.split():
        yield i + " "
        time.sleep(speed)

5. Getting a Random Wikipedia Topic

I added a fun feature that lets you fetch a random Wikipedia article. It’s great for those moments when you just want to learn something new without having to think of a topic.

def get_random_wikipedia_topic():
    url = "https://en.wikipedia.org/wiki/Special:Random"
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    return soup.find('h1', {'id': 'firstHeading'}).text

6. Handling User Input and Displaying Content

Finally, I handled the user input and displayed the content in a chat-like interface. I also added options to clear the chat history and summarize the last response.

if st.sidebar.button("Get Random Wikipedia Topic"):
    random_topic = get_random_wikipedia_topic()
    st.sidebar.write(f"Random Topic: {random_topic}")
    prompt = random_topic

if prompt:
    link = generate_link(prompt)
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    with st.chat_message("assistant"):
        message_placeholder = st.empty()
        full_response = ""
        for chunk in generating_wiki_link(link):
            full_response += chunk
            message_placeholder.markdown(full_response + "▌")
        message_placeholder.markdown(full_response)
    st.session_state.messages.append({"role": "assistant", "content": full_response})

if st.sidebar.button("Clear Chat History"):
    st.session_state.messages = []
    st.rerun()

if st.sidebar.button("Summarize Last Response"):
    if st.session_state.messages and st.session_state.messages[-1]["role"] == "assistant":
        last_response = st.session_state.messages[-1]["content"]
        summary = " ".join(last_response.split()[:50]) + "..."
        st.sidebar.markdown("### Summary")
        st.sidebar.write(summary)

Click the link below to start exploring:
https://wiki-verse.streamlit.app/

Check out the code behind Wiki-Fetch on GitHub!

Happy browsing! 📚

Conclusion

And that’s it! 🎉 I’ve built a simple yet functional Wikipedia search app using Streamlit and BeautifulSoup. This was a fun project to work on, and I hope you find it just as enjoyable to try out. If you have any questions or feedback, feel free to reach out. Happy coding! 🚀

About Me:
🖇️LinkedIn
🧑‍💻GitHub

DEV Community