DEV Community

Cover image for Create Phi-3 Chatbot with 20 Lines of Code (Runs Without Wifi) ๐Ÿš€ ๐Ÿค–
Will Taner for LLMWare

Posted on

Create Phi-3 Chatbot with 20 Lines of Code (Runs Without Wifi) ๐Ÿš€ ๐Ÿค–

Made in collaboration with Andrew Dang

The Power of Local Chatbots

Chatbots are fun and powerful tools. They excel at organization, summarization, and conversation. The versatility of chatbots makes platforms like ChatGPT popular. However, these platforms train their chatbots on user data, which comes at the cost of user privacy. Hence the appeal of local large language models (LLMs). Users can deploy local LLMs without the need for an internet connection and on consumer hardware. Stick around to learn how to deploy a local chatbot in just 20 lines of code!


Let's get started ๐Ÿ”ฅ

This video will introduce you to the capabilities of the Chatbot and provide an overview of how it is built.


Framework ๐Ÿ–ผ๏ธ

LLMWare

For our new readers, LLMWARE is a comprehensive, open-source framework that provides a unified platform for application patterns based on LLMs, including Retrieval Augmented Generation (RAG).

Streamlit

Streamlit is an open-source Python library that allows developers to create interactive web applications quickly and easily. It's designed for machine learning and data science projects, enabling users to visualize data, run models, and share results in a user-friendly interface.

Please run pip install llmware and pip install streamlit in the command line to download these packages.


Importing Libraries and Configurations ๐Ÿ“š

import streamlit as st
from llmware.models import ModelCatalog
from llmware.gguf_configs import GGUFConfigs

GGUFConfigs().set_config("max_output_tokens", 500)
Enter fullscreen mode Exit fullscreen mode

Streamlit(st): Used for creating web applications. It provides functions to display content and handle user interactions.

ModelCatalog: A component of llmware that manages selecting the desired model, loading the model, and configuring the model using the parameters in the Model Loading section below.

GGUFConfigs: A component of llmware that handles global (applies to all currently loaded models) configurations for models, such as output token limits. You may find the full list of global configurations here in the variable _conf_libs under the GGUFConfigs class.


Model Loading ๐Ÿชซ๐Ÿ”‹

model = ModelCatalog().load_model(model_name, temperature=0.3, sample=True, max_output=450)
Enter fullscreen mode Exit fullscreen mode

Model Name: The name of the model you want to load from ModelCatalog.

Temperature: This controls the randomness of the output. Valid values range between 0 and 1, where lower values make the model more deterministic, and higher values make the model more random and creative.

Sample: Determines if the output is generated deterministically or probabilistically. False generates deterministic output. True generates probabilistic output.

Max Output: Specifies the max length of the generated output (ex. if this is set to 100, there will be at most 100 words in the output).


Session State Management ๐Ÿงฎ

Manages chat history within the session state of Streamlit, ensuring that previous messages are preserved across reruns of the app.

Ensure there is a list called "messages" in session state

if "messages" not in st.session_state:
   st.session_state.messages = []
Enter fullscreen mode Exit fullscreen mode

Display Chat History

role is either "user" or "assistant" and will retrieve the corresponding message.

for message in st.session_state.messages:
   with st.chat_message(message["role"]):
       st.markdown(message["content"])
Enter fullscreen mode Exit fullscreen mode

Prompt the User and Risplay Responses

If the user enters text, display their text

prompt = st.chat_input("Say something")
if prompt:

   with st.chat_message("user"):
      st.markdown(prompt)
Enter fullscreen mode Exit fullscreen mode

Generate and Display Responses

   with st.chat_message("assistant"):
      bot_response = st.write_stream(model.stream(prompt))
Enter fullscreen mode Exit fullscreen mode

Update Session State

Updates the session state by appending the new user input and the assistant's response to the chat history.

   st.session_state.messages.append({"role": "user", "content": prompt})
   st.session_state.messages.append({"role": "assistant", "content": bot_response})
Enter fullscreen mode Exit fullscreen mode

Main Block โš™๏ธ

The typical Python idiom to ensure that the script runs only when it is executed as the main program, not when imported as a module.

Initializes the application with a predefined list of model names and starts the chat application with the first model in the list.

This script is a straightforward example of integrating AI models into a web application for interactive purposes, showcasing the use of Streamlit for UI and llmware for backend model handling.

if __name__ == "__main__":

   chat_models = ["phi-3-gguf",
               "llama-2-7b-chat-gguf",
               "llama-3-instruct-bartowski-gguf",
               "openhermes-mistral-7b-gguf",
               "zephyr-7b-gguf",
               "tiny-llama-chat-gguf"]

   model_name = chat_models[0]
   simple_chat_ui_app(model_name)
Enter fullscreen mode Exit fullscreen mode

Model List: This list contains the identifiers of various chat models available through the llmware library. These models are pre-trained and configured for generating conversational responses.

Purpose: At runtime, one of these models is selected to power the chat interface. This allows for flexibility in choosing different models based on their capabilities or performance characteristics.


Fully Integrated Code ๐Ÿ“„

To run, go to command line ->
streamlit run "path/to/gguf_streaming_chatbot.py"

import streamlit as st
from llmware.models import ModelCatalog
from llmware.gguf_configs import GGUFConfigs

GGUFConfigs().set_config("max_output_tokens", 500)


def simple_chat_ui_app (model_name):

    st.title(f"Simple Chat with {model_name}")

    model = ModelCatalog().load_model(model_name, temperature=0.3, sample=True, max_output=450)

    if "messages" not in st.session_state:
        st.session_state.messages = []

    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.markdown(message["content"])

    prompt = st.chat_input("Say something")
    if prompt:

        with st.chat_message("user"):
            st.markdown(prompt)

        with st.chat_message("assistant"):

            bot_response = st.write_stream(model.stream(prompt))

        st.session_state.messages.append({"role": "user", "content": prompt})
        st.session_state.messages.append({"role": "assistant", "content": bot_response})

    return 0


if __name__ == "__main__":

    chat_models = ["phi-3-gguf",
                   "llama-2-7b-chat-gguf",
                   "llama-3-instruct-bartowski-gguf",
                   "openhermes-mistral-7b-gguf",
                   "zephyr-7b-gguf",
                   "tiny-llama-chat-gguf"]

    model_name = chat_models[0]

    simple_chat_ui_app(model_name)
Enter fullscreen mode Exit fullscreen mode

You may also find the fully integrated code on our Github repo here


Conclusion ๐Ÿ”’

As we have demonstrated, deploying a local chatbot can be achieved with just 20 lines of code. This simplicity, combined with the enhanced privacy and control, makes local LLMs an attractive option for users seeking powerful and secure chatbot solutions.

Thank you for exploring this topic with us. We hope you are now equipped with the knowledge to deploy your own local chatbot and harness its full potential.

Please check out our Github and leave a star! https://github.com/llmware-ai/llmware

Follow us on Discord here: https://discord.gg/MgRaZz2VAB

Top comments (1)

Collapse
 
noberst profile image
Namee

Great article!