DEV Community

Faiq Ahsan
Faiq Ahsan

Posted on • Updated on

Build your own AI ChatBot on your machine

By now everyone knows and love ChatGPT and GenAI has taken the world by storm but do you know now you can build and run your own custom AI chatbot on your machine.

really

YES! Let's take a look at the ingredients for this recipe.

Python

If you are someone who is looking to dig deep into the AI / ML then you need to learn Python which is the go to programming language in this space. If you already know it then you are all set here otherwise i would suggest going through a python crash course or whatever suits you best and also make sure that you have python3 installed on your system.

Ollama

Ollama is an awesome open source package which provides a really handy and easy way to run the large language models locally. We would be using this package to download and run the 8B version of Llama3.

Gradio

Gradio is the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it.

Ok so now lets start!!

Step1: Installing the Ollama

Download and install the Ollama package on your machine. Once installed run the below command to pull the Llama3 8B version.

ollama pull llama3
Enter fullscreen mode Exit fullscreen mode

By default it downloads the 8B version if you want to run other version like 70B then simply append it after the name e.g llama3:70b. Check out the complete list here.

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Step2: Creating custom model from Llama3

Open up a code editor and create a file name modelfile and paste the below content in it.

FROM llama3

## Set the Temperature

PARAMETER temperature 1

PARAMETER top_p 0.5

PARAMETER top_k 10

PARAMETER mirostat_tau 4.0

## Set the system prompt

SYSTEM """
You are a personal AI assistant named as Ultron created by Tony Stark. Answer and help around all the questions being asked.
"""

Enter fullscreen mode Exit fullscreen mode

Parameters

Parameters dictates how your model responds and learn.

temperature: The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)

top_p: Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)

top_k: Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)

mirostat_tau: Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)

Check out all the available parameters and their purpose here.

System prompt

Here you can play around and give any name and personality to your chatbot.

Now let's create the custom model from the modelfile by running the below command. Provide a name of your choice e.g ultron.

ollama create ultron -f ./Modelfile
ollama run ultron
Enter fullscreen mode Exit fullscreen mode

You would see ultron running and ready to accept input prompt. Ollama has also REST API for running and managing the model so when you run your model it's also available for use on the below endpoint

http://localhost:11434/api/generate

We will be using this api to integrate with our Gradio chatbot UI.

Step3: Create the UI for chatbot

Initialize a python virtual environment by running the below commands.

python3 -m venv env
source env/bin/activate
Enter fullscreen mode Exit fullscreen mode

Now install the required packages

pip install requests gradio
Enter fullscreen mode Exit fullscreen mode

Now create a python file app.py and paste the below code in it.

import requests
import json
import gradio as gr

model_api = "http://localhost:11434/api/generate"

headers = {"Content-Type": "application/json"}

history = []


def generate_response(prompt):
    history.append(prompt)
    final_prompt = "\n".join(history)  # append history
    data = {
        "model": "ultron",
        "prompt": final_prompt,
        "stream": False,
    }
    response = requests.post(model_api, headers=headers, data=json.dumps(data))
    if response.status_code == 200:  # successful
        response = response.text
        data = json.loads(response)
        actual_response = data["response"]
        return actual_response
    else:
        print("error:", response.text)


interface = gr.Interface(
    title="Ultron: Your personal assistant",
    fn=generate_response,
    inputs=gr.Textbox(lines=4, placeholder="How can i help you today?"),
    outputs="text",
)
interface.launch(share=True)

Enter fullscreen mode Exit fullscreen mode

Now let's launch the app, run your python file python3 app.py and your chatbot would be live on the below endpoint or similar. Please note that the response time may vary according to the your system's computing power.

http://127.0.0.1:7860/

amazing

There you have it! Your own chatbot running locally on your machine, you can even turn off the internet it would still work. Please share in the comments what other cools apps you are making with AI models.

Top comments (0)