DEV Community

Cover image for First steps with llama
Stefan Alfbo
Stefan Alfbo

Posted on

First steps with llama

Time to try out some coding with generative AI and LLM (Large Language Model).

I had one requirement for this and that was to run it locally on my machine and not use a web API, like the one provided by OpenAI.

To do this is I managed to stumble over this project llama.cpp by Georgi Gerganov, which also has a couple of bindings to other programming languages.

I went with Python, llama-cpp-python, since my goal is just to get a small project up and running locally.

Set up the project.

# Create a folder for the project
mkdir test-llama-cpp-python && cd $_

# Create a virtual environment
pyenv virtualenv 3.9.0 llm
pyenv activate llm

# Install the llama-cpp-python package
python -m pip install llama-cpp-python
python -m pip freeze > requirements.txt

# Create an empty main.py
touch main.py

# Open up the project in VS Code
code .
Enter fullscreen mode Exit fullscreen mode

In the main file, add a simple skeleton prompt loop.

import os

def get_reply(prompt):
    """Local inference with llama-cpp-python"""
    return ""

def clear():
    """Clears the terminal screen."""
    os.system('cls' if os.name == 'nt' else 'clear')

def main():
    """The prompt loop."""
    clear()

    while True:
        cli_prompt = input("You: ")

        if cli_prompt == "exit":
            break
        else:
            answer = get_reply(cli_prompt)

            print(f"""Llama: {answer}""")


if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

From the examples on GitHub we can see that we need to import the class Llama into our main file and we will also need a model.

There is a popular AI community, Hugging Face, where we can find models to use. There is one requirement about the model file, and that is that it has to be in the GGML file format. There is a converter available in the llama.cpp GitHub project to do that.

However I searched for a model that already was in that format and took the first one I found, TheBloke/Llama-2-7B-Chat-GGML. I downloaded the following model at the end, llama-2-7b-chat.ggmlv3.q4_1.bin.

Picking correct/best model is for another topic and is out of scope for this post.

When the model is downloaded to the project folder we can update our main.py file to start using the Llama class and the model.

from llama_cpp import Llama

llama = Llama(model_path='llama-2-7b-chat.ggmlv3.q4_1.bin', verbose=False)

def get_reply(prompt):
    """Local inference with llama-cpp-python"""
    response = llama(
        f"""Q: {prompt} A:""", max_tokens=64, stop=["Q:", "\n"], echo=False
    )

    return response["choices"].pop()["text"].strip()
Enter fullscreen mode Exit fullscreen mode

We first import Llama class and initialize an Llama object. The constructor needs the path to our model file which is given with model_path. Iā€™m also setting the verbose flag to false to suppress noise messages from the llama-cpp package.

The get_reply method is doing all local inference with the llama-cpp-python package. The prompt to generate text from needs to be formatted in a specific way and therefore the Q and A is added.

Here is the final result of the code.

import os

from llama_cpp import Llama

llama = Llama(model_path="llama-2-7b-chat.ggmlv3.q4_1.bin", verbose=False)


def get_reply(prompt):
    """Local inference with llama-cpp-python"""
    response = llama(
        f"""Q: {prompt} A:""", max_tokens=64, stop=["Q:", "\n"], echo=False
    )

    return response["choices"].pop()["text"].strip()


def clear():
    """Clears the terminal screen."""
    os.system("cls" if os.name == "nt" else "clear")


def main():
    """The prompt loop."""
    clear()

    while True:
        cli_prompt = input("You: ")

        if cli_prompt == "exit":
            break
        else:
            answer = get_reply(cli_prompt)

            print(f"""Llama: {answer}""")


if __name__ == "__main__":
    main()

Enter fullscreen mode Exit fullscreen mode

Test run it by executing following in your cli.

python main.py
Enter fullscreen mode Exit fullscreen mode

And ask a question, rember that exit will close the prompt.

You: What are the names of the planets in the solar system?
Llama: The planets in our solar system, in order from closest to farthest from the Sun, are: Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune
You: exit
Enter fullscreen mode Exit fullscreen mode

Until next time!

Top comments (0)