DEV Community

Cover image for How to Run Llama-3.1🦙 Locally Using Python🐍 and Hugging Face 🤗
Debapriya Das
Debapriya Das

Posted on • Updated on

How to Run Llama-3.1🦙 Locally Using Python🐍 and Hugging Face 🤗

Introduction

The latest Llama🦙 (Large Language Model Meta AI) 3.1 is a powerful AI model developed by Meta AI that has gained significant attention in the natural language processing (NLP) community. It is the most capable open-source llm till date. In this blog, I will guide you through the process of cloning the Llama 3.1 model from Hugging Face🤗 and running it on your local machine using Python. After which you can integrate it in any AI project.


Prerequisites

  • Python 3.8 or higher installed on your local machine
  • Hugging Face Transformers library installed (pip install transformers)
  • Git installed on your local machine
  • A Hugging Face account

Step 1: Get access to the model

Meta-llama-3.1-8b-Instruct hugging face model

  • At the beginning you should be seeing this:

Meta-llama-3.1-8b-Instruct model

  • Submit the form below to get access of the model

access to meta llama 3.1 model

  • Once you see "You have been granted access to this model", you are good to go...

gated model in hugging face

Step 2: Create an ACCESS_TOKEN

  • Go to "Settings" (Bottom right corner of the below image):

hugging face settings

  • Go to "Access Tokens" click "Create new token"(upper right corner of the image):

create hugging face token

  • Give read and write permissions and select the repo as shown:

create hugging face token

  • Copy the token and place it somewhere safe and secure as it will be needed in the future.(note: once you copy it you cannot copy it again, so if you anyhow forget the key, you have to create a new one to begin with :))

huggingface token


Step 3: Clone the LLaMA 3.1 Model

Now run the following command on your favorite terminal.
The ACCESS_TOKEN is the one you copied and the <huggingface-user-name> is the username of your hugging face account.

git clone https://<huggingface-user-name>:<ACCESS_TOKEN>@huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
Enter fullscreen mode Exit fullscreen mode

This can take a lot of time depending on your internet speed.

Step 4: Install Required Libraries

Once the cloning is done, go to the cloned folder and install all the dependencies from the requirements.txt. (you can create an virtual-environment using conda(recommended) or virtualenv)
You can find out the requirements file in my GitHub provided in the resources section below.

Using conda:

cd Meta-Llama-3.1-8B-Instruct
conda install --yes --file requirements.txt
Enter fullscreen mode Exit fullscreen mode

Using pip:

cd Meta-Llama-3.1-8B-Instruct
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Step 5: Run the Llama 3.1 Model

Create a new Python file (e.g., test.py) and paste the location of the model repository you just cloned as the model_id (such as, "D:\\Codes\\NLP\\Meta-Llama-3.1-8B-Instruct"). Here is an example:

import transformers
import torch

## Here you paste your cloned repos location
model_id = "D:\\Codes\\NLP\\Meta-Llama-3.1-8B-Instruct" 

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Enter fullscreen mode Exit fullscreen mode

You can set device_map=cuda if you want use the gpu also.

Step 6: Run the Python Script

python test.py
Enter fullscreen mode Exit fullscreen mode

Output

llama-3.1 output


Issues you can face

  • OSError: [WinError 126] fbgemm.dll
    • To solve this error make sure you have Visual Studio installed.
      • In case you don't have it, click here and install it.
      • Then restart the computer.
  • If there is still any errors with pytorch versions, use anaconda or miniconda to configure a new environment with suitable python version and dependencies.
  • If you are facing any other issue or error feel free to comment below.

Resources

For more details on llama 3.1 check out: https://ai.meta.com/blog/meta-llama-3-1/

My implementation https://github.com/Debapriya-source/llama-3.1-8B-Instruct.git


Conclusion

In this blog, we have successfully cloned the LLaMA-3.1-8B-Instruct model from Hugging Face and run it on our local machine using Python. You can now experiment with the model by modifying the prompt, adjusting hyperparameters, or integrate with your upcoming projects. Happy coding!

Top comments (9)

Collapse
 
meir_meir_ba97d0e4663bddc profile image
Meir Meir

Great Article mr Das. Can you please explain how to import \ load my data into the model so i can query the model based by my data ?

Collapse
 
debapriyadas profile image
Debapriya Das

If I have understood your question properly, you are asking how to fine-tune the model using your custom dataset. In that case you can explore the hugging face autotrain.
Hopefully I will also write an article on this topic. Best of luck.

Collapse
 
martinbaun profile image
Martin Baun

Love this! Still can't believe it's happening for real 🤗

Collapse
 
shricodev profile image
Shrijal Acharya

This is really great, buddy. 👏 I am planning to work on a project on LLaMA using Golang and hopefully write an article. Let's see how it goes.

Collapse
 
debapriyadas profile image
Debapriya Das

Best of luck!

Collapse
 
meir_meir_ba97d0e4663bddc profile image
Meir Meir • Edited

Mr Das. I Just cloned the repo and there is no requirements file...
Can you confirm?

Collapse
 
debapriyadas profile image
Debapriya Das

It should be in the cloned folder
You can use "cd Meta-Llama-3.1-8B-Instruct" to get into the folder.

Collapse
 
meir_meir_ba97d0e4663bddc profile image
Meir Meir

I did that of course and the file is not there...

Thread Thread
 
debapriyadas profile image
Debapriya Das

Hey I checked it. For some reason it is not there.
No worries you can check out my repository here is the "requirements.txt" that I have used: github.com/Debapriya-source/llama-...