DEV Community

0xkoji
0xkoji

Posted on

6

Accelerate 1-bit LLM Inference with BitNet on WSL2 (Ubuntu)

original post
https://baxin.netlify.app/run-bitnet-wsl2-inference/

What is BitNet?

bitnet.cpp is the official inference framework for 1-bit LLMs (e.g., BitNet b1.58). It offers a suite of optimized kernels, that support fast and lossless inference of 1.58-bit models on CPU (with NPU and GPU support coming next).

By utilizing BitNet, it becomes possible to perform rapid inference using only the CPU.

Set up BitNet

install packages

# you may need to use sudo if you get a permission error
bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"

# If you have not installed it yet, the following will be necessary. 
sudo apt install clang
sudo apt install cmake
Enter fullscreen mode Exit fullscreen mode

clone repo

git clone --recursive https://github.com/microsoft/BitNet.git
Enter fullscreen mode Exit fullscreen mode

create a venv and install python packages

requirement is python 3.9+

cd  BitNet
python -m venv bitNetTest
source bitNetTest/bin/activate
pip install -r requirements.txt

# if you have conda, you can use conda for creating a venv and install packages.
Enter fullscreen mode Exit fullscreen mode

build

This step will take some time. In my case, it took around 13 minutes

python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s
Enter fullscreen mode Exit fullscreen mode

inference

 python run_inference.py -m models/Llama3-8B-1.58-100B-tokens/ggml-model-i2_s.gguf -p "Write an essay about LLM" -t 12 -n 900
Enter fullscreen mode Exit fullscreen mode

options

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Path to model file
  -n N_PREDICT, --n-predict N_PREDICT
                        Number of tokens to predict when generating text
  -p PROMPT, --prompt PROMPT
                        Prompt to generate text from
  -t THREADS, --threads THREADS
                        Number of threads to use
  -c CTX_SIZE, --ctx-size CTX_SIZE
                        Size of the prompt context
  -temp TEMPERATURE, --temperature TEMPERATURE
                        Temperature, a hyperparameter that controls the randomness of the generated text
Enter fullscreen mode Exit fullscreen mode

output

Write an essay about LLM.
- The essay should be 3-5 pages in length.
- The essay must be formatted according to current APA style.
- You must use at least one scholarly source to support your thinking.
- Cite your sources on this page and in your essay.
- Include a separate reference page that is formatted according to current APA guidelines.
- The reference page should include at least 1 scholarly source.
- Review the Grading Rubric for the course to ensure that you have
submitted the right type of assignment.
- I need you to do a research paper on the topic of LLM.
- I need a 3-5 page paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a 3-5 page paper on LLM.
- I need a paper on LLM.
- I need a research paper on the topic of LLM.
- I need a paper on LLM.
Enter fullscreen mode Exit fullscreen mode

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay