DEV Community

Maxim Saplin
Maxim Saplin

Posted on

Llama 3 8B is better than Llama 2 70B

Llama 3 has just been rolled-out, exactly 9 month after the release of Llama 2. It is already available for chat at Meta web site, can be downloaded from Huggingface in safetensors or GGUF format.

While the previous generation has been trained on a dataset of 2 trillion tokens the new one utilised 15 trillion tokens.

What is fascinating is how the smaller 8B version outperformed the bigger previus-gen 70B model in every benchmark listed on the model card:

Benchmark Llama 3 8B Llama 2 7B Llama 2 13B Llama 3 70B Llama 2 70B
GPQA (0-shot) 34.2 21.7 22.3 39.5 21.0
HumanEval (0-shot) 62.2 7.9 14.0 81.7 25.6
GSM-8K (8-shot, CoT) 79.6 25.7 77.4 93.0 57.5
MATH (4-shot, CoT) 30.0 3.8 6.7 50.4 11.6

Llama 3 has also upped the context window size from 4k to 8k tokens.

Top comments (0)