This post is a combination of some new results, old results, and reddit.com/u/invectorgator's results (with permission) to help give a clear picture of all testing so far. Links to the relevant posts can be found below.
This was a lot of fun, and has lit a fire under me about benchmarking. I have some ideas for a personal benchmarking tool using Wilmer that will be easier for me to run. Will share more info once I dig into it.
As usual, a few notes about the tests:
- These tests were performed using u/chibop1's  MMLU-Pro project. Be sure to swing by and thank them for giving us this fun toy
- With the permission of u/invectorgator, this post will combine all of our results together.
- We both used the same commits of the MMLU-Pro project, we both used only q8 ggufs (unless otherwise specified) and both used Text-Generation-WebUI for our backends to guarantee correct prompt templating, so our test results are compatible
 
- I didn't do these tests expecting them to be super scientific and accurate assessments of an LLM's knowledge. I understand the concerns people have about them. But they do test a combination of knowledge AND instruction following. They aren't perfect, but it's better than just perplexity testing.
- Invectorgator is doing Gemma, so I'm not
- Qwen 2 7b just really does not like this test; at least running in text-gen.
New Models In This Test
This test will add the following new models to the pile. I went with some of my personal favorite fine-tunes. You can find the exact GGUFs that I used below, and you can see the above posts for the exact ggufs for the other models:
Old Posts Combined Into This One:
Key Takeaway
I am now convinced that Hermes 2 Theta Llama 3 8b is secretly a 30b in disguise. To say it is punching above its weight is an understatement.
All below tests are ggufs (q8 unless otherwise noted) running in Text-Generation-WebUI. The tests require > 4096 context, so some model versions were chosen to fit that need.
Line breaks are for loose grouping.
  
  
  Business
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 277/789 | 35.11 | 
| Open-Hermes-2.5-7b | 285/789 | 36.12 | 
| Mistral-7b-Inst-v0.3-q8 | 265/789 | 33.59 | 
| Llama-3-8b-q4_K_M | 148/789 | 18.76 | 
| Llama-3-8b-q8 | 160/789 | 20.28 | 
| Llama-3-8b-SPPO-Iter-3 | 247/789 | 31.31 | 
| Hermes-2-Theta-Llama-3-8b | 330/789 | 41.83 | 
| Yi-1.5-9b-32k-q8 | 240/789 | 30.42 | 
| Phi-Medium-128k-q8 | 260/789 | 32.95 | 
| Mixtral-8x7b-Instruct-Q8 | 310/789 | 39.29 | 
| Dolphin-Mixtral-2.5-8x7b | 350/789 | 44.36 | 
| Nous-Capybara-34b | 313/789 | 39.67 | 
| Yi-1.5-34B-32K-Q8 | 325/789 | 41.19 | 
| Command-R-v01-Q8 | 126/789 | 15.97 | 
| Llama-3-70b-FP16-Q2_KXXS | 254/789 | 32.19 | 
| Llama-3-70b-FP16-Q2_K | 309/789 | 39.16 | 
| Llama-3-70b-FP16-Q4_K_M | 427/789 | 54.12 | 
| Llama-3-70b-FP16-Q5_K_M | 415/789 | 52.60 | 
| Llama-3-70b-FP16-Q6_K | 408/789 | 51.71 | 
| Llama-3-70b-FP16-Q8_0 | 411/789 | 52.09 | 
  
  
  Law
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 282/1101 | 25.61 | 
| Open-Hermes-2.5-7b | 260/1101 | 23.61 | 
| Mistral-7b-Inst-v0.3-q8 | 248/1101 | 22.52 | 
| Yi-1.5-9b-32k-q8 | 191/1101 | 17.35 | 
| Phi-Medium-128k-q8 | 255/1101 | 23.16 | 
| Llama-3-8b-q4_K_M | 161/1101 | 14.62 | 
| Llama-3-8b-q8 | 172/1101 | 15.62 | 
| Llama-3-8b-SPPO-Iter-3 | 200/1101 | 18.17 | 
| Hermes-2-Theta-Llama-3-8b | 280/1101 | 25.43 | 
| Mixtral-8x7b-Instruct-Q8 | 282/1101 | 25.61 | 
| Dolphin-Mixtral-2.5-8x7b | 271/1101 | 24.61 | 
| Nous-Capybara-34b | 369/1101 | 33.51 | 
| Yi-1.5-34B-32K-Q8 | 417/1101 | 37.87 | 
| Command-R-v01-Q8 | 146/1101 | 13.26 | 
| Llama-3-70b-FP16-Q2_KXXS | 362/1101 | 32.88 | 
| Llama-3-70b-FP16-Q2_K | 416/1101 | 37.78 | 
| Llama-3-70b-FP16-Q4_K_M | 471/1101 | 42.78 | 
| Llama-3-70b-FP16-Q5_K_M | 469/1101 | 42.60 | 
| Llama-3-70b-FP16-Q6_K | 469/1101 | 42.60 | 
| Llama-3-70b-FP16-Q8_0 | 464/1101 | 42.14 | 
  
  
  Psychology
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 430/798 | 53.88 | 
| Open-Hermes-2.5-7b | 434/798 | 54.39 | 
| Mistral-7b-Inst-v0.3-q8 | 343/798 | 42.98 | 
| Llama-3-8b-q4_K_M | 328/798 | 41.10 | 
| Llama-3-8b-q8 | 372/798 | 46.62 | 
| Llama-3-8b-SPPO-Iter-3 | 252/798 | 31.58 | 
| Hermes-2-Theta-Llama-3-8b | 452/798 | 56.64 | 
| Yi-1.5-9b-32k-q8 | 173/798 | 21.68 | 
| Phi-Medium-128k-q8 | 358/798 | 44.86 | 
| Mixtral-8x7b-Instruct-Q8 | 365/798 | 45.74 | 
| Dolphin-Mixtral-2.5-8x7b | 468/798 | 58.65 | 
| Nous-Capybara-34b | 474/798 | 59.40 | 
| Yi-1.5-34B-32K-Q8 | 510/798 | 63.91 | 
| Command-R-v01-Q8 | 131/798 | 16.42 | 
| Llama-3-70b-FP16-Q2_KXXS | 493/798 | 61.78 | 
| Llama-3-70b-FP16-Q2_K | 565/798 | 70.80 | 
| Llama-3-70b-FP16-Q4_K_M | 597/798 | 74.81 | 
| Llama-3-70b-FP16-Q5_K_M | 611/798 | 76.57 | 
| Llama-3-70b-FP16-Q6_K | 605/798 | 75.81 | 
| Llama-3-70b-FP16-Q8_0 | 605/798 | 75.81 | 
  
  
  Biology
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 427/717 | 59.55 | 
| Open-Hermes-2.5-7b | 417/717 | 58.16 | 
| Mistral-7b-Inst-v0.3-q8 | 390/717 | 54.39 | 
| Llama-3-8b-q4_K_M | 412/717 | 57.46 | 
| Llama-3-8b-q8 | 424/717 | 59.14 | 
| Llama-3-8b-SPPO-Iter-3 | 316/717 | 44.07 | 
| Hermes-2-Theta-Llama-3-8b | 453/717 | 63.18 | 
| Yi-1.5-9b-32k-q8 | 288/717 | 40.17 | 
| Phi-Medium-128k-q8 | 262/717 | 36.54 | 
| Mixtral-8x7b-Instruct-Q8 | 334/717 | 46.58 | 
| Dolphin-Mixtral-2.5-8x7b | 434/717 | 60.53 | 
| Nous-Capybara-34b | 473/717 | 65.97 | 
| Yi-1.5-34B-32K-Q8 | 521/717 | 72.66 | 
| Command-R-v01-Q8 | 138/717 | 19.25 | 
| Llama-3-70b-FP16-Q2_KXXS | 510/717 | 71.13 | 
| Llama-3-70b-FP16-Q2_K | 556/717 | 77.55 | 
| Llama-3-70b-FP16-Q4_K_M | 581/717 | 81.03 | 
| Llama-3-70b-FP16-Q5_K_M | 579/717 | 80.75 | 
| Llama-3-70b-FP16-Q6_K | 574/717 | 80.06 | 
| Llama-3-70b-FP16-Q8_0 | 581/717 | 81.03 | 
  
  
  Chemistry
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 246/1132 | 21.73 | 
| Open-Hermes-2.5-7b | 298/1132 | 26.33 | 
| Mistral-7b-Inst-v0.3-q8 | 265/1132 | 23.41 | 
| Llama-3-8b-q4_K_M | 163/1132 | 14.40 | 
| Llama-3-8b-q8 | 175/1132 | 15.46 | 
| Llama-3-8b-SPPO-Iter-3 | 236/1132 | 20.85 | 
| Hermes-2-Theta-Llama-3-8b | 330/1132 | 29.15 | 
| Yi-1.5-9b-32k-q8 | 270/1132 | 23.85 | 
| Phi-Medium-128k-q8 | 207/1132 | 18.29 | 
| Mixtral-8x7b-Instruct-Q8 | 338/1132 | 29.86 | 
| Dolphin-Mixtral-2.5-8x7b | 369/1132 | 32.60 | 
| Nous-Capybara-34b | 368/1132 | 32.51 | 
| Yi-1.5-34B-32K-Q8 | 350/1132 | 30.92 | 
| Command-R-v01-Q8 | 129/1132 | 11.40 | 
| Llama-3-70b-FP16-Q2_KXXS | 331/1132 | 29.24 | 
| Llama-3-70b-FP16-Q2_K | 378/1132 | 33.39 | 
| Llama-3-70b-FP16-Q4_K_M | 475/1132 | 41.96 | 
| Llama-3-70b-FP16-Q5_K_M | 493/1132 | 43.55 | 
| Llama-3-70b-FP16-Q6_K | 461/1132 | 40.72 | 
| Llama-3-70b-FP16-Q8_0 | 502/1132 | 44.35 | 
  
  
  History
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 143/381 | 37.53 | 
| Open-Hermes-2.5-7b | 148/381 | 38.85 | 
| Mistral-7b-Inst-v0.3-q8 | 120/381 | 31.50 | 
| Llama-3-8b-q4_K_M | 82/381 | 21.52 | 
| Llama-3-8b-q8 | 94/381 | 24.67 | 
| Llama-3-8b-SPPO-Iter-3 | 70/381 | 18.37 | 
| Hermes-2-Theta-Llama-3-8b | 155/381 | 40.68 | 
| Yi-1.5-9b-32k-q8 | 69/381 | 18.11 | 
| Phi-Medium-128k-q8 | 119/381 | 31.23 | 
| Mixtral-8x7b-Instruct-Q8 | 116/381 | 30.45 | 
| Dolphin-Mixtral-2.5-8x7b | 155/381 | 40.68 | 
| Nous-Capybara-34b | 105/381 | 27.56 | 
| Yi-1.5-34B-32K-Q8 | 174/381 | 45.67 | 
| Command-R-v01-Q8 | 40/381 | 10.50 | 
| Llama-3-70b-FP16-Q2_KXXS | 174/381 | 45.67 | 
| Llama-3-70b-FP16-Q2_K | 213/381 | 55.91 | 
| Llama-3-70b-FP16-Q4_K_M | 232/381 | 60.89 | 
| Llama-3-70b-FP16-Q5_K_M | 231/381 | 60.63 | 
| Llama-3-70b-FP16-Q6_K | 231/381 | 60.63 | 
| Llama-3-70b-FP16-Q8_0 | 231/381 | 60.63 | 
  
  
  Other
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 375/924 | 40.58 | 
| Open-Hermes-2.5-7b | 392/924 | 42.42 | 
| Mistral-7b-Inst-v0.3-q8 | 327/924 | 35.39 | 
| Llama-3-8b-q4_K_M | 269/924 | 29.11 | 
| Llama-3-8b-q8 | 292/924 | 31.60 | 
| Llama-3-8b-SPPO-Iter-3 | 270/924 | 29.22 | 
| Hermes-2-Theta-Llama-3-8b | 429/924 | 46.43 | 
| Yi-1.5-9b-32k-q8 | 227/924 | 24.57 | 
| Phi-Medium-128k-q8 | 388/924 | 41.99 | 
| Mixtral-8x7b-Instruct-Q8 | 355/924 | 38.42 | 
| Dolphin-Mixtral-2.5-8x7b | 448/924 | 48.48 | 
| Nous-Capybara-34b | 451/924 | 48.81 | 
| Yi-1.5-34B-32K-Q8 | 481/924 | 52.06 | 
| Command-R-v01-Q8 | 131/924 | 14.18 | 
| Llama-3-70b-FP16-Q2_KXXS | 395/924 | 42.75 | 
| Llama-3-70b-FP16-Q2_K | 472/924 | 51.08 | 
| Llama-3-70b-FP16-Q4_K_M | 529/924 | 57.25 | 
| Llama-3-70b-FP16-Q5_K_M | 552/924 | 59.74 | 
| Llama-3-70b-FP16-Q6_K | 546/924 | 59.09 | 
| Llama-3-70b-FP16-Q8_0 | 556/924 | 60.17 | 
  
  
  Health
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 376/818 | 45.97 | 
| Open-Hermes-2.5-7b | 356/818 | 43.52 | 
| Mistral-7b-Inst-v0.3-q8 | 294/818 | 35.94 | 
| Llama-3-8b-q4_K_M | 216/818 | 26.41 | 
| Llama-3-8b-q8 | 263/818 | 32.15 | 
| Llama-3-8b-SPPO-Iter-3 | 229/818 | 28.00 | 
| Hermes-2-Theta-Llama-3-8b | 388/818 | 47.43 | 
| Yi-1.5-9b-32k-q8 | 227/818 | 27.75 | 
| Phi-Medium-128k-q8 | 349/818 | 42.67 | 
| Mixtral-8x7b-Instruct-Q8 | 325/818 | 39.73 | 
| Dolphin-Mixtral-2.5-8x7b | 367/818 | 44.87 | 
| Nous-Capybara-34b | 348/818 | 42.54 | 
| Yi-1.5-34B-32K-Q8 | 468/818 | 57.21 | 
| Command-R-v01-Q8 | 110/818 | 13.45 | 
| Llama-3-70b-FP16-Q2_KXXS | 406/818 | 49.63 | 
| Llama-3-70b-FP16-Q2_K | 502/818 | 61.37 | 
| Llama-3-70b-FP16-Q4_K_M | 542/818 | 66.26 | 
| Llama-3-70b-FP16-Q5_K_M | 551/818 | 67.36 | 
| Llama-3-70b-FP16-Q6_K | 546/818 | 66.75 | 
| Llama-3-70b-FP16-Q8_0 | 544/818 | 66.50 | 
  
  
  Economics
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 391/844 | 46.33 | 
| Open-Hermes-2.5-7b | 407/844 | 48.22 | 
| Mistral-7b-Inst-v0.3-q8 | 343/844 | 40.64 | 
| Llama-3-8b-q4_K_M | 307/844 | 36.37 | 
| Llama-3-8b-q8 | 309/844 | 36.61 | 
| Llama-3-8b-SPPO-Iter-3 | 249/844 | 29.50 | 
| Hermes-2-Theta-Llama-3-8b | 448/844 | 53.08 | 
| Yi-1.5-9b-32k-q8 | 290/844 | 34.36 | 
| Phi-Medium-128k-q8 | 369/844 | 43.72 | 
| Mixtral-8x7b-Instruct-Q8 | 415/844 | 49.17 | 
| Dolphin-Mixtral-2.5-8x7b | 462/844 | 54.74 | 
| Nous-Capybara-34b | 451/844 | 53.44 | 
| Yi-1.5-34B-32K-Q8 | 519/844 | 61.49 | 
| Command-R-v01-Q8 | 198/844 | 23.46 | 
| Llama-3-70b-FP16-Q2_KXXS | 494/844 | 58.53 | 
| Llama-3-70b-FP16-Q2_K | 565/844 | 66.94 | 
| Llama-3-70b-FP16-Q4_K_M | 606/844 | 71.80 | 
| Llama-3-70b-FP16-Q5_K_M | 623/844 | 73.82 | 
| Llama-3-70b-FP16-Q6_K | 614/844 | 72.75 | 
| Llama-3-70b-FP16-Q8_0 | 625/844 | 74.05 | 
  
  
  Math
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 379/1351 | 28.05 | 
| Open-Hermes-2.5-7b | 423/1351 | 31.31 | 
| Mistral-7b-Inst-v0.3-q8 | 399/1351 | 29.53 | 
| Llama-3-8b-q4_K_M | 202/1351 | 14.95 | 
| Llama-3-8b-q8 | 167/1351 | 12.36 | 
| Llama-3-8b-SPPO-Iter-3 | 392/1351 | 29.02 | 
| Hermes-2-Theta-Llama-3-8b | 509/1351 | 37.68 | 
| Yi-1.5-9b-32k-q8 | 370/1351 | 27.39 | 
| Phi-Medium-128k-q8 | 299/1351 | 22.13 | 
| Mixtral-8x7b-Instruct-Q8 | 475/1351 | 35.16 | 
| Dolphin-Mixtral-2.5-8x7b | 487/1351 | 36.04 | 
| Nous-Capybara-34b | 347/1351 | 25.68 | 
| Yi-1.5-34B-32K-Q8 | 467/1351 | 34.57 | 
| Command-R-v01-Q8 | 166/1351 | 12.29 | 
| Llama-3-70b-FP16-Q2_KXXS | 336/1351 | 24.87 | 
| Llama-3-70b-FP16-Q2_K | 436/1351 | 32.27 | 
| Llama-3-70b-FP16-Q4_K_M | 529/1351 | 39.16 | 
| Llama-3-70b-FP16-Q5_K_M | 543/1351 | 40.19 | 
| Llama-3-70b-FP16-Q6_K | 547/1351 | 40.49 | 
| Llama-3-70b-FP16-Q8_0 | 532/1351 | 39.38 | 
  
  
  Physics
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 344/1299 | 26.48 | 
| Open-Hermes-2.5-7b | 351/1299 | 27.02 | 
| Mistral-7b-Inst-v0.3-q8 | 338/1299 | 26.02 | 
| Llama-3-8b-q4_K_M | 168/1299 | 12.93 | 
| Llama-3-8b-q8 | 178/1299 | 13.70 | 
| Llama-3-8b-SPPO-Iter-3 | 312/1299 | 24.02 | 
| Hermes-2-Theta-Llama-3-8b | 417/1299 | 32.10 | 
| Yi-1.5-9b-32k-q8 | 321/1299 | 24.71 | 
| Phi-Medium-128k-q8 | 312/1299 | 24.02 | 
| Mixtral-8x7b-Instruct-Q8 | 442/1299 | 34.03 | 
| Dolphin-Mixtral-2.5-8x7b | 410/1299 | 31.56 | 
| Nous-Capybara-34b | 404/1299 | 31.10 | 
| Yi-1.5-34B-32K-Q8 | 483/1299 | 37.18 | 
| Command-R-v01-Q8 | 166/1299 | 12.78 | 
| Llama-3-70b-FP16-Q2_KXXS | 382/1299 | 29.41 | 
| Llama-3-70b-FP16-Q2_K | 478/1299 | 36.80 | 
| Llama-3-70b-FP16-Q4_K_M | 541/1299 | 41.65 | 
| Llama-3-70b-FP16-Q5_K_M | 565/1299 | 43.49 | 
| Llama-3-70b-FP16-Q6_K | 550/1299 | 42.34 | 
| Llama-3-70b-FP16-Q8_0 | 544/1299 | 41.88 | 
  
  
  Computer Science
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 137/410 | 33.41 | 
| Open-Hermes-2.5-7b | 166/410 | 40.49 | 
| Mistral-7b-Inst-v0.3-q8 | 120/410 | 29.27 | 
| Llama-3-8b-q4_K_M | 105/410 | 25.61 | 
| Llama-3-8b-q8 | 125/410 | 30.49 | 
| Llama-3-8b-SPPO-Iter-3 | 130/410 | 31.71 | 
| Hermes-2-Theta-Llama-3-8b | 169/410 | 41.22 | 
| Yi-1.5-9b-32k-q8 | 96/410 | 23.41 | 
| Phi-Medium-128k-q8 | 131/410 | 31.95 | 
| Mixtral-8x7b-Instruct-Q8 | 150/410 | 36.59 | 
| Dolphin-Mixtral-2.5-8x7b | 177/410 | 43.17 | 
| Nous-Capybara-34b | 134/410 | 32.68 | 
| Yi-1.5-34B-32K-Q8 | 191/410 | 46.59 | 
| Command-R-v01-Q8 | 61/410 | 14.88 | 
| Llama-3-70b-FP16-Q2_KXXS | 186/410 | 45.37 | 
| Llama-3-70b-FP16-Q2_K | 199/410 | 48.54 | 
| Llama-3-70b-FP16-Q4_K_M | 239/410 | 58.29 | 
| Llama-3-70b-FP16-Q5_K_M | 241/410 | 58.78 | 
| Llama-3-70b-FP16-Q6_K | 240/410 | 58.54 | 
| Llama-3-70b-FP16-Q8_0 | 238/410 | 58.05 | 
  
  
  Philosophy
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 170/499 | 34.07 | 
| Open-Hermes-2.5-7b | 200/499 | 40.08 | 
| Mistral-7b-Inst-v0.3-q8 | 175/499 | 35.07 | 
| Llama-3-8b-q4_K_M | 152/499 | 30.46 | 
| Llama-3-8b-q8 | 161/499 | 32.26 | 
| Llama-3-8b-SPPO-Iter-3 | 142/499 | 28.46 | 
| Hermes-2-Theta-Llama-3-8b | 194/499 | 38.88 | 
| Yi-1.5-9b-32k-q8 | 114/499 | 22.85 | 
| Phi-Medium-128k-q8 | 187/499 | 37.47 | 
| Mixtral-8x7b-Instruct-Q8 | 194/499 | 38.88 | 
| Dolphin-Mixtral-2.5-8x7b | 212/499 | 42.48 | 
| Nous-Capybara-34b | 197/499 | 39.48 | 
| Yi-1.5-34B-32K-Q8 | 257/499 | 51.50 | 
| Command-R-v01-Q8 | 160/499 | 32.06 | 
| Llama-3-70b-FP16-Q2_KXXS | 200/499 | 40.08 | 
| Llama-3-70b-FP16-Q2_K | 258/499 | 51.70 | 
| Llama-3-70b-FP16-Q4_K_M | 282/499 | 56.51 | 
| Llama-3-70b-FP16-Q5_K_M | 281/499 | 56.31 | 
| Llama-3-70b-FP16-Q6_K | 283/499 | 56.71 | 
| Llama-3-70b-FP16-Q8_0 | 278/499 | 55.71 | 
  
  
  Engineering
| Model | Correct | Score (%) | 
| WizardLM-2-7b | 196/969 | 20.23 | 
| Open-Hermes-2.5-7b | 193/969 | 19.92 | 
| Mistral-7b-Inst-v0.3-q8 | 198/969 | 20.43 | 
| Llama-3-8b-q4_K_M | 149/969 | 15.38 | 
| Llama-3-8b-q8 | 166/969 | 17.13 | 
| Llama-3-8b-SPPO-Iter-3 | 165/969 | 17.03 | 
| Hermes-2-Theta-Llama-3-8b | 245/969 | 25.28 | 
| Yi-1.5-9b-32k-q8 | 190/969 | 19.61 | 
| Phi-Medium-128k-q8 | 183/969 | 18.89 | 
| Mixtral-8x7b-Instruct-Q8 | 234/969 | 24.15 | 
| Dolphin-Mixtral-2.5-8x7b | 236/969 | 24.35 | 
| Nous-Capybara-34b | 393/969 | 40.56 | 
| Yi-1.5-34B-32K-Q8 | 408/969 | 42.11 | 
| Command-R-v01-Q8 | 145/969 | 14.96 | 
| Llama-3-70b-FP16-Q2_KXXS | 326/969 | 33.64 | 
| Llama-3-70b-FP16-Q2_K | 375/969 | 38.70 | 
| Llama-3-70b-FP16-Q4_K_M | 394/969 | 40.66 | 
| Llama-3-70b-FP16-Q5_K_M | 417/969 | 43.03 | 
| Llama-3-70b-FP16-Q6_K | 406/969 | 41.90 | 
| Llama-3-70b-FP16-Q8_0 | 398/969 | 41.07 | 
  
  
  Totals
| Model | Total Correct | Total Score (%) | 
| WizardLM-2-7b | 4173/12032 | 34.68 | 
| Open-Hermes-2.5-7b | 4330/12032 | 35.99 | 
| Mistral-7b-Inst-v0.3-q8 | 3825/12032 | 31.79 | 
| Llama-3-8b-q4_K_M | 2862/12032 | 23.79 | 
| Llama-3-8b-q8 | 3058/12032 | 25.42 | 
| Llama-3-8b-SPPO-Iter-3 | 3210/12032 | 26.68 | 
| Hermes-2-Theta-Llama-3-8b | 4799/12032 | 39.89 | 
| Yi-1.5-9b-32k-q8 | 3066/12032 | 25.48 | 
| Phi-Medium-128k-q8 | 3679/12032 | 30.58 | 
| Mixtral-8x7b-Instruct-Q8 | 4335/12032 | 36.03 | 
| Dolphin-Mixtral-2.5-8x7b | 4846/12032 | 40.27 | 
| Nous-Capybara-34b | 4827/12032 | 40.12 | 
| Yi-1.5-34B-32K-Q8 | 5571/12032 | 46.30 | 
| Command-R-v01-Q8 | 1847/12032 | 15.35 | 
| Llama-3-70b-FP16-Q2_KXXS | 4849/12032 | 40.30 | 
| Llama-3-70b-FP16-Q2_K | 5722/12032 | 47.56 | 
| Llama-3-70b-FP16-Q4_K_M | 6445/12032 | 53.57 | 
| Llama-3-70b-FP16-Q5_K_M | 6571/12032 | 54.61 | 
| Llama-3-70b-FP16-Q6_K | 6480/12032 | 53.86 | 
| Llama-3-70b-FP16-Q8_0 | 6509/12032 | 54.10 | 
 
              
Top comments (0)