Llama.cpp

Llama.cpp is a port of Facebook's LLaMA model in C/C++ developed by Georgi Gerganov. Llama.cpp allows the inference of LLaMA and other supported models in C/C++. For CPU inference Llama.cpp supports AVX2/AVX-512, ARM NEON, and other modern ISAs along with features like OpenBLAS usage.


Llama.cpp b4397

Backend: CPU BLAS - Model: Mistral-7B-Instruct-v0.3-Q8_0 - Test: Prompt Processing 2048

OpenBenchmarking.org metrics for this test profile configuration based on 113 public results since 29 December 2024 with the latest data as of 7 February 2025.

Below is an overview of the generalized performance for components where there is sufficient statistically significant data based upon user-uploaded results. It is important to keep in mind particularly in the Linux/open-source space there can be vastly different OS configurations, with this overview intended to offer just general guidance as to the performance expectations.

Component
Details
Percentile Rank
# Compatible Public Results
Tokens Per Second (Average)
Zen 5 [64 Cores / 128 Threads]
93rd
6
131 +/- 16
Zen 5 [96 Cores / 192 Threads]
84th
8
108 +/- 6
Mid-Tier
75th
< 96
Zen 5 [256 Cores / 512 Threads]
69th
18
93 +/- 4
Zen 4 [192 Cores / 384 Threads]
58th
4
78 +/- 1
Zen 4 [192 Cores / 384 Threads]
53rd
6
76 +/- 1
Median
50th
74
Zen 4 [64 Cores / 128 Threads]
47th
4
70
Arrow Lake [24 Cores / 24 Threads]
32nd
3
47
Zen 4 [8 Cores / 16 Threads]
27th
3
38
Low-Tier
25th
< 35
Zen 5 [10 Cores / 20 Threads]
23rd
4
33 +/- 1
Zen 5 [12 Cores / 24 Threads]
19th
7
31 +/- 2
Zen 4 [4 Cores / 8 Threads]
17th
4
31
Lunar Lake [8 Cores / 8 Threads]
9th
4
27
Meteor Lake [16 Cores / 22 Threads]
5th
3
20
Alder Lake [14 Cores / 20 Threads]
3rd
3
15