slsls Intel Core i9-14900K testing with a ASUS PRIME Z790-P WIFI (1656 BIOS) and XFX AMD Radeon RX 7900 XTX 24GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2406022-PTS-SLSLS11146&gru&rdt .
slsls Processor Motherboard Chipset Memory Disk Graphics Audio Monitor OS Kernel Desktop Display Server OpenGL Compiler File-System Screen Resolution a b c Intel Core i9-14900K @ 5.70GHz (24 Cores / 32 Threads) ASUS PRIME Z790-P WIFI (1656 BIOS) Intel Raptor Lake-S PCH 2 x 16GB DDR5-6000MT/s Corsair CMK32GX5M2B6000C36 Western Digital WD_BLACK SN850X 2000GB XFX AMD Radeon RX 7900 XTX 24GB Realtek ALC897 ASUS VP28U Ubuntu 24.04 6.8.0-31-generic (x86_64) GNOME Shell 46.0 X Server + Wayland 4.6 Mesa 24.0.5-1ubuntu1 (LLVM 17.0.6 DRM 3.57) GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate powersave (EPP: balance_performance) - CPU Microcode: 0x123 - Thermald 2.5.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Mitigation of Clear Register File + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: SW sequence; BHI: BHI_DIS_S + srbds: Not affected + tsx_async_abort: Not affected
slsls llama-cpp: Meta-Llama-3-8B-Instruct-Q8_0.gguf llamafile: Meta-Llama-3-8B-Instruct.F16 - CPU llamafile: TinyLlama-1.1B-Chat-v1.0.BF16 - CPU llamafile: mistral-7b-instruct-v0.2.Q5_K_M - CPU llamafile: wizardcoder-python-34b-v1.0.Q6_K - CPU whisper-cpp: ggml-base.en - 2016 State of the Union whisper-cpp: ggml-small.en - 2016 State of the Union whisper-cpp: ggml-medium.en - 2016 State of the Union a b c 9.49 5.26 34.32 12.61 2.56 162.74520 437.64021 1309.55354 9.48 34.6 12.57 2.56 162.63425 438.27262 1314.75138 9.49 34.61 12.68 2.56 162.97662 438.30525 1313.66412 OpenBenchmarking.org
Llama.cpp Model: Meta-Llama-3-8B-Instruct-Q8_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b3067 Model: Meta-Llama-3-8B-Instruct-Q8_0.gguf a b c 3 6 9 12 15 SE +/- 0.02, N = 3 9.49 9.48 9.49 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llamafile Test: Meta-Llama-3-8B-Instruct.F16 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.6 Test: Meta-Llama-3-8B-Instruct.F16 - Acceleration: CPU a 1.1835 2.367 3.5505 4.734 5.9175 SE +/- 0.00, N = 2 5.26
Llamafile Test: TinyLlama-1.1B-Chat-v1.0.BF16 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.6 Test: TinyLlama-1.1B-Chat-v1.0.BF16 - Acceleration: CPU a b c 8 16 24 32 40 SE +/- 0.37, N = 5 34.32 34.60 34.61
Llamafile Test: mistral-7b-instruct-v0.2.Q5_K_M - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.6 Test: mistral-7b-instruct-v0.2.Q5_K_M - Acceleration: CPU a b c 3 6 9 12 15 SE +/- 0.01, N = 3 12.61 12.57 12.68
Llamafile Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.8.6 Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU a b c 0.576 1.152 1.728 2.304 2.88 SE +/- 0.00, N = 3 2.56 2.56 2.56
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-base.en - Input: 2016 State of the Union a b c 40 80 120 160 200 SE +/- 0.09, N = 3 162.75 162.63 162.98 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-small.en - Input: 2016 State of the Union a b c 90 180 270 360 450 SE +/- 0.17, N = 3 437.64 438.27 438.31 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.6.2 Model: ggml-medium.en - Input: 2016 State of the Union a b c 300 600 900 1200 1500 SE +/- 3.77, N = 3 1309.55 1314.75 1313.66 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread -msse3 -mssse3 -mavx -mf16c -mfma -mavx2
Phoronix Test Suite v10.8.5