xeon febby 2 x INTEL XEON PLATINUM 8592+ testing with a Quanta Cloud QuantaGrid D54Q-2U S6Q-MB-MPS (3B05.TEL4P1 BIOS) and ASPEED on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402196-NE-XEONFEBBY11 .
xeon febby Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution a b 2 x INTEL XEON PLATINUM 8592+ @ 3.90GHz (128 Cores / 256 Threads) Quanta Cloud QuantaGrid D54Q-2U S6Q-MB-MPS (3B05.TEL4P1 BIOS) Intel Device 1bce 1008GB 3201GB Micron_7450_MTFDKCB3T2TFS ASPEED 2 x Intel X710 for 10GBASE-T Ubuntu 23.10 6.6.0-060600-generic (x86_64) GCC 13.2.0 ext4 1024x768 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x21000161 Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS IBPB: conditional RSB filling PBRSB-eIBRS: SW sequence + srbds: Not affected + tsx_async_abort: Not affected
xeon febby quicksilver: CTS2 quicksilver: CORAL2 P1 quicksilver: CORAL2 P2 namd: ATPase with 327,506 Atoms namd: STMV with 1,066,628 Atoms dav1d: Chimera 1080p dav1d: Summer Nature 4K dav1d: Summer Nature 1080p dav1d: Chimera 1080p 10-bit oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only oidn: RTLightmap.hdr.4096x4096 - CPU-Only y-cruncher: 1B y-cruncher: 500M gromacs: MPI CPU - water_GMX50_bare pytorch: CPU - 1 - ResNet-50 pytorch: CPU - 1 - ResNet-152 pytorch: CPU - 1 - Efficientnet_v2_l tensorflow: CPU - 1 - VGG-16 tensorflow: CPU - 1 - AlexNet tensorflow: CPU - 1 - GoogLeNet tensorflow: CPU - 1 - ResNet-50 speedb: Rand Read speedb: Update Rand speedb: Read While Writing speedb: Read Rand Write Rand onnx: GPT-2 - CPU - Standard onnx: GPT-2 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: yolov4 - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: T5 Encoder - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: bertsquad-12 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: CaffeNet 12-int8 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: fcn-resnet101-11 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ArcFace ResNet-100 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: ResNet50 v1-12-int8 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: super-resolution-10 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard onnx: Faster R-CNN R-50-FPN-int8 - CPU - Standard llama-cpp: llama-2-7b.Q4_0.gguf llama-cpp: llama-2-13b.Q4_0.gguf llama-cpp: llama-2-70b-chat.Q5_0.gguf llamafile: llava-v1.5-7b-q4 - CPU llamafile: mistral-7b-instruct-v0.2.Q8_0 - CPU llamafile: wizardcoder-python-34b-v1.0.Q6_K - CPU a b 9556000 8625000 8000000 5.98308 1.74622 204.38 68.43 87.78 235.42 5.10 5.14 2.46 5.107 2.723 17.918 51.66 19.08 0.42 12.25 39.98 18.23 7.28 613257745 157060 16943739 1520436 205.563 4.86151 15.3826 65.0065 465.647 2.14695 15.9646 62.6361 786.262 1.27117 9.90267 100.98 35.9174 27.8405 170.023 5.88091 256.996 3.89053 38.3302 26.0875 0.69 0.55 0.43 0.53 8.57 3.74 9354000 8820000 8418000 4.02029 1.81638 202.83 68.36 86.79 239.01 5.16 5.00 2.47 5.249 2.761 18.398 51.00 19.21 11.89 37.7 17.62 7.01 490533153 156458 18187110 1514331 208.993 4.78113 17.1686 58.2438 360.779 2.77091 22.6587 44.1311 789.312 1.26631 9.4317 106.023 38.5606 25.9319 170.235 5.87369 247.665 4.03712 37.8832 26.395 0.58 0.45 0.34 0.53 8.81 3.86 OpenBenchmarking.org
Quicksilver Input: CTS2 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CTS2 a b 2M 4M 6M 8M 10M 9556000 9354000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Quicksilver Input: CORAL2 P1 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P1 a b 2M 4M 6M 8M 10M 8625000 8820000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Quicksilver Input: CORAL2 P2 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P2 a b 2M 4M 6M 8M 10M 8000000 8418000 1. (CXX) g++ options: -fopenmp -O3 -march=native
NAMD Input: ATPase with 327,506 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0b6 Input: ATPase with 327,506 Atoms a b 1.3462 2.6924 4.0386 5.3848 6.731 5.98308 4.02029
NAMD Input: STMV with 1,066,628 Atoms OpenBenchmarking.org ns/day, More Is Better NAMD 3.0b6 Input: STMV with 1,066,628 Atoms a b 0.4087 0.8174 1.2261 1.6348 2.0435 1.74622 1.81638
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Chimera 1080p a b 40 80 120 160 200 204.38 202.83 1. (CC) gcc options: -pthread
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Summer Nature 4K a b 15 30 45 60 75 68.43 68.36 1. (CC) gcc options: -pthread
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Summer Nature 1080p a b 20 40 60 80 100 87.78 86.79 1. (CC) gcc options: -pthread
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 1.4 Video Input: Chimera 1080p 10-bit a b 50 100 150 200 250 235.42 239.01 1. (CC) gcc options: -pthread
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only a b 1.161 2.322 3.483 4.644 5.805 5.10 5.16
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only a b 1.1565 2.313 3.4695 4.626 5.7825 5.14 5.00
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.2 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only a b 0.5558 1.1116 1.6674 2.2232 2.779 2.46 2.47
Y-Cruncher Pi Digits To Calculate: 1B OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.3 Pi Digits To Calculate: 1B a b 1.181 2.362 3.543 4.724 5.905 5.107 5.249
Y-Cruncher Pi Digits To Calculate: 500M OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.3 Pi Digits To Calculate: 500M a b 0.6212 1.2424 1.8636 2.4848 3.106 2.723 2.761
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2024 Implementation: MPI CPU - Input: water_GMX50_bare a b 5 10 15 20 25 17.92 18.40 1. (CXX) g++ options: -O3 -lm
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 a b 12 24 36 48 60 51.66 51.00 MIN: 21.28 / MAX: 52.92 MIN: 24.67 / MAX: 53.71
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-152 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-152 a b 5 10 15 20 25 19.08 19.21 MIN: 2.63 / MAX: 20.35 MIN: 6.66 / MAX: 20.29
PyTorch Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: Efficientnet_v2_l a 0.0945 0.189 0.2835 0.378 0.4725 0.42 MIN: 0.23 / MAX: 1.28
TensorFlow Device: CPU - Batch Size: 1 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: VGG-16 a b 3 6 9 12 15 12.25 11.89
TensorFlow Device: CPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: AlexNet a b 9 18 27 36 45 39.98 37.70
TensorFlow Device: CPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: GoogLeNet a b 4 8 12 16 20 18.23 17.62
TensorFlow Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: ResNet-50 a b 2 4 6 8 10 7.28 7.01
Speedb Test: Random Read OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Read a b 130M 260M 390M 520M 650M 613257745 490533153 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Update Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Update Random a b 30K 60K 90K 120K 150K 157060 156458 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Read While Writing OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read While Writing a b 4M 8M 12M 16M 20M 16943739 18187110 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Read Random Write Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read Random Write Random a b 300K 600K 900K 1200K 1500K 1520436 1514331 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b 50 100 150 200 250 205.56 208.99 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: GPT-2 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: GPT-2 - Device: CPU - Executor: Standard a b 1.0938 2.1876 3.2814 4.3752 5.469 4.86151 4.78113 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b 4 8 12 16 20 15.38 17.17 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: yolov4 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: yolov4 - Device: CPU - Executor: Standard a b 15 30 45 60 75 65.01 58.24 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b 100 200 300 400 500 465.65 360.78 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: T5 Encoder - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: T5 Encoder - Device: CPU - Executor: Standard a b 0.6235 1.247 1.8705 2.494 3.1175 2.14695 2.77091 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b 5 10 15 20 25 15.96 22.66 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: bertsquad-12 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: bertsquad-12 - Device: CPU - Executor: Standard a b 14 28 42 56 70 62.64 44.13 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b 200 400 600 800 1000 786.26 789.31 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard a b 0.286 0.572 0.858 1.144 1.43 1.27117 1.26631 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b 3 6 9 12 15 9.90267 9.43170 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: fcn-resnet101-11 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard a b 20 40 60 80 100 100.98 106.02 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b 9 18 27 36 45 35.92 38.56 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard a b 7 14 21 28 35 27.84 25.93 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b 40 80 120 160 200 170.02 170.24 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard a b 1.3232 2.6464 3.9696 5.2928 6.616 5.88091 5.87369 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a b 60 120 180 240 300 257.00 247.67 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: super-resolution-10 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: super-resolution-10 - Device: CPU - Executor: Standard a b 0.9084 1.8168 2.7252 3.6336 4.542 3.89053 4.03712 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inferences Per Second, More Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b 9 18 27 36 45 38.33 37.88 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
ONNX Runtime Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard OpenBenchmarking.org Inference Time Cost (ms), Fewer Is Better ONNX Runtime 1.17 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard a b 6 12 18 24 30 26.09 26.40 1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt
Llama.cpp Model: llama-2-7b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-7b.Q4_0.gguf a b 0.1553 0.3106 0.4659 0.6212 0.7765 0.69 0.58 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llama.cpp Model: llama-2-13b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-13b.Q4_0.gguf a b 0.1238 0.2476 0.3714 0.4952 0.619 0.55 0.45 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llama.cpp Model: llama-2-70b-chat.Q5_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-70b-chat.Q5_0.gguf a b 0.0968 0.1936 0.2904 0.3872 0.484 0.43 0.34 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Llamafile Test: llava-v1.5-7b-q4 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: llava-v1.5-7b-q4 - Acceleration: CPU a b 0.1193 0.2386 0.3579 0.4772 0.5965 0.53 0.53
Llamafile Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU a b 2 4 6 8 10 8.57 8.81
Llamafile Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU a b 0.8685 1.737 2.6055 3.474 4.3425 3.74 3.86
Phoronix Test Suite v10.8.5