2024 year AMD Ryzen Threadripper PRO 5965WX 24-Cores testing with a ASUS Pro WS WRX80E-SAGE SE WIFI (1201 BIOS) and ASUS NVIDIA NV106 2GB on Ubuntu 23.10 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2402040-NE-2024YEAR116&grs&sor .
2024 year Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server Display Driver OpenGL Compiler File-System Screen Resolution a b c d AMD Ryzen Threadripper PRO 5965WX 24-Cores @ 3.80GHz (24 Cores / 48 Threads) ASUS Pro WS WRX80E-SAGE SE WIFI (1201 BIOS) AMD Starship/Matisse 8 x 16GB DDR4-2133MT/s Corsair CMK32GX4M2E3200C16 2048GB SOLIDIGM SSDPFKKW020X7 ASUS NVIDIA NV106 2GB AMD Starship/Matisse VA2431 2 x Intel X550 + Intel Wi-Fi 6 AX200 Ubuntu 23.10 6.5.0-13-generic (x86_64) GNOME Shell 45.0 X Server + Wayland nouveau 4.3 Mesa 23.2.1-1ubuntu3 GCC 13.2.0 ext4 1920x1080 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-XYspKM/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0xa008205 Python Details - Python 3.11.6 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Mitigation of safe RET no microcode + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected
2024 year tensorflow: CPU - 1 - GoogLeNet lczero: BLAS lczero: Eigen svt-av1: Preset 13 - Bosphorus 1080p rav1e: 5 rav1e: 10 llama-cpp: llama-2-13b.Q4_0.gguf speedb: Update Rand svt-av1: Preset 4 - Bosphorus 1080p compress-lz4: 9 - Compression Speed pytorch: CPU - 16 - ResNet-50 speedb: Seq Fill deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream deepsparse: ResNet-50, Sparse INT8 - Asynchronous Multi-Stream speedb: Read While Writing deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream tensorflow: CPU - 16 - ResNet-50 deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Synchronous Single-Stream rav1e: 6 svt-av1: Preset 13 - Bosphorus 4K pytorch: CPU - 256 - ResNet-50 deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream deepsparse: ResNet-50, Sparse INT8 - Synchronous Single-Stream tensorflow: CPU - 16 - GoogLeNet svt-av1: Preset 8 - Bosphorus 4K deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Synchronous Single-Stream llama-cpp: llama-2-7b.Q4_0.gguf deepsparse: ResNet-50, Baseline - Synchronous Single-Stream speedb: Rand Read deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream svt-av1: Preset 12 - Bosphorus 4K deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream pytorch: CPU - 1 - ResNet-50 deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream tensorflow: CPU - 1 - ResNet-50 deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream tensorflow: CPU - 1 - VGG-16 deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream svt-av1: Preset 12 - Bosphorus 1080p rav1e: 1 speedb: Rand Fill Sync speedb: Read Rand Write Rand deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream tensorflow: CPU - 1 - AlexNet quicksilver: CORAL2 P2 deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: ResNet-50, Baseline - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream y-cruncher: 500M deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream svt-av1: Preset 4 - Bosphorus 4K cachebench: Read / Modify / Write deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream speedb: Rand Fill deepsparse: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream tensorflow: CPU - 16 - VGG-16 deepsparse: BERT-Large, NLP Question Answering, Sparse INT8 - Asynchronous Multi-Stream svt-av1: Preset 8 - Bosphorus 1080p tensorflow: CPU - 16 - AlexNet llama-cpp: llama-2-70b-chat.Q5_0.gguf deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream llamafile: llava-v1.5-7b-q4 - CPU deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO, Sparse INT8 - Synchronous Single-Stream deepsparse: BERT-Large, NLP Question Answering - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream compress-lz4: 3 - Compression Speed quicksilver: CTS2 deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream quicksilver: CORAL2 P1 y-cruncher: 1B deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream llamafile: mistral-7b-instruct-v0.2.Q8_0 - CPU compress-lz4: 1 - Compression Speed compress-lz4: 9 - Decompression Speed compress-lz4: 1 - Decompression Speed compress-lz4: 3 - Decompression Speed cachebench: Write cachebench: Read llamafile: wizardcoder-python-34b-v1.0.Q6_K - CPU a b c d 9.94 173 121 543.554 3.747 10.634 11.64 431692 18.803 44.28 32.50 620607 2012.2062 5.9507 7004007 306.9919 39.0493 19.87 5.2624 189.9093 5.261 190.794 31.92 757.1631 1.3175 60.85 61.53 26.9119 157.2286 20.76 6.3518 148134848 150.0421 79.8855 190.916 45.9976 21.7296 40.68 156.9701 6.3619 8.85 151.4666 79.1618 224.1802 2.72 53.4699 501.42 1.044 47488 2327911 76.1394 13.129 445.2565 6.26 24030000 39.0436 307.0146 8.9109 112.0914 26.8394 7.325 58.254 17.1616 30.489 6.677 130857.577562 685.1217 558330 17.495 446.6313 335.3479 8.51 35.762 122.948 100.44 1.94 32.5188 17.22 392.9588 98.9102 10.1046 368.3051 54.1194 18.4727 131.24 20680000 18.3872 54.3716 24210000 15.545 10.1424 98.516 10.13 828.78 4840.5 5019.5 4595.9 69134.680498 11543.372321 3.25 9.74 219 146 573.042 3.791 10.885 11.32 418848 18.682 44.48 32.10 618776 1962.7047 6.1010 7070407 305.3795 39.2555 19.45 5.1753 193.1106 5.292 192.726 32.15 752.7784 1.3252 60.18 61.830 26.6672 155.4837 20.95 6.4230 147432214 148.2388 80.8465 190.402 46.5453 21.4743 40.42 155.5402 6.4208 8.79 149.7818 80.0470 221.7053 2.70 54.0612 506.596 1.048 47708 2307686 75.5067 13.2388 448.0408 6.23 24026667 39.3339 304.7712 8.9755 111.2892 26.6478 7.349 58.3490 17.1337 30.2809 6.669 130069.848338 681.0038 554997 17.5999 448.9079 333.3674 8.46 35.9726 123.293 100.01 1.94 32.3621 17.26 394.7759 98.7456 10.1213 369.7720 54.3340 18.4000 131.10 20646667 18.3317 54.5365 24230000 15.497 10.1362 98.5635 10.15 829.36 4841.4 5020.0 4597.9 69140.530992 11543.164687 3.25 16.49 225 154 580.467 3.769 10.957 11.27 423788 18.409 45.49 31.65 604758 1970.6106 6.0747 7047502 306.5175 39.1181 19.61 5.1787 192.9801 5.282 189.252 31.59 765.9923 1.3025 60.04 61.47 26.705 155.403 20.74 6.4266 146285738 148.7561 80.5729 192.157 46.0912 21.6853 40.82 155.6182 6.4178 8.85 149.9507 79.8946 223.0515 2.72 53.7417 501.156 1.044 47373 2320670 75.9898 13.1545 448.9031 6.23 23890000 39.1233 306.5028 8.952 111.5864 26.6741 7.301 57.9978 17.2376 30.451 6.678 130806.245683 683.2461 557348 17.5377 449.2956 333.6358 8.48 35.9278 122.95 100.08 1.95 32.4212 17.3 393.8346 98.4979 10.146 369.6228 54.1474 18.4635 131.4 20620000 18.3589 54.4554 24240000 15.532 10.1277 98.6526 10.14 829.15 4842.4 5023.2 4598 69142.435503 11543.096362 3.25 9.73 213 151 565.906 3.891 11.022 11.25 417457 18.922 44.52 32.21 612428 1962.1551 6.1024 6896007 299.8295 39.977 19.81 5.2804 189.2673 5.191 192.603 32.07 757.2715 1.3175 59.82 60.95 26.5435 155.2016 20.68 6.4345 146473036 149.2683 80.2684 192.683 46.1334 21.6655 40.35 155.1926 6.4346 8.89 149.7647 80.052 222.7232 2.69 53.8237 506.174 1.054 47267 2316804 75.7073 13.204 448.0349 6.21 23840000 39.1902 305.8614 8.9586 111.4965 26.6902 7.297 58.4094 17.1162 30.3159 6.633 130851.30507 681.5303 556675 17.5865 449.1605 334.6424 8.51 35.837 122.616 99.9 1.95 32.3762 17.25 394.3756 98.8912 10.1068 369.53 54.204 18.4443 131.61 20600000 18.3959 54.3471 24290000 15.501 10.1151 98.7671 10.13 830.37 4844.5 5019.5 4596.7 69142.188854 11543.486939 3.25 OpenBenchmarking.org
TensorFlow Device: CPU - Batch Size: 1 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: GoogLeNet c a b d 4 8 12 16 20 SE +/- 0.09, N = 3 16.49 9.94 9.74 9.73
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.30 Backend: BLAS c b d a 50 100 150 200 250 SE +/- 0.33, N = 3 225 219 213 173 1. (CXX) g++ options: -flto -pthread
LeelaChessZero Backend: Eigen OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.30 Backend: Eigen c d b a 30 60 90 120 150 SE +/- 2.08, N = 3 154 151 146 121 1. (CXX) g++ options: -flto -pthread
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.8 Encoder Mode: Preset 13 - Input: Bosphorus 1080p c b d a 130 260 390 520 650 SE +/- 7.15, N = 3 580.47 573.04 565.91 543.55 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
rav1e Speed: 5 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 5 d b c a 0.8755 1.751 2.6265 3.502 4.3775 SE +/- 0.014, N = 3 3.891 3.791 3.769 3.747
rav1e Speed: 10 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 10 d c b a 3 6 9 12 15 SE +/- 0.11, N = 5 11.02 10.96 10.89 10.63
Llama.cpp Model: llama-2-13b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-13b.Q4_0.gguf a b c d 3 6 9 12 15 SE +/- 0.06, N = 3 11.64 11.32 11.27 11.25 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Speedb Test: Update Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Update Random a c b d 90K 180K 270K 360K 450K SE +/- 4060.59, N = 3 431692 423788 418848 417457 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
SVT-AV1 Encoder Mode: Preset 4 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.8 Encoder Mode: Preset 4 - Input: Bosphorus 1080p d a b c 5 10 15 20 25 SE +/- 0.08, N = 3 18.92 18.80 18.68 18.41 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
LZ4 Compression Compression Level: 9 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 9 - Compression Speed c d b a 10 20 30 40 50 SE +/- 0.02, N = 3 45.49 44.52 44.48 44.28 1. (CC) gcc options: -O3
PyTorch Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 16 - Model: ResNet-50 a d b c 8 16 24 32 40 SE +/- 0.12, N = 3 32.50 32.21 32.10 31.65 MIN: 30.56 / MAX: 32.75 MIN: 30.29 / MAX: 32.43 MIN: 29.1 / MAX: 32.53 MIN: 29.55 / MAX: 31.86
Speedb Test: Sequential Fill OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Sequential Fill a b d c 130K 260K 390K 520K 650K SE +/- 3239.87, N = 3 620607 618776 612428 604758 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream a c b d 400 800 1200 1600 2000 SE +/- 10.49, N = 3 2012.21 1970.61 1962.70 1962.16
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Asynchronous Multi-Stream a c b d 2 4 6 8 10 SE +/- 0.0329, N = 3 5.9507 6.0747 6.1010 6.1024
Speedb Test: Read While Writing OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read While Writing b c a d 1.5M 3M 4.5M 6M 7.5M SE +/- 60887.57, N = 3 7070407 7047502 7004007 6896007 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream a c b d 70 140 210 280 350 SE +/- 0.76, N = 3 306.99 306.52 305.38 299.83
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream a c b d 9 18 27 36 45 SE +/- 0.09, N = 3 39.05 39.12 39.26 39.98
TensorFlow Device: CPU - Batch Size: 16 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: ResNet-50 a d c b 5 10 15 20 25 SE +/- 0.08, N = 3 19.87 19.81 19.61 19.45
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream b c a d 1.1881 2.3762 3.5643 4.7524 5.9405 SE +/- 0.0173, N = 3 5.1753 5.1787 5.2624 5.2804
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Synchronous Single-Stream b c a d 40 80 120 160 200 SE +/- 0.65, N = 3 193.11 192.98 189.91 189.27
rav1e Speed: 6 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 6 b c a d 1.1907 2.3814 3.5721 4.7628 5.9535 SE +/- 0.008, N = 3 5.292 5.282 5.261 5.191
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.8 Encoder Mode: Preset 13 - Input: Bosphorus 4K b d a c 40 80 120 160 200 SE +/- 0.80, N = 3 192.73 192.60 190.79 189.25 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
PyTorch Device: CPU - Batch Size: 256 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 256 - Model: ResNet-50 b d a c 7 14 21 28 35 SE +/- 0.11, N = 3 32.15 32.07 31.92 31.59 MIN: 30.21 / MAX: 32.69 MIN: 30.1 / MAX: 32.3 MIN: 30 / MAX: 32.18 MIN: 29.73 / MAX: 32.12
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream c d a b 170 340 510 680 850 SE +/- 1.33, N = 3 765.99 757.27 757.16 752.78
Neural Magic DeepSparse Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Sparse INT8 - Scenario: Synchronous Single-Stream c a d b 0.2982 0.5964 0.8946 1.1928 1.491 SE +/- 0.0024, N = 3 1.3025 1.3175 1.3175 1.3252
TensorFlow Device: CPU - Batch Size: 16 - Model: GoogLeNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: GoogLeNet a b c d 14 28 42 56 70 SE +/- 0.36, N = 3 60.85 60.18 60.04 59.82
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.8 Encoder Mode: Preset 8 - Input: Bosphorus 4K b a c d 14 28 42 56 70 SE +/- 0.07, N = 3 61.83 61.53 61.47 60.95 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream a c b d 6 12 18 24 30 SE +/- 0.03, N = 3 26.91 26.71 26.67 26.54
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream a b c d 30 60 90 120 150 SE +/- 0.26, N = 3 157.23 155.48 155.40 155.20
Llama.cpp Model: llama-2-7b.Q4_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-7b.Q4_0.gguf b a c d 5 10 15 20 25 SE +/- 0.24, N = 4 20.95 20.76 20.74 20.68 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Synchronous Single-Stream a b c d 2 4 6 8 10 SE +/- 0.0106, N = 3 6.3518 6.4230 6.4266 6.4345
Speedb Test: Random Read OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Read a b d c 30M 60M 90M 120M 150M SE +/- 81483.06, N = 3 148134848 147432214 146473036 146285738 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream a d c b 30 60 90 120 150 SE +/- 0.36, N = 3 150.04 149.27 148.76 148.24
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream a d c b 20 40 60 80 100 SE +/- 0.20, N = 3 79.89 80.27 80.57 80.85
SVT-AV1 Encoder Mode: Preset 12 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.8 Encoder Mode: Preset 12 - Input: Bosphorus 4K d c a b 40 80 120 160 200 SE +/- 1.08, N = 3 192.68 192.16 190.92 190.40 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream a c d b 11 22 33 44 55 SE +/- 0.12, N = 3 46.00 46.09 46.13 46.55
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream a c d b 5 10 15 20 25 SE +/- 0.05, N = 3 21.73 21.69 21.67 21.47
PyTorch Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org batches/sec, More Is Better PyTorch 2.1 Device: CPU - Batch Size: 1 - Model: ResNet-50 c a b d 9 18 27 36 45 SE +/- 0.19, N = 3 40.82 40.68 40.42 40.35 MIN: 37.73 / MAX: 41.05 MIN: 37.73 / MAX: 40.91 MIN: 37.5 / MAX: 41 MIN: 37.34 / MAX: 40.65
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream a c b d 30 60 90 120 150 SE +/- 0.08, N = 3 156.97 155.62 155.54 155.19
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream a c b d 2 4 6 8 10 SE +/- 0.0037, N = 3 6.3619 6.4178 6.4208 6.4346
TensorFlow Device: CPU - Batch Size: 1 - Model: ResNet-50 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: ResNet-50 d c a b 2 4 6 8 10 SE +/- 0.05, N = 3 8.89 8.85 8.85 8.79
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream a c b d 30 60 90 120 150 SE +/- 0.01, N = 3 151.47 149.95 149.78 149.76
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Asynchronous Multi-Stream a c b d 20 40 60 80 100 SE +/- 0.02, N = 3 79.16 79.89 80.05 80.05
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream a c d b 50 100 150 200 250 SE +/- 0.12, N = 3 224.18 223.05 222.72 221.71
TensorFlow Device: CPU - Batch Size: 1 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: VGG-16 c a b d 0.612 1.224 1.836 2.448 3.06 SE +/- 0.01, N = 3 2.72 2.72 2.70 2.69
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream a c d b 12 24 36 48 60 SE +/- 0.03, N = 3 53.47 53.74 53.82 54.06
SVT-AV1 Encoder Mode: Preset 12 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.8 Encoder Mode: Preset 12 - Input: Bosphorus 1080p b d a c 110 220 330 440 550 SE +/- 5.37, N = 5 506.60 506.17 501.42 501.16 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
rav1e Speed: 1 OpenBenchmarking.org Frames Per Second, More Is Better rav1e 0.7 Speed: 1 d b c a 0.2372 0.4744 0.7116 0.9488 1.186 SE +/- 0.004, N = 3 1.054 1.048 1.044 1.044
Speedb Test: Random Fill Sync OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Fill Sync b a c d 10K 20K 30K 40K 50K SE +/- 66.17, N = 3 47708 47488 47373 47267 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Speedb Test: Read Random Write Random OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Read Random Write Random a c d b 500K 1000K 1500K 2000K 2500K SE +/- 1258.96, N = 3 2327911 2320670 2316804 2307686 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream a c d b 20 40 60 80 100 SE +/- 0.14, N = 3 76.14 75.99 75.71 75.51
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Synchronous Single-Stream a c d b 3 6 9 12 15 SE +/- 0.02, N = 3 13.13 13.15 13.20 13.24
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream a d b c 100 200 300 400 500 SE +/- 0.32, N = 3 445.26 448.03 448.04 448.90
TensorFlow Device: CPU - Batch Size: 1 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 1 - Model: AlexNet a c b d 2 4 6 8 10 SE +/- 0.01, N = 3 6.26 6.23 6.23 6.21
Quicksilver Input: CORAL2 P2 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P2 a b c d 5M 10M 15M 20M 25M SE +/- 3333.33, N = 3 24030000 24026667 23890000 23840000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream a c d b 9 18 27 36 45 SE +/- 0.01, N = 3 39.04 39.12 39.19 39.33
Neural Magic DeepSparse Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: ResNet-50, Baseline - Scenario: Asynchronous Multi-Stream a c d b 70 140 210 280 350 SE +/- 0.13, N = 3 307.01 306.50 305.86 304.77
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream a c d b 3 6 9 12 15 SE +/- 0.0211, N = 3 8.9109 8.9520 8.9586 8.9755
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream a c d b 30 60 90 120 150 SE +/- 0.26, N = 3 112.09 111.59 111.50 111.29
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream a d c b 6 12 18 24 30 SE +/- 0.05, N = 3 26.84 26.69 26.67 26.65
Y-Cruncher Pi Digits To Calculate: 500M OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.3 Pi Digits To Calculate: 500M d c a b 2 4 6 8 10 SE +/- 0.007, N = 3 7.297 7.301 7.325 7.349
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream c a b d 13 26 39 52 65 SE +/- 0.07, N = 3 58.00 58.25 58.35 58.41
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Synchronous Single-Stream c a b d 4 8 12 16 20 SE +/- 0.02, N = 3 17.24 17.16 17.13 17.12
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream a c d b 7 14 21 28 35 SE +/- 0.02, N = 3 30.49 30.45 30.32 30.28
SVT-AV1 Encoder Mode: Preset 4 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.8 Encoder Mode: Preset 4 - Input: Bosphorus 4K c a b d 2 4 6 8 10 SE +/- 0.015, N = 3 6.678 6.677 6.669 6.633 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
CacheBench Test: Read / Modify / Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read / Modify / Write a d c b 30K 60K 90K 120K 150K SE +/- 386.79, N = 3 130857.58 130851.31 130806.25 130069.85 MIN: 112608.55 / MAX: 137126.28 MIN: 112492.8 / MAX: 137124.99 MIN: 112724.52 / MAX: 137125.96 MIN: 101861.72 / MAX: 137133.31 1. (CC) gcc options: -O3 -lrt
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream a c d b 150 300 450 600 750 SE +/- 0.83, N = 3 685.12 683.25 681.53 681.00
Speedb Test: Random Fill OpenBenchmarking.org Op/s, More Is Better Speedb 2.7 Test: Random Fill a c d b 120K 240K 360K 480K 600K SE +/- 4227.88, N = 3 558330 557348 556675 554997 1. (CXX) g++ options: -O3 -march=native -pthread -fno-builtin-memcmp -fno-rtti -lpthread
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Text Classification, BERT base uncased SST2, Sparse INT8 - Scenario: Asynchronous Multi-Stream a c d b 4 8 12 16 20 SE +/- 0.02, N = 3 17.50 17.54 17.59 17.60
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream a b d c 100 200 300 400 500 SE +/- 0.27, N = 3 446.63 448.91 449.16 449.30
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream a d c b 70 140 210 280 350 SE +/- 0.44, N = 3 335.35 334.64 333.64 333.37
TensorFlow Device: CPU - Batch Size: 16 - Model: VGG-16 OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: VGG-16 d a c b 2 4 6 8 10 SE +/- 0.03, N = 3 8.51 8.51 8.48 8.46
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering, Sparse INT8 - Scenario: Asynchronous Multi-Stream a d c b 8 16 24 32 40 SE +/- 0.05, N = 3 35.76 35.84 35.93 35.97
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.8 Encoder Mode: Preset 8 - Input: Bosphorus 1080p b c a d 30 60 90 120 150 SE +/- 0.59, N = 3 123.29 122.95 122.95 122.62 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
TensorFlow Device: CPU - Batch Size: 16 - Model: AlexNet OpenBenchmarking.org images/sec, More Is Better TensorFlow 2.12 Device: CPU - Batch Size: 16 - Model: AlexNet a c b d 20 40 60 80 100 SE +/- 0.20, N = 3 100.44 100.08 100.01 99.90
Llama.cpp Model: llama-2-70b-chat.Q5_0.gguf OpenBenchmarking.org Tokens Per Second, More Is Better Llama.cpp b1808 Model: llama-2-70b-chat.Q5_0.gguf d c b a 0.4388 0.8776 1.3164 1.7552 2.194 SE +/- 0.00, N = 3 1.95 1.95 1.94 1.94 1. (CXX) g++ options: -std=c++11 -fPIC -O3 -pthread -march=native -mtune=native -lopenblas
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream a c d b 8 16 24 32 40 SE +/- 0.05, N = 3 32.52 32.42 32.38 32.36
Llamafile Test: llava-v1.5-7b-q4 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: llava-v1.5-7b-q4 - Acceleration: CPU c b d a 4 8 12 16 20 SE +/- 0.01, N = 3 17.30 17.26 17.25 17.22
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream a c d b 90 180 270 360 450 SE +/- 0.41, N = 3 392.96 393.83 394.38 394.78
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream a d b c 20 40 60 80 100 SE +/- 0.29, N = 3 98.91 98.89 98.75 98.50
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO, Sparse INT8 - Scenario: Synchronous Single-Stream a d b c 3 6 9 12 15 SE +/- 0.03, N = 3 10.10 10.11 10.12 10.15
Neural Magic DeepSparse Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: BERT-Large, NLP Question Answering - Scenario: Asynchronous Multi-Stream a d c b 80 160 240 320 400 SE +/- 0.27, N = 3 368.31 369.53 369.62 369.77
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream a c d b 12 24 36 48 60 SE +/- 0.03, N = 3 54.12 54.15 54.20 54.33
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream a c d b 5 10 15 20 25 SE +/- 0.01, N = 3 18.47 18.46 18.44 18.40
LZ4 Compression Compression Level: 3 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 3 - Compression Speed d c a b 30 60 90 120 150 SE +/- 0.30, N = 3 131.61 131.40 131.24 131.10 1. (CC) gcc options: -O3
Quicksilver Input: CTS2 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CTS2 a b c d 4M 8M 12M 16M 20M SE +/- 6666.67, N = 3 20680000 20646667 20620000 20600000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream d a c b 5 10 15 20 25 SE +/- 0.01, N = 3 18.40 18.39 18.36 18.33
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream d a c b 12 24 36 48 60 SE +/- 0.02, N = 3 54.35 54.37 54.46 54.54
Quicksilver Input: CORAL2 P1 OpenBenchmarking.org Figure Of Merit, More Is Better Quicksilver 20230818 Input: CORAL2 P1 d c b a 5M 10M 15M 20M 25M SE +/- 11547.01, N = 3 24290000 24240000 24230000 24210000 1. (CXX) g++ options: -fopenmp -O3 -march=native
Y-Cruncher Pi Digits To Calculate: 1B OpenBenchmarking.org Seconds, Fewer Is Better Y-Cruncher 0.8.3 Pi Digits To Calculate: 1B b d c a 4 8 12 16 20 SE +/- 0.01, N = 3 15.50 15.50 15.53 15.55
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream d c b a 3 6 9 12 15 SE +/- 0.01, N = 3 10.12 10.13 10.14 10.14
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.6 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream d c b a 20 40 60 80 100 SE +/- 0.09, N = 3 98.77 98.65 98.56 98.52
Llamafile Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: mistral-7b-instruct-v0.2.Q8_0 - Acceleration: CPU b c d a 3 6 9 12 15 SE +/- 0.01, N = 3 10.15 10.14 10.13 10.13
LZ4 Compression Compression Level: 1 - Compression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 1 - Compression Speed d b c a 200 400 600 800 1000 SE +/- 0.63, N = 3 830.37 829.36 829.15 828.78 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 9 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 9 - Decompression Speed d c b a 1000 2000 3000 4000 5000 SE +/- 1.12, N = 3 4844.5 4842.4 4841.4 4840.5 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 1 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 1 - Decompression Speed c b d a 1100 2200 3300 4400 5500 SE +/- 1.42, N = 3 5023.2 5020.0 5019.5 5019.5 1. (CC) gcc options: -O3
LZ4 Compression Compression Level: 3 - Decompression Speed OpenBenchmarking.org MB/s, More Is Better LZ4 Compression 1.9.4 Compression Level: 3 - Decompression Speed c b d a 1000 2000 3000 4000 5000 SE +/- 0.63, N = 3 4598.0 4597.9 4596.7 4595.9 1. (CC) gcc options: -O3
CacheBench Test: Write OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Write c d b a 15K 30K 45K 60K 75K SE +/- 3.29, N = 3 69142.44 69142.19 69140.53 69134.68 MIN: 68884.8 / MAX: 69218.23 MIN: 68886.61 / MAX: 69217.36 MIN: 68883.98 / MAX: 69225.86 MIN: 68881.15 / MAX: 69208.76 1. (CC) gcc options: -O3 -lrt
CacheBench Test: Read OpenBenchmarking.org MB/s, More Is Better CacheBench Test: Read d a b c 2K 4K 6K 8K 10K SE +/- 0.09, N = 3 11543.49 11543.37 11543.16 11543.10 MIN: 11542.8 / MAX: 11544.64 MIN: 11542.37 / MAX: 11544.55 MIN: 11542.65 / MAX: 11544.48 MIN: 11542.7 / MAX: 11543.41 1. (CC) gcc options: -O3 -lrt
Llamafile Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU OpenBenchmarking.org Tokens Per Second, More Is Better Llamafile 0.6 Test: wizardcoder-python-34b-v1.0.Q6_K - Acceleration: CPU d c b a 0.7313 1.4626 2.1939 2.9252 3.6565 SE +/- 0.00, N = 3 3.25 3.25 3.25 3.25
Phoronix Test Suite v10.8.5