dddas AMD Ryzen Threadripper 3970X 32-Core testing with a ASUS ROG ZENITH II EXTREME (1603 BIOS) and AMD Radeon RX 5700 8GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2306242-NE-DDDAS565146&sor&grr .
dddas Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL Vulkan Compiler File-System Screen Resolution a b AMD Ryzen Threadripper 3970X 32-Core @ 3.70GHz (32 Cores / 64 Threads) ASUS ROG ZENITH II EXTREME (1603 BIOS) AMD Starship/Matisse 64GB Samsung SSD 980 PRO 500GB AMD Radeon RX 5700 8GB (1750/875MHz) AMD Navi 10 HDMI Audio ASUS VP28U Aquantia AQC107 NBase-T/IEEE + Intel I211 + Intel Wi-Fi 6 AX200 Ubuntu 22.04 5.19.0-051900rc7-generic (x86_64) GNOME Shell 42.2 X Server + Wayland 4.6 Mesa 22.0.1 (LLVM 13.0.1 DRM 3.47) 1.2.204 GCC 11.3.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-aYxV0E/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-aYxV0E/gcc-11-11.3.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Disk Details - NONE / errors=remount-ro,relatime,rw / Block Size: 4096 Processor Details - Scaling Governor: acpi-cpufreq schedutil (Boost: Enabled) - CPU Microcode: 0x830104d Graphics Details - BAR1 / Visible vRAM Size: 256 MB - vBIOS Version: 113-D1820201-101 Python Details - Python 3.10.6 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Mitigation of untrained return thunk; SMT enabled with STIBP protection + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional STIBP: always-on RSB filling + srbds: Not affected + tsx_async_abort: Not affected
dddas whisper-cpp: ggml-medium.en - 2016 State of the Union whisper-cpp: ggml-small.en - 2016 State of the Union sqlite: 64 sqlite: 32 libxsmm: 128 sqlite: 16 sqlite: 4 onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU petsc: Streams sqlite: 8 sqlite: 2 nekrs: Kershaw nekrs: TurboPipe Periodic hpcg: 104 104 104 - 60 qmcpack: FeCO6_b3lyp_gms libxsmm: 256 mocassin: Dust 2D tau100.0 qmcpack: FeCO6_b3lyp_gms palabos: 100 ospray: particle_volume/scivis/real_time whisper-cpp: ggml-base.en - 2016 State of the Union ospray: particle_volume/pathtracer/real_time palabos: 400 palabos: 500 qmcpack: Li2_STO_ae leveldb: Seq Fill leveldb: Seq Fill xonotic: 3840 x 2160 - Ultimate leveldb: Rand Delete heffte: c2c - FFTW - double-long - 512 heffte: c2c - Stock - double-long - 512 stress-ng: Socket Activity stress-ng: Pipe laghos: Sedov Blast Wave, ube_922_hex.mesh gpaw: Carbon Nanotube vvenc: Bosphorus 4K - Fast sqlite: 1 ospray: particle_volume/ao/real_time xonotic: 2560 x 1440 - Ultimate xonotic: 1920 x 1200 - Ultimate xonotic: 1920 x 1080 - Ultimate xonotic: 3840 x 2160 - Ultra onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - u8s8f32 - CPU xonotic: 3840 x 2160 - High onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - f32 - CPU xonotic: 1920 x 1080 - Ultra xonotic: 1920 x 1200 - Ultra xonotic: 2560 x 1440 - Ultra z3: 2.smt2 ospray: gravity_spheres_volume/dim_512/scivis/real_time xonotic: 1920 x 1080 - High xonotic: 2560 x 1440 - High xonotic: 1920 x 1200 - High ospray: gravity_spheres_volume/dim_512/ao/real_time heffte: r2c - FFTW - double-long - 512 leveldb: Seek Rand ospray: gravity_spheres_volume/dim_512/pathtracer/real_time heffte: r2c - Stock - double-long - 512 deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream xonotic: 3840 x 2160 - Low cp2k: Fayalite-FIST xonotic: 1920 x 1080 - Low xonotic: 1920 x 1200 - Low xonotic: 2560 x 1440 - Low deepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream deepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Asynchronous Multi-Stream onednn: IP Shapes 1D - u8s8f32 - CPU deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Stream vvenc: Bosphorus 4K - Faster deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Asynchronous Multi-Stream kripke: deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Stream deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Stream deepsparse: NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream deepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream laghos: Triple Point Problem deepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Synchronous Single-Stream deepsparse: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Synchronous Single-Stream svt-av1: Preset 4 - Bosphorus 4K vvenc: Bosphorus 1080p - Fast leveldb: Rand Read leveldb: Hot Read deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream deepsparse: CV Segmentation, 90% Pruned YOLACT Pruned - Synchronous Single-Stream encode-opus: WAV To Opus Encode onednn: IP Shapes 1D - f32 - CPU deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream deepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream espeak: Text-To-Speech Synthesis deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Asynchronous Multi-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Detection, YOLOv5s COCO - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream stress-ng: Futex deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream deepsparse: CV Classification, ResNet-50 ImageNet - Synchronous Single-Stream libxsmm: 64 libxsmm: 32 oidn: RTLightmap.hdr.4096x4096 - CPU-Only stress-ng: IO_uring stress-ng: MMAP stress-ng: Malloc stress-ng: Cloning stress-ng: MEMFD stress-ng: Atomic stress-ng: CPU Cache liquid-dsp: 64 - 256 - 512 liquid-dsp: 8 - 256 - 512 stress-ng: Zlib liquid-dsp: 32 - 256 - 512 stress-ng: Pthread liquid-dsp: 8 - 256 - 32 liquid-dsp: 8 - 256 - 57 stress-ng: Memory Copying stress-ng: NUMA liquid-dsp: 16 - 256 - 512 stress-ng: Matrix 3D Math stress-ng: Vector Shuffle stress-ng: Function Call stress-ng: Semaphores stress-ng: Wide Vector Math stress-ng: Vector Floating Point stress-ng: Glibc C String Functions liquid-dsp: 64 - 256 - 57 stress-ng: System V Message Passing stress-ng: Floating Point liquid-dsp: 4 - 256 - 512 stress-ng: Poll liquid-dsp: 64 - 256 - 32 stress-ng: Mutex stress-ng: AVL Tree stress-ng: Crypto liquid-dsp: 32 - 256 - 57 stress-ng: Context Switching stress-ng: Forking stress-ng: Vector Math stress-ng: Matrix Math stress-ng: Hash stress-ng: Glibc Qsort Data Sorting stress-ng: CPU Stress stress-ng: SENDFILE stress-ng: Fused Multiply-Add liquid-dsp: 32 - 256 - 32 liquid-dsp: 2 - 256 - 512 liquid-dsp: 16 - 256 - 57 liquid-dsp: 16 - 256 - 32 liquid-dsp: 1 - 256 - 512 liquid-dsp: 1 - 256 - 32 liquid-dsp: 2 - 256 - 32 liquid-dsp: 4 - 256 - 57 liquid-dsp: 4 - 256 - 32 liquid-dsp: 2 - 256 - 57 liquid-dsp: 1 - 256 - 57 z3: 1.smt2 embree: Pathtracer ISPC - Asian Dragon Obj qmcpack: simple-H2O embree: Pathtracer - Asian Dragon Obj leveldb: Rand Fill leveldb: Rand Fill leveldb: Overwrite leveldb: Overwrite vvenc: Bosphorus 1080p - Faster dav1d: Chimera 1080p 10-bit remhos: Sample Remap Example dav1d: Chimera 1080p cp2k: H20-64 onednn: Deconvolution Batch shapes_1d - f32 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU embree: Pathtracer ISPC - Crown embree: Pathtracer - Crown oidn: RT.hdr_alb_nrm.3840x2160 - CPU-Only embree: Pathtracer ISPC - Asian Dragon oidn: RT.ldr_alb_nrm.3840x2160 - CPU-Only dav1d: Summer Nature 4K svt-av1: Preset 4 - Bosphorus 1080p embree: Pathtracer - Asian Dragon heffte: c2c - FFTW - double-long - 256 heffte: c2c - Stock - double-long - 256 mocassin: Gas HII40 svt-av1: Preset 8 - Bosphorus 4K leveldb: Fill Sync leveldb: Fill Sync onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 3D - u8s8f32 - CPU svt-av1: Preset 8 - Bosphorus 1080p heffte: r2c - FFTW - double-long - 256 svt-av1: Preset 12 - Bosphorus 4K heffte: r2c - Stock - double-long - 256 svt-av1: Preset 13 - Bosphorus 4K onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - f32 - CPU dav1d: Summer Nature 1080p heffte: r2c - FFTW - double-long - 128 onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU svt-av1: Preset 12 - Bosphorus 1080p svt-av1: Preset 13 - Bosphorus 1080p heffte: c2c - Stock - double-long - 128 heffte: c2c - FFTW - double-long - 128 heffte: r2c - Stock - double-long - 128 oidn: RT.hdr_alb_nrm.3840x2160 - Radeon HIP a b 1018.28439 395.70935 681.448 505.417 635.8 373.820 266.581 935.611 58312.0964 291.254 243.219 2123046667 3444566667 10.9645 196.98 910.4 181.265 175.39 121.931 9.74548 156.48335 128.572 139.299 143.850 136.22 254.942 27.8 311.3989499 245.160 15.3464 15.4082 3072.80 18809740.35 264.34 110.846 5.44 106.014 9.86893 384.0158631 384.7672496 386.9438277 420.7457662 3252.18 3235.41 3244.46 467.6921769 938.102 976.467 518.6729034 521.4981114 520.7893156 76.011 4.62468 561.4381904 560.9659015 561.0057748 4.93554 27.7110 65.839 7.67668 30.0147 185.7387 86.1294 670.0380428 123.826 671.4224194 671.9542910 673.1714914 62.4431 255.9872 552.2812 28.8773 556.6090 28.6062 1.177229 134.4071 119.0134 10.931 482.7469 33.0990 148243333 34.8038 28.7279 22.7831 43.8828 61.1727 16.3453 61.1988 16.3383 220.46 12.8830 77.5785 3.756 13.875 43.493 43.137 46.3523 21.5675 28.695 1.55099 67.8249 235.8139 12.2881 81.3373 31.077 107.3698 148.9743 10.9104 91.5663 49.4917 323.0917 4610857.40 7.0704 141.2991 318.5 160.5 0.60 439798.24 437.11 92853207.13 3354.40 395.11 480.06 1624118.54 506326667 82123667 4517.78 313753333 128353.64 354570000 409183333 10973.65 752.30 160113333 2806.09 22825.44 24278.34 66510329.66 1501239.29 94803.76 33453867.32 1836033333 10692419.88 11201.44 41277667 4084623.29 2250733333 18827346.28 283.41 78260.17 1506266667 11409509.77 51344.69 224417.23 199178.68 7627578.66 942.22 82729.76 515575.47 33507543.08 1343200000 20851667 795103333 690800000 10537667 45075000 89896333 206086667 178100000 103806667 51993333 29.932 33.8311 27.602 37.3900 262.981 26.9 262.354 27.0 24.877 374.79 23.537 398.39 42.966 5.69206 1.36630 34.4087 38.4669 1.22 39.3962 1.22 222.52 10.868 41.5850 13.7638 13.8764 12.684 54.148 10866.014 0.6 4.26624 0.948450 85.308 27.4329 126.423 30.1443 127.136 5.76769 4.81893 597.02 56.4524 2.68566 1.57740 308.249 360.924 26.5517 30.8024 51.8810 1003.11362 363.32431 680.811 502.766 635.4 374.33 262.624 932.987 58276.7926 284.865 237.187 2109640000 3441770000 11.0163 191.15 907.4 180.727 174.82 122.23 9.76771 151.04973 128.663 140.078 144.062 132.76 255.488 27.7 311.7695344 245.182 15.3508 15.4135 9580.27 22201912.16 265.390123291 110.949 5.387 105 9.89124 381.3165184 386.8054656 386.2753935 423.3297457 3275.76 3226.7 3227.02 469.7486112 958.843 987.992 524.5527153 521.2002993 527.2114812 76.122 4.62434 563.7930829 567.977016 567.9388648 4.98028 27.7303 64.562 7.70051 30.036 183.8144 87.0186 675.8397046 122.975 669.8590201 676.0787756 687.4861314 62.3112 256.6381 551.16 28.9177 550.4325 28.9881 1.08944 134.323 119.0879 10.892 485.9478 32.9066 146215600 33.4783 29.8626 23.2237 43.0475 60.6621 16.4828 60.8593 16.4295 219.12 12.7473 78.4003 3.721 13.767 43.547 42.757 46.5115 21.4935 28.83 1.30181 67.5353 236.8228 11.9799 83.4278 31.488 106.6629 149.9629 10.8616 91.9798 49.319 324.2731 4644715 6.833 146.1891 318.7 160.7 0.60 440335.12 439.24 92812375.37 3360.52 394.5 480.51 1535034.64 506050000 82224000 4518.88 314560000 128387.57 355110000 410090000 10984.91 741.66 160130000 2795.85 22200.86 24275.23 71041068.48 1496970.66 95693.93 33092079.25 1837000000 10677047.79 11221.55 41475000 4101817.96 2269700000 18816044.49 282.42 78455.19 1512100000 11620881.04 51160.22 224460.71 200423.6 7624159.24 943.84 82887.24 528847.31 33539318.97 1350000000 20989000 799490000 692130000 10560000 45023000 89686000 206140000 179170000 103260000 51814000 29.898 33.9593 27.484 37.4822 264.868 26.7 262.283 27 24.801 374.13 23.654 398.2 42.112 5.73181 1.43664 34.5263 38.45 1.22 39.399 1.23 222.24 10.845 41.782 13.79 13.8536 12.598 54.556 16348.371 0.4 4.21682 0.985098 85.496 27.2699 127.675 30.2863 127.68 5.70574 4.83565 597.19 55.8515 2.69872 1.5719 305.73 364.28 26.8097 30.8579 50.934 OpenBenchmarking.org
Whisper.cpp Model: ggml-medium.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-medium.en - Input: 2016 State of the Union b a 200 400 600 800 1000 SE +/- 11.84, N = 3 1003.11 1018.28 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
Whisper.cpp Model: ggml-small.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-small.en - Input: 2016 State of the Union b a 90 180 270 360 450 SE +/- 6.52, N = 9 363.32 395.71 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
SQLite Threads / Copies: 64 OpenBenchmarking.org Seconds, Fewer Is Better SQLite 3.41.2 Threads / Copies: 64 b a 150 300 450 600 750 SE +/- 0.71, N = 3 680.81 681.45 1. (CC) gcc options: -O2 -lreadline -ltermcap -lz -lm
SQLite Threads / Copies: 32 OpenBenchmarking.org Seconds, Fewer Is Better SQLite 3.41.2 Threads / Copies: 32 b a 110 220 330 440 550 SE +/- 1.08, N = 3 502.77 505.42 1. (CC) gcc options: -O2 -lreadline -ltermcap -lz -lm
libxsmm M N K: 128 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 128 a b 140 280 420 560 700 SE +/- 0.22, N = 3 635.8 635.4 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
SQLite Threads / Copies: 16 OpenBenchmarking.org Seconds, Fewer Is Better SQLite 3.41.2 Threads / Copies: 16 a b 80 160 240 320 400 SE +/- 1.37, N = 3 373.82 374.33 1. (CC) gcc options: -O2 -lreadline -ltermcap -lz -lm
SQLite Threads / Copies: 4 OpenBenchmarking.org Seconds, Fewer Is Better SQLite 3.41.2 Threads / Copies: 4 b a 60 120 180 240 300 SE +/- 2.89, N = 4 262.62 266.58 1. (CC) gcc options: -O2 -lreadline -ltermcap -lz -lm
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU b a 200 400 600 800 1000 SE +/- 7.63, N = 15 932.99 935.61 MIN: 924.9 MIN: 895.45 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
PETSc Test: Streams OpenBenchmarking.org MB/s, More Is Better PETSc 3.19 Test: Streams a b 12K 24K 36K 48K 60K SE +/- 71.95, N = 3 58312.10 58276.79 1. (CC) gcc options: -fPIC -O3 -O2 -lpthread -ludev -lpciaccess -lm
SQLite Threads / Copies: 8 OpenBenchmarking.org Seconds, Fewer Is Better SQLite 3.41.2 Threads / Copies: 8 b a 60 120 180 240 300 SE +/- 2.25, N = 3 284.87 291.25 1. (CC) gcc options: -O2 -lreadline -ltermcap -lz -lm
SQLite Threads / Copies: 2 OpenBenchmarking.org Seconds, Fewer Is Better SQLite 3.41.2 Threads / Copies: 2 b a 50 100 150 200 250 SE +/- 1.34, N = 3 237.19 243.22 1. (CC) gcc options: -O2 -lreadline -ltermcap -lz -lm
nekRS Input: Kershaw OpenBenchmarking.org flops/rank, More Is Better nekRS 23.0 Input: Kershaw a b 500M 1000M 1500M 2000M 2500M SE +/- 3171604.92, N = 3 2123046667 2109640000 1. (CXX) g++ options: -fopenmp -O2 -march=native -mtune=native -ftree-vectorize -rdynamic -lmpi_cxx -lmpi
nekRS Input: TurboPipe Periodic OpenBenchmarking.org flops/rank, More Is Better nekRS 23.0 Input: TurboPipe Periodic a b 700M 1400M 2100M 2800M 3500M SE +/- 1942175.18, N = 3 3444566667 3441770000 1. (CXX) g++ options: -fopenmp -O2 -march=native -mtune=native -ftree-vectorize -rdynamic -lmpi_cxx -lmpi
High Performance Conjugate Gradient X Y Z: 104 104 104 - RT: 60 OpenBenchmarking.org GFLOP/s, More Is Better High Performance Conjugate Gradient 3.1 X Y Z: 104 104 104 - RT: 60 b a 3 6 9 12 15 SE +/- 0.02, N = 3 11.02 10.96 1. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi
QMCPACK Input: FeCO6_b3lyp_gms OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: FeCO6_b3lyp_gms b a 40 80 120 160 200 SE +/- 1.72, N = 3 191.15 196.98 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
libxsmm M N K: 256 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 256 a b 200 400 600 800 1000 SE +/- 3.58, N = 3 910.4 907.4 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
Monte Carlo Simulations of Ionised Nebulae Input: Dust 2D tau100.0 OpenBenchmarking.org Seconds, Fewer Is Better Monte Carlo Simulations of Ionised Nebulae 2.02.73.3 Input: Dust 2D tau100.0 b a 40 80 120 160 200 SE +/- 0.15, N = 3 180.73 181.27 1. (F9X) gfortran options: -cpp -Jsource/ -ffree-line-length-0 -lm -std=legacy -O2 -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lz
QMCPACK Input: FeCO6_b3lyp_gms OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: FeCO6_b3lyp_gms b a 40 80 120 160 200 SE +/- 0.14, N = 3 174.82 175.39 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Palabos Grid Size: 100 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 100 b a 30 60 90 120 150 SE +/- 0.14, N = 3 122.23 121.93 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
OSPRay Benchmark: particle_volume/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/scivis/real_time b a 3 6 9 12 15 SE +/- 0.00531, N = 3 9.76771 9.74548
Whisper.cpp Model: ggml-base.en - Input: 2016 State of the Union OpenBenchmarking.org Seconds, Fewer Is Better Whisper.cpp 1.4 Model: ggml-base.en - Input: 2016 State of the Union b a 30 60 90 120 150 SE +/- 1.99, N = 3 151.05 156.48 1. (CXX) g++ options: -O3 -std=c++11 -fPIC -pthread
OSPRay Benchmark: particle_volume/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/pathtracer/real_time b a 30 60 90 120 150 SE +/- 0.05, N = 3 128.66 128.57
Palabos Grid Size: 400 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 400 b a 30 60 90 120 150 SE +/- 0.57, N = 3 140.08 139.30 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
Palabos Grid Size: 500 OpenBenchmarking.org Mega Site Updates Per Second, More Is Better Palabos 2.3 Grid Size: 500 b a 30 60 90 120 150 SE +/- 0.27, N = 3 144.06 143.85 1. (CXX) g++ options: -std=c++17 -pedantic -O3 -rdynamic -lcrypto -lcurl -lsz -lz -ldl -lm
QMCPACK Input: Li2_STO_ae OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: Li2_STO_ae b a 30 60 90 120 150 SE +/- 0.40, N = 3 132.76 136.22 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
LevelDB Benchmark: Sequential Fill OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.23 Benchmark: Sequential Fill a b 60 120 180 240 300 SE +/- 0.72, N = 3 254.94 255.49 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
LevelDB Benchmark: Sequential Fill OpenBenchmarking.org MB/s, More Is Better LevelDB 1.23 Benchmark: Sequential Fill a b 7 14 21 28 35 SE +/- 0.07, N = 3 27.8 27.7 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
Xonotic Resolution: 3840 x 2160 - Effects Quality: Ultimate OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 3840 x 2160 - Effects Quality: Ultimate b a 70 140 210 280 350 SE +/- 0.68, N = 3 311.77 311.40 MIN: 98 / MAX: 488 MIN: 97 / MAX: 487
LevelDB Benchmark: Random Delete OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.23 Benchmark: Random Delete a b 50 100 150 200 250 SE +/- 0.46, N = 3 245.16 245.18 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 512 b a 4 8 12 16 20 SE +/- 0.00, N = 3 15.35 15.35 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 512 b a 4 8 12 16 20 SE +/- 0.01, N = 3 15.41 15.41 1. (CXX) g++ options: -O3
Stress-NG Test: Socket Activity OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Socket Activity b a 2K 4K 6K 8K 10K SE +/- 1064.20, N = 15 9580.27 3072.80 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Pipe OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Pipe b a 5M 10M 15M 20M 25M SE +/- 858971.94, N = 15 22201912.16 18809740.35 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Laghos Test: Sedov Blast Wave, ube_922_hex.mesh OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Sedov Blast Wave, ube_922_hex.mesh b a 60 120 180 240 300 SE +/- 0.22, N = 3 265.39 264.34 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
GPAW Input: Carbon Nanotube OpenBenchmarking.org Seconds, Fewer Is Better GPAW 23.6 Input: Carbon Nanotube a b 20 40 60 80 100 SE +/- 0.26, N = 3 110.85 110.95 1. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi
VVenC Video Input: Bosphorus 4K - Video Preset: Fast OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.8 Video Input: Bosphorus 4K - Video Preset: Fast a b 1.224 2.448 3.672 4.896 6.12 SE +/- 0.015, N = 3 5.440 5.387 1. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto
SQLite Threads / Copies: 1 OpenBenchmarking.org Seconds, Fewer Is Better SQLite 3.41.2 Threads / Copies: 1 b a 20 40 60 80 100 SE +/- 0.30, N = 3 105.00 106.01 1. (CC) gcc options: -O2 -lreadline -ltermcap -lz -lm
OSPRay Benchmark: particle_volume/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: particle_volume/ao/real_time b a 3 6 9 12 15 SE +/- 0.00341, N = 3 9.89124 9.86893
Xonotic Resolution: 2560 x 1440 - Effects Quality: Ultimate OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 2560 x 1440 - Effects Quality: Ultimate a b 80 160 240 320 400 SE +/- 1.62, N = 3 384.02 381.32 MIN: 99 / MAX: 847 MIN: 106 / MAX: 824
Xonotic Resolution: 1920 x 1200 - Effects Quality: Ultimate OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 1920 x 1200 - Effects Quality: Ultimate b a 80 160 240 320 400 SE +/- 0.49, N = 3 386.81 384.77 MIN: 104 / MAX: 887 MIN: 102 / MAX: 919
Xonotic Resolution: 1920 x 1080 - Effects Quality: Ultimate OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 1920 x 1080 - Effects Quality: Ultimate a b 80 160 240 320 400 SE +/- 2.17, N = 3 386.94 386.28 MIN: 97 / MAX: 892 MIN: 101 / MAX: 871
Xonotic Resolution: 3840 x 2160 - Effects Quality: Ultra OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 3840 x 2160 - Effects Quality: Ultra b a 90 180 270 360 450 SE +/- 0.23, N = 3 423.33 420.75 MIN: 194 / MAX: 581 MIN: 194 / MAX: 579
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU a b 700 1400 2100 2800 3500 SE +/- 29.27, N = 3 3252.18 3275.76 MIN: 3200.87 MIN: 3269.28 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU b a 700 1400 2100 2800 3500 SE +/- 25.05, N = 3 3226.70 3235.41 MIN: 3215.86 MIN: 3194.45 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU b a 700 1400 2100 2800 3500 SE +/- 29.09, N = 3 3227.02 3244.46 MIN: 3219.36 MIN: 3177.37 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
Xonotic Resolution: 3840 x 2160 - Effects Quality: High OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 3840 x 2160 - Effects Quality: High b a 100 200 300 400 500 SE +/- 0.24, N = 3 469.75 467.69 MIN: 225 / MAX: 637 MIN: 222 / MAX: 635
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU a b 200 400 600 800 1000 SE +/- 8.51, N = 3 938.10 958.84 MIN: 914.11 MIN: 951.92 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU a b 200 400 600 800 1000 SE +/- 3.39, N = 3 976.47 987.99 MIN: 961.99 MIN: 979.32 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
Xonotic Resolution: 1920 x 1080 - Effects Quality: Ultra OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 1920 x 1080 - Effects Quality: Ultra b a 110 220 330 440 550 SE +/- 1.23, N = 3 524.55 518.67 MIN: 285 / MAX: 905 MIN: 259 / MAX: 910
Xonotic Resolution: 1920 x 1200 - Effects Quality: Ultra OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 1920 x 1200 - Effects Quality: Ultra a b 110 220 330 440 550 SE +/- 0.46, N = 3 521.50 521.20 MIN: 282 / MAX: 935 MIN: 285 / MAX: 919
Xonotic Resolution: 2560 x 1440 - Effects Quality: Ultra OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 2560 x 1440 - Effects Quality: Ultra b a 110 220 330 440 550 SE +/- 2.20, N = 3 527.21 520.79 MIN: 294 / MAX: 931 MIN: 272 / MAX: 931
Z3 Theorem Prover SMT File: 2.smt2 OpenBenchmarking.org Seconds, Fewer Is Better Z3 Theorem Prover 4.12.1 SMT File: 2.smt2 a b 20 40 60 80 100 SE +/- 0.12, N = 3 76.01 76.12 1. (CXX) g++ options: -lpthread -std=c++17 -fvisibility=hidden -mfpmath=sse -msse -msse2 -O3 -fPIC
OSPRay Benchmark: gravity_spheres_volume/dim_512/scivis/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/scivis/real_time a b 1.0406 2.0812 3.1218 4.1624 5.203 SE +/- 0.00170, N = 3 4.62468 4.62434
Xonotic Resolution: 1920 x 1080 - Effects Quality: High OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 1920 x 1080 - Effects Quality: High b a 120 240 360 480 600 SE +/- 1.66, N = 3 563.79 561.44 MIN: 337 / MAX: 945 MIN: 330 / MAX: 956
Xonotic Resolution: 2560 x 1440 - Effects Quality: High OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 2560 x 1440 - Effects Quality: High b a 120 240 360 480 600 SE +/- 0.81, N = 3 567.98 560.97 MIN: 347 / MAX: 923 MIN: 336 / MAX: 962
Xonotic Resolution: 1920 x 1200 - Effects Quality: High OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 1920 x 1200 - Effects Quality: High b a 120 240 360 480 600 SE +/- 3.14, N = 3 567.94 561.01 MIN: 343 / MAX: 932 MIN: 341 / MAX: 967
OSPRay Benchmark: gravity_spheres_volume/dim_512/ao/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/ao/real_time b a 1.1206 2.2412 3.3618 4.4824 5.603 SE +/- 0.00393, N = 3 4.98028 4.93554
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 512 b a 7 14 21 28 35 SE +/- 0.01, N = 3 27.73 27.71 1. (CXX) g++ options: -O3
LevelDB Benchmark: Seek Random OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.23 Benchmark: Seek Random b a 15 30 45 60 75 SE +/- 0.19, N = 3 64.56 65.84 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
OSPRay Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time OpenBenchmarking.org Items Per Second, More Is Better OSPRay 2.12 Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_time b a 2 4 6 8 10 SE +/- 0.01066, N = 3 7.70051 7.67668
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 512 b a 7 14 21 28 35 SE +/- 0.04, N = 3 30.04 30.01 1. (CXX) g++ options: -O3
Neural Magic DeepSparse Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream b a 40 80 120 160 200 SE +/- 1.22, N = 3 183.81 185.74
Neural Magic DeepSparse Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-Stream b a 20 40 60 80 100 SE +/- 0.56, N = 3 87.02 86.13
Xonotic Resolution: 3840 x 2160 - Effects Quality: Low OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 3840 x 2160 - Effects Quality: Low b a 150 300 450 600 750 SE +/- 1.56, N = 3 675.84 670.04 MIN: 413 / MAX: 1166 MIN: 387 / MAX: 1175
CP2K Molecular Dynamics Input: Fayalite-FIST OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2023.1 Input: Fayalite-FIST b a 30 60 90 120 150 122.98 123.83 1. (F9X) gfortran options: -fopenmp -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kdbm -lcp2kgrid -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -lhdf5 -lhdf5_hl -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -lopenblas -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
Xonotic Resolution: 1920 x 1080 - Effects Quality: Low OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 1920 x 1080 - Effects Quality: Low a b 140 280 420 560 700 SE +/- 0.98, N = 3 671.42 669.86 MIN: 430 / MAX: 1177 MIN: 439 / MAX: 1136
Xonotic Resolution: 1920 x 1200 - Effects Quality: Low OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 1920 x 1200 - Effects Quality: Low b a 150 300 450 600 750 SE +/- 1.00, N = 3 676.08 671.95 MIN: 427 / MAX: 1181 MIN: 431 / MAX: 1193
Xonotic Resolution: 2560 x 1440 - Effects Quality: Low OpenBenchmarking.org Frames Per Second, More Is Better Xonotic 0.8.6 Resolution: 2560 x 1440 - Effects Quality: Low b a 150 300 450 600 750 SE +/- 2.49, N = 3 687.49 673.17 MIN: 439 / MAX: 1194 MIN: 426 / MAX: 1185
Neural Magic DeepSparse Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream b a 14 28 42 56 70 SE +/- 0.10, N = 3 62.31 62.44
Neural Magic DeepSparse Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Asynchronous Multi-Stream b a 60 120 180 240 300 SE +/- 0.36, N = 3 256.64 255.99
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream b a 120 240 360 480 600 SE +/- 1.59, N = 3 551.16 552.28
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Asynchronous Multi-Stream b a 7 14 21 28 35 SE +/- 0.10, N = 3 28.92 28.88
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream b a 120 240 360 480 600 SE +/- 0.31, N = 3 550.43 556.61
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Asynchronous Multi-Stream b a 7 14 21 28 35 SE +/- 0.06, N = 3 28.99 28.61
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU b a 0.2649 0.5298 0.7947 1.0596 1.3245 SE +/- 0.016952, N = 14 1.089440 1.177229 MIN: 0.97 MIN: 0.89 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream b a 30 60 90 120 150 SE +/- 0.21, N = 3 134.32 134.41
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-Stream b a 30 60 90 120 150 SE +/- 0.18, N = 3 119.09 119.01
VVenC Video Input: Bosphorus 4K - Video Preset: Faster OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.8 Video Input: Bosphorus 4K - Video Preset: Faster a b 3 6 9 12 15 SE +/- 0.04, N = 3 10.93 10.89 1. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream a b 110 220 330 440 550 SE +/- 2.43, N = 3 482.75 485.95
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Asynchronous Multi-Stream a b 8 16 24 32 40 SE +/- 0.16, N = 3 33.10 32.91
Kripke OpenBenchmarking.org Throughput FoM, More Is Better Kripke 1.2.6 a b 30M 60M 90M 120M 150M SE +/- 636875.17, N = 3 148243333 146215600 1. (CXX) g++ options: -O3 -fopenmp -ldl
Neural Magic DeepSparse Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-Stream b a 8 16 24 32 40 SE +/- 0.22, N = 3 33.48 34.80
Neural Magic DeepSparse Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-Stream b a 7 14 21 28 35 SE +/- 0.18, N = 3 29.86 28.73
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Synchronous Single-Stream a b 6 12 18 24 30 SE +/- 0.14, N = 3 22.78 23.22
Neural Magic DeepSparse Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Synchronous Single-Stream a b 10 20 30 40 50 SE +/- 0.26, N = 3 43.88 43.05
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream b a 14 28 42 56 70 SE +/- 0.04, N = 3 60.66 61.17
Neural Magic DeepSparse Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-Stream b a 4 8 12 16 20 SE +/- 0.01, N = 3 16.48 16.35
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream b a 14 28 42 56 70 SE +/- 0.05, N = 3 60.86 61.20
Neural Magic DeepSparse Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-Stream b a 4 8 12 16 20 SE +/- 0.01, N = 3 16.43 16.34
Laghos Test: Triple Point Problem OpenBenchmarking.org Major Kernels Total Rate, More Is Better Laghos 3.1 Test: Triple Point Problem a b 50 100 150 200 250 SE +/- 0.34, N = 3 220.46 219.12 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
Neural Magic DeepSparse Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Synchronous Single-Stream b a 3 6 9 12 15 SE +/- 0.03, N = 3 12.75 12.88
Neural Magic DeepSparse Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Sentiment Analysis, 80% Pruned Quantized BERT Base Uncased - Scenario: Synchronous Single-Stream b a 20 40 60 80 100 SE +/- 0.19, N = 3 78.40 77.58
SVT-AV1 Encoder Mode: Preset 4 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 4 - Input: Bosphorus 4K a b 0.8451 1.6902 2.5353 3.3804 4.2255 SE +/- 0.010, N = 3 3.756 3.721 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
VVenC Video Input: Bosphorus 1080p - Video Preset: Fast OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.8 Video Input: Bosphorus 1080p - Video Preset: Fast a b 4 8 12 16 20 SE +/- 0.04, N = 3 13.88 13.77 1. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto
LevelDB Benchmark: Random Read OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.23 Benchmark: Random Read a b 10 20 30 40 50 SE +/- 0.19, N = 3 43.49 43.55 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
LevelDB Benchmark: Hot Read OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.23 Benchmark: Hot Read b a 10 20 30 40 50 SE +/- 0.21, N = 3 42.76 43.14 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream a b 11 22 33 44 55 SE +/- 0.03, N = 3 46.35 46.51
Neural Magic DeepSparse Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Segmentation, 90% Pruned YOLACT Pruned - Scenario: Synchronous Single-Stream a b 5 10 15 20 25 SE +/- 0.01, N = 3 21.57 21.49
Opus Codec Encoding WAV To Opus Encode OpenBenchmarking.org Seconds, Fewer Is Better Opus Codec Encoding 1.4 WAV To Opus Encode a b 7 14 21 28 35 SE +/- 0.05, N = 5 28.70 28.83 1. (CXX) g++ options: -O3 -fvisibility=hidden -logg -lm
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU b a 0.349 0.698 1.047 1.396 1.745 SE +/- 0.01212, N = 10 1.30181 1.55099 MIN: 1.19 MIN: 1.33 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream b a 15 30 45 60 75 SE +/- 0.04, N = 3 67.54 67.82
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-Stream b a 50 100 150 200 250 SE +/- 0.13, N = 3 236.82 235.81
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream b a 3 6 9 12 15 SE +/- 0.03, N = 3 11.98 12.29
Neural Magic DeepSparse Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-Stream b a 20 40 60 80 100 SE +/- 0.18, N = 3 83.43 81.34
eSpeak-NG Speech Engine Text-To-Speech Synthesis OpenBenchmarking.org Seconds, Fewer Is Better eSpeak-NG Speech Engine 1.51 Text-To-Speech Synthesis a b 7 14 21 28 35 SE +/- 0.34, N = 4 31.08 31.49 1. (CXX) g++ options: -O2
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream b a 20 40 60 80 100 SE +/- 0.10, N = 3 106.66 107.37
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO - Scenario: Asynchronous Multi-Stream b a 30 60 90 120 150 SE +/- 0.14, N = 3 149.96 148.97
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream b a 3 6 9 12 15 SE +/- 0.02, N = 3 10.86 10.91
Neural Magic DeepSparse Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Detection, YOLOv5s COCO - Scenario: Synchronous Single-Stream b a 20 40 60 80 100 SE +/- 0.13, N = 3 91.98 91.57
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream b a 11 22 33 44 55 SE +/- 0.05, N = 3 49.32 49.49
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-Stream b a 70 140 210 280 350 SE +/- 0.39, N = 3 324.27 323.09
Stress-NG Test: Futex OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Futex b a 1000K 2000K 3000K 4000K 5000K SE +/- 56259.21, N = 4 4644715.00 4610857.40 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org ms/batch, Fewer Is Better Neural Magic DeepSparse 1.5 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream b a 2 4 6 8 10 SE +/- 0.0378, N = 3 6.8330 7.0704
Neural Magic DeepSparse Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream OpenBenchmarking.org items/sec, More Is Better Neural Magic DeepSparse 1.5 Model: CV Classification, ResNet-50 ImageNet - Scenario: Synchronous Single-Stream b a 30 60 90 120 150 SE +/- 0.75, N = 3 146.19 141.30
libxsmm M N K: 64 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 64 b a 70 140 210 280 350 SE +/- 0.09, N = 3 318.7 318.5 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
libxsmm M N K: 32 OpenBenchmarking.org GFLOPS/s, More Is Better libxsmm 2-1.17-3645 M N K: 32 b a 40 80 120 160 200 SE +/- 0.07, N = 3 160.7 160.5 1. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2
Intel Open Image Denoise Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RTLightmap.hdr.4096x4096 - Device: CPU-Only b a 0.135 0.27 0.405 0.54 0.675 SE +/- 0.00, N = 3 0.60 0.60
Stress-NG Test: IO_uring OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: IO_uring b a 90K 180K 270K 360K 450K SE +/- 726.45, N = 3 440335.12 439798.24 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: MMAP OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: MMAP b a 100 200 300 400 500 SE +/- 0.81, N = 3 439.24 437.11 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Malloc OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Malloc a b 20M 40M 60M 80M 100M SE +/- 44041.52, N = 3 92853207.13 92812375.37 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Cloning OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Cloning b a 700 1400 2100 2800 3500 SE +/- 2.74, N = 3 3360.52 3354.40 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: MEMFD OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: MEMFD a b 90 180 270 360 450 SE +/- 0.62, N = 3 395.11 394.50 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Atomic OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Atomic b a 100 200 300 400 500 SE +/- 0.46, N = 3 480.51 480.06 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: CPU Cache OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: CPU Cache a b 300K 600K 900K 1200K 1500K SE +/- 19308.61, N = 3 1624118.54 1535034.64 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 512 a b 110M 220M 330M 440M 550M SE +/- 148361.42, N = 3 506326667 506050000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 8 - Buffer Length: 256 - Filter Length: 512 b a 20M 40M 60M 80M 100M SE +/- 37834.43, N = 3 82224000 82123667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Stress-NG Test: Zlib OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Zlib b a 1000 2000 3000 4000 5000 SE +/- 2.96, N = 3 4518.88 4517.78 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 512 b a 70M 140M 210M 280M 350M SE +/- 399263.21, N = 3 314560000 313753333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Stress-NG Test: Pthread OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Pthread b a 30K 60K 90K 120K 150K SE +/- 521.45, N = 3 128387.57 128353.64 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 8 - Buffer Length: 256 - Filter Length: 32 b a 80M 160M 240M 320M 400M SE +/- 120138.81, N = 3 355110000 354570000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 8 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 8 - Buffer Length: 256 - Filter Length: 57 b a 90M 180M 270M 360M 450M SE +/- 176099.72, N = 3 410090000 409183333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Stress-NG Test: Memory Copying OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Memory Copying b a 2K 4K 6K 8K 10K SE +/- 6.50, N = 3 10984.91 10973.65 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: NUMA OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: NUMA a b 160 320 480 640 800 SE +/- 5.01, N = 3 752.30 741.66 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 512 b a 30M 60M 90M 120M 150M SE +/- 32829.53, N = 3 160130000 160113333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Stress-NG Test: Matrix 3D Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Matrix 3D Math a b 600 1200 1800 2400 3000 SE +/- 1.75, N = 3 2806.09 2795.85 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Vector Shuffle OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Vector Shuffle a b 5K 10K 15K 20K 25K SE +/- 43.57, N = 3 22825.44 22200.86 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Function Call OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Function Call a b 5K 10K 15K 20K 25K SE +/- 38.00, N = 3 24278.34 24275.23 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Semaphores OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Semaphores b a 15M 30M 45M 60M 75M SE +/- 919418.56, N = 3 71041068.48 66510329.66 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Wide Vector Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Wide Vector Math a b 300K 600K 900K 1200K 1500K SE +/- 3256.42, N = 3 1501239.29 1496970.66 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Vector Floating Point OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Vector Floating Point b a 20K 40K 60K 80K 100K SE +/- 139.68, N = 3 95693.93 94803.76 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Glibc C String Functions OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Glibc C String Functions a b 7M 14M 21M 28M 35M SE +/- 238790.03, N = 3 33453867.32 33092079.25 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 57 b a 400M 800M 1200M 1600M 2000M SE +/- 19718378.34, N = 3 1837000000 1836033333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Stress-NG Test: System V Message Passing OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: System V Message Passing a b 2M 4M 6M 8M 10M SE +/- 13654.50, N = 3 10692419.88 10677047.79 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Floating Point OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Floating Point b a 2K 4K 6K 8K 10K SE +/- 7.25, N = 3 11221.55 11201.44 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 4 - Buffer Length: 256 - Filter Length: 512 b a 9M 18M 27M 36M 45M SE +/- 67087.84, N = 3 41475000 41277667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Stress-NG Test: Poll OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Poll b a 900K 1800K 2700K 3600K 4500K SE +/- 1922.15, N = 3 4101817.96 4084623.29 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Liquid-DSP Threads: 64 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 64 - Buffer Length: 256 - Filter Length: 32 b a 500M 1000M 1500M 2000M 2500M SE +/- 296273.15, N = 3 2269700000 2250733333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Stress-NG Test: Mutex OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Mutex a b 4M 8M 12M 16M 20M SE +/- 22386.94, N = 3 18827346.28 18816044.49 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: AVL Tree OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: AVL Tree a b 60 120 180 240 300 SE +/- 0.22, N = 3 283.41 282.42 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Crypto OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Crypto b a 20K 40K 60K 80K 100K SE +/- 78.61, N = 3 78455.19 78260.17 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 57 b a 300M 600M 900M 1200M 1500M SE +/- 2366666.67, N = 3 1512100000 1506266667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Stress-NG Test: Context Switching OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Context Switching b a 2M 4M 6M 8M 10M SE +/- 22031.57, N = 3 11620881.04 11409509.77 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Forking OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Forking a b 11K 22K 33K 44K 55K SE +/- 291.28, N = 3 51344.69 51160.22 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Vector Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Vector Math b a 50K 100K 150K 200K 250K SE +/- 22.96, N = 3 224460.71 224417.23 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Matrix Math OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Matrix Math b a 40K 80K 120K 160K 200K SE +/- 476.06, N = 3 200423.60 199178.68 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Hash OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Hash a b 1.6M 3.2M 4.8M 6.4M 8M SE +/- 2470.62, N = 3 7627578.66 7624159.24 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Glibc Qsort Data Sorting OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Glibc Qsort Data Sorting b a 200 400 600 800 1000 SE +/- 0.47, N = 3 943.84 942.22 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: CPU Stress OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: CPU Stress b a 20K 40K 60K 80K 100K SE +/- 76.38, N = 3 82887.24 82729.76 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: SENDFILE OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: SENDFILE b a 110K 220K 330K 440K 550K SE +/- 656.59, N = 3 528847.31 515575.47 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Stress-NG Test: Fused Multiply-Add OpenBenchmarking.org Bogo Ops/s, More Is Better Stress-NG 0.15.10 Test: Fused Multiply-Add b a 7M 14M 21M 28M 35M SE +/- 7490.49, N = 3 33539318.97 33507543.08 1. (CXX) g++ options: -lm -lapparmor -latomic -lc -lcrypt -ldl -lEGL -lGLESv2 -ljpeg -lmpfr -lpthread -lrt -lsctp -lz
Liquid-DSP Threads: 32 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 32 - Buffer Length: 256 - Filter Length: 32 b a 300M 600M 900M 1200M 1500M SE +/- 2211334.44, N = 3 1350000000 1343200000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 2 - Buffer Length: 256 - Filter Length: 512 b a 4M 8M 12M 16M 20M SE +/- 21712.77, N = 3 20989000 20851667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 57 b a 200M 400M 600M 800M 1000M SE +/- 377903.57, N = 3 799490000 795103333 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 16 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 16 - Buffer Length: 256 - Filter Length: 32 b a 150M 300M 450M 600M 750M SE +/- 1128996.60, N = 3 692130000 690800000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 512 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 512 b a 2M 4M 6M 8M 10M SE +/- 21333.33, N = 3 10560000 10537667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 32 a b 10M 20M 30M 40M 50M SE +/- 21825.06, N = 3 45075000 45023000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 2 - Buffer Length: 256 - Filter Length: 32 a b 20M 40M 60M 80M 100M SE +/- 85545.96, N = 3 89896333 89686000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 4 - Buffer Length: 256 - Filter Length: 57 b a 40M 80M 120M 160M 200M SE +/- 210502.05, N = 3 206140000 206086667 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 4 - Buffer Length: 256 - Filter Length: 32 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 4 - Buffer Length: 256 - Filter Length: 32 b a 40M 80M 120M 160M 200M SE +/- 81853.53, N = 3 179170000 178100000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 2 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 2 - Buffer Length: 256 - Filter Length: 57 a b 20M 40M 60M 80M 100M SE +/- 150591.43, N = 3 103806667 103260000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Liquid-DSP Threads: 1 - Buffer Length: 256 - Filter Length: 57 OpenBenchmarking.org samples/s, More Is Better Liquid-DSP 1.6 Threads: 1 - Buffer Length: 256 - Filter Length: 57 a b 11M 22M 33M 44M 55M SE +/- 193694.20, N = 3 51993333 51814000 1. (CC) gcc options: -O3 -pthread -lm -lc -lliquid
Z3 Theorem Prover SMT File: 1.smt2 OpenBenchmarking.org Seconds, Fewer Is Better Z3 Theorem Prover 4.12.1 SMT File: 1.smt2 b a 7 14 21 28 35 SE +/- 0.01, N = 3 29.90 29.93 1. (CXX) g++ options: -lpthread -std=c++17 -fvisibility=hidden -mfpmath=sse -msse -msse2 -O3 -fPIC
Embree Binary: Pathtracer ISPC - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer ISPC - Model: Asian Dragon Obj b a 8 16 24 32 40 SE +/- 0.06, N = 3 33.96 33.83 MIN: 33.75 / MAX: 34.43 MIN: 33.53 / MAX: 34.4
QMCPACK Input: simple-H2O OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.16 Input: simple-H2O b a 6 12 18 24 30 SE +/- 0.04, N = 3 27.48 27.60 1. (CXX) g++ options: -fopenmp -foffload=disable -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -ffast-math -march=native -O3 -lm -ldl
Embree Binary: Pathtracer - Model: Asian Dragon Obj OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer - Model: Asian Dragon Obj b a 9 18 27 36 45 SE +/- 0.05, N = 3 37.48 37.39 MIN: 37.25 / MAX: 38.16 MIN: 37.08 / MAX: 37.99
LevelDB Benchmark: Random Fill OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.23 Benchmark: Random Fill a b 60 120 180 240 300 SE +/- 0.90, N = 3 262.98 264.87 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
LevelDB Benchmark: Random Fill OpenBenchmarking.org MB/s, More Is Better LevelDB 1.23 Benchmark: Random Fill a b 6 12 18 24 30 SE +/- 0.09, N = 3 26.9 26.7 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
LevelDB Benchmark: Overwrite OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.23 Benchmark: Overwrite b a 60 120 180 240 300 SE +/- 0.92, N = 3 262.28 262.35 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
LevelDB Benchmark: Overwrite OpenBenchmarking.org MB/s, More Is Better LevelDB 1.23 Benchmark: Overwrite b a 6 12 18 24 30 SE +/- 0.09, N = 3 27.0 27.0 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
VVenC Video Input: Bosphorus 1080p - Video Preset: Faster OpenBenchmarking.org Frames Per Second, More Is Better VVenC 1.8 Video Input: Bosphorus 1080p - Video Preset: Faster a b 6 12 18 24 30 SE +/- 0.09, N = 3 24.88 24.80 1. (CXX) g++ options: -O3 -flto -fno-fat-lto-objects -flto=auto
dav1d Video Input: Chimera 1080p 10-bit OpenBenchmarking.org FPS, More Is Better dav1d 1.2.1 Video Input: Chimera 1080p 10-bit a b 80 160 240 320 400 SE +/- 0.32, N = 3 374.79 374.13 1. (CC) gcc options: -pthread -lm
Remhos Test: Sample Remap Example OpenBenchmarking.org Seconds, Fewer Is Better Remhos 1.0 Test: Sample Remap Example a b 6 12 18 24 30 SE +/- 0.04, N = 3 23.54 23.65 1. (CXX) g++ options: -O3 -std=c++11 -lmfem -lHYPRE -lmetis -lrt -lmpi_cxx -lmpi
dav1d Video Input: Chimera 1080p OpenBenchmarking.org FPS, More Is Better dav1d 1.2.1 Video Input: Chimera 1080p a b 90 180 270 360 450 SE +/- 0.11, N = 3 398.39 398.20 1. (CC) gcc options: -pthread -lm
CP2K Molecular Dynamics Input: H20-64 OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2023.1 Input: H20-64 b a 10 20 30 40 50 42.11 42.97 1. (F9X) gfortran options: -fopenmp -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kdbm -lcp2kgrid -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -lhdf5 -lhdf5_hl -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -lopenblas -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU a b 1.2897 2.5794 3.8691 5.1588 6.4485 SE +/- 0.03300, N = 3 5.69206 5.73181 MIN: 4.03 MIN: 4.01 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU a b 0.3232 0.6464 0.9696 1.2928 1.616 SE +/- 0.01766, N = 3 1.36630 1.43664 MIN: 1.27 MIN: 1.35 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
Embree Binary: Pathtracer ISPC - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer ISPC - Model: Crown b a 8 16 24 32 40 SE +/- 0.09, N = 3 34.53 34.41 MIN: 34.2 / MAX: 35.09 MIN: 33.95 / MAX: 35.09
Embree Binary: Pathtracer - Model: Crown OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer - Model: Crown a b 9 18 27 36 45 SE +/- 0.08, N = 3 38.47 38.45 MIN: 37.94 / MAX: 39.09 MIN: 38.09 / MAX: 38.98
Intel Open Image Denoise Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RT.hdr_alb_nrm.3840x2160 - Device: CPU-Only b a 0.2745 0.549 0.8235 1.098 1.3725 SE +/- 0.00, N = 3 1.22 1.22
Embree Binary: Pathtracer ISPC - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer ISPC - Model: Asian Dragon b a 9 18 27 36 45 SE +/- 0.02, N = 3 39.40 39.40 MIN: 39.18 / MAX: 39.86 MIN: 39.15 / MAX: 40.06
Intel Open Image Denoise Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only OpenBenchmarking.org Images / Sec, More Is Better Intel Open Image Denoise 2.0 Run: RT.ldr_alb_nrm.3840x2160 - Device: CPU-Only b a 0.2768 0.5536 0.8304 1.1072 1.384 SE +/- 0.00, N = 3 1.23 1.22
dav1d Video Input: Summer Nature 4K OpenBenchmarking.org FPS, More Is Better dav1d 1.2.1 Video Input: Summer Nature 4K a b 50 100 150 200 250 SE +/- 0.24, N = 3 222.52 222.24 1. (CC) gcc options: -pthread -lm
SVT-AV1 Encoder Mode: Preset 4 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 4 - Input: Bosphorus 1080p a b 3 6 9 12 15 SE +/- 0.05, N = 3 10.87 10.85 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
Embree Binary: Pathtracer - Model: Asian Dragon OpenBenchmarking.org Frames Per Second, More Is Better Embree 4.1 Binary: Pathtracer - Model: Asian Dragon b a 10 20 30 40 50 SE +/- 0.05, N = 3 41.78 41.59 MIN: 41.55 / MAX: 42.41 MIN: 41.23 / MAX: 42.19
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 256 b a 4 8 12 16 20 SE +/- 0.01, N = 3 13.79 13.76 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 256 a b 4 8 12 16 20 SE +/- 0.01, N = 3 13.88 13.85 1. (CXX) g++ options: -O3
Monte Carlo Simulations of Ionised Nebulae Input: Gas HII40 OpenBenchmarking.org Seconds, Fewer Is Better Monte Carlo Simulations of Ionised Nebulae 2.02.73.3 Input: Gas HII40 b a 3 6 9 12 15 SE +/- 0.05, N = 3 12.60 12.68 1. (F9X) gfortran options: -cpp -Jsource/ -ffree-line-length-0 -lm -std=legacy -O2 -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lz
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 8 - Input: Bosphorus 4K b a 12 24 36 48 60 SE +/- 0.30, N = 3 54.56 54.15 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
LevelDB Benchmark: Fill Sync OpenBenchmarking.org Microseconds Per Op, Fewer Is Better LevelDB 1.23 Benchmark: Fill Sync a b 4K 8K 12K 16K 20K SE +/- 65.20, N = 3 10866.01 16348.37 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
LevelDB Benchmark: Fill Sync OpenBenchmarking.org MB/s, More Is Better LevelDB 1.23 Benchmark: Fill Sync a b 0.135 0.27 0.405 0.54 0.675 SE +/- 0.00, N = 3 0.6 0.4 1. (CXX) g++ options: -fno-exceptions -fno-rtti -O3 -lgmock -lgtest -lsnappy -ltcmalloc
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU b a 0.9599 1.9198 2.8797 3.8396 4.7995 SE +/- 0.01334, N = 3 4.21682 4.26624 MIN: 4.1 MIN: 4.12 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU a b 0.2216 0.4432 0.6648 0.8864 1.108 SE +/- 0.010152, N = 3 0.948450 0.985098 MIN: 0.87 MIN: 0.9 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
SVT-AV1 Encoder Mode: Preset 8 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 8 - Input: Bosphorus 1080p b a 20 40 60 80 100 SE +/- 0.40, N = 3 85.50 85.31 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 256 a b 6 12 18 24 30 SE +/- 0.06, N = 3 27.43 27.27 1. (CXX) g++ options: -O3
SVT-AV1 Encoder Mode: Preset 12 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 12 - Input: Bosphorus 4K b a 30 60 90 120 150 SE +/- 1.42, N = 4 127.68 126.42 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 256 b a 7 14 21 28 35 SE +/- 0.11, N = 3 30.29 30.14 1. (CXX) g++ options: -O3
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 4K OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 13 - Input: Bosphorus 4K b a 30 60 90 120 150 SE +/- 0.08, N = 3 127.68 127.14 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU b a 1.2977 2.5954 3.8931 5.1908 6.4885 SE +/- 0.00320, N = 3 5.70574 5.76769 MIN: 5.64 MIN: 5.69 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU a b 1.088 2.176 3.264 4.352 5.44 SE +/- 0.01212, N = 3 4.81893 4.83565 MIN: 4.74 MIN: 4.78 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
dav1d Video Input: Summer Nature 1080p OpenBenchmarking.org FPS, More Is Better dav1d 1.2.1 Video Input: Summer Nature 1080p b a 130 260 390 520 650 SE +/- 0.86, N = 3 597.19 597.02 1. (CC) gcc options: -pthread -lm
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: FFTW - Precision: double-long - X Y Z: 128 a b 13 26 39 52 65 SE +/- 0.45, N = 15 56.45 55.85 1. (CXX) g++ options: -O3
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU a b 0.6072 1.2144 1.8216 2.4288 3.036 SE +/- 0.01103, N = 3 2.68566 2.69872 MIN: 2.61 MIN: 2.64 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.1 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU b a 0.3549 0.7098 1.0647 1.4196 1.7745 SE +/- 0.00305, N = 3 1.57190 1.57740 MIN: 1.5 MIN: 1.49 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl
SVT-AV1 Encoder Mode: Preset 12 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 12 - Input: Bosphorus 1080p a b 70 140 210 280 350 SE +/- 1.75, N = 3 308.25 305.73 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
SVT-AV1 Encoder Mode: Preset 13 - Input: Bosphorus 1080p OpenBenchmarking.org Frames Per Second, More Is Better SVT-AV1 1.6 Encoder Mode: Preset 13 - Input: Bosphorus 1080p b a 80 160 240 320 400 SE +/- 1.35, N = 3 364.28 360.92 1. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: Stock - Precision: double-long - X Y Z: 128 b a 6 12 18 24 30 SE +/- 0.17, N = 3 26.81 26.55 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: c2c - Backend: FFTW - Precision: double-long - X Y Z: 128 b a 7 14 21 28 35 SE +/- 0.37, N = 3 30.86 30.80 1. (CXX) g++ options: -O3
HeFFTe - Highly Efficient FFT for Exascale Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 128 OpenBenchmarking.org GFLOP/s, More Is Better HeFFTe - Highly Efficient FFT for Exascale 2.3 Test: r2c - Backend: Stock - Precision: double-long - X Y Z: 128 a b 12 24 36 48 60 SE +/- 0.72, N = 3 51.88 50.93 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5