AMD EPYC 4th Gen AVX-512 Comparison

AMD EPYC 9654 Genoa AVX-512 benchmark comparison by Michael Larabel for a future article.

Compare your own system(s) to this result file with the Phoronix Test Suite by running the command: phoronix-test-suite benchmark 2212195-NE-AVXCOMPAR69
Jump To Table - Results

View

Do Not Show Noisy Results
Do Not Show Results With Incomplete Data
Do Not Show Results With Little Change/Spread
List Notable Results
Show Result Confidence Charts
Allow Limiting Results To Certain Suite(s)

Statistics

Show Overall Harmonic Mean(s)
Show Overall Geometric Mean
Show Wins / Losses Counts (Pie Chart)
Normalize Results
Remove Outliers Before Calculating Averages

Graph Settings

Force Line Graphs Where Applicable
Convert To Scalar Where Applicable
Prefer Vertical Bar Graphs
No Box Plots
On Line Graphs With Missing Data, Connect The Line Gaps

Multi-Way Comparison

Condense Multi-Option Tests Into Single Result Graphs
Condense Test Profiles With Multiple Version Results Into Single Result Graphs

Table

Show Detailed System Result Table

Run Management

Highlight
Result
Toggle/Hide
Result
Result
Identifier
Performance Per
Dollar
Date
Run
  Test
  Duration
AVX-512 On
December 18 2022
  20 Hours, 2 Minutes
AVX-512 Off
December 18 2022
  15 Hours, 29 Minutes
Invert Behavior (Only Show Selected Data)
  17 Hours, 45 Minutes
Only show results matching title/arguments (delimit multiple options with a comma):
Do not show results matching title/arguments (delimit multiple options with a comma):


AMD EPYC 4th Gen AVX-512 ComparisonOpenBenchmarking.orgPhoronix Test Suite2 x AMD EPYC 9654 96-Core @ 2.40GHz (192 Cores / 384 Threads)AMD Titanite_4G (RTI1002E BIOS)AMD Device 14a41520GB800GB INTEL SSDPF21Q800GBASPEEDVGA HDMIBroadcom NetXtreme BCM5720 PCIeUbuntu 22.106.1.0-phx (x86_64)GNOME Shell 43.0X Server 1.21.1.41.3.224GCC 12.2.0 + Clang 15.0.2-1ext41920x1080ProcessorMotherboardChipsetMemoryDiskGraphicsMonitorNetworkOSKernelDesktopDisplay ServerVulkanCompilerFile-SystemScreen ResolutionAMD EPYC 4th Gen AVX-512 Comparison BenchmarksSystem Logs- Transparent Huge Pages: madvise- AVX-512 On: CXXFLAGS="-O3 -march=native -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mprefer-vector-width=512" CFLAGS="-O3 -march=native -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mprefer-vector-width=512" - AVX-512 Off: CXXFLAGS="-O3 -march=native -mno-avx512f" CFLAGS="-O3 -march=native -mno-avx512f" - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-U8K4Qv/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v - Scaling Governor: acpi-cpufreq performance (Boost: Enabled) - CPU Microcode: 0xa10110d - Python 3.10.7- itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Retpolines IBPB: conditional IBRS_FW STIBP: always-on RSB filling PBRSB-eIBRS: Not affected + srbds: Not affected + tsx_async_abort: Not affected

AVX-512 On vs. AVX-512 Off ComparisonPhoronix Test SuiteBaseline+46.4%+46.4%+92.8%+92.8%+139.2%+139.2%50.5%37.6%3.6%CPU - 16 - AlexNet185.4%R.N.N.T - bf16bf16bf16 - CPU155.1%D.T.S153.4%R.N.N.T - f32 - CPU152.5%W.P.D.F - CPU143.4%W.P.D.F - CPU143%F.D.F - CPU132.2%F.D.F - CPU131.9%A.G.R.R.0.F - CPU120%LBC, LBRY Credits117%Device AI Score115.1%M.T.E.T.D.F - CPU114.8%M.T.E.T.D.F - CPU114.7%W.P.D.F.I - CPU104.7%W.P.D.F.I - CPU104.5%F.D.F.I - CPU104.2%F.D.F.I - CPU103.9%V.D.F - CPU97.1%V.D.F - CPU97%CPU - 16 - GoogLeNet93.3%D.I.S92.8%V.D.F.I - CPU84.7%V.D.F.I - CPU84.6%D.B.s - f32 - CPU78.2%P.V.B.D.F - CPU76.5%P.V.B.D.F - CPU76.4%CPU - 16 - ResNet-5073.7%gravity_spheres_volume/dim_512/scivis/real_time71.4%Q.S.2.P70.9%P.D.F - CPU70%P.D.F - CPU69.5%P.D.F - CPU68.4%P.D.F - CPU67.7%gravity_spheres_volume/dim_512/ao/real_time67.4%scrypt62.9%A.G.R.R.0.F.I - CPU58.3%resnet-v2-5056.1%N.Q.A.B.b.u.S.1.P - A.M.S55.2%N.Q.A.B.b.u.S.1.P - A.M.S55.2%inception-v3Garlicoin46.5%OpenMP - BM144.1%OpenMP - BM144.1%N.Q.A.B.b.u.S.1.P - S.S.S41.5%N.Q.A.B.b.u.S.1.P - S.S.S41.5%C.C.R.5.I - A.M.S38.5%C.C.R.5.I - A.M.S38.4%R.N.N.I - bf16bf16bf16 - CPUA.G.R.R.0.F - CPU37.4%Skeincoin36.7%OpenMP - BM235.4%OpenMP - BM235.4%gravity_spheres_volume/dim_512/pathtracer/real_time33.8%x25x33.1%DistinctUserID30.7%PartialTweets30.4%C.D.Y.C - A.M.S24.1%C.D.Y.C - A.M.S24%Kostya23.3%2 - 1080p - 32 - Path Tracer23%2 - 4K - 16 - Path Tracer22.8%2 - 1080p - 16 - Path Tracer22.7%1 - 1080p - 32 - Path Tracer22.4%1 - 1080p - 1 - Path Tracer22.3%2 - 4K - 32 - Path Tracer22.2%1 - 4K - 1 - Path Tracer22.2%1 - 4K - 32 - Path Tracer22.1%1 - 4K - 16 - Path Tracer22.1%1 - 1080p - 16 - Path Tracer22%3 - 1080p - 16 - Path Tracer21.7%3 - 1080p - 32 - Path Tracer21.6%TopTweet21.2%3 - 4K - 1 - Path Tracer21.2%LargeRand21.2%N.T.C.B.b.u.S - A.M.S21%3 - 4K - 16 - Path Tracer21%N.T.C.B.b.u.S - A.M.S20.9%3 - 1080p - 1 - Path Tracer20.9%3 - 4K - 32 - Path Tracer20.7%Pathtracer ISPC - Asian Dragon20.4%Pathtracer ISPC - Asian Dragon Obj20.2%C.D.Y.C - S.S.S20.1%C.D.Y.C - S.S.S20.1%Pathtracer ISPC - Crown19.8%2 - 4K - 1 - Path Tracer19.5%2 - 1080p - 1 - Path Tracer19.5%N.T.C.D.m - A.M.S19%d.M.M.S - Execution Time19%N.T.C.D.m - A.M.S19%OpenMP - Points2Image18.7%super-resolution-10 - CPU - Standard17.7%N.T.C.B.b.u.S - S.S.S15.6%N.T.C.B.b.u.S - S.S.S15.6%vklBenchmark ISPC15.3%CPU - vision_transformer15%ArcFace ResNet-100 - CPU - Standard14.5%A.G.R.R.0.F.I - CPU12.8%N.T.C.B.b.u.c - S.S.S12.8%N.T.C.B.b.u.c - S.S.S12.8%N.D.C.o.b.u.o.I - S.S.S12.2%N.D.C.o.b.u.o.I - S.S.S12.2%CPU - blazeface12.1%Eigen11.4%CPU - regnety_400m9.2%F.x.A9.1%CPU - efficientnet-b08.9%fcn-resnet101-11 - CPU - Standard8.4%BLAS7.8%Fayalite-FIST6.7%SqueezeNetV1.06.6%d.M.M.S - Mesh Time6.3%N.T.C.D.m - S.S.S5.4%N.T.C.D.m - S.S.S5.4%Preset 12 - Bosphorus 4K5%4.9%JPEG - 904.6%CPU - googlenet4.6%PNG - 904.5%CPU - mnasnet4.5%B.C4.3%C.B.S.A - f32 - CPU4.2%RTLightmap.hdr.4096x4096bertsquad-12 - CPU - Standard3.4%Windowed Gaussian3.3%CPU - FastestDet3%Preset 13 - Bosphorus 4K2.8%JPEG - 1002.8%OpenMP - NDT Mapping2.7%TensorFlowoneDNNAI Benchmark AlphaoneDNNOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOCpuminer-OptAI Benchmark AlphaOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOOpenVINOTensorFlowAI Benchmark AlphaOpenVINOOpenVINOoneDNNOpenVINOOpenVINOTensorFlowOSPRayCpuminer-OptOpenVINOOpenVINOOpenVINOOpenVINOOSPRayCpuminer-OptOpenVINOMobile Neural NetworkNeural Magic DeepSparseNeural Magic DeepSparseMobile Neural NetworkCpuminer-OptminiBUDEminiBUDENeural Magic DeepSparseNeural Magic DeepSparseNeural Magic DeepSparseNeural Magic DeepSparseoneDNNOpenVINOCpuminer-OptminiBUDEminiBUDEOSPRayCpuminer-OptsimdjsonsimdjsonNeural Magic DeepSparseNeural Magic DeepSparsesimdjsonOSPRay StudioOSPRay StudioOSPRay StudioOSPRay StudioOSPRay StudioOSPRay StudioOSPRay StudioOSPRay StudioOSPRay StudioOSPRay StudioOSPRay StudioOSPRay StudiosimdjsonOSPRay StudiosimdjsonNeural Magic DeepSparseOSPRay StudioNeural Magic DeepSparseOSPRay StudioOSPRay StudioEmbreeEmbreeNeural Magic DeepSparseNeural Magic DeepSparseEmbreeOSPRay StudioOSPRay StudioNeural Magic DeepSparseOpenFOAMNeural Magic DeepSparseDarmstadt Automotive Parallel Heterogeneous SuiteONNX RuntimeNeural Magic DeepSparseNeural Magic DeepSparseOpenVKLNCNNONNX RuntimeOpenVINONeural Magic DeepSparseNeural Magic DeepSparseNeural Magic DeepSparseNeural Magic DeepSparseNCNNLeelaChessZeroNCNNSMHasherNCNNONNX RuntimeLeelaChessZeroCP2K Molecular DynamicsMobile Neural NetworkOpenFOAMNeural Magic DeepSparseNeural Magic DeepSparseSVT-AV1Numpy BenchmarkJPEG XL libjxlNCNNJPEG XL libjxlNCNNNumenta Anomaly BenchmarkoneDNNIntel Open Image DenoiseONNX RuntimeNumenta Anomaly BenchmarkNCNNSVT-AV1JPEG XL libjxlDarmstadt Automotive Parallel Heterogeneous SuiteAVX-512 OnAVX-512 Off

AMD EPYC 4th Gen AVX-512 Comparisontensorflow: CPU - 16 - AlexNetonednn: Recurrent Neural Network Training - bf16bf16bf16 - CPUai-benchmark: Device Training Scoreonednn: Recurrent Neural Network Training - f32 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Weld Porosity Detection FP16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Face Detection FP16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUcpuminer-opt: LBC, LBRY Creditsai-benchmark: Device AI Scoreopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Machine Translation EN To DE FP16 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Weld Porosity Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Face Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16 - CPUopenvino: Vehicle Detection FP16 - CPUai-benchmark: Device Inference Scoreopenvino: Vehicle Detection FP16-INT8 - CPUopenvino: Vehicle Detection FP16-INT8 - CPUonednn: Deconvolution Batch shapes_3d - f32 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUopenvino: Person Vehicle Bike Detection FP16 - CPUtensorflow: CPU - 16 - ResNet-50ospray: gravity_spheres_volume/dim_512/scivis/real_timecpuminer-opt: Quad SHA-256, Pyriteopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP16 - CPUopenvino: Person Detection FP32 - CPUopenvino: Person Detection FP32 - CPUospray: gravity_spheres_volume/dim_512/ao/real_timecpuminer-opt: scryptopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUmnn: resnet-v2-50deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Streammnn: inception-v3cpuminer-opt: Garlicoinminibude: OpenMP - BM1minibude: OpenMP - BM1deepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Streamdeepsparse: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamdeepsparse: CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Streamonednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPUopenvino: Age Gender Recognition Retail 0013 FP16 - CPUcpuminer-opt: Skeincoinminibude: OpenMP - BM2minibude: OpenMP - BM2ospray: gravity_spheres_volume/dim_512/pathtracer/real_timecpuminer-opt: x25xsimdjson: DistinctUserIDsimdjson: PartialTweetsdeepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Streamdeepsparse: CV Detection,YOLOv5s COCO - Asynchronous Multi-Streamsimdjson: Kostyaospray-studio: 2 - 1080p - 32 - Path Tracerospray-studio: 2 - 4K - 16 - Path Tracerospray-studio: 2 - 1080p - 16 - Path Tracerospray-studio: 1 - 1080p - 32 - Path Tracerospray-studio: 1 - 1080p - 1 - Path Tracerospray-studio: 2 - 4K - 32 - Path Tracerospray-studio: 1 - 4K - 1 - Path Tracerospray-studio: 1 - 4K - 32 - Path Tracerospray-studio: 1 - 4K - 16 - Path Tracerospray-studio: 1 - 1080p - 16 - Path Tracerospray-studio: 3 - 1080p - 16 - Path Tracerospray-studio: 3 - 1080p - 32 - Path Tracersimdjson: TopTweetospray-studio: 3 - 4K - 1 - Path Tracersimdjson: LargeRanddeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamospray-studio: 3 - 4K - 16 - Path Tracerdeepsparse: NLP Text Classification, BERT base uncased SST2 - Asynchronous Multi-Streamospray-studio: 3 - 1080p - 1 - Path Tracerospray-studio: 3 - 4K - 32 - Path Tracerembree: Pathtracer ISPC - Asian Dragonembree: Pathtracer ISPC - Asian Dragon Objdeepsparse: CV Detection,YOLOv5s COCO - Synchronous Single-Streamdeepsparse: CV Detection,YOLOv5s COCO - Synchronous Single-Streamembree: Pathtracer ISPC - Crownospray-studio: 2 - 4K - 1 - Path Tracerospray-studio: 2 - 1080p - 1 - Path Tracerdeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamopenfoam: drivaerFastback, Medium Mesh Size - Execution Timedeepsparse: NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Streamdeepsparse: NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Streamopenvkl: vklBenchmark ISPCopenvino: Age Gender Recognition Retail 0013 FP16-INT8 - CPUdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Streamdeepsparse: NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Streamdeepsparse: NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Streamncnn: CPU - blazefacelczero: Eigenncnn: CPU - regnety_400msmhasher: FarmHash32 x86_64 AVXncnn: CPU - efficientnet-b0onnx: fcn-resnet101-11 - CPU - Standardlczero: BLAScp2k: Fayalite-FISTmnn: SqueezeNetV1.0openfoam: drivaerFastback, Medium Mesh Size - Mesh Timedeepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Streamdeepsparse: NLP Text Classification, DistilBERT mnli - Synchronous Single-Streamnumpy: jpegxl: JPEG - 90ncnn: CPU - googlenetjpegxl: PNG - 90ncnn: CPU - mnasnetnumenta-nab: Bayesian Changepointonednn: Convolution Batch Shapes Auto - f32 - CPUoidn: RTLightmap.hdr.4096x4096onnx: bertsquad-12 - CPU - Standardnumenta-nab: Windowed Gaussianjpegxl: JPEG - 100daphne: OpenMP - NDT Mappingncnn: CPU - resnet50gromacs: MPI CPU - water_GMX50_barenumenta-nab: Relative Entropyonednn: Deconvolution Batch shapes_1d - f32 - CPUoidn: RT.hdr_alb_nrm.3840x2160oidn: RT.ldr_alb_nrm.3840x2160smhasher: FarmHash32 x86_64 AVXdaphne: OpenMP - Points2Imagesvt-av1: Preset 12 - Bosphorus 4Ksvt-av1: Preset 13 - Bosphorus 4Konnx: ArcFace ResNet-100 - CPU - Standardonnx: super-resolution-10 - CPU - Standardncnn: CPU - FastestDetncnn: CPU - vision_transformertensorflow: CPU - 16 - GoogLeNetAVX-512 OnAVX-512 Off157.291955.3927011962.064.799988.44102.04469.290.551066950621150.11956.8819800.209.63193.93246.987452.966.43351011202.624.280.9895919065.345.2922.1543.0711226068043.341100.2543.341101.0044.15664800.610.3615.441761.2241125.786045.812715537299.545291.982100.54589.940849.03281953.02552361.27148967.981951330346.0818652.00554.21877696.066.866.70857.2265111.72824.184764938523844698148188085811854692712353280456086.526941.26616.680411070155.346217722184212.9364184.4701189.56115.2712180.931560415480.0826113.637331195.936784.998811.75861332170652.7134.520628.960934.251829.189225.889096247.3239794.4457.7127190771122.7258.579135.77418187.08375.3413575.669.5572.809.9542.3016.6770.5161671.665164.7270.741371.1866.3418.7649.89822.58753.513.5126.38713677.764494843245.555246.5721051740158.9574.9360.1755.114989.1310664953.9811.664110.4943.941088.501.214917402887107.62445.599672.9419.6994.97503.503782.0812.6718216065.767.901.763365135.569.3312.7525.1354132296025.501864.9325.741845.9526.37942946.080.5724.098490.3658195.165630.436488275065.096202.60471.043714.065567.91181410.80511715.76108449.491427543255.6616391.51940.51175784.255.255.14690.8722138.58613.39586111524292457491812298571022647113212871341368205.388411.04509.446213398187.854021426785176.8831153.4476157.8196.3313151.055872218495.3347135.216481005.161573.538013.59071155151239.6030.606232.664730.526732.750529.008162270.1236483.7062.8625084231198.4089.146144.3478177.53875.6282548.839.1376.139.5244.1917.3890.5376821.724994.8820.721335.6367.4818.46710.03522.83543.503.5126.41211521.38233.944239.881918628860.7386.1631.12OpenBenchmarking.org

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.10Device: CPU - Batch Size: 16 - Model: AlexNetAVX-512 OffAVX-512 On306090120150SE +/- 0.19, N = 3SE +/- 2.35, N = 1255.11157.29

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPUAVX-512 OffAVX-512 On11002200330044005500SE +/- 41.71, N = 3SE +/- 19.67, N = 154989.131955.39-mno-avx512f - MIN: 4767.22-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 1769.71. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

AI Benchmark Alpha

AI Benchmark Alpha is a Python library for evaluating artificial intelligence (AI) performance on diverse hardware platforms and relies upon the TensorFlow machine learning library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterAI Benchmark Alpha 0.1.2Device Training ScoreAVX-512 OffAVX-512 On600120018002400300010662701

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPUAVX-512 OffAVX-512 On11002200330044005500SE +/- 60.12, N = 4SE +/- 20.13, N = 54953.981962.06-mno-avx512f - MIN: 4690.85-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 1899.341. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16 - Device: CPUAVX-512 OffAVX-512 On3691215SE +/- 0.01, N = 3SE +/- 0.01, N = 311.664.79-mno-avx512f - MIN: 9.72 / MAX: 43.7-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 3.98 / MAX: 28.791. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16 - Device: CPUAVX-512 OffAVX-512 On2K4K6K8K10KSE +/- 4.29, N = 3SE +/- 6.96, N = 34110.499988.44-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Face Detection FP16 - Device: CPUAVX-512 OffAVX-512 On20406080100SE +/- 0.01, N = 3SE +/- 0.07, N = 343.94102.04-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Face Detection FP16 - Device: CPUAVX-512 OffAVX-512 On2004006008001000SE +/- 0.51, N = 3SE +/- 0.24, N = 31088.50469.29-mno-avx512f - MIN: 937.5 / MAX: 1231.55-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 410.93 / MAX: 547.331. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16 - Device: CPUAVX-512 OffAVX-512 On0.27230.54460.81691.08921.3615SE +/- 0.01, N = 4SE +/- 0.00, N = 31.210.55-mno-avx512f - MIN: 0.98 / MAX: 47.37-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.5 / MAX: 29.571. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: LBC, LBRY CreditsAVX-512 OffAVX-512 On200K400K600K800K1000KSE +/- 667.26, N = 3SE +/- 7645.47, N = 34917401066950-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -lcurl -lz -lpthread -lssl -lcrypto -lgmp

AI Benchmark Alpha

AI Benchmark Alpha is a Python library for evaluating artificial intelligence (AI) performance on diverse hardware platforms and relies upon the TensorFlow machine learning library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterAI Benchmark Alpha 0.1.2Device AI ScoreAVX-512 OffAVX-512 On1300260039005200650028876211

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Machine Translation EN To DE FP16 - Device: CPUAVX-512 OffAVX-512 On20406080100SE +/- 0.74, N = 3SE +/- 0.21, N = 3107.6250.11-mno-avx512f - MIN: 83.48 / MAX: 216.07-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 38.98 / MAX: 166.691. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Machine Translation EN To DE FP16 - Device: CPUAVX-512 OffAVX-512 On2004006008001000SE +/- 3.04, N = 3SE +/- 4.00, N = 3445.59956.88-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 On4K8K12K16K20KSE +/- 1.81, N = 3SE +/- 12.72, N = 39672.9419800.20-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Weld Porosity Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 On510152025SE +/- 0.01, N = 3SE +/- 0.01, N = 319.699.63-mno-avx512f - MIN: 16.4 / MAX: 74.85-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 8.29 / MAX: 57.531. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Face Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 On4080120160200SE +/- 0.02, N = 3SE +/- 0.05, N = 394.97193.93-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Face Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 On110220330440550SE +/- 0.07, N = 3SE +/- 0.01, N = 3503.50246.98-mno-avx512f - MIN: 402.47 / MAX: 569.65-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 204.5 / MAX: 299.541. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16 - Device: CPUAVX-512 OffAVX-512 On16003200480064008000SE +/- 10.17, N = 3SE +/- 3.52, N = 33782.087452.96-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16 - Device: CPUAVX-512 OffAVX-512 On3691215SE +/- 0.03, N = 3SE +/- 0.00, N = 312.676.43-mno-avx512f - MIN: 9.58 / MAX: 77.91-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 5.22 / MAX: 61.261. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

AI Benchmark Alpha

AI Benchmark Alpha is a Python library for evaluating artificial intelligence (AI) performance on diverse hardware platforms and relies upon the TensorFlow machine learning library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterAI Benchmark Alpha 0.1.2Device Inference ScoreAVX-512 OffAVX-512 On800160024003200400018213510

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 On2K4K6K8K10KSE +/- 1.39, N = 3SE +/- 2.53, N = 36065.7611202.62-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Vehicle Detection FP16-INT8 - Device: CPUAVX-512 OffAVX-512 On246810SE +/- 0.00, N = 3SE +/- 0.00, N = 37.904.28-mno-avx512f - MIN: 6.37 / MAX: 41.53-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 3.5 / MAX: 42.621. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPUAVX-512 OffAVX-512 On0.39680.79361.19041.58721.984SE +/- 0.004550, N = 9SE +/- 0.001906, N = 91.7633600.989591-mno-avx512f - MIN: 1.55-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.91. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Vehicle Bike Detection FP16 - Device: CPUAVX-512 OffAVX-512 On2K4K6K8K10KSE +/- 2.17, N = 3SE +/- 4.01, N = 35135.569065.34-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Vehicle Bike Detection FP16 - Device: CPUAVX-512 OffAVX-512 On3691215SE +/- 0.00, N = 3SE +/- 0.00, N = 39.335.29-mno-avx512f - MIN: 7.39 / MAX: 57.57-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 4.36 / MAX: 45.251. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

TensorFlow

This is a benchmark of the TensorFlow deep learning framework using the TensorFlow reference benchmarks (tensorflow/benchmarks with tf_cnn_benchmarks.py). Note with the Phoronix Test Suite there is also pts/tensorflow-lite for benchmarking the TensorFlow Lite binaries too. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.10Device: CPU - Batch Size: 16 - Model: ResNet-50AVX-512 OffAVX-512 On510152025SE +/- 0.01, N = 3SE +/- 0.11, N = 312.7522.15

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.10Benchmark: gravity_spheres_volume/dim_512/scivis/real_timeAVX-512 OffAVX-512 On1020304050SE +/- 0.04, N = 3SE +/- 0.10, N = 325.1443.07

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: Quad SHA-256, PyriteAVX-512 OffAVX-512 On500K1000K1500K2000K2500KSE +/- 276.83, N = 3SE +/- 9040.25, N = 313229602260680-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Detection FP16 - Device: CPUAVX-512 OffAVX-512 On1020304050SE +/- 0.22, N = 3SE +/- 0.20, N = 325.5043.34-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Detection FP16 - Device: CPUAVX-512 OffAVX-512 On400800120016002000SE +/- 15.65, N = 3SE +/- 4.83, N = 31864.931100.25-mno-avx512f - MIN: 1371.29 / MAX: 2532.74-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 783.9 / MAX: 1824.921. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Person Detection FP32 - Device: CPUAVX-512 OffAVX-512 On1020304050SE +/- 0.04, N = 3SE +/- 0.18, N = 325.7443.34-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Person Detection FP32 - Device: CPUAVX-512 OffAVX-512 On400800120016002000SE +/- 3.32, N = 3SE +/- 4.44, N = 31845.951101.00-mno-avx512f - MIN: 1386.31 / MAX: 2798-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 840.82 / MAX: 1792.511. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.10Benchmark: gravity_spheres_volume/dim_512/ao/real_timeAVX-512 OffAVX-512 On1020304050SE +/- 0.02, N = 3SE +/- 0.21, N = 326.3844.16

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: scryptAVX-512 OffAVX-512 On10002000300040005000SE +/- 0.31, N = 3SE +/- 0.45, N = 32946.084800.61-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -lcurl -lz -lpthread -lssl -lcrypto -lgmp

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAVX-512 OffAVX-512 On0.12830.25660.38490.51320.6415SE +/- 0.00, N = 3SE +/- 0.00, N = 30.570.36-mno-avx512f - MIN: 0.52 / MAX: 47.93-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.34 / MAX: 33.71. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

Mobile Neural Network

MNN is the Mobile Neural Network as a highly efficient, lightweight deep learning framework developed by Alibaba. This MNN test profile is building the OpenMP / CPU threaded version for processor benchmarking and not any GPU-accelerated test. MNN does allow making use of AVX-512 extensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.1Model: resnet-v2-50AVX-512 OffAVX-512 On612182430SE +/- 0.08, N = 8SE +/- 0.08, N = 924.1015.44-mno-avx512f - MIN: 23.44 / MAX: 71.32-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 14.79 / MAX: 54.051. (CXX) g++ options: -O3 -march=native -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-StreamAVX-512 OffAVX-512 On160320480640800SE +/- 0.61, N = 3SE +/- 0.79, N = 3490.37761.22

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Asynchronous Multi-StreamAVX-512 OffAVX-512 On4080120160200SE +/- 0.23, N = 3SE +/- 0.11, N = 3195.17125.79

Mobile Neural Network

MNN is the Mobile Neural Network as a highly efficient, lightweight deep learning framework developed by Alibaba. This MNN test profile is building the OpenMP / CPU threaded version for processor benchmarking and not any GPU-accelerated test. MNN does allow making use of AVX-512 extensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.1Model: inception-v3AVX-512 OffAVX-512 On1020304050SE +/- 0.13, N = 8SE +/- 0.23, N = 930.4445.81-mno-avx512f - MIN: 28.75 / MAX: 109.22-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 44.13 / MAX: 87.281. (CXX) g++ options: -O3 -march=native -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: GarlicoinAVX-512 OffAVX-512 On15K30K45K60K75KSE +/- 13.33, N = 3SE +/- 89.69, N = 34882771553-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -lcurl -lz -lpthread -lssl -lcrypto -lgmp

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AVX-512 OffAVX-512 On16003200480064008000SE +/- 26.42, N = 8SE +/- 5.79, N = 105065.107299.551. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM1AVX-512 OffAVX-512 On60120180240300SE +/- 1.06, N = 8SE +/- 0.23, N = 10202.60291.981. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On20406080100SE +/- 0.17, N = 3SE +/- 0.17, N = 371.04100.55

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On48121620SE +/- 0.0334, N = 3SE +/- 0.0172, N = 314.06559.9408

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamAVX-512 OffAVX-512 On1530456075SE +/- 0.02, N = 3SE +/- 0.11, N = 367.9149.03

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: CV Classification, ResNet-50 ImageNet - Scenario: Asynchronous Multi-StreamAVX-512 OffAVX-512 On400800120016002000SE +/- 0.17, N = 3SE +/- 4.32, N = 31410.811953.03

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPUAVX-512 OffAVX-512 On5001000150020002500SE +/- 22.32, N = 3SE +/- 24.41, N = 41715.762361.27-mno-avx512f - MIN: 1607.13-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 2265.71. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16 - Device: CPUAVX-512 OffAVX-512 On30K60K90K120K150KSE +/- 1130.85, N = 4SE +/- 1328.20, N = 3108449.49148967.98-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: SkeincoinAVX-512 OffAVX-512 On400K800K1200K1600K2000KSE +/- 3964.01, N = 3SE +/- 13005.55, N = 314275431951330-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -lcurl -lz -lpthread -lssl -lcrypto -lgmp

miniBUDE

MiniBUDE is a mini application for the the core computation of the Bristol University Docking Engine (BUDE). This test profile currently makes use of the OpenMP implementation of miniBUDE for CPU benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AVX-512 OffAVX-512 On80160240320400SE +/- 1.27, N = 3SE +/- 0.46, N = 4255.66346.081. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2AVX-512 OffAVX-512 On2K4K6K8K10KSE +/- 31.65, N = 3SE +/- 11.62, N = 46391.528652.011. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

OSPRay

Intel OSPRay is a portable ray-tracing engine for high-performance, high-fidelity scientific visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems Per Second, More Is BetterOSPRay 2.10Benchmark: gravity_spheres_volume/dim_512/pathtracer/real_timeAVX-512 OffAVX-512 On1224364860SE +/- 0.03, N = 3SE +/- 0.05, N = 340.5154.22

Cpuminer-Opt

Cpuminer-Opt is a fork of cpuminer-multi that carries a wide range of CPU performance optimizations for measuring the potential cryptocurrency mining performance of the CPU/processor with a wide variety of cryptocurrencies. The benchmark reports the hash speed for the CPU mining performance for the selected cryptocurrency. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgkH/s, More Is BetterCpuminer-Opt 3.20.3Algorithm: x25xAVX-512 OffAVX-512 On16003200480064008000SE +/- 14.41, N = 3SE +/- 16.58, N = 35784.257696.06-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -lcurl -lz -lpthread -lssl -lcrypto -lgmp

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: DistinctUserIDAVX-512 OffAVX-512 On246810SE +/- 0.01, N = 3SE +/- 0.02, N = 35.256.86-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: PartialTweetsAVX-512 OffAVX-512 On246810SE +/- 0.01, N = 3SE +/- 0.04, N = 35.146.70-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-StreamAVX-512 OffAVX-512 On2004006008001000SE +/- 1.12, N = 3SE +/- 0.77, N = 3690.87857.23

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: CV Detection,YOLOv5s COCO - Scenario: Asynchronous Multi-StreamAVX-512 OffAVX-512 On306090120150SE +/- 0.25, N = 3SE +/- 0.12, N = 3138.59111.73

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: KostyaAVX-512 OffAVX-512 On0.94051.8812.82153.7624.7025SE +/- 0.00, N = 3SE +/- 0.02, N = 33.394.18-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 2 - Resolution: 1080p - Samples Per Pixel: 32 - Renderer: Path TracerAVX-512 OffAVX-512 On13002600390052006500SE +/- 8.25, N = 3SE +/- 6.24, N = 358614764-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 2 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path TracerAVX-512 OffAVX-512 On2K4K6K8K10KSE +/- 15.51, N = 3SE +/- 2.19, N = 3115249385-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 2 - Resolution: 1080p - Samples Per Pixel: 16 - Renderer: Path TracerAVX-512 OffAVX-512 On6001200180024003000SE +/- 0.88, N = 3SE +/- 9.17, N = 329242384-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 1 - Resolution: 1080p - Samples Per Pixel: 32 - Renderer: Path TracerAVX-512 OffAVX-512 On12002400360048006000SE +/- 6.56, N = 3SE +/- 3.71, N = 357494698-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 1 - Resolution: 1080p - Samples Per Pixel: 1 - Renderer: Path TracerAVX-512 OffAVX-512 On4080120160200SE +/- 0.33, N = 3SE +/- 0.33, N = 3181148-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 2 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path TracerAVX-512 OffAVX-512 On5K10K15K20K25KSE +/- 30.02, N = 3SE +/- 20.34, N = 32298518808-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 1 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path TracerAVX-512 OffAVX-512 On150300450600750SE +/- 2.08, N = 3SE +/- 1.00, N = 3710581-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 1 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path TracerAVX-512 OffAVX-512 On5K10K15K20K25KSE +/- 27.41, N = 3SE +/- 13.75, N = 32264718546-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 1 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path TracerAVX-512 OffAVX-512 On2K4K6K8K10KSE +/- 24.46, N = 3SE +/- 10.17, N = 3113219271-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 1 - Resolution: 1080p - Samples Per Pixel: 16 - Renderer: Path TracerAVX-512 OffAVX-512 On6001200180024003000SE +/- 1.20, N = 3SE +/- 2.33, N = 328712353-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 1080p - Samples Per Pixel: 16 - Renderer: Path TracerAVX-512 OffAVX-512 On7001400210028003500SE +/- 7.00, N = 3SE +/- 3.06, N = 334132804-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 1080p - Samples Per Pixel: 32 - Renderer: Path TracerAVX-512 OffAVX-512 On15003000450060007500SE +/- 5.51, N = 3SE +/- 9.54, N = 368205608-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: TopTweetAVX-512 OffAVX-512 On246810SE +/- 0.01, N = 3SE +/- 0.08, N = 45.386.52-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path TracerAVX-512 OffAVX-512 On2004006008001000SE +/- 1.15, N = 3SE +/- 1.53, N = 3841694-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

simdjson

This is a benchmark of SIMDJSON, a high performance JSON parser. SIMDJSON aims to be the fastest JSON parser and is used by projects like Microsoft FishStore, Yandex ClickHouse, Shopify, and others. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgGB/s, More Is Bettersimdjson 2.0Throughput Test: LargeRandomAVX-512 OffAVX-512 On0.28350.5670.85051.1341.4175SE +/- 0.00, N = 3SE +/- 0.00, N = 31.041.26-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-StreamAVX-512 OffAVX-512 On130260390520650SE +/- 0.33, N = 3SE +/- 1.92, N = 3509.45616.68

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 4K - Samples Per Pixel: 16 - Renderer: Path TracerAVX-512 OffAVX-512 On3K6K9K12K15KSE +/- 22.81, N = 3SE +/- 27.14, N = 31339811070-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Asynchronous Multi-StreamAVX-512 OffAVX-512 On4080120160200SE +/- 0.10, N = 3SE +/- 0.47, N = 3187.85155.35

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 1080p - Samples Per Pixel: 1 - Renderer: Path TracerAVX-512 OffAVX-512 On50100150200250SE +/- 0.58, N = 3SE +/- 0.00, N = 3214177-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 3 - Resolution: 4K - Samples Per Pixel: 32 - Renderer: Path TracerAVX-512 OffAVX-512 On6K12K18K24K30KSE +/- 15.62, N = 3SE +/- 21.63, N = 32678522184-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 3.13Binary: Pathtracer ISPC - Model: Asian DragonAVX-512 OffAVX-512 On50100150200250SE +/- 0.21, N = 8SE +/- 0.22, N = 9176.88212.94MIN: 170.19 / MAX: 190MIN: 207.72 / MAX: 227.11

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 3.13Binary: Pathtracer ISPC - Model: Asian Dragon ObjAVX-512 OffAVX-512 On4080120160200SE +/- 0.36, N = 4SE +/- 0.62, N = 4153.45184.47MIN: 130.31 / MAX: 166.17MIN: 178.83 / MAX: 196.34

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: CV Detection,YOLOv5s COCO - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On4080120160200SE +/- 0.17, N = 3SE +/- 0.22, N = 3157.82189.56

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: CV Detection,YOLOv5s COCO - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On246810SE +/- 0.0068, N = 3SE +/- 0.0061, N = 36.33135.2712

Embree

Intel Embree is a collection of high-performance ray-tracing kernels for execution on CPUs and supporting instruction sets such as SSE, AVX, AVX2, and AVX-512. Embree also supports making use of the Intel SPMD Program Compiler (ISPC). Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFrames Per Second, More Is BetterEmbree 3.13Binary: Pathtracer ISPC - Model: CrownAVX-512 OffAVX-512 On4080120160200SE +/- 0.25, N = 7SE +/- 0.62, N = 8151.06180.93MIN: 114.4 / MAX: 176.43MIN: 124.41 / MAX: 209.74

OSPRay Studio

Intel OSPRay Studio is an open-source, interactive visualization and ray-tracing software package. OSPRay Studio makes use of Intel OSPRay, a portable ray-tracing engine for high-performance, high-fidelity visualizations. OSPRay builds off Intel's Embree and Intel SPMD Program Compiler (ISPC) components as part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 2 - Resolution: 4K - Samples Per Pixel: 1 - Renderer: Path TracerAVX-512 OffAVX-512 On160320480640800SE +/- 0.67, N = 3SE +/- 1.00, N = 3722604-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

OpenBenchmarking.orgms, Fewer Is BetterOSPRay Studio 0.11Camera: 2 - Resolution: 1080p - Samples Per Pixel: 1 - Renderer: Path TracerAVX-512 OffAVX-512 On4080120160200SE +/- 0.33, N = 3SE +/- 0.00, N = 3184154-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ldl

Neural Magic DeepSparse

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamAVX-512 OffAVX-512 On20406080100SE +/- 0.05, N = 3SE +/- 0.14, N = 395.3380.08

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Execution TimeAVX-512 OffAVX-512 On306090120150135.22113.641. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, DistilBERT mnli - Scenario: Asynchronous Multi-StreamAVX-512 OffAVX-512 On30060090012001500SE +/- 0.78, N = 3SE +/- 2.39, N = 31005.161195.94

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On20406080100SE +/- 0.27, N = 3SE +/- 0.45, N = 373.5485.00

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, BERT base uncased SST2 - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On3691215SE +/- 0.05, N = 3SE +/- 0.06, N = 313.5911.76

OpenVKL

OpenVKL is the Intel Open Volume Kernel Library that offers high-performance volume computation kernels and part of the Intel oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgItems / Sec, More Is BetterOpenVKL 1.3.1Benchmark: vklBenchmark ISPCAVX-512 OffAVX-512 On30060090012001500SE +/- 8.11, N = 3SE +/- 13.58, N = 311551332MIN: 251 / MAX: 5181MIN: 329 / MAX: 4770

OpenVINO

This is a test of the Intel OpenVINO, a toolkit around neural networks, using its built-in benchmarking support and analyzing the throughput and latency for various models. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgFPS, More Is BetterOpenVINO 2022.2.devModel: Age Gender Recognition Retail 0013 FP16-INT8 - Device: CPUAVX-512 OffAVX-512 On40K80K120K160K200KSE +/- 1455.37, N = 3SE +/- 127.59, N = 3151239.60170652.71-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -fPIC -O3 -march=native -fsigned-char -ffunction-sections -fdata-sections -fno-strict-overflow -fwrapv -flto -shared

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On816243240SE +/- 0.03, N = 3SE +/- 0.03, N = 330.6134.52

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Token Classification, BERT base uncased conll2003 - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On816243240SE +/- 0.04, N = 3SE +/- 0.03, N = 332.6628.96

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On816243240SE +/- 0.10, N = 3SE +/- 0.08, N = 330.5334.25

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Document Classification, oBERT base uncased on IMDB - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On816243240SE +/- 0.11, N = 3SE +/- 0.07, N = 332.7529.19

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: CPU - Model: blazefaceAVX-512 OffAVX-512 On714212835SE +/- 0.54, N = 3SE +/- 0.18, N = 829.0025.88-mno-avx512f - MIN: 26.03 / MAX: 144.12-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 24.45 / MAX: 112.371. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: EigenAVX-512 OffAVX-512 On2K4K6K8K10KSE +/- 43.59, N = 3SE +/- 45.37, N = 381629096-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -flto -O3 -march=native -pthread

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: CPU - Model: regnety_400mAVX-512 OffAVX-512 On60120180240300SE +/- 3.75, N = 3SE +/- 2.25, N = 8270.12247.32-mno-avx512f - MIN: 245.47 / MAX: 498.8-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 232.61 / MAX: 506.551. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread

SMHasher

SMHasher is a hash function tester supporting various algorithms and able to make use of AVX and other modern CPU instruction set extensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMiB/sec, More Is BetterSMHasher 2022-08-22Hash: FarmHash32 x86_64 AVXAVX-512 OffAVX-512 On9K18K27K36K45KSE +/- 28.78, N = 5SE +/- 0.46, N = 536483.7039794.44-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -flto=auto -fno-fat-lto-objects

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: CPU - Model: efficientnet-b0AVX-512 OffAVX-512 On1428425670SE +/- 0.31, N = 3SE +/- 0.35, N = 862.8657.71-mno-avx512f - MIN: 59.46 / MAX: 154.82-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 54.52 / MAX: 522.741. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: fcn-resnet101-11 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 On60120180240300SE +/- 2.36, N = 12SE +/- 0.44, N = 3250271-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

LeelaChessZero

LeelaChessZero (lc0 / lczero) is a chess engine automated vian neural networks. This test profile can be used for OpenCL, CUDA + cuDNN, and BLAS (CPU-based) benchmarking. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNodes Per Second, More Is BetterLeelaChessZero 0.28Backend: BLASAVX-512 OffAVX-512 On2K4K6K8K10KSE +/- 21.53, N = 3SE +/- 103.04, N = 484239077-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -flto -O3 -march=native -pthread

CP2K Molecular Dynamics

CP2K is an open-source molecular dynamics software package focused on quantum chemistry and solid-state physics. This test profile currently uses the SSMP (OpenMP) version of cp2k. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterCP2K Molecular Dynamics 8.2Input: Fayalite-FISTAVX-512 OffAVX-512 On300600900120015001198.411122.73

Mobile Neural Network

MNN is the Mobile Neural Network as a highly efficient, lightweight deep learning framework developed by Alibaba. This MNN test profile is building the OpenMP / CPU threaded version for processor benchmarking and not any GPU-accelerated test. MNN does allow making use of AVX-512 extensions. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterMobile Neural Network 2.1Model: SqueezeNetV1.0AVX-512 OffAVX-512 On3691215SE +/- 0.091, N = 8SE +/- 0.147, N = 99.1468.579-mno-avx512f - MIN: 7.72 / MAX: 19.03-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 6.67 / MAX: 21.51. (CXX) g++ options: -O3 -march=native -std=c++11 -fvisibility=hidden -fomit-frame-pointer -fstrict-aliasing -ffunction-sections -fdata-sections -ffast-math -fno-rtti -fno-exceptions -rdynamic -pthread -ldl

OpenFOAM

OpenFOAM is the leading free, open-source software for computational fluid dynamics (CFD). This test profile currently uses the drivaerFastback test case for analyzing automotive aerodynamics or alternatively the older motorBike input. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenFOAM 10Input: drivaerFastback, Medium Mesh Size - Mesh TimeAVX-512 OffAVX-512 On306090120150144.35135.771. (CXX) g++ options: -std=c++14 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lfiniteVolume -lmeshTools -lparallel -llagrangian -lregionModels -lgenericPatchFields -lOpenFOAM -ldl -lm

Neural Magic DeepSparse

OpenBenchmarking.orgitems/sec, More Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On4080120160200SE +/- 0.29, N = 3SE +/- 0.11, N = 3177.54187.08

OpenBenchmarking.orgms/batch, Fewer Is BetterNeural Magic DeepSparse 1.1Model: NLP Text Classification, DistilBERT mnli - Scenario: Synchronous Single-StreamAVX-512 OffAVX-512 On1.26632.53263.79895.06526.3315SE +/- 0.0091, N = 3SE +/- 0.0031, N = 35.62825.3413

Numpy Benchmark

This is a test to obtain the general Numpy performance. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgScore, More Is BetterNumpy BenchmarkAVX-512 OffAVX-512 On120240360480600SE +/- 0.59, N = 3SE +/- 2.18, N = 3548.83575.66

JPEG XL libjxl

The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance using the reference libjxl library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMP/s, More Is BetterJPEG XL libjxl 0.7Input: JPEG - Quality: 90AVX-512 OffAVX-512 On3691215SE +/- 0.03, N = 3SE +/- 0.04, N = 39.139.55-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -fno-rtti -funwind-tables -O2 -fPIE -pie -lm -latomic

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: CPU - Model: googlenetAVX-512 OffAVX-512 On20406080100SE +/- 1.35, N = 3SE +/- 0.76, N = 876.1372.80-mno-avx512f - MIN: 70.13 / MAX: 155.61-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 67.13 / MAX: 388.521. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread

JPEG XL libjxl

The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance using the reference libjxl library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMP/s, More Is BetterJPEG XL libjxl 0.7Input: PNG - Quality: 90AVX-512 OffAVX-512 On3691215SE +/- 0.02, N = 3SE +/- 0.02, N = 39.529.95-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -fno-rtti -funwind-tables -O2 -fPIE -pie -lm -latomic

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: CPU - Model: mnasnetAVX-512 OffAVX-512 On1020304050SE +/- 0.49, N = 3SE +/- 0.26, N = 844.1942.30-mno-avx512f - MIN: 41.56 / MAX: 148.98-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 39.64 / MAX: 571.781. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Bayesian ChangepointAVX-512 OffAVX-512 On48121620SE +/- 0.19, N = 5SE +/- 0.21, N = 417.3916.68

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPUAVX-512 OffAVX-512 On0.1210.2420.3630.4840.605SE +/- 0.004200, N = 15SE +/- 0.004067, N = 70.5376820.516167-mno-avx512f - MIN: 0.42-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 0.421. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RTLightmap.hdr.4096x4096AVX-512 OffAVX-512 On0.3870.7741.1611.5481.935SE +/- 0.01, N = 3SE +/- 0.01, N = 31.721.66

ONNX Runtime

ONNX Runtime is developed by Microsoft and partners as a open-source, cross-platform, high performance machine learning inferencing and training accelerator. This test profile runs the ONNX Runtime with various models available from the ONNX Zoo. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: bertsquad-12 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 On110220330440550SE +/- 3.11, N = 3SE +/- 1.86, N = 3499516-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Windowed GaussianAVX-512 OffAVX-512 On1.09852.1973.29554.3945.4925SE +/- 0.044, N = 7SE +/- 0.032, N = 154.8824.727

JPEG XL libjxl

The JPEG XL Image Coding System is designed to provide next-generation JPEG image capabilities with JPEG XL offering better image quality and compression over legacy JPEG. This test profile is currently focused on the multi-threaded JPEG XL image encode performance using the reference libjxl library. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgMP/s, More Is BetterJPEG XL libjxl 0.7Input: JPEG - Quality: 100AVX-512 OffAVX-512 On0.16650.3330.49950.6660.8325SE +/- 0.01, N = 9SE +/- 0.01, N = 90.720.74-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -fno-rtti -funwind-tables -O2 -fPIE -pie -lm -latomic

Darmstadt Automotive Parallel Heterogeneous Suite

DAPHNE is the Darmstadt Automotive Parallel HeterogeNEous Benchmark Suite with OpenCL / CUDA / OpenMP test cases for these automotive benchmarks for evaluating programming models in context to vehicle autonomous driving capabilities. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: NDT MappingAVX-512 OffAVX-512 On30060090012001500SE +/- 8.66, N = 3SE +/- 5.94, N = 31335.631371.181. (CXX) g++ options: -O3 -std=c++11 -fopenmp

NCNN

NCNN is a high performance neural network inference framework optimized for mobile and other platforms developed by Tencent. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: CPU - Model: resnet50AVX-512 OffAVX-512 On1530456075SE +/- 0.89, N = 3SE +/- 0.50, N = 867.4866.34-mno-avx512f - MIN: 63.4 / MAX: 170.14-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 62.92 / MAX: 194.741. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread

GROMACS

The GROMACS (GROningen MAchine for Chemical Simulations) molecular dynamics package testing with the water_GMX50 data. This test profile allows selecting between CPU and GPU-based GROMACS builds. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2022.1Implementation: MPI CPU - Input: water_GMX50_bareAVX-512 OffAVX-512 On510152025SE +/- 0.10, N = 3SE +/- 0.24, N = 318.4718.76-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native

Numenta Anomaly Benchmark

Numenta Anomaly Benchmark (NAB) is a benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial time-series data files plus a novel scoring mechanism designed for real-time applications. This test profile currently measures the time to run various detectors. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgSeconds, Fewer Is BetterNumenta Anomaly Benchmark 1.1Detector: Relative EntropyAVX-512 OffAVX-512 On3691215SE +/- 0.082, N = 5SE +/- 0.089, N = 510.0359.898

oneDNN

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 2.7Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPUAVX-512 OffAVX-512 On510152025SE +/- 0.06, N = 3SE +/- 0.16, N = 322.8422.59-mno-avx512f - MIN: 20.22-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 19.61. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -pie -ldl -lpthread

Intel Open Image Denoise

Open Image Denoise is a denoising library for ray-tracing and part of the oneAPI rendering toolkit. Learn more via the OpenBenchmarking.org test page.

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RT.hdr_alb_nrm.3840x2160AVX-512 OffAVX-512 On0.78981.57962.36943.15923.949SE +/- 0.03, N = 5SE +/- 0.01, N = 53.503.51

OpenBenchmarking.orgImages / Sec, More Is BetterIntel Open Image Denoise 1.4.0Run: RT.ldr_alb_nrm.3840x2160AVX-512 OffAVX-512 On0.78981.57962.36943.15923.949SE +/- 0.01, N = 5SE +/- 0.01, N = 53.513.51

CPU Temperature Monitor

OpenBenchmarking.orgCelsiusCPU Temperature MonitorPhoronix Test Suite System MonitoringAVX-512 OffAVX-512 On1428425670Min: 35.5 / Avg: 51.26 / Max: 73.75Min: 30.13 / Avg: 49.97 / Max: 73.38

CPU Power Consumption Monitor

OpenBenchmarking.orgWattsCPU Power Consumption MonitorPhoronix Test Suite System MonitoringAVX-512 OffAVX-512 On130260390520650Min: 106.95 / Avg: 449.58 / Max: 735.32Min: 26.37 / Avg: 434.8 / Max: 766.01

Darmstadt Automotive Parallel Heterogeneous Suite

OpenBenchmarking.orgCelsius, Fewer Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteCPU Temperature MonitorAVX-512 OffAVX-512 On1020304050Min: 36.75 / Avg: 39.2 / Max: 49.75Min: 31.25 / Avg: 38.03 / Max: 47.5

OpenBenchmarking.orgWatts, Fewer Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteCPU Power Consumption MonitorAVX-512 OffAVX-512 On50100150200250Min: 136.19 / Avg: 273.1 / Max: 289.31Min: 49.36 / Avg: 258.94 / Max: 283.12

OpenBenchmarking.orgMegahertz, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteCPU Peak Freq (Highest CPU Core Frequency) MonitorAVX-512 OffAVX-512 On8001600240032004000Min: 3694 / Avg: 3698.64 / Max: 4768Min: 2400 / Avg: 3647.5 / Max: 3715

OpenBenchmarking.orgTest Cases Per Minute Per Watt, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: Points2ImageAVX-512 OffAVX-512 On122436486042.1952.82

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous SuiteBackend: OpenMP - Kernel: Points2ImageAVX-512 OffAVX-512 On3K6K9K12K15KSE +/- 269.82, N = 15SE +/- 126.17, N = 1511521.3813677.761. (CXX) g++ options: -O3 -std=c++11 -fopenmp

SVT-AV1

MinAvgMaxAVX-512 Off37.840.548.1AVX-512 On30.634.643.0OpenBenchmarking.orgCelsius, Fewer Is BetterSVT-AV1 1.4CPU Temperature Monitor1428425670

MinAvgMaxAVX-512 Off127.1241.8338.9AVX-512 On52.7175.4338.1OpenBenchmarking.orgWatts, Fewer Is BetterSVT-AV1 1.4CPU Power Consumption Monitor80160240320400

MinAvgMaxAVX-512 Off369437183985AVX-512 On240032743770OpenBenchmarking.orgMegahertz, More Is BetterSVT-AV1 1.4CPU Peak Freq (Highest CPU Core Frequency) Monitor11002200330044005500

OpenBenchmarking.orgFrames Per Second Per Watt, More Is BetterSVT-AV1 1.4Encoder Mode: Preset 12 - Input: Bosphorus 4KAVX-512 OffAVX-512 On0.3150.630.9451.261.5750.9681.400

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.4Encoder Mode: Preset 12 - Input: Bosphorus 4KAVX-512 OffAVX-512 On50100150200250SE +/- 3.57, N = 15SE +/- 4.78, N = 15233.94245.56

MinAvgMaxAVX-512 Off39.042.449.3AVX-512 On32.436.849.3OpenBenchmarking.orgCelsius, Fewer Is BetterSVT-AV1 1.4CPU Temperature Monitor1428425670

MinAvgMaxAVX-512 Off108.8243.3325.0AVX-512 On52.7162.9323.1OpenBenchmarking.orgWatts, Fewer Is BetterSVT-AV1 1.4CPU Power Consumption Monitor80160240320400

MinAvgMaxAVX-512 Off369437284349AVX-512 On240032023756OpenBenchmarking.orgMegahertz, More Is BetterSVT-AV1 1.4CPU Peak Freq (Highest CPU Core Frequency) Monitor11002200330044005500

OpenBenchmarking.orgFrames Per Second Per Watt, More Is BetterSVT-AV1 1.4Encoder Mode: Preset 13 - Input: Bosphorus 4KAVX-512 OffAVX-512 On0.34040.68081.02121.36161.7020.9861.513

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 1.4Encoder Mode: Preset 13 - Input: Bosphorus 4KAVX-512 OffAVX-512 On50100150200250SE +/- 4.01, N = 15SE +/- 4.74, N = 15239.88246.57

ONNX Runtime

OpenBenchmarking.orgCelsius, Fewer Is BetterONNX Runtime 1.11CPU Temperature MonitorAVX-512 OffAVX-512 On1122334455Min: 38.13 / Avg: 50.46 / Max: 57.25Min: 32.75 / Avg: 48.03 / Max: 50.63

OpenBenchmarking.orgWatts, Fewer Is BetterONNX Runtime 1.11CPU Power Consumption MonitorAVX-512 OffAVX-512 On80160240320400Min: 107.66 / Avg: 447.6 / Max: 464.67Min: 51.25 / Avg: 441.77 / Max: 464.82

OpenBenchmarking.orgMegahertz, More Is BetterONNX Runtime 1.11CPU Peak Freq (Highest CPU Core Frequency) MonitorAVX-512 OffAVX-512 On7001400210028003500Min: 3593 / Avg: 3607.6 / Max: 4285Min: 2400 / Avg: 3571.99 / Max: 3770

OpenBenchmarking.orgInferences Per Minute Per Watt, More Is BetterONNX Runtime 1.11Model: ArcFace ResNet-100 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 On0.53531.07061.60592.14122.67652.0512.379

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: ArcFace ResNet-100 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 On2004006008001000SE +/- 20.83, N = 12SE +/- 39.92, N = 129181051-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

OpenBenchmarking.orgCelsius, Fewer Is BetterONNX Runtime 1.11CPU Temperature MonitorAVX-512 OffAVX-512 On1122334455Min: 39.5 / Avg: 43.19 / Max: 53Min: 36.38 / Avg: 39.9 / Max: 42.63

OpenBenchmarking.orgWatts, Fewer Is BetterONNX Runtime 1.11CPU Power Consumption MonitorAVX-512 OffAVX-512 On50100150200250Min: 132.49 / Avg: 266.8 / Max: 278.02Min: 53.19 / Avg: 259.31 / Max: 268.38

OpenBenchmarking.orgMegahertz, More Is BetterONNX Runtime 1.11CPU Peak Freq (Highest CPU Core Frequency) MonitorAVX-512 OffAVX-512 On8001600240032004000Min: 3693 / Avg: 3701.93 / Max: 4397Min: 2400 / Avg: 3664.92 / Max: 3710

OpenBenchmarking.orgInferences Per Minute Per Watt, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 On71421283523.5728.54

OpenBenchmarking.orgInferences Per Minute, More Is BetterONNX Runtime 1.11Model: super-resolution-10 - Device: CPU - Executor: StandardAVX-512 OffAVX-512 On16003200480064008000SE +/- 260.54, N = 12SE +/- 5.46, N = 362887401-mno-avx512f-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi1. (CXX) g++ options: -O3 -march=native -ffunction-sections -fdata-sections -mtune=native -flto=auto -fno-fat-lto-objects -ldl -lrt

NCNN

OpenBenchmarking.orgCelsius, Fewer Is BetterNCNN 20220729CPU Temperature MonitorAVX-512 OffAVX-512 On1020304050Min: 43 / Avg: 46.1 / Max: 50.13Min: 38.63 / Avg: 44.45 / Max: 48

OpenBenchmarking.orgWatts, Fewer Is BetterNCNN 20220729CPU Power Consumption MonitorAVX-512 OffAVX-512 On80160240320400Min: 199.1 / Avg: 400.13 / Max: 450.41Min: 54.96 / Avg: 396.44 / Max: 451.5

OpenBenchmarking.orgMegahertz, More Is BetterNCNN 20220729CPU Peak Freq (Highest CPU Core Frequency) MonitorAVX-512 OffAVX-512 On7001400210028003500Min: 3563 / Avg: 3574.38 / Max: 3815Min: 2400 / Avg: 3565.67 / Max: 3699

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: CPU - Model: FastestDetAVX-512 OffAVX-512 On1428425670SE +/- 2.90, N = 3SE +/- 0.49, N = 860.7358.95-mno-avx512f - MIN: 52.89 / MAX: 236.85-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 55.11 / MAX: 282.721. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread

OpenBenchmarking.orgms, Fewer Is BetterNCNN 20220729Target: CPU - Model: vision_transformerAVX-512 OffAVX-512 On20406080100SE +/- 5.82, N = 3SE +/- 1.59, N = 886.1674.93-mno-avx512f - MIN: 73.72 / MAX: 1760.74-mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi - MIN: 65.04 / MAX: 2154.621. (CXX) g++ options: -O3 -march=native -rdynamic -lgomp -lpthread

TensorFlow

MinAvgMaxAVX-512 Off38.640.550.4AVX-512 On34.641.748.9OpenBenchmarking.orgCelsius, Fewer Is BetterTensorFlow 2.10CPU Temperature Monitor1428425670

MinAvgMaxAVX-512 Off156.1298.9329.3AVX-512 On53.1308.0379.5OpenBenchmarking.orgWatts, Fewer Is BetterTensorFlow 2.10CPU Power Consumption Monitor100200300400500

MinAvgMaxAVX-512 Off369437274474AVX-512 On240035864161OpenBenchmarking.orgMegahertz, More Is BetterTensorFlow 2.10CPU Peak Freq (Highest CPU Core Frequency) Monitor12002400360048006000

OpenBenchmarking.orgimages/sec Per Watt, More Is BetterTensorFlow 2.10Device: CPU - Batch Size: 16 - Model: GoogLeNetAVX-512 OffAVX-512 On0.04390.08780.13170.17560.21950.1040.195

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.10Device: CPU - Batch Size: 16 - Model: GoogLeNetAVX-512 OffAVX-512 On1326395265SE +/- 0.25, N = 3SE +/- 1.01, N = 1531.1260.17

163 Results Shown

TensorFlow
oneDNN
AI Benchmark Alpha
oneDNN
OpenVINO:
  Weld Porosity Detection FP16 - CPU:
    ms
    FPS
  Face Detection FP16 - CPU:
    FPS
    ms
  Age Gender Recognition Retail 0013 FP16 - CPU:
    ms
Cpuminer-Opt
AI Benchmark Alpha
OpenVINO:
  Machine Translation EN To DE FP16 - CPU:
    ms
    FPS
  Weld Porosity Detection FP16-INT8 - CPU:
    FPS
    ms
  Face Detection FP16-INT8 - CPU:
    FPS
    ms
  Vehicle Detection FP16 - CPU:
    FPS
    ms
AI Benchmark Alpha
OpenVINO:
  Vehicle Detection FP16-INT8 - CPU:
    FPS
    ms
oneDNN
OpenVINO:
  Person Vehicle Bike Detection FP16 - CPU:
    FPS
    ms
TensorFlow
OSPRay
Cpuminer-Opt
OpenVINO:
  Person Detection FP16 - CPU:
    FPS
    ms
  Person Detection FP32 - CPU:
    FPS
    ms
OSPRay
Cpuminer-Opt
OpenVINO
Mobile Neural Network
Neural Magic DeepSparse:
  NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Asynchronous Multi-Stream:
    items/sec
    ms/batch
Mobile Neural Network
Cpuminer-Opt
miniBUDE:
  OpenMP - BM1:
    GFInst/s
    Billion Interactions/s
Neural Magic DeepSparse:
  NLP Question Answering, BERT base uncased SQuaD 12layer Pruned90 - Synchronous Single-Stream:
    items/sec
    ms/batch
  CV Classification, ResNet-50 ImageNet - Asynchronous Multi-Stream:
    ms/batch
    items/sec
oneDNN
OpenVINO
Cpuminer-Opt
miniBUDE:
  OpenMP - BM2:
    Billion Interactions/s
    GFInst/s
OSPRay
Cpuminer-Opt
simdjson:
  DistinctUserID
  PartialTweets
Neural Magic DeepSparse:
  CV Detection,YOLOv5s COCO - Asynchronous Multi-Stream:
    items/sec
    ms/batch
simdjson
OSPRay Studio:
  2 - 1080p - 32 - Path Tracer
  2 - 4K - 16 - Path Tracer
  2 - 1080p - 16 - Path Tracer
  1 - 1080p - 32 - Path Tracer
  1 - 1080p - 1 - Path Tracer
  2 - 4K - 32 - Path Tracer
  1 - 4K - 1 - Path Tracer
  1 - 4K - 32 - Path Tracer
  1 - 4K - 16 - Path Tracer
  1 - 1080p - 16 - Path Tracer
  3 - 1080p - 16 - Path Tracer
  3 - 1080p - 32 - Path Tracer
simdjson
OSPRay Studio
simdjson
Neural Magic DeepSparse
OSPRay Studio
Neural Magic DeepSparse
OSPRay Studio:
  3 - 1080p - 1 - Path Tracer
  3 - 4K - 32 - Path Tracer
Embree:
  Pathtracer ISPC - Asian Dragon
  Pathtracer ISPC - Asian Dragon Obj
Neural Magic DeepSparse:
  CV Detection,YOLOv5s COCO - Synchronous Single-Stream:
    items/sec
    ms/batch
Embree
OSPRay Studio:
  2 - 4K - 1 - Path Tracer
  2 - 1080p - 1 - Path Tracer
Neural Magic DeepSparse
OpenFOAM
Neural Magic DeepSparse:
  NLP Text Classification, DistilBERT mnli - Asynchronous Multi-Stream
  NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Stream
  NLP Text Classification, BERT base uncased SST2 - Synchronous Single-Stream
OpenVKL
OpenVINO
Neural Magic DeepSparse:
  NLP Token Classification, BERT base uncased conll2003 - Synchronous Single-Stream:
    items/sec
    ms/batch
  NLP Document Classification, oBERT base uncased on IMDB - Synchronous Single-Stream:
    items/sec
    ms/batch
NCNN
LeelaChessZero
NCNN
SMHasher
NCNN
ONNX Runtime
LeelaChessZero
CP2K Molecular Dynamics
Mobile Neural Network
OpenFOAM
Neural Magic DeepSparse:
  NLP Text Classification, DistilBERT mnli - Synchronous Single-Stream:
    items/sec
    ms/batch
Numpy Benchmark
JPEG XL libjxl
NCNN
JPEG XL libjxl
NCNN
Numenta Anomaly Benchmark
oneDNN
Intel Open Image Denoise
ONNX Runtime
Numenta Anomaly Benchmark
JPEG XL libjxl
Darmstadt Automotive Parallel Heterogeneous Suite
NCNN
GROMACS
Numenta Anomaly Benchmark
oneDNN
Intel Open Image Denoise:
  RT.hdr_alb_nrm.3840x2160
  RT.ldr_alb_nrm.3840x2160
CPU Temperature Monitor:
  Phoronix Test Suite System Monitoring:
    Celsius
    Watts
  CPU Temp Monitor:
    Celsius
  CPU Power Consumption Monitor:
    Watts
  CPU Peak Freq (Highest CPU Core Frequency) Monitor:
    Megahertz
  OpenMP - Points2Image:
    Test Cases Per Minute Per Watt
Darmstadt Automotive Parallel Heterogeneous Suite
SVT-AV1:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
  Preset 12 - Bosphorus 4K
SVT-AV1
SVT-AV1:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
  Preset 13 - Bosphorus 4K
SVT-AV1
ONNX Runtime:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
  ArcFace ResNet-100 - CPU - Standard
ONNX Runtime
ONNX Runtime:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
  super-resolution-10 - CPU - Standard
ONNX Runtime
NCNN:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
NCNN:
  CPU - FastestDet
  CPU - vision_transformer
TensorFlow:
  CPU Temp Monitor
  CPU Power Consumption Monitor
  CPU Peak Freq (Highest CPU Core Frequency) Monitor
  CPU - 16 - GoogLeNet
TensorFlow