12700k HPC+OpenCL AVX512 performance profiling Intel Core i7-12700K testing with a MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS) and Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB on Pop 21.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2112125-TJ-12700KHPC62&sro&grs .
12700k HPC+OpenCL AVX512 performance profiling Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL OpenCL Vulkan Compiler File-System Screen Resolution 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt Intel Core i7-12700K @ 6.30GHz (8 Cores / 16 Threads) MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS) Intel Device 7aa7 32GB 500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 128GB HP SSD S700 Pro Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB (1650/750MHz) Realtek ALC897 LG HDR WQHD Intel I225-V Pop 21.04 5.15.5-76051505-generic (x86_64) GNOME Shell 3.38.4 X Server 1.20.11 4.6 Mesa 21.2.2 (LLVM 12.0.0) OpenCL 2.2 AMD-APP (3361.0) 1.2.185 GCC 11.1.0 ext4 3440x1440 500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 300GB Western Digital WD3000GLFS-0 + 128GB HP SSD S700 Pro OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Environment Details - 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" CFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" - 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" CFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" FFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Disk Details - NONE / errors=remount-ro,noatime,rw / Block Size: 4096 Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0x15 - Thermald 2.4.3 Graphics Details - GLAMOR - BAR1 / Visible vRAM Size: 6128 MB Python Details - Python 2.7.18 + Python 3.9.5 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
12700k HPC+OpenCL AVX512 performance profiling caffe: GoogleNet - CPU - 100 caffe: GoogleNet - CPU - 200 onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU shoc: OpenCL - Max SP Flops cp2k: Fayalite-FIST caffe: GoogleNet - CPU - 1000 onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU lczero: BLAS pennant: leblancbig darktable: Boat - OpenCL caffe: AlexNet - CPU - 1000 rnnoise: onednn: Recurrent Neural Network Inference - f32 - CPU shoc: OpenCL - Bus Speed Readback pennant: sedovbig daphne: OpenMP - Points2Image hpl: intel-mpi: IMB-MPI1 PingPong mt-dgemm: Sustained Floating-Point Rate cl-mem: Write onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU fftw: Float + SSE - 2D FFT Size 32 mafft: Multiple Sequence Alignment - LSU RNA shoc: OpenCL - Triad caffe: AlexNet - CPU - 100 fftw: Float + SSE - 2D FFT Size 4096 askap: tConvolve OpenMP - Gridding fftw: Float + SSE - 1D FFT Size 32 fftw: Stock - 2D FFT Size 4096 relion: Basic - CPU cl-mem: Copy intel-mpi: IMB-MPI1 Sendrecv intel-mpi: IMB-MPI1 Sendrecv darktable: Server Rack - OpenCL openfoam: Motorbike 30M onednn: Recurrent Neural Network Training - u8s8f32 - CPU namd: ATPase Simulation - 327,506 Atoms intel-mpi: IMB-P2P PingPong onednn: IP Shapes 3D - f32 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU fftw: Stock - 1D FFT Size 4096 intel-mpi: IMB-MPI1 Exchange numpy: qmcpack: simple-H2O shoc: OpenCL - GEMM SGEMM_N onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU cl-mem: Read caffe: AlexNet - CPU - 200 himeno: Poisson Pressure Solver shoc: OpenCL - Bus Speed Download onednn: IP Shapes 1D - u8s8f32 - CPU onednn: IP Shapes 1D - bf16bf16bf16 - CPU intel-mpi: IMB-MPI1 Exchange mrbayes: Primate Phylogeny Analysis askap: Hogbom Clean OpenMP onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU askap: tConvolve MPI - Degridding rbenchmark: fftw: Stock - 2D FFT Size 32 gromacs: MPI CPU - water_GMX50_bare amg: onednn: Recurrent Neural Network Inference - u8s8f32 - CPU fftw: Float + SSE - 1D FFT Size 4096 darktable: Server Room - OpenCL askap: tConvolve OpenMP - Degridding onednn: IP Shapes 3D - u8s8f32 - CPU tensorflow-lite: SqueezeNet openfoam: Motorbike 60M octave-benchmark: tensorflow-lite: Mobilenet Float onednn: Recurrent Neural Network Training - f32 - CPU onednn: IP Shapes 1D - f32 - CPU askap: tConvolve MT - Gridding onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU daphne: OpenMP - Euclidean Cluster darktable: Masskrug - OpenCL arrayfire: BLAS CPU deepspeech: CPU parboil: OpenMP Stencil parboil: OpenMP CUTCP onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU shoc: OpenCL - FFT SP tensorflow-lite: Inception ResNet V2 hmmer: Pfam Database Search shoc: OpenCL - MD5 Hash tensorflow-lite: Inception V4 onednn: Convolution Batch Shapes Auto - f32 - CPU shoc: OpenCL - Reduction onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU shoc: OpenCL - Texture Read Bandwidth lulesh: minife: Small tensorflow-lite: NASNet Mobile tensorflow-lite: Mobilenet Quant daphne: OpenMP - NDT Mapping parboil: OpenMP LBM askap: tConvolve MT - Degridding fftw: Stock - 1D FFT Size 32 shoc: OpenCL - S3D askap: tConvolve MPI - Gridding onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU onednn: Deconvolution Batch shapes_1d - f32 - CPU parboil: OpenMP MRI Gridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 79674 157272 7.43204 8376637 398.345 726541 6.77334 906 49.93021 3.971 247624 16.509 1388.34 20.3871 69.89434 35022.145431097 97.554 10004.46 4.875598 255.3 0.565699 80496 7.703 12.5997 23122 43900 1866.30 32180 13770 1684.702 198.4 12273.54 53.50 0.133 137.71 2596.32 1.16249 8628982 8.78268 3.56837 18293 15189.76 618.63 17.950 1841.75 13.2836 263.6 46624 9471.738369 20.1487 0.685132 2.44545 109.01 73.794 267.389 0.872683 4859.18 0.1044 22948 1.180 303975100 1334.31 103960 3.012 3614.48 1.93347 145830 867.20 5.080 97253.9 2540.28 2.58637 1245.64 2585.15 1671.51 3.827 1205.09 48.85145 15.011457 3.177141 1337.47 2.33613 680.883 1878730 82.482 9.3041 2080110 13.3957 254.126 1.18583 349.340 6872.8297 6411.43 124859 98345.3 1033.44 114.068817 2054.71 22774 125.078 5046.07 4.67789 1.06856 4.13169 6.27787 49.150711 67852 136187 6.72455 9031599 372.744 680177 6.34910 959 47.38740 4.151 238435 17.119 1341.61 21.0509 67.77358 36114.808332258 100.593 10294.25 5.016124 248.4 0.550588 82515 7.523 12.3131 23659 42935 1906.52 31560 14020 1656.713 195.2 12471.89 52.66 0.131 135.71 2560.48 1.17871 8513496 8.89450 3.52788 18502 15023.98 612.18 18.126 1859.40 13.4080 261.2 47045 9554.056801 19.9782 0.690873 2.46474 108.21 74.339 269.303 0.878588 4889.74 0.1050 22827 1.186 302617633 1328.42 104417 2.999 3599.35 1.92539 145241 864.00 5.064 97559.5 2532.45 2.57885 1248.93 2578.54 1667.43 3.818 1207.87 48.96052 14.980375 3.183109 1339.83 2.33206 682.017 1881663 82.610 9.3179 2082823 13.4115 254.407 1.18705 349.012 6878.6229 6407.12 124941 98287.0 1033.74 114.096662 2054.38 22777 125.070 5046.07 4.67293 1.07696 4.23668 6.60245 48.567082 OpenBenchmarking.org
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 292.08, N = 3 SE +/- 640.77, N = 15 67852 79674 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 778.54, N = 3 SE +/- 1277.18, N = 3 136187 157272 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.01945, N = 3 SE +/- 0.04340, N = 3 6.72455 7.43204 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.2 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.53 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2M 4M 6M 8M 10M SE +/- 142226.18, N = 9 SE +/- 65495.62, N = 3 9031599 8376637 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
CP2K Molecular Dynamics Input: Fayalite-FIST OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 8.2 Input: Fayalite-FIST 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 90 180 270 360 450 372.74 398.35
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 1000 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 160K 320K 480K 640K 800K SE +/- 2693.85, N = 3 SE +/- 11125.73, N = 9 680177 726541 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.00598, N = 3 SE +/- 0.06030, N = 3 6.34910 6.77334 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.03 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 11.95, N = 4 SE +/- 12.84, N = 9 959 906 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -flto -O3 -pthread
Pennant Test: leblancbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.03, N = 3 SE +/- 0.05, N = 3 47.39 49.93 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
Darktable Test: Boat - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Boat - Acceleration: OpenCL 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.934 1.868 2.802 3.736 4.67 SE +/- 0.043, N = 3 SE +/- 0.049, N = 3 4.151 3.971
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 1000 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 50K 100K 150K 200K 250K SE +/- 1961.39, N = 3 SE +/- 3023.73, N = 9 238435 247624 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.01, N = 3 SE +/- 0.21, N = 3 17.12 16.51 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -O3 -pedantic -fvisibility=hidden
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 3.44, N = 3 SE +/- 11.91, N = 8 1341.61 1388.34 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.01 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1265.31 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 5 10 15 20 25 SE +/- 0.21, N = 15 SE +/- 0.01, N = 3 21.05 20.39 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
Pennant Test: sedovbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: sedovbig 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 16 32 48 64 80 SE +/- 0.13, N = 3 SE +/- 0.35, N = 3 67.77 69.89 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 8K 16K 24K 32K 40K SE +/- 233.74, N = 3 SE +/- 389.77, N = 3 36114.81 35022.15 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
HPL Linpack OpenBenchmarking.org GFLOPS, More Is Better HPL Linpack 2.3 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 1.07, N = 3 SE +/- 0.14, N = 3 100.59 97.55 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -O3 -lopenblas -lm -pthread -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 PingPong OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 PingPong 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2K 4K 6K 8K 10K SE +/- 131.85, N = 3 SE +/- 140.45, N = 15 10294.25 10004.46 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 10.93 / MAX: 34708.09 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.66 / MAX: 34960.72 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1.1286 2.2572 3.3858 4.5144 5.643 SE +/- 0.013832, N = 3 SE +/- 0.034356, N = 3 5.016124 4.875598 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -O3 -march=native -fopenmp
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.15, N = 3 SE +/- 0.17, N = 3 248.4 255.3 1. (CC) gcc options: -O2 -flto -lOpenCL
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.1273 0.2546 0.3819 0.5092 0.6365 SE +/- 0.005334, N = 6 SE +/- 0.005655, N = 3 0.550588 0.565699 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.45 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.47 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
FFTW Build: Float + SSE - Size: 2D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 920.77, N = 3 SE +/- 272.26, N = 3 82515 80496 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
Timed MAFFT Alignment Multiple Sequence Alignment - LSU RNA OpenBenchmarking.org Seconds, Fewer Is Better Timed MAFFT Alignment 7.471 Multiple Sequence Alignment - LSU RNA 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.012, N = 3 SE +/- 0.014, N = 3 7.523 7.703 1. (CC) gcc options: -std=c99 -O3 -lm -lpthread
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.13, N = 6 SE +/- 0.14, N = 3 12.31 12.60 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 282.19, N = 3 SE +/- 319.02, N = 3 23659 23122 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
FFTW Build: Float + SSE - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 9K 18K 27K 36K 45K SE +/- 90.35, N = 3 SE +/- 484.91, N = 5 42935 43900 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
ASKAP Test: tConvolve OpenMP - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 12.08, N = 3 SE +/- 4.37, N = 3 1906.52 1866.30 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
FFTW Build: Float + SSE - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 7K 14K 21K 28K 35K SE +/- 195.07, N = 3 SE +/- 18.34, N = 3 31560 32180 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
FFTW Build: Stock - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 33.65, N = 3 SE +/- 62.76, N = 3 14020 13770 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 0.43, N = 3 SE +/- 3.94, N = 3 1656.71 1684.70 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fopenmp -std=c++0x -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 40 80 120 160 200 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 195.2 198.4 1. (CC) gcc options: -O2 -flto -lOpenCL
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 81.05, N = 3 SE +/- 111.25, N = 3 12471.89 12273.54 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 66000.84 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 66577.1 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12 24 36 48 60 SE +/- 0.28, N = 3 SE +/- 0.35, N = 3 52.66 53.50 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.19 / MAX: 1702.76 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.19 / MAX: 1786.28 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Darktable Test: Server Rack - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Server Rack - Acceleration: OpenCL 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.0299 0.0598 0.0897 0.1196 0.1495 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 0.131 0.133
OpenFOAM Input: Motorbike 30M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.13, N = 3 SE +/- 0.70, N = 3 135.71 137.71 -lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling -ldynamicMesh 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 600 1200 1800 2400 3000 SE +/- 26.69, N = 3 SE +/- 3.87, N = 3 2560.48 2596.32 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2405.01 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2454.76 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.2652 0.5304 0.7956 1.0608 1.326 SE +/- 0.00713, N = 3 SE +/- 0.00085, N = 3 1.17871 1.16249
Intel MPI Benchmarks Test: IMB-P2P PingPong OpenBenchmarking.org Average Msg/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-P2P PingPong 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2M 4M 6M 8M 10M SE +/- 102028.44, N = 3 SE +/- 23849.34, N = 3 8513496 8628982 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1946 / MAX: 22082360 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1994 / MAX: 22289308 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.13446, N = 14 SE +/- 0.01096, N = 3 8.89450 8.78268 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 8.61 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 8.63 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.8029 1.6058 2.4087 3.2116 4.0145 SE +/- 0.02720, N = 10 SE +/- 0.02891, N = 3 3.52788 3.56837 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.15 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.16 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
FFTW Build: Stock - Size: 1D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 4096 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 4K 8K 12K 16K 20K SE +/- 161.26, N = 3 SE +/- 196.08, N = 3 18502 18293 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 218.78, N = 15 SE +/- 185.82, N = 15 15023.98 15189.76 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 64515.64 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 65915.24 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 130 260 390 520 650 SE +/- 4.12, N = 3 SE +/- 3.76, N = 3 612.18 618.63
QMCPACK Input: simple-H2O OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.11 Input: simple-H2O 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.10, N = 3 SE +/- 0.21, N = 14 18.13 17.95 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -fomit-frame-pointer -ffast-math -pthread -lm -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 17.38, N = 3 SE +/- 8.67, N = 3 1859.40 1841.75 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 13.41 13.28 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.07 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 12.99 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.07, N = 3 SE +/- 0.06, N = 3 261.2 263.6 1. (CC) gcc options: -O2 -flto -lOpenCL
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 10K 20K 30K 40K 50K SE +/- 218.90, N = 3 SE +/- 442.71, N = 3 47045 46624 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2K 4K 6K 8K 10K SE +/- 6.17, N = 3 SE +/- 123.70, N = 3 9554.06 9471.74 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -O3 -mavx2
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 5 10 15 20 25 SE +/- 0.15, N = 3 SE +/- 0.24, N = 3 19.98 20.15 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.1554 0.3108 0.4662 0.6216 0.777 SE +/- 0.006051, N = 3 SE +/- 0.006714, N = 3 0.690873 0.685132 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.6 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.6 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.5546 1.1092 1.6638 2.2184 2.773 SE +/- 0.02589, N = 5 SE +/- 0.02457, N = 3 2.46474 2.44545 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.19 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.14 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.88, N = 15 SE +/- 0.91, N = 15 108.21 109.01 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.28 / MAX: 3672.41 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.28 / MAX: 3601.44 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Timed MrBayes Analysis Primate Phylogeny Analysis OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.15, N = 3 SE +/- 0.37, N = 3 74.34 73.79 -march=native -mavx512bf16 -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline
ASKAP Test: Hogbom Clean OpenMP OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.64, N = 3 SE +/- 1.10, N = 3 269.30 267.39 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.1977 0.3954 0.5931 0.7908 0.9885 SE +/- 0.007466, N = 3 SE +/- 0.001641, N = 3 0.878588 0.872683 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.8 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.82 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
ASKAP Test: tConvolve MPI - Degridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1000 2000 3000 4000 5000 SE +/- 30.56, N = 3 SE +/- 0.00, N = 3 4889.74 4859.18 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.0236 0.0472 0.0708 0.0944 0.118 SE +/- 0.0008, N = 15 SE +/- 0.0005, N = 3 0.1050 0.1044 1. R scripting front-end version 4.0.4 (2021-02-15)
FFTW Build: Stock - Size: 2D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 135.68, N = 3 SE +/- 297.88, N = 3 22827 22948 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2021.2 Implementation: MPI CPU - Input: water_GMX50_bare 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.2669 0.5338 0.8007 1.0676 1.3345 SE +/- 0.005, N = 3 SE +/- 0.001, N = 3 1.186 1.180 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -pthread
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 70M 140M 210M 280M 350M SE +/- 414229.41, N = 3 SE +/- 51637.29, N = 3 302617633 303975100 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 2.17, N = 3 SE +/- 1.94, N = 3 1328.42 1334.31 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.56 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1263.19 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
FFTW Build: Float + SSE - Size: 1D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 4096 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 1211.30, N = 3 SE +/- 1023.44, N = 3 104417 103960 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
Darktable Test: Server Room - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Server Room - Acceleration: OpenCL 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.6777 1.3554 2.0331 2.7108 3.3885 SE +/- 0.007, N = 3 SE +/- 0.004, N = 3 2.999 3.012
ASKAP Test: tConvolve OpenMP - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 800 1600 2400 3200 4000 SE +/- 47.99, N = 3 SE +/- 16.43, N = 3 3599.35 3614.48 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.435 0.87 1.305 1.74 2.175 SE +/- 0.00843, N = 3 SE +/- 0.00686, N = 3 1.92539 1.93347 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1.86 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.85 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: SqueezeNet 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 572.70, N = 3 SE +/- 601.51, N = 3 145241 145830
OpenFOAM Input: Motorbike 60M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 0.58, N = 3 SE +/- 2.30, N = 3 864.00 867.20 -lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling -ldynamicMesh 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm
GNU Octave Benchmark OpenBenchmarking.org Seconds, Fewer Is Better GNU Octave Benchmark 6.1.1~hg.2021.01.26 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1.143 2.286 3.429 4.572 5.715 SE +/- 0.026, N = 5 SE +/- 0.018, N = 5 5.064 5.080
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Float 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 138.23, N = 3 SE +/- 155.46, N = 3 97559.5 97253.9
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 500 1000 1500 2000 2500 SE +/- 7.55, N = 3 SE +/- 4.59, N = 3 2532.45 2540.28 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2401.58 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.6 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.5819 1.1638 1.7457 2.3276 2.9095 SE +/- 0.00417, N = 3 SE +/- 0.00632, N = 3 2.57885 2.58637 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.34 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.33 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
ASKAP Test: tConvolve MT - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 0.92, N = 3 SE +/- 0.56, N = 3 1248.93 1245.64 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 600 1200 1800 2400 3000 SE +/- 20.40, N = 3 SE +/- 25.72, N = 3 2578.54 2585.15 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2395.45 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.47 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 15.22, N = 3 SE +/- 0.74, N = 3 1667.43 1671.51 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Darktable Test: Masskrug - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Masskrug - Acceleration: OpenCL 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.8611 1.7222 2.5833 3.4444 4.3055 SE +/- 0.007, N = 3 SE +/- 0.012, N = 3 3.818 3.827
ArrayFire Test: BLAS CPU OpenBenchmarking.org GFLOPS, More Is Better ArrayFire 3.7 Test: BLAS CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 0.54, N = 3 SE +/- 0.76, N = 3 1207.87 1205.09 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -rdynamic
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.29, N = 3 SE +/- 0.34, N = 3 48.96 48.85
Parboil Test: OpenMP Stencil OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP Stencil 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.05, N = 3 SE +/- 0.07, N = 3 14.98 15.01 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Parboil Test: OpenMP CUTCP OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.7162 1.4324 2.1486 2.8648 3.581 SE +/- 0.009619, N = 3 SE +/- 0.006360, N = 3 3.183109 3.177141 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 5.65, N = 3 SE +/- 2.19, N = 3 1339.83 1337.47 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1261.62 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1270.85 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.5256 1.0512 1.5768 2.1024 2.628 SE +/- 0.02354, N = 3 SE +/- 0.00302, N = 3 2.33206 2.33613 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.07 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 150 300 450 600 750 SE +/- 0.08, N = 3 SE +/- 0.96, N = 3 682.02 680.88 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception ResNet V2 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 400K 800K 1200K 1600K 2000K SE +/- 2515.38, N = 3 SE +/- 4029.02, N = 3 1881663 1878730
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 3.3.2 Pfam Database Search 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.24, N = 3 SE +/- 0.11, N = 3 82.61 82.48 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -O3 -pthread -lhmmer -leasel -lm -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.0002, N = 3 SE +/- 0.0009, N = 3 9.3179 9.3041 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception V4 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 400K 800K 1200K 1600K 2000K SE +/- 4943.42, N = 3 SE +/- 1250.33, N = 3 2082823 2080110
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.00, N = 3 SE +/- 0.01, N = 3 13.41 13.40 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.19 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 13.16 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.16, N = 3 SE +/- 0.22, N = 3 254.41 254.13 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.2671 0.5342 0.8013 1.0684 1.3355 SE +/- 0.01157, N = 3 SE +/- 0.01307, N = 3 1.18705 1.18583 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.01 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 80 160 240 320 400 SE +/- 1.09, N = 3 SE +/- 1.54, N = 3 349.01 349.34 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1500 3000 4500 6000 7500 SE +/- 67.35, N = 3 SE +/- 82.30, N = 4 6878.62 6872.83 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1400 2800 4200 5600 7000 SE +/- 1.22, N = 3 SE +/- 0.36, N = 3 6407.12 6411.43 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: NASNet Mobile 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 607.87, N = 3 SE +/- 671.08, N = 3 124941 124859
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Quant 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 93.88, N = 3 SE +/- 62.98, N = 3 98287.0 98345.3
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 6.53, N = 3 SE +/- 11.85, N = 3 1033.74 1033.44 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Parboil Test: OpenMP LBM OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP LBM 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 114.10 114.07 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
ASKAP Test: tConvolve MT - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 1.75, N = 3 SE +/- 1.84, N = 3 2054.38 2054.71 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
FFTW Build: Stock - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 4.36, N = 3 SE +/- 0.67, N = 3 22777 22774 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CC) gcc options: -pthread -O3 -lm
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.56, N = 3 SE +/- 0.71, N = 3 125.07 125.08 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
ASKAP Test: tConvolve MPI - Gridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1100 2200 3300 4400 5500 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 5046.07 5046.07 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 1.0525 2.105 3.1575 4.21 5.2625 SE +/- 0.09231, N = 15 SE +/- 0.07736, N = 15 4.67293 4.67789 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.21 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.27 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.2423 0.4846 0.7269 0.9692 1.2115 SE +/- 0.02280, N = 12 SE +/- 0.01500, N = 15 1.07696 1.06856 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.99 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.98 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 0.9533 1.9066 2.8599 3.8132 4.7665 SE +/- 0.08547, N = 15 SE +/- 0.01117, N = 3 4.23668 4.13169 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.02 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.05 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.13247, N = 12 SE +/- 0.12013, N = 15 6.60245 6.27787 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.53 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.58 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Parboil Test: OpenMP MRI Gridding OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI Gridding 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.99, N = 12 SE +/- 0.68, N = 15 48.57 49.15 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
Phoronix Test Suite v10.8.5