12700k HPC+OpenCL AVX512 performance profiling Intel Core i7-12700K testing with a MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS) and Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB on Pop 21.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2112125-TJ-12700KHPC62&grr&rdt .
12700k HPC+OpenCL AVX512 performance profiling Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL OpenCL Vulkan Compiler File-System Screen Resolution 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt Intel Core i7-12700K @ 6.30GHz (8 Cores / 16 Threads) MSI PRO Z690-A DDR4(MS-7D25) v1.0 (1.15 BIOS) Intel Device 7aa7 32GB 500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 128GB HP SSD S700 Pro Gigabyte AMD Radeon RX 5600 OEM/5600 XT / 5700/5700 6GB (1650/750MHz) Realtek ALC897 LG HDR WQHD Intel I225-V Pop 21.04 5.15.5-76051505-generic (x86_64) GNOME Shell 3.38.4 X Server 1.20.11 4.6 Mesa 21.2.2 (LLVM 12.0.0) OpenCL 2.2 AMD-APP (3361.0) 1.2.185 GCC 11.1.0 ext4 3440x1440 500GB Western Digital WDS500G2B0C-00PXH0 + 3 x 10001GB Seagate ST10000DM0004-1Z + 300GB Western Digital WD3000GLFS-0 + 128GB HP SSD S700 Pro OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Environment Details - 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" CFLAGS="-O3 -march=sapphirerapids -mno-amx-tile -mno-amx-int8 -mno-amx-bf16" - 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt: CXXFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" CFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" FFLAGS="-O3 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect" Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-bootstrap --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-targets=nvptx-none=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-RPS7jb/gcc-11-11.1.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Disk Details - NONE / errors=remount-ro,noatime,rw / Block Size: 4096 Processor Details - Scaling Governor: intel_pstate powersave - CPU Microcode: 0x15 - Thermald 2.4.3 Graphics Details - GLAMOR - BAR1 / Visible vRAM Size: 6128 MB Python Details - Python 2.7.18 + Python 3.9.5 Security Details - itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl and seccomp + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced IBRS IBPB: conditional RSB filling + srbds: Not affected + tsx_async_abort: Not affected
12700k HPC+OpenCL AVX512 performance profiling relion: Basic - CPU caffe: GoogleNet - CPU - 1000 openfoam: Motorbike 60M hpl: shoc: OpenCL - Max SP Flops lczero: BLAS caffe: AlexNet - CPU - 1000 caffe: GoogleNet - CPU - 100 fftw: Float + SSE - 2D FFT Size 4096 parboil: OpenMP MRI Gridding gromacs: MPI CPU - water_GMX50_bare caffe: GoogleNet - CPU - 200 openfoam: Motorbike 30M onednn: Recurrent Neural Network Inference - f32 - CPU cp2k: Fayalite-FIST numpy: parboil: OpenMP LBM fftw: Stock - 2D FFT Size 4096 tensorflow-lite: Inception V4 intel-mpi: IMB-MPI1 Exchange intel-mpi: IMB-MPI1 Exchange tensorflow-lite: Inception ResNet V2 onednn: Deconvolution Batch shapes_1d - f32 - CPU hmmer: Pfam Database Search onednn: Recurrent Neural Network Training - u8s8f32 - CPU onednn: Recurrent Neural Network Training - bf16bf16bf16 - CPU onednn: Recurrent Neural Network Training - f32 - CPU onednn: Recurrent Neural Network Inference - u8s8f32 - CPU onednn: Recurrent Neural Network Inference - bf16bf16bf16 - CPU mrbayes: Primate Phylogeny Analysis pennant: sedovbig askap: tConvolve MT - Degridding askap: tConvolve MT - Gridding namd: ATPase Simulation - 327,506 Atoms tensorflow-lite: Mobilenet Quant tensorflow-lite: SqueezeNet tensorflow-lite: NASNet Mobile tensorflow-lite: Mobilenet Float daphne: OpenMP - Points2Image himeno: Poisson Pressure Solver shoc: OpenCL - S3D qmcpack: simple-H2O pennant: leblancbig caffe: AlexNet - CPU - 200 mt-dgemm: Sustained Floating-Point Rate minife: Small rbenchmark: deepspeech: CPU askap: tConvolve MPI - Gridding askap: tConvolve MPI - Degridding onednn: IP Shapes 3D - f32 - CPU caffe: AlexNet - CPU - 100 onednn: Deconvolution Batch shapes_1d - bf16bf16bf16 - CPU onednn: Deconvolution Batch shapes_1d - u8s8f32 - CPU intel-mpi: IMB-MPI1 PingPong onednn: IP Shapes 1D - bf16bf16bf16 - CPU onednn: IP Shapes 3D - bf16bf16bf16 - CPU onednn: Matrix Multiply Batch Shapes Transformer - u8s8f32 - CPU daphne: OpenMP - NDT Mapping intel-mpi: IMB-MPI1 Sendrecv intel-mpi: IMB-MPI1 Sendrecv rnnoise: intel-mpi: IMB-P2P PingPong parboil: OpenMP Stencil arrayfire: BLAS CPU onednn: Deconvolution Batch shapes_3d - bf16bf16bf16 - CPU onednn: IP Shapes 1D - f32 - CPU onednn: IP Shapes 1D - u8s8f32 - CPU amg: onednn: Deconvolution Batch shapes_3d - u8s8f32 - CPU daphne: OpenMP - Euclidean Cluster askap: Hogbom Clean OpenMP onednn: Matrix Multiply Batch Shapes Transformer - f32 - CPU onednn: Matrix Multiply Batch Shapes Transformer - bf16bf16bf16 - CPU lulesh: onednn: IP Shapes 3D - u8s8f32 - CPU onednn: Deconvolution Batch shapes_3d - f32 - CPU shoc: OpenCL - Texture Read Bandwidth askap: tConvolve OpenMP - Degridding askap: tConvolve OpenMP - Gridding octave-benchmark: mafft: Multiple Sequence Alignment - LSU RNA fftw: Float + SSE - 1D FFT Size 4096 cl-mem: Copy cl-mem: Read cl-mem: Write onednn: Convolution Batch Shapes Auto - f32 - CPU onednn: Convolution Batch Shapes Auto - u8s8f32 - CPU onednn: Convolution Batch Shapes Auto - bf16bf16bf16 - CPU darktable: Boat - OpenCL darktable: Masskrug - OpenCL shoc: OpenCL - GEMM SGEMM_N darktable: Server Room - OpenCL fftw: Stock - 1D FFT Size 4096 shoc: OpenCL - Bus Speed Readback fftw: Stock - 2D FFT Size 32 fftw: Stock - 1D FFT Size 32 parboil: OpenMP CUTCP fftw: Float + SSE - 1D FFT Size 32 fftw: Float + SSE - 2D FFT Size 32 shoc: OpenCL - Triad shoc: OpenCL - MD5 Hash shoc: OpenCL - Reduction shoc: OpenCL - FFT SP darktable: Server Rack - OpenCL shoc: OpenCL - Bus Speed Download 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1684.702 726541 867.20 97.554 8376637 906 247624 79674 43900 49.150711 1.180 157272 137.71 1388.34 398.345 618.63 114.068817 13770 2080110 109.01 15189.76 1878730 6.27787 82.482 2596.32 2585.15 2540.28 1334.31 1337.47 73.794 69.89434 2054.71 1245.64 1.16249 98345.3 145830 124859 97253.9 35022.145431097 9471.738369 125.078 17.950 49.93021 46624 4.875598 6411.43 0.1044 48.85145 5046.07 4859.18 8.78268 23122 7.43204 0.872683 10004.46 2.44545 3.56837 0.565699 1033.44 53.50 12273.54 16.509 8628982 15.011457 1205.09 4.67789 2.58637 0.685132 303975100 1.06856 1671.51 267.389 2.33613 1.18583 6872.8297 1.93347 4.13169 349.340 3614.48 1866.30 5.080 7.703 103960 198.4 263.6 255.3 13.3957 13.2836 6.77334 3.971 3.827 1841.75 3.012 18293 20.3871 22948 22774 3.177141 32180 80496 12.5997 9.3041 254.126 680.883 0.133 20.1487 1656.713 680177 864.00 100.593 9031599 959 238435 67852 42935 48.567082 1.186 136187 135.71 1341.61 372.744 612.18 114.096662 14020 2082823 108.21 15023.98 1881663 6.60245 82.610 2560.48 2578.54 2532.45 1328.42 1339.83 74.339 67.77358 2054.38 1248.93 1.17871 98287.0 145241 124941 97559.5 36114.808332258 9554.056801 125.070 18.126 47.38740 47045 5.016124 6407.12 0.1050 48.96052 5046.07 4889.74 8.89450 23659 6.72455 0.878588 10294.25 2.46474 3.52788 0.550588 1033.74 52.66 12471.89 17.119 8513496 14.980375 1207.87 4.67293 2.57885 0.690873 302617633 1.07696 1667.43 269.303 2.33206 1.18705 6878.6229 1.92539 4.23668 349.012 3599.35 1906.52 5.064 7.523 104417 195.2 261.2 248.4 13.4115 13.4080 6.34910 4.151 3.818 1859.40 2.999 18502 21.0509 22827 22777 3.183109 31560 82515 12.3131 9.3179 254.407 682.017 0.131 19.9782 OpenBenchmarking.org
RELION Test: Basic - Device: CPU OpenBenchmarking.org Seconds, Fewer Is Better RELION 3.1.1 Test: Basic - Device: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 3.94, N = 3 SE +/- 0.43, N = 3 1684.70 1656.71 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fopenmp -std=c++0x -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -pthread -lmpi_cxx -lmpi
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 1000 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 160K 320K 480K 640K 800K SE +/- 11125.73, N = 9 SE +/- 2693.85, N = 3 726541 680177 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
OpenFOAM Input: Motorbike 60M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 60M 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 2.30, N = 3 SE +/- 0.58, N = 3 867.20 864.00 -ldynamicMesh -lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm
HPL Linpack OpenBenchmarking.org GFLOPS, More Is Better HPL Linpack 2.3 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.14, N = 3 SE +/- 1.07, N = 3 97.55 100.59 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -lopenblas -lm -pthread -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Max SP Flops OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Max SP Flops 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2M 4M 6M 8M 10M SE +/- 65495.62, N = 3 SE +/- 142226.18, N = 9 8376637 9031599 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
LeelaChessZero Backend: BLAS OpenBenchmarking.org Nodes Per Second, More Is Better LeelaChessZero 0.28 Backend: BLAS 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 12.84, N = 9 SE +/- 11.95, N = 4 906 959 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -flto -O3 -pthread
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 1000 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 1000 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 50K 100K 150K 200K 250K SE +/- 3023.73, N = 9 SE +/- 1961.39, N = 3 247624 238435 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 100 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 640.77, N = 15 SE +/- 292.08, N = 3 79674 67852 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
FFTW Build: Float + SSE - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 4096 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 9K 18K 27K 36K 45K SE +/- 484.91, N = 5 SE +/- 90.35, N = 3 43900 42935 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
Parboil Test: OpenMP MRI Gridding OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP MRI Gridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.68, N = 15 SE +/- 0.99, N = 12 49.15 48.57 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
GROMACS Implementation: MPI CPU - Input: water_GMX50_bare OpenBenchmarking.org Ns Per Day, More Is Better GROMACS 2021.2 Implementation: MPI CPU - Input: water_GMX50_bare 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.2669 0.5338 0.8007 1.0676 1.3345 SE +/- 0.001, N = 3 SE +/- 0.005, N = 3 1.180 1.186 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -pthread
Caffe Model: GoogleNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: GoogleNet - Acceleration: CPU - Iterations: 200 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 1277.18, N = 3 SE +/- 778.54, N = 3 157272 136187 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
OpenFOAM Input: Motorbike 30M OpenBenchmarking.org Seconds, Fewer Is Better OpenFOAM 8 Input: Motorbike 30M 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.70, N = 3 SE +/- 0.13, N = 3 137.71 135.71 -ldynamicMesh -lspecie -lfiniteVolume -lfvOptions -lmeshTools -lsampling 1. (CXX) g++ options: -std=c++11 -m64 -O3 -ftemplate-depth-100 -fPIC -fuse-ld=bfd -Xlinker --add-needed --no-as-needed -lgenericPatchFields -lOpenFOAM -ldl -lm
oneDNN Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 11.91, N = 8 SE +/- 3.44, N = 3 1388.34 1341.61 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1265.31 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.01 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
CP2K Molecular Dynamics Input: Fayalite-FIST OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 8.2 Input: Fayalite-FIST 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 90 180 270 360 450 398.35 372.74
Numpy Benchmark OpenBenchmarking.org Score, More Is Better Numpy Benchmark 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 130 260 390 520 650 SE +/- 3.76, N = 3 SE +/- 4.12, N = 3 618.63 612.18
Parboil Test: OpenMP LBM OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP LBM 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 114.07 114.10 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
FFTW Build: Stock - Size: 2D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 4096 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 62.76, N = 3 SE +/- 33.65, N = 3 13770 14020 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
TensorFlow Lite Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception V4 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400K 800K 1200K 1600K 2000K SE +/- 1250.33, N = 3 SE +/- 4943.42, N = 3 2080110 2082823
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.91, N = 15 SE +/- 0.88, N = 15 109.01 108.21 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.28 / MAX: 3601.44 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.28 / MAX: 3672.41 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Exchange OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Exchange 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 185.82, N = 15 SE +/- 218.78, N = 15 15189.76 15023.98 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 65915.24 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 64515.64 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
TensorFlow Lite Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Inception ResNet V2 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400K 800K 1200K 1600K 2000K SE +/- 4029.02, N = 3 SE +/- 2515.38, N = 3 1878730 1881663
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.12013, N = 15 SE +/- 0.13247, N = 12 6.27787 6.60245 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.58 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.53 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Timed HMMer Search Pfam Database Search OpenBenchmarking.org Seconds, Fewer Is Better Timed HMMer Search 3.3.2 Pfam Database Search 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.11, N = 3 SE +/- 0.24, N = 3 82.48 82.61 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -pthread -lhmmer -leasel -lm -lmpi
oneDNN Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 600 1200 1800 2400 3000 SE +/- 3.87, N = 3 SE +/- 26.69, N = 3 2596.32 2560.48 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2454.76 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2405.01 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 600 1200 1800 2400 3000 SE +/- 25.72, N = 3 SE +/- 20.40, N = 3 2585.15 2578.54 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.47 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2395.45 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Training - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 500 1000 1500 2000 2500 SE +/- 4.59, N = 3 SE +/- 7.55, N = 3 2540.28 2532.45 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2405.6 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2401.58 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 1.94, N = 3 SE +/- 2.17, N = 3 1334.31 1328.42 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1263.19 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1262.56 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Recurrent Neural Network Inference - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 2.19, N = 3 SE +/- 5.65, N = 3 1337.47 1339.83 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1270.85 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1261.62 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Timed MrBayes Analysis Primate Phylogeny Analysis OpenBenchmarking.org Seconds, Fewer Is Better Timed MrBayes Analysis 3.2.7 Primate Phylogeny Analysis 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20 40 60 80 100 SE +/- 0.37, N = 3 SE +/- 0.15, N = 3 73.79 74.34 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512bf16 -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msha -maes -mavx -mfma -mavx2 -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -mavx512ifma -mavx512vbmi -mrdrnd -mbmi -mbmi2 -madx -mabm -O3 -std=c99 -pedantic -lm -lreadline
Pennant Test: sedovbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: sedovbig 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 16 32 48 64 80 SE +/- 0.35, N = 3 SE +/- 0.13, N = 3 69.89 67.77 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
ASKAP Test: tConvolve MT - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Degridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 1.84, N = 3 SE +/- 1.75, N = 3 2054.71 2054.38 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MT - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve MT - Gridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 0.56, N = 3 SE +/- 0.92, N = 3 1245.64 1248.93 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
NAMD ATPase Simulation - 327,506 Atoms OpenBenchmarking.org days/ns, Fewer Is Better NAMD 2.14 ATPase Simulation - 327,506 Atoms 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.2652 0.5304 0.7956 1.0608 1.326 SE +/- 0.00085, N = 3 SE +/- 0.00713, N = 3 1.16249 1.17871
TensorFlow Lite Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Quant 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 62.98, N = 3 SE +/- 93.88, N = 3 98345.3 98287.0
TensorFlow Lite Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: SqueezeNet 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 601.51, N = 3 SE +/- 572.70, N = 3 145830 145241
TensorFlow Lite Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: NASNet Mobile 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30K 60K 90K 120K 150K SE +/- 671.08, N = 3 SE +/- 607.87, N = 3 124859 124941
TensorFlow Lite Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better TensorFlow Lite 2020-08-23 Model: Mobilenet Float 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 155.46, N = 3 SE +/- 138.23, N = 3 97253.9 97559.5
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Points2Image 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 8K 16K 24K 32K 40K SE +/- 389.77, N = 3 SE +/- 233.74, N = 3 35022.15 36114.81 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Himeno Benchmark Poisson Pressure Solver OpenBenchmarking.org MFLOPS, More Is Better Himeno Benchmark 3.0 Poisson Pressure Solver 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2K 4K 6K 8K 10K SE +/- 123.70, N = 3 SE +/- 6.17, N = 3 9471.74 9554.06 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -mavx2
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: S3D OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: S3D 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 30 60 90 120 150 SE +/- 0.71, N = 3 SE +/- 0.56, N = 3 125.08 125.07 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
QMCPACK Input: simple-H2O OpenBenchmarking.org Total Execution Time - Seconds, Fewer Is Better QMCPACK 3.11 Input: simple-H2O 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.21, N = 14 SE +/- 0.10, N = 3 17.95 18.13 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fopenmp -finline-limit=1000 -fstrict-aliasing -funroll-all-loops -fomit-frame-pointer -ffast-math -pthread -lm -ldl
Pennant Test: leblancbig OpenBenchmarking.org Hydro Cycle Time - Seconds, Fewer Is Better Pennant 1.0.1 Test: leblancbig 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.05, N = 3 SE +/- 0.03, N = 3 49.93 47.39 1. (CXX) g++ options: -fopenmp -pthread -lmpi_cxx -lmpi
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 200 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 200 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 10K 20K 30K 40K 50K SE +/- 442.71, N = 3 SE +/- 218.90, N = 3 46624 47045 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
ACES DGEMM Sustained Floating-Point Rate OpenBenchmarking.org GFLOP/s, More Is Better ACES DGEMM 1.0 Sustained Floating-Point Rate 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1.1286 2.2572 3.3858 4.5144 5.643 SE +/- 0.034356, N = 3 SE +/- 0.013832, N = 3 4.875598 5.016124 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -march=native -fopenmp
miniFE Problem Size: Small OpenBenchmarking.org CG Mflops, More Is Better miniFE 2.2 Problem Size: Small 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1400 2800 4200 5600 7000 SE +/- 0.36, N = 3 SE +/- 1.22, N = 3 6411.43 6407.12 1. (CXX) g++ options: -O3 -fopenmp -pthread -lmpi_cxx -lmpi
R Benchmark OpenBenchmarking.org Seconds, Fewer Is Better R Benchmark 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.0236 0.0472 0.0708 0.0944 0.118 SE +/- 0.0005, N = 3 SE +/- 0.0008, N = 15 0.1044 0.1050 1. R scripting front-end version 4.0.4 (2021-02-15)
DeepSpeech Acceleration: CPU OpenBenchmarking.org Seconds, Fewer Is Better DeepSpeech 0.6 Acceleration: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 11 22 33 44 55 SE +/- 0.34, N = 3 SE +/- 0.29, N = 3 48.85 48.96
ASKAP Test: tConvolve MPI - Gridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Gridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1100 2200 3300 4400 5500 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 5046.07 5046.07 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve MPI - Degridding OpenBenchmarking.org Mpix/sec, More Is Better ASKAP 1.0 Test: tConvolve MPI - Degridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1000 2000 3000 4000 5000 SE +/- 0.00, N = 3 SE +/- 30.56, N = 3 4859.18 4889.74 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
oneDNN Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.01096, N = 3 SE +/- 0.13446, N = 14 8.78268 8.89450 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 8.63 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 8.61 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Caffe Model: AlexNet - Acceleration: CPU - Iterations: 100 OpenBenchmarking.org Milli-Seconds, Fewer Is Better Caffe 2020-02-13 Model: AlexNet - Acceleration: CPU - Iterations: 100 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 319.02, N = 3 SE +/- 282.19, N = 3 23122 23659 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -fPIC -rdynamic -lglog -lgflags -lprotobuf -lpthread -lsz -lz -ldl -lm -llmdb -lopenblas
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.04340, N = 3 SE +/- 0.01945, N = 3 7.43204 6.72455 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.53 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.2 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_1d - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.1977 0.3954 0.5931 0.7908 0.9885 SE +/- 0.001641, N = 3 SE +/- 0.007466, N = 3 0.872683 0.878588 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.82 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.8 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Intel MPI Benchmarks Test: IMB-MPI1 PingPong OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 PingPong 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2K 4K 6K 8K 10K SE +/- 140.45, N = 15 SE +/- 131.85, N = 3 10004.46 10294.25 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.66 / MAX: 34960.72 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 10.93 / MAX: 34708.09 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
oneDNN Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.5546 1.1092 1.6638 2.2184 2.773 SE +/- 0.02457, N = 3 SE +/- 0.02589, N = 5 2.44545 2.46474 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.14 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.19 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.8029 1.6058 2.4087 3.2116 4.0145 SE +/- 0.02891, N = 3 SE +/- 0.02720, N = 10 3.56837 3.52788 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 3.16 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 3.15 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.1273 0.2546 0.3819 0.5092 0.6365 SE +/- 0.005655, N = 3 SE +/- 0.005334, N = 6 0.565699 0.550588 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.47 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.45 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: NDT Mapping 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 200 400 600 800 1000 SE +/- 11.85, N = 3 SE +/- 6.53, N = 3 1033.44 1033.74 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average usec, Fewer Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 12 24 36 48 60 SE +/- 0.35, N = 3 SE +/- 0.28, N = 3 53.50 52.66 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.19 / MAX: 1786.28 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.19 / MAX: 1702.76 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Intel MPI Benchmarks Test: IMB-MPI1 Sendrecv OpenBenchmarking.org Average Mbytes/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-MPI1 Sendrecv 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3K 6K 9K 12K 15K SE +/- 111.25, N = 3 SE +/- 81.05, N = 3 12273.54 12471.89 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MAX: 66577.1 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MAX: 66000.84 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
RNNoise OpenBenchmarking.org Seconds, Fewer Is Better RNNoise 2020-06-28 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.21, N = 3 SE +/- 0.01, N = 3 16.51 17.12 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -O3 -pedantic -fvisibility=hidden
Intel MPI Benchmarks Test: IMB-P2P PingPong OpenBenchmarking.org Average Msg/sec, More Is Better Intel MPI Benchmarks 2019.3 Test: IMB-P2P PingPong 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2M 4M 6M 8M 10M SE +/- 23849.34, N = 3 SE +/- 102028.44, N = 3 8628982 8513496 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1994 / MAX: 22289308 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1946 / MAX: 22082360 1. (CXX) g++ options: -O3 -O0 -pedantic -fopenmp -pthread -lmpi_cxx -lmpi
Parboil Test: OpenMP Stencil OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP Stencil 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 4 8 12 16 20 SE +/- 0.07, N = 3 SE +/- 0.05, N = 3 15.01 14.98 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
ArrayFire Test: BLAS CPU OpenBenchmarking.org GFLOPS, More Is Better ArrayFire 3.7 Test: BLAS CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 300 600 900 1200 1500 SE +/- 0.76, N = 3 SE +/- 0.54, N = 3 1205.09 1207.87 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -rdynamic
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1.0525 2.105 3.1575 4.21 5.2625 SE +/- 0.07736, N = 15 SE +/- 0.09231, N = 15 4.67789 4.67293 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.27 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.21 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.5819 1.1638 1.7457 2.3276 2.9095 SE +/- 0.00632, N = 3 SE +/- 0.00417, N = 3 2.58637 2.57885 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.33 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.34 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 1D - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.1554 0.3108 0.4662 0.6216 0.777 SE +/- 0.006714, N = 3 SE +/- 0.006051, N = 3 0.685132 0.690873 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.6 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.6 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Algebraic Multi-Grid Benchmark OpenBenchmarking.org Figure Of Merit, More Is Better Algebraic Multi-Grid Benchmark 1.2 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 70M 140M 210M 280M 350M SE +/- 51637.29, N = 3 SE +/- 414229.41, N = 3 303975100 302617633 1. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -pthread -lmpi
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.2423 0.4846 0.7269 0.9692 1.2115 SE +/- 0.01500, N = 15 SE +/- 0.02280, N = 12 1.06856 1.07696 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 0.98 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 0.99 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster OpenBenchmarking.org Test Cases Per Minute, More Is Better Darmstadt Automotive Parallel Heterogeneous Suite Backend: OpenMP - Kernel: Euclidean Cluster 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 0.74, N = 3 SE +/- 15.22, N = 3 1671.51 1667.43 1. (CXX) g++ options: -O3 -std=c++11 -fopenmp
ASKAP Test: Hogbom Clean OpenMP OpenBenchmarking.org Iterations Per Second, More Is Better ASKAP 1.0 Test: Hogbom Clean OpenMP 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 1.10, N = 3 SE +/- 0.64, N = 3 267.39 269.30 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.5256 1.0512 1.5768 2.1024 2.628 SE +/- 0.00302, N = 3 SE +/- 0.02354, N = 3 2.33613 2.33206 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 2.05 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 2.07 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Matrix Multiply Batch Shapes Transformer - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.2671 0.5342 0.8013 1.0684 1.3355 SE +/- 0.01307, N = 3 SE +/- 0.01157, N = 3 1.18583 1.18705 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.01 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
LULESH OpenBenchmarking.org z/s, More Is Better LULESH 2.0.3 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1500 3000 4500 6000 7500 SE +/- 82.30, N = 4 SE +/- 67.35, N = 3 6872.83 6878.62 1. (CXX) g++ options: -O3 -fopenmp -lm -pthread -lmpi_cxx -lmpi
oneDNN Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: IP Shapes 3D - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.435 0.87 1.305 1.74 2.175 SE +/- 0.00686, N = 3 SE +/- 0.00843, N = 3 1.93347 1.92539 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 1.85 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 1.86 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Deconvolution Batch shapes_3d - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.9533 1.9066 2.8599 3.8132 4.7665 SE +/- 0.01117, N = 3 SE +/- 0.08547, N = 15 4.13169 4.23668 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 4.05 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 4.02 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Texture Read Bandwidth OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Texture Read Bandwidth 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 80 160 240 320 400 SE +/- 1.54, N = 3 SE +/- 1.09, N = 3 349.34 349.01 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
ASKAP Test: tConvolve OpenMP - Degridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Degridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 800 1600 2400 3200 4000 SE +/- 16.43, N = 3 SE +/- 47.99, N = 3 3614.48 3599.35 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
ASKAP Test: tConvolve OpenMP - Gridding OpenBenchmarking.org Million Grid Points Per Second, More Is Better ASKAP 1.0 Test: tConvolve OpenMP - Gridding 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 4.37, N = 3 SE +/- 12.08, N = 3 1866.30 1906.52 1. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp
GNU Octave Benchmark OpenBenchmarking.org Seconds, Fewer Is Better GNU Octave Benchmark 6.1.1~hg.2021.01.26 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 1.143 2.286 3.429 4.572 5.715 SE +/- 0.018, N = 5 SE +/- 0.026, N = 5 5.080 5.064
Timed MAFFT Alignment Multiple Sequence Alignment - LSU RNA OpenBenchmarking.org Seconds, Fewer Is Better Timed MAFFT Alignment 7.471 Multiple Sequence Alignment - LSU RNA 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.014, N = 3 SE +/- 0.012, N = 3 7.703 7.523 1. (CC) gcc options: -std=c99 -O3 -lm -lpthread
FFTW Build: Float + SSE - Size: 1D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 4096 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 1023.44, N = 3 SE +/- 1211.30, N = 3 103960 104417 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 40 80 120 160 200 SE +/- 0.12, N = 3 SE +/- 0.12, N = 3 198.4 195.2 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.06, N = 3 SE +/- 0.07, N = 3 263.6 261.2 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.17, N = 3 SE +/- 0.15, N = 3 255.3 248.4 1. (CC) gcc options: -O2 -flto -lOpenCL
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.00, N = 3 13.40 13.41 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 13.16 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.19 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: u8s8f32 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 13.28 13.41 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 12.99 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 13.07 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 2.1.2 Harness: Convolution Batch Shapes Auto - Data Type: bf16bf16bf16 - Engine: CPU 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 2 4 6 8 10 SE +/- 0.06030, N = 3 SE +/- 0.00598, N = 3 6.77334 6.34910 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 - MIN: 6.15 -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect - MIN: 6.03 1. (CXX) g++ options: -O3 -march=native -std=c++11 -fopenmp -msse4.1 -fPIC -pie -lpthread -ldl
Darktable Test: Boat - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Boat - Acceleration: OpenCL 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.934 1.868 2.802 3.736 4.67 SE +/- 0.049, N = 3 SE +/- 0.043, N = 3 3.971 4.151
Darktable Test: Masskrug - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Masskrug - Acceleration: OpenCL 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.8611 1.7222 2.5833 3.4444 4.3055 SE +/- 0.012, N = 3 SE +/- 0.007, N = 3 3.827 3.818
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: GEMM SGEMM_N OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: GEMM SGEMM_N 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 400 800 1200 1600 2000 SE +/- 8.67, N = 3 SE +/- 17.38, N = 3 1841.75 1859.40 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
Darktable Test: Server Room - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Server Room - Acceleration: OpenCL 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.6777 1.3554 2.0331 2.7108 3.3885 SE +/- 0.004, N = 3 SE +/- 0.007, N = 3 3.012 2.999
FFTW Build: Stock - Size: 1D FFT Size 4096 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 4096 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 4K 8K 12K 16K 20K SE +/- 196.08, N = 3 SE +/- 161.26, N = 3 18293 18502 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Readback OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Readback 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5 10 15 20 25 SE +/- 0.01, N = 3 SE +/- 0.21, N = 15 20.39 21.05 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
FFTW Build: Stock - Size: 2D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 2D FFT Size 32 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 297.88, N = 3 SE +/- 135.68, N = 3 22948 22827 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
FFTW Build: Stock - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Stock - Size: 1D FFT Size 32 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5K 10K 15K 20K 25K SE +/- 0.67, N = 3 SE +/- 4.36, N = 3 22774 22777 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
Parboil Test: OpenMP CUTCP OpenBenchmarking.org Seconds, Fewer Is Better Parboil 2.5 Test: OpenMP CUTCP 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.7162 1.4324 2.1486 2.8648 3.581 SE +/- 0.006360, N = 3 SE +/- 0.009619, N = 3 3.177141 3.183109 1. (CXX) g++ options: -lm -lpthread -lgomp -O3 -ffast-math -fopenmp
FFTW Build: Float + SSE - Size: 1D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 1D FFT Size 32 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 7K 14K 21K 28K 35K SE +/- 18.34, N = 3 SE +/- 195.07, N = 3 32180 31560 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
FFTW Build: Float + SSE - Size: 2D FFT Size 32 OpenBenchmarking.org Mflops, More Is Better FFTW 3.3.6 Build: Float + SSE - Size: 2D FFT Size 32 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 20K 40K 60K 80K 100K SE +/- 272.26, N = 3 SE +/- 920.77, N = 3 80496 82515 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CC) gcc options: -pthread -O3 -lm
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Triad OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Triad 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.14, N = 3 SE +/- 0.13, N = 6 12.60 12.31 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: MD5 Hash OpenBenchmarking.org GHash/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: MD5 Hash 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 3 6 9 12 15 SE +/- 0.0009, N = 3 SE +/- 0.0002, N = 3 9.3041 9.3179 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Reduction OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Reduction 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 60 120 180 240 300 SE +/- 0.22, N = 3 SE +/- 0.16, N = 3 254.13 254.41 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: FFT SP OpenBenchmarking.org GFLOPS, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: FFT SP 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 150 300 450 600 750 SE +/- 0.96, N = 3 SE +/- 0.08, N = 3 680.88 682.02 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
Darktable Test: Server Rack - Acceleration: OpenCL OpenBenchmarking.org Seconds, Fewer Is Better Darktable 3.4.1 Test: Server Rack - Acceleration: OpenCL 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 0.0299 0.0598 0.0897 0.1196 0.1495 SE +/- 0.001, N = 3 SE +/- 0.000, N = 3 0.133 0.131
SHOC Scalable HeterOgeneous Computing Target: OpenCL - Benchmark: Bus Speed Download OpenBenchmarking.org GB/s, More Is Better SHOC Scalable HeterOgeneous Computing 2020-04-17 Target: OpenCL - Benchmark: Bus Speed Download 12700k AVX512 march=sapphirerapids gcc 11.1 rx 5600xt 12700k AVX512 march=native + AVX512 gcc 11.1 rx 5600xt 5 10 15 20 25 SE +/- 0.24, N = 3 SE +/- 0.15, N = 3 20.15 19.98 -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -march=native -mavx512f -mavx512dq -mavx512ifma -mavx512cd -mavx512bw -mavx512vl -mavx512bf16 -mavx512vbmi -mavx512vbmi2 -mavx512vnni -mavx512bitalg -mavx512vpopcntdq -mavx512vp2intersect 1. (CXX) g++ options: -O3 -lSHOCCommonMPI -lSHOCCommonOpenCL -lSHOCCommon -lOpenCL -lrt -pthread -lmpi_cxx -lmpi
Phoronix Test Suite v10.8.5