Intel Xeon 6900P - SNC vs. HEX Clustering Mode

Benchmarks by Michael Larabel for a future article..

HTML result view exported from: https://openbenchmarking.org/result/2409257-NE-INTELGNRH28&sro&grs.

Intel Xeon 6900P - SNC vs. HEX Clustering ModeProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen ResolutionHEX ModeSNC3 - Default2 x Intel Xeon 6980P @ 3.90GHz (256 Cores / 512 Threads)Intel BIRCHSTREAM (BHSDCRB1.IPC.0035.D44.2408292336 BIOS)Intel Ice Lake IEH1520GB960GB SAMSUNG MZ1L2960HCJR-00A07ASPEEDIntel I210 + 2 x Intel 10-Gigabit X540-AT2Ubuntu 24.046.8.0-45-generic (x86_64)GCC 13.2.0ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x10002f0 Java Details- OpenJDK Runtime Environment (build 21.0.4+7-Ubuntu-1ubuntu224.04)Python Details- Python 3.12.3Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: BHI_DIS_S + srbds: Not affected + tsx_async_abort: Not affected

Intel Xeon 6900P - SNC vs. HEX Clustering Modebuild-linux-kernel: allmodconfigcassandra: Writesaskap: tConvolve MPI - Griddingpetsc: Streamsaskap: tConvolve MPI - Degriddinggraph500: 26tensorflow: CPU - 512 - ResNet-50build-llvm: Ninjanamd: ATPase with 327,506 Atomsbuild-linux-kernel: defconfiggraph500: 26graph500: 26easywave: e2Asean Grid + BengkuluSept2007 Source - 1200openradioss: Bird Strike on Windshieldpyhpc: CPU - Numpy - 4194304 - Equation of Stateamg: incompact3d: input.i3d 193 Cells Per Directionrelion: Basic - CPUmt-dgemm: Sustained Floating-Point Ratebuild-llvm: Unix Makefilesincompact3d: input.i3d 129 Cells Per Directiongraph500: 26openradioss: Chrysler Neon 1Mnwchem: C240 Buckyballhpcg: 160 160 160 - 60hpcg: 144 144 144 - 60dacapobench: BioJava Biological Data Frameworkdaphne: OpenMP - NDT Mappinglibxsmm: 64svt-av1: Preset 8 - Bosphorus 4Kincompact3d: X3D-benchmarking input.i3dhpcg: 104 104 104 - 60libxsmm: 128openradioss: Cell Phone Drop Testsvt-av1: Preset 13 - Beauty 4K 10-bitdaphne: OpenMP - Points2Imagespecfem3d: Layered Halfspaceopenradioss: INIVOL and Fluid Structure Interaction Drop Containerblender: BMW27 - CPU-Onlysvt-av1: Preset 8 - Beauty 4K 10-bitgpaw: Carbon Nanotubespecfem3d: Tomographic Modeldacapobench: Apache Kafkasvt-av1: Preset 5 - Beauty 4K 10-bitcompress-7zip: Compression Ratingspecfem3d: Homogeneous Halfspacebyte: System Callgromacs: MPI CPU - water_GMX50_barelammps: 20k Atomspyhpc: CPU - Numpy - 4194304 - Isoneutral Mixingcompress-7zip: Decompression Ratingblender: Pabellon Barcelona - CPU-Onlypgbench: 100 - 1000 - Read Write - Average Latencypgbench: 100 - 1000 - Read Writesvt-av1: Preset 5 - Bosphorus 4Keasywave: e2Asean Grid + BengkuluSept2007 Source - 2400specfem3d: Water-layered Halfspacebyte: Dhrystone 2blender: Barbershop - CPU-Onlyspecfem3d: Mount St. Helensblender: Fishy Cat - CPU-Onlysvt-av1: Preset 3 - Bosphorus 4Kblender: Junkshop - CPU-Onlyblender: Classroom - CPU-Onlydacapobench: Jythonbyte: Whetstone Doubleopenradioss: Bumper Beampgbench: 100 - 1000 - Read Only - Average Latencypgbench: 100 - 1000 - Read Onlydaphne: OpenMP - Euclidean Clusterstockfish: Chess Benchmarksvt-av1: Preset 13 - Bosphorus 4Kdacapobench: H2 Database Enginedacapobench: Apache Xalan XSLTdacapobench: Apache Tomcatlammps: Rhodopsin Proteinopenradioss: Rubber O-Ring Seal Installationlibxsmm: 32libxsmm: 256namd: STMV with 1,066,628 Atomsminibude: OpenMP - BM2minibude: OpenMP - BM2HEX ModeSNC3 - Default193.11414712592985.2595124.231682064.2673326000214.7394.0614.4997026.983964107000153033000059.596172.451.71377083950002.82190266114.3973865.658800214.4390.903198322195467000066.461779.7159.835161.9165791547.435554.066.99272.1088104168.7847862.240.7313.4104243.697.42268851893.137.698.19188.8664.44864455259955.6849117745.513972044910572120.132.13792.7521.931137797921.5473.2261365631.193140.4389.32128229118804694499.068.404.01897477310.499.10610.8917.5135683728535.6110.681.645638580676.38566141415195.989171782500893370.474231.103214.12706.02.55596283.7577093.931131.149100742125342449771.2850107661871769000173.9976.7303.7875523.4431105950000174879000052.965154.561.54685042013332.56438231104.0513545.814232197.1290.831954996211624000061.481660.6170.002171.2055503574.845826.563.96768.8912379176.0098182.939.1812.9794375.317.20756925990.637.497.99186.8774.36123485660985.7738977255.433174239897496230.932.59294.0521.907139494721.2874.0431350630.856141.9059.24385098118658833934.368.003.99571036610.439.15510.8517.5535623722311.4110.551.843542652612.85587474317193.486112382595922371.402219.853507.44213.91.97540268.5306713.241OpenBenchmarking.org

Timed Linux Kernel Compilation

Build: allmodconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.8Build: allmodconfigHEX ModeSNC3 - Default4080120160200SE +/- 0.27, N = 3SE +/- 1.64, N = 3193.11131.15

Apache Cassandra

Test: Writes

OpenBenchmarking.orgOp/s, More Is BetterApache Cassandra 5.0Test: WritesHEX ModeSNC3 - Default30K60K90K120K150KSE +/- 1522.45, N = 3SE +/- 1022.25, N = 3147125100742

ASKAP

Test: tConvolve MPI - Gridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - GriddingHEX ModeSNC3 - Default30K60K90K120K150KSE +/- 1249.70, N = 3SE +/- 1080.26, N = 392985.2125342.01. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

PETSc

Test: Streams

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsHEX ModeSNC3 - Default130K260K390K520K650KSE +/- 1960.36, N = 3SE +/- 5879.52, N = 4595124.23449771.291. (CC) gcc options: -fPIC -O3 -O2 -lpthread -lm

ASKAP

Test: tConvolve MPI - Degridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - DegriddingHEX ModeSNC3 - Default20K40K60K80K100KSE +/- 704.03, N = 3SE +/- 797.06, N = 382064.2107661.01. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

Graph500

Scale: 26

OpenBenchmarking.orgsssp median_TEPS, More Is BetterGraph500 3.0Scale: 26HEX ModeSNC3 - Default200M400M600M800M1000M6733260008717690001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

TensorFlow

Device: CPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: ResNet-50HEX ModeSNC3 - Default50100150200250SE +/- 2.28, N = 4SE +/- 1.54, N = 3214.73173.99

Timed LLVM Compilation

Build System: Ninja

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 16.0Build System: NinjaHEX ModeSNC3 - Default20406080100SE +/- 0.68, N = 15SE +/- 0.86, N = 594.0676.73

NAMD

Input: ATPase with 327,506 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0Input: ATPase with 327,506 AtomsHEX ModeSNC3 - Default1.01242.02483.03724.04965.062SE +/- 0.05577, N = 15SE +/- 0.01928, N = 34.499703.78755

Timed Linux Kernel Compilation

Build: defconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.8Build: defconfigHEX ModeSNC3 - Default612182430SE +/- 0.23, N = 8SE +/- 0.20, N = 826.9823.44

Graph500

Scale: 26

OpenBenchmarking.orgsssp max_TEPS, More Is BetterGraph500 3.0Scale: 26HEX ModeSNC3 - Default200M400M600M800M1000M96410700011059500001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgbfs median_TEPS, More Is BetterGraph500 3.0Scale: 26HEX ModeSNC3 - Default400M800M1200M1600M2000M153033000017487900001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

easyWave

Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200

OpenBenchmarking.orgSeconds, Fewer Is BettereasyWave r34Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200HEX ModeSNC3 - Default1326395265SE +/- 0.84, N = 12SE +/- 0.59, N = 559.6052.971. (CXX) g++ options: -O3 -fopenmp

OpenRadioss

Model: Bird Strike on Windshield

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Bird Strike on WindshieldHEX ModeSNC3 - Default4080120160200SE +/- 0.18, N = 3SE +/- 0.38, N = 3172.45154.56

PyHPC Benchmarks

Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of State

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of StateHEX ModeSNC3 - Default0.38540.77081.15621.54161.927SE +/- 0.012, N = 3SE +/- 0.008, N = 31.7131.546

Algebraic Multi-Grid Benchmark

OpenBenchmarking.orgFigure Of Merit, More Is BetterAlgebraic Multi-Grid Benchmark 1.2HEX ModeSNC3 - Default2000M4000M6000M8000M10000MSE +/- 20523024.36, N = 3SE +/- 11711915.09, N = 3770839500085042013331. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -lmpi

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per DirectionHEX ModeSNC3 - Default0.63491.26981.90472.53963.1745SE +/- 0.02990610, N = 15SE +/- 0.01327750, N = 32.821902662.564382311. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

RELION

Test: Basic - Device: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterRELION 4.0.1Test: Basic - Device: CPUHEX ModeSNC3 - Default306090120150SE +/- 0.83, N = 3SE +/- 0.54, N = 3114.40104.051. (CXX) g++ options: -fopenmp -std=c++11 -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -ljpeg -lmpi_cxx -lmpi

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateHEX ModeSNC3 - Default8001600240032004000SE +/- 5.73, N = 3SE +/- 6.73, N = 33865.663545.811. (CC) gcc options: -ffast-math -mavx2 -O3 -fopenmp -lopenblas

Timed LLVM Compilation

Build System: Unix Makefiles

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 16.0Build System: Unix MakefilesHEX ModeSNC3 - Default50100150200250SE +/- 1.09, N = 3SE +/- 0.78, N = 3214.44197.13

Xcompact3d Incompact3d

Input: input.i3d 129 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 129 Cells Per DirectionHEX ModeSNC3 - Default0.20320.40640.60960.81281.016SE +/- 0.011942737, N = 3SE +/- 0.005913486, N = 150.9031983220.8319549961. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Graph500

Scale: 26

OpenBenchmarking.orgbfs max_TEPS, More Is BetterGraph500 3.0Scale: 26HEX ModeSNC3 - Default500M1000M1500M2000M2500M195467000021162400001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

OpenRadioss

Model: Chrysler Neon 1M

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Chrysler Neon 1MHEX ModeSNC3 - Default1530456075SE +/- 0.38, N = 3SE +/- 0.37, N = 366.4661.48

NWChem

Input: C240 Buckyball

OpenBenchmarking.orgSeconds, Fewer Is BetterNWChem 7.0.2Input: C240 BuckyballHEX ModeSNC3 - Default4008001200160020001779.71660.61. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -lcomex -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2

High Performance Conjugate Gradient

X Y Z: 160 160 160 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HEX ModeSNC3 - Default4080120160200SE +/- 0.29, N = 3SE +/- 0.02, N = 3159.84170.001. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

High Performance Conjugate Gradient

X Y Z: 144 144 144 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HEX ModeSNC3 - Default4080120160200SE +/- 0.22, N = 3SE +/- 0.29, N = 3161.92171.211. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

DaCapo Benchmark

Java Test: BioJava Biological Data Framework

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: BioJava Biological Data FrameworkHEX ModeSNC3 - Default12002400360048006000SE +/- 50.95, N = 8SE +/- 42.20, N = 1557915503

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: NDT Mapping

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous Suite 2021.11.02Backend: OpenMP - Kernel: NDT MappingHEX ModeSNC3 - Default120240360480600SE +/- 1.66, N = 3SE +/- 8.56, N = 12547.43574.841. (CXX) g++ options: -O3 -std=c++11 -fopenmp

libxsmm

M N K: 64

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64HEX ModeSNC3 - Default12002400360048006000SE +/- 68.57, N = 15SE +/- 63.47, N = 155554.05826.51. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 8 - Input: Bosphorus 4KHEX ModeSNC3 - Default1530456075SE +/- 0.74, N = 3SE +/- 0.65, N = 366.9963.971. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Xcompact3d Incompact3d

Input: X3D-benchmarking input.i3d

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: X3D-benchmarking input.i3dHEX ModeSNC3 - Default1632486480SE +/- 0.09, N = 3SE +/- 0.42, N = 372.1168.891. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

High Performance Conjugate Gradient

X Y Z: 104 104 104 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60HEX ModeSNC3 - Default4080120160200SE +/- 0.15, N = 3SE +/- 0.92, N = 3168.78176.011. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

libxsmm

M N K: 128

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128HEX ModeSNC3 - Default2K4K6K8K10KSE +/- 151.59, N = 9SE +/- 103.51, N = 37862.28182.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

OpenRadioss

Model: Cell Phone Drop Test

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Cell Phone Drop TestHEX ModeSNC3 - Default918273645SE +/- 0.29, N = 3SE +/- 0.06, N = 340.7339.18

SVT-AV1

Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 13 - Input: Beauty 4K 10-bitHEX ModeSNC3 - Default3691215SE +/- 0.00, N = 3SE +/- 0.02, N = 313.4112.981. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: Points2Image

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous Suite 2021.11.02Backend: OpenMP - Kernel: Points2ImageHEX ModeSNC3 - Default9001800270036004500SE +/- 47.68, N = 15SE +/- 52.66, N = 34243.694375.311. (CXX) g++ options: -O3 -std=c++11 -fopenmp

SPECFEM3D

Model: Layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Layered HalfspaceHEX ModeSNC3 - Default246810SE +/- 0.036865236, N = 3SE +/- 0.060537089, N = 37.4226885187.2075692591. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

OpenRadioss

Model: INIVOL and Fluid Structure Interaction Drop Container

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: INIVOL and Fluid Structure Interaction Drop ContainerHEX ModeSNC3 - Default20406080100SE +/- 0.28, N = 3SE +/- 0.25, N = 393.1390.63

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: BMW27 - Compute: CPU-OnlyHEX ModeSNC3 - Default246810SE +/- 0.05, N = 3SE +/- 0.02, N = 37.697.49

SVT-AV1

Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 8 - Input: Beauty 4K 10-bitHEX ModeSNC3 - Default246810SE +/- 0.009, N = 3SE +/- 0.055, N = 38.1917.9911. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

GPAW

Input: Carbon Nanotube

OpenBenchmarking.orgSeconds, Fewer Is BetterGPAW 23.6Input: Carbon NanotubeHEX ModeSNC3 - Default20406080100SE +/- 0.28, N = 3SE +/- 0.72, N = 388.8786.881. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi

SPECFEM3D

Model: Tomographic Model

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Tomographic ModelHEX ModeSNC3 - Default1.00092.00183.00274.00365.0045SE +/- 0.014802330, N = 3SE +/- 0.014525946, N = 34.4486445524.3612348561. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

DaCapo Benchmark

Java Test: Apache Kafka

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: Apache KafkaHEX ModeSNC3 - Default13002600390052006500SE +/- 1.20, N = 3SE +/- 2.60, N = 359956098

SVT-AV1

Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 5 - Input: Beauty 4K 10-bitHEX ModeSNC3 - Default1.29892.59783.89675.19566.4945SE +/- 0.018, N = 3SE +/- 0.006, N = 35.6845.7731. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 24.05Test: Compression RatingHEX ModeSNC3 - Default200K400K600K800K1000KSE +/- 5282.93, N = 3SE +/- 8001.79, N = 39117748977251. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

SPECFEM3D

Model: Homogeneous Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Homogeneous HalfspaceHEX ModeSNC3 - Default1.24062.48123.72184.96246.203SE +/- 0.007527831, N = 3SE +/- 0.012290725, N = 35.5139720445.4331742391. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

BYTE Unix Benchmark

Computational Test: System Call

OpenBenchmarking.orgLPS, More Is BetterBYTE Unix Benchmark 5.1.3-gitComputational Test: System CallHEX ModeSNC3 - Default200M400M600M800M1000MSE +/- 144616.99, N = 3SE +/- 219845.26, N = 3910572120.1897496230.91. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareHEX ModeSNC3 - Default816243240SE +/- 0.05, N = 3SE +/- 0.10, N = 332.1432.591. (CXX) g++ options: -O3 -lm

LAMMPS Molecular Dynamics Simulator

Model: 20k Atoms

OpenBenchmarking.orgns/day, More Is BetterLAMMPS Molecular Dynamics Simulator 23Jun2022Model: 20k AtomsHEX ModeSNC3 - Default20406080100SE +/- 0.33, N = 3SE +/- 0.21, N = 392.7594.051. (CXX) g++ options: -O3 -lm -ldl

PyHPC Benchmarks

Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral Mixing

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral MixingHEX ModeSNC3 - Default0.43450.8691.30351.7382.1725SE +/- 0.011, N = 3SE +/- 0.020, N = 31.9311.907

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 24.05Test: Decompression RatingHEX ModeSNC3 - Default300K600K900K1200K1500KSE +/- 5512.25, N = 3SE +/- 10392.40, N = 3137797913949471. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

Blender

Blend File: Pabellon Barcelona - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: Pabellon Barcelona - Compute: CPU-OnlyHEX ModeSNC3 - Default510152025SE +/- 0.03, N = 3SE +/- 0.05, N = 321.5421.28

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Write - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read Write - Average LatencyHEX ModeSNC3 - Default1632486480SE +/- 0.01, N = 3SE +/- 0.20, N = 373.2374.041. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read WriteHEX ModeSNC3 - Default3K6K9K12K15KSE +/- 2.00, N = 3SE +/- 37.41, N = 313656135061. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

SVT-AV1

Encoder Mode: Preset 5 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 5 - Input: Bosphorus 4KHEX ModeSNC3 - Default714212835SE +/- 0.19, N = 3SE +/- 0.35, N = 331.1930.861. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

easyWave

Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400

OpenBenchmarking.orgSeconds, Fewer Is BettereasyWave r34Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400HEX ModeSNC3 - Default306090120150SE +/- 1.63, N = 12SE +/- 1.82, N = 12140.44141.911. (CXX) g++ options: -O3 -fopenmp

SPECFEM3D

Model: Water-layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Water-layered HalfspaceHEX ModeSNC3 - Default3691215SE +/- 0.016038732, N = 3SE +/- 0.054952663, N = 39.3212822919.2438509811. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

BYTE Unix Benchmark

Computational Test: Dhrystone 2

OpenBenchmarking.orgLPS, More Is BetterBYTE Unix Benchmark 5.1.3-gitComputational Test: Dhrystone 2HEX ModeSNC3 - Default4000M8000M12000M16000M20000MSE +/- 9501831.17, N = 3SE +/- 22131931.33, N = 318804694499.018658833934.31. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: Barbershop - Compute: CPU-OnlyHEX ModeSNC3 - Default1530456075SE +/- 0.18, N = 3SE +/- 0.33, N = 368.4068.00

SPECFEM3D

Model: Mount St. Helens

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Mount St. HelensHEX ModeSNC3 - Default0.90431.80862.71293.61724.5215SE +/- 0.026444752, N = 3SE +/- 0.037097693, N = 74.0189747733.9957103661. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: Fishy Cat - Compute: CPU-OnlyHEX ModeSNC3 - Default3691215SE +/- 0.10, N = 3SE +/- 0.12, N = 310.4910.43

SVT-AV1

Encoder Mode: Preset 3 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 3 - Input: Bosphorus 4KHEX ModeSNC3 - Default3691215SE +/- 0.031, N = 3SE +/- 0.011, N = 39.1069.1551. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

Blender

Blend File: Junkshop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: Junkshop - Compute: CPU-OnlyHEX ModeSNC3 - Default3691215SE +/- 0.06, N = 3SE +/- 0.03, N = 310.8910.85

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: Classroom - Compute: CPU-OnlyHEX ModeSNC3 - Default48121620SE +/- 0.04, N = 3SE +/- 0.14, N = 317.5117.55

DaCapo Benchmark

Java Test: Jython

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: JythonHEX ModeSNC3 - Default8001600240032004000SE +/- 22.30, N = 3SE +/- 41.68, N = 335683562

BYTE Unix Benchmark

Computational Test: Whetstone Double

OpenBenchmarking.orgMWIPS, More Is BetterBYTE Unix Benchmark 5.1.3-gitComputational Test: Whetstone DoubleHEX ModeSNC3 - Default800K1600K2400K3200K4000KSE +/- 113.17, N = 3SE +/- 382.28, N = 33728535.63722311.41. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm

OpenRadioss

Model: Bumper Beam

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Bumper BeamHEX ModeSNC3 - Default20406080100SE +/- 0.36, N = 3SE +/- 0.43, N = 3110.68110.55

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average LatencyHEX ModeSNC3 - Default0.41470.82941.24411.65882.0735SE +/- 0.108, N = 12SE +/- 0.023, N = 31.6451.8431. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read OnlyHEX ModeSNC3 - Default140K280K420K560K700KSE +/- 42784.99, N = 12SE +/- 6739.41, N = 36385805426521. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: Euclidean Cluster

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous Suite 2021.11.02Backend: OpenMP - Kernel: Euclidean ClusterHEX ModeSNC3 - Default150300450600750SE +/- 5.87, N = 3SE +/- 15.97, N = 12676.38612.851. (CXX) g++ options: -O3 -std=c++11 -fopenmp

Stockfish

Chess Benchmark

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 17Chess BenchmarkHEX ModeSNC3 - Default130M260M390M520M650MSE +/- 10082654.90, N = 9SE +/- 16310733.41, N = 65661414155874743171. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -funroll-loops -msse -msse3 -mpopcnt -mavx2 -mbmi -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto-partition=one -flto=jobserver

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 13 - Input: Bosphorus 4KHEX ModeSNC3 - Default4080120160200SE +/- 4.03, N = 12SE +/- 0.39, N = 3195.99193.491. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

DaCapo Benchmark

Java Test: H2 Database Engine

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: H2 Database EngineHEX ModeSNC3 - Default4K8K12K16K20KSE +/- 372.20, N = 15SE +/- 336.25, N = 151717811238

DaCapo Benchmark

Java Test: Apache Xalan XSLT

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: Apache Xalan XSLTHEX ModeSNC3 - Default6001200180024003000SE +/- 40.23, N = 15SE +/- 50.69, N = 1525002595

DaCapo Benchmark

Java Test: Apache Tomcat

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: Apache TomcatHEX ModeSNC3 - Default2K4K6K8K10KSE +/- 144.35, N = 15SE +/- 74.15, N = 1589339223

LAMMPS Molecular Dynamics Simulator

Model: Rhodopsin Protein

OpenBenchmarking.orgns/day, More Is BetterLAMMPS Molecular Dynamics Simulator 23Jun2022Model: Rhodopsin ProteinHEX ModeSNC3 - Default1632486480SE +/- 0.88, N = 3SE +/- 1.30, N = 1270.4771.401. (CXX) g++ options: -O3 -lm -ldl

OpenRadioss

Model: Rubber O-Ring Seal Installation

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Rubber O-Ring Seal InstallationHEX ModeSNC3 - Default50100150200250SE +/- 6.59, N = 9SE +/- 3.87, N = 12231.10219.85

libxsmm

M N K: 32

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32HEX ModeSNC3 - Default8001600240032004000SE +/- 68.51, N = 12SE +/- 58.49, N = 123214.13507.41. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

libxsmm

M N K: 256

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HEX ModeSNC3 - Default9001800270036004500SE +/- 36.38, N = 3SE +/- 124.41, N = 152706.04213.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

NAMD

Input: STMV with 1,066,628 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0Input: STMV with 1,066,628 AtomsHEX ModeSNC3 - Default0.57511.15021.72532.30042.8755SE +/- 0.06201, N = 15SE +/- 0.01818, N = 132.555961.97540

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2HEX ModeSNC3 - Default60120180240300SE +/- 2.64, N = 6SE +/- 6.07, N = 12283.76268.531. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2HEX ModeSNC3 - Default15003000450060007500SE +/- 66.01, N = 6SE +/- 151.76, N = 127093.936713.241. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm


Phoronix Test Suite v10.8.5