Intel Xeon 6900P - SNC vs. HEX Clustering Mode

Benchmarks by Michael Larabel for a future article..

HTML result view exported from: https://openbenchmarking.org/result/2409257-NE-INTELGNRH28&grt&sro.

Intel Xeon 6900P - SNC vs. HEX Clustering ModeProcessorMotherboardChipsetMemoryDiskGraphicsNetworkOSKernelCompilerFile-SystemScreen ResolutionHEX ModeSNC3 - Default2 x Intel Xeon 6980P @ 3.90GHz (256 Cores / 512 Threads)Intel BIRCHSTREAM (BHSDCRB1.IPC.0035.D44.2408292336 BIOS)Intel Ice Lake IEH1520GB960GB SAMSUNG MZ1L2960HCJR-00A07ASPEEDIntel I210 + 2 x Intel 10-Gigabit X540-AT2Ubuntu 24.046.8.0-45-generic (x86_64)GCC 13.2.0ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details- Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x10002f0 Java Details- OpenJDK Runtime Environment (build 21.0.4+7-Ubuntu-1ubuntu224.04)Python Details- Python 3.12.3Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: BHI_DIS_S + srbds: Not affected + tsx_async_abort: Not affected

Intel Xeon 6900P - SNC vs. HEX Clustering Modecompress-7zip: Compression Ratingcompress-7zip: Decompression Ratingmt-dgemm: Sustained Floating-Point Rateamg: cassandra: Writesaskap: tConvolve MPI - Degriddingaskap: tConvolve MPI - Griddingblender: BMW27 - CPU-Onlyblender: Junkshop - CPU-Onlyblender: Classroom - CPU-Onlyblender: Fishy Cat - CPU-Onlyblender: Barbershop - CPU-Onlyblender: Pabellon Barcelona - CPU-Onlybyte: Dhrystone 2byte: System Callbyte: Whetstone Doubledacapobench: Jythondacapobench: Apache Kafkadacapobench: Apache Tomcatdacapobench: Apache Xalan XSLTdacapobench: H2 Database Enginedacapobench: BioJava Biological Data Frameworkdaphne: OpenMP - NDT Mappingdaphne: OpenMP - Points2Imagedaphne: OpenMP - Euclidean Clustereasywave: e2Asean Grid + BengkuluSept2007 Source - 1200easywave: e2Asean Grid + BengkuluSept2007 Source - 2400gpaw: Carbon Nanotubegraph500: 26graph500: 26graph500: 26graph500: 26gromacs: MPI CPU - water_GMX50_barehpcg: 104 104 104 - 60hpcg: 144 144 144 - 60hpcg: 160 160 160 - 60lammps: 20k Atomslammps: Rhodopsin Proteinlibxsmm: 128libxsmm: 256libxsmm: 32libxsmm: 64minibude: OpenMP - BM2minibude: OpenMP - BM2namd: ATPase with 327,506 Atomsnamd: STMV with 1,066,628 Atomsnwchem: C240 Buckyballopenradioss: Bumper Beamopenradioss: Chrysler Neon 1Mopenradioss: Cell Phone Drop Testopenradioss: Bird Strike on Windshieldopenradioss: Rubber O-Ring Seal Installationopenradioss: INIVOL and Fluid Structure Interaction Drop Containerpetsc: Streamspgbench: 100 - 1000 - Read Onlypgbench: 100 - 1000 - Read Only - Average Latencypgbench: 100 - 1000 - Read Writepgbench: 100 - 1000 - Read Write - Average Latencypyhpc: CPU - Numpy - 4194304 - Equation of Statepyhpc: CPU - Numpy - 4194304 - Isoneutral Mixingrelion: Basic - CPUspecfem3d: Mount St. Helensspecfem3d: Layered Halfspacespecfem3d: Tomographic Modelspecfem3d: Homogeneous Halfspacespecfem3d: Water-layered Halfspacestockfish: Chess Benchmarksvt-av1: Preset 3 - Bosphorus 4Ksvt-av1: Preset 5 - Bosphorus 4Ksvt-av1: Preset 8 - Bosphorus 4Ksvt-av1: Preset 13 - Bosphorus 4Ksvt-av1: Preset 5 - Beauty 4K 10-bitsvt-av1: Preset 8 - Beauty 4K 10-bitsvt-av1: Preset 13 - Beauty 4K 10-bittensorflow: CPU - 512 - ResNet-50build-linux-kernel: defconfigbuild-linux-kernel: allmodconfigbuild-llvm: Ninjabuild-llvm: Unix Makefilesincompact3d: X3D-benchmarking input.i3dincompact3d: input.i3d 129 Cells Per Directionincompact3d: input.i3d 193 Cells Per DirectionHEX ModeSNC3 - Default91177413779793865.658800770839500014712582064.292985.27.6910.8917.5110.4968.4021.5418804694499.0910572120.13728535.63568599589332500171785791547.434243.69676.3859.596140.43888.8661530330000195467000067332600096410700032.137168.784161.916159.83592.75270.4747862.22706.03214.15554.07093.931283.7574.499702.555961779.7110.6866.4640.73172.45231.1093.13595124.23166385801.6451365673.2261.7131.931114.3974.0189747737.4226885184.4486445525.5139720449.3212822915661414159.10631.19366.992195.9895.6848.19113.410214.7326.983193.11494.061214.43972.10881040.9031983222.8219026689772513949473545.81423285042013331007421076611253427.4910.8517.5510.4368.0021.2818658833934.3897496230.93722311.43562609892232595112385503574.844375.31612.8552.965141.90586.87717487900002116240000871769000110595000032.592176.009171.205170.00294.05271.4028182.94213.93507.45826.56713.241268.5303.787551.975401660.6110.5561.4839.18154.56219.8590.63449771.28505426521.8431350674.0431.5461.907104.0513.9957103667.2075692594.3612348565.4331742399.2438509815874743179.15530.85663.967193.4865.7737.99112.979173.9923.443131.14976.730197.12968.89123790.8319549962.56438231OpenBenchmarking.org

7-Zip Compression

Test: Compression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 24.05Test: Compression RatingHEX ModeSNC3 - Default200K400K600K800K1000KSE +/- 5282.93, N = 3SE +/- 8001.79, N = 39117748977251. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

7-Zip Compression

Test: Decompression Rating

OpenBenchmarking.orgMIPS, More Is Better7-Zip Compression 24.05Test: Decompression RatingHEX ModeSNC3 - Default300K600K900K1200K1500KSE +/- 5512.25, N = 3SE +/- 10392.40, N = 3137797913949471. (CXX) g++ options: -lpthread -ldl -O2 -fPIC

ACES DGEMM

Sustained Floating-Point Rate

OpenBenchmarking.orgGFLOP/s, More Is BetterACES DGEMM 1.0Sustained Floating-Point RateHEX ModeSNC3 - Default8001600240032004000SE +/- 5.73, N = 3SE +/- 6.73, N = 33865.663545.811. (CC) gcc options: -ffast-math -mavx2 -O3 -fopenmp -lopenblas

Algebraic Multi-Grid Benchmark

OpenBenchmarking.orgFigure Of Merit, More Is BetterAlgebraic Multi-Grid Benchmark 1.2HEX ModeSNC3 - Default2000M4000M6000M8000M10000MSE +/- 20523024.36, N = 3SE +/- 11711915.09, N = 3770839500085042013331. (CC) gcc options: -lparcsr_ls -lparcsr_mv -lseq_mv -lIJ_mv -lkrylov -lHYPRE_utilities -lm -fopenmp -lmpi

Apache Cassandra

Test: Writes

OpenBenchmarking.orgOp/s, More Is BetterApache Cassandra 5.0Test: WritesHEX ModeSNC3 - Default30K60K90K120K150KSE +/- 1522.45, N = 3SE +/- 1022.25, N = 3147125100742

ASKAP

Test: tConvolve MPI - Degridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - DegriddingHEX ModeSNC3 - Default20K40K60K80K100KSE +/- 704.03, N = 3SE +/- 797.06, N = 382064.2107661.01. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

ASKAP

Test: tConvolve MPI - Gridding

OpenBenchmarking.orgMpix/sec, More Is BetterASKAP 1.0Test: tConvolve MPI - GriddingHEX ModeSNC3 - Default30K60K90K120K150KSE +/- 1249.70, N = 3SE +/- 1080.26, N = 392985.2125342.01. (CXX) g++ options: -O3 -fstrict-aliasing -fopenmp

Blender

Blend File: BMW27 - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: BMW27 - Compute: CPU-OnlyHEX ModeSNC3 - Default246810SE +/- 0.05, N = 3SE +/- 0.02, N = 37.697.49

Blender

Blend File: Junkshop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: Junkshop - Compute: CPU-OnlyHEX ModeSNC3 - Default3691215SE +/- 0.06, N = 3SE +/- 0.03, N = 310.8910.85

Blender

Blend File: Classroom - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: Classroom - Compute: CPU-OnlyHEX ModeSNC3 - Default48121620SE +/- 0.04, N = 3SE +/- 0.14, N = 317.5117.55

Blender

Blend File: Fishy Cat - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: Fishy Cat - Compute: CPU-OnlyHEX ModeSNC3 - Default3691215SE +/- 0.10, N = 3SE +/- 0.12, N = 310.4910.43

Blender

Blend File: Barbershop - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: Barbershop - Compute: CPU-OnlyHEX ModeSNC3 - Default1530456075SE +/- 0.18, N = 3SE +/- 0.33, N = 368.4068.00

Blender

Blend File: Pabellon Barcelona - Compute: CPU-Only

OpenBenchmarking.orgSeconds, Fewer Is BetterBlender 4.2Blend File: Pabellon Barcelona - Compute: CPU-OnlyHEX ModeSNC3 - Default510152025SE +/- 0.03, N = 3SE +/- 0.05, N = 321.5421.28

BYTE Unix Benchmark

Computational Test: Dhrystone 2

OpenBenchmarking.orgLPS, More Is BetterBYTE Unix Benchmark 5.1.3-gitComputational Test: Dhrystone 2HEX ModeSNC3 - Default4000M8000M12000M16000M20000MSE +/- 9501831.17, N = 3SE +/- 22131931.33, N = 318804694499.018658833934.31. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm

BYTE Unix Benchmark

Computational Test: System Call

OpenBenchmarking.orgLPS, More Is BetterBYTE Unix Benchmark 5.1.3-gitComputational Test: System CallHEX ModeSNC3 - Default200M400M600M800M1000MSE +/- 144616.99, N = 3SE +/- 219845.26, N = 3910572120.1897496230.91. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm

BYTE Unix Benchmark

Computational Test: Whetstone Double

OpenBenchmarking.orgMWIPS, More Is BetterBYTE Unix Benchmark 5.1.3-gitComputational Test: Whetstone DoubleHEX ModeSNC3 - Default800K1600K2400K3200K4000KSE +/- 113.17, N = 3SE +/- 382.28, N = 33728535.63722311.41. (CC) gcc options: -pedantic -O3 -ffast-math -march=native -mtune=native -lm

DaCapo Benchmark

Java Test: Jython

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: JythonHEX ModeSNC3 - Default8001600240032004000SE +/- 22.30, N = 3SE +/- 41.68, N = 335683562

DaCapo Benchmark

Java Test: Apache Kafka

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: Apache KafkaHEX ModeSNC3 - Default13002600390052006500SE +/- 1.20, N = 3SE +/- 2.60, N = 359956098

DaCapo Benchmark

Java Test: Apache Tomcat

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: Apache TomcatHEX ModeSNC3 - Default2K4K6K8K10KSE +/- 144.35, N = 15SE +/- 74.15, N = 1589339223

DaCapo Benchmark

Java Test: Apache Xalan XSLT

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: Apache Xalan XSLTHEX ModeSNC3 - Default6001200180024003000SE +/- 40.23, N = 15SE +/- 50.69, N = 1525002595

DaCapo Benchmark

Java Test: H2 Database Engine

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: H2 Database EngineHEX ModeSNC3 - Default4K8K12K16K20KSE +/- 372.20, N = 15SE +/- 336.25, N = 151717811238

DaCapo Benchmark

Java Test: BioJava Biological Data Framework

OpenBenchmarking.orgmsec, Fewer Is BetterDaCapo Benchmark 23.11Java Test: BioJava Biological Data FrameworkHEX ModeSNC3 - Default12002400360048006000SE +/- 50.95, N = 8SE +/- 42.20, N = 1557915503

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: NDT Mapping

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous Suite 2021.11.02Backend: OpenMP - Kernel: NDT MappingHEX ModeSNC3 - Default120240360480600SE +/- 1.66, N = 3SE +/- 8.56, N = 12547.43574.841. (CXX) g++ options: -O3 -std=c++11 -fopenmp

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: Points2Image

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous Suite 2021.11.02Backend: OpenMP - Kernel: Points2ImageHEX ModeSNC3 - Default9001800270036004500SE +/- 47.68, N = 15SE +/- 52.66, N = 34243.694375.311. (CXX) g++ options: -O3 -std=c++11 -fopenmp

Darmstadt Automotive Parallel Heterogeneous Suite

Backend: OpenMP - Kernel: Euclidean Cluster

OpenBenchmarking.orgTest Cases Per Minute, More Is BetterDarmstadt Automotive Parallel Heterogeneous Suite 2021.11.02Backend: OpenMP - Kernel: Euclidean ClusterHEX ModeSNC3 - Default150300450600750SE +/- 5.87, N = 3SE +/- 15.97, N = 12676.38612.851. (CXX) g++ options: -O3 -std=c++11 -fopenmp

easyWave

Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200

OpenBenchmarking.orgSeconds, Fewer Is BettereasyWave r34Input: e2Asean Grid + BengkuluSept2007 Source - Time: 1200HEX ModeSNC3 - Default1326395265SE +/- 0.84, N = 12SE +/- 0.59, N = 559.6052.971. (CXX) g++ options: -O3 -fopenmp

easyWave

Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400

OpenBenchmarking.orgSeconds, Fewer Is BettereasyWave r34Input: e2Asean Grid + BengkuluSept2007 Source - Time: 2400HEX ModeSNC3 - Default306090120150SE +/- 1.63, N = 12SE +/- 1.82, N = 12140.44141.911. (CXX) g++ options: -O3 -fopenmp

GPAW

Input: Carbon Nanotube

OpenBenchmarking.orgSeconds, Fewer Is BetterGPAW 23.6Input: Carbon NanotubeHEX ModeSNC3 - Default20406080100SE +/- 0.28, N = 3SE +/- 0.72, N = 388.8786.881. (CC) gcc options: -shared -fwrapv -O2 -lxc -lblas -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgbfs median_TEPS, More Is BetterGraph500 3.0Scale: 26HEX ModeSNC3 - Default400M800M1200M1600M2000M153033000017487900001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgbfs max_TEPS, More Is BetterGraph500 3.0Scale: 26HEX ModeSNC3 - Default500M1000M1500M2000M2500M195467000021162400001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgsssp median_TEPS, More Is BetterGraph500 3.0Scale: 26HEX ModeSNC3 - Default200M400M600M800M1000M6733260008717690001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

Graph500

Scale: 26

OpenBenchmarking.orgsssp max_TEPS, More Is BetterGraph500 3.0Scale: 26HEX ModeSNC3 - Default200M400M600M800M1000M96410700011059500001. (CC) gcc options: -fcommon -O3 -lpthread -lm -lmpi

GROMACS

Implementation: MPI CPU - Input: water_GMX50_bare

OpenBenchmarking.orgNs Per Day, More Is BetterGROMACS 2024Implementation: MPI CPU - Input: water_GMX50_bareHEX ModeSNC3 - Default816243240SE +/- 0.05, N = 3SE +/- 0.10, N = 332.1432.591. (CXX) g++ options: -O3 -lm

High Performance Conjugate Gradient

X Y Z: 104 104 104 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 104 104 104 - RT: 60HEX ModeSNC3 - Default4080120160200SE +/- 0.15, N = 3SE +/- 0.92, N = 3168.78176.011. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

High Performance Conjugate Gradient

X Y Z: 144 144 144 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 144 144 144 - RT: 60HEX ModeSNC3 - Default4080120160200SE +/- 0.22, N = 3SE +/- 0.29, N = 3161.92171.211. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

High Performance Conjugate Gradient

X Y Z: 160 160 160 - RT: 60

OpenBenchmarking.orgGFLOP/s, More Is BetterHigh Performance Conjugate Gradient 3.1X Y Z: 160 160 160 - RT: 60HEX ModeSNC3 - Default4080120160200SE +/- 0.29, N = 3SE +/- 0.02, N = 3159.84170.001. (CXX) g++ options: -O3 -ffast-math -ftree-vectorize -lmpi_cxx -lmpi

LAMMPS Molecular Dynamics Simulator

Model: 20k Atoms

OpenBenchmarking.orgns/day, More Is BetterLAMMPS Molecular Dynamics Simulator 23Jun2022Model: 20k AtomsHEX ModeSNC3 - Default20406080100SE +/- 0.33, N = 3SE +/- 0.21, N = 392.7594.051. (CXX) g++ options: -O3 -lm -ldl

LAMMPS Molecular Dynamics Simulator

Model: Rhodopsin Protein

OpenBenchmarking.orgns/day, More Is BetterLAMMPS Molecular Dynamics Simulator 23Jun2022Model: Rhodopsin ProteinHEX ModeSNC3 - Default1632486480SE +/- 0.88, N = 3SE +/- 1.30, N = 1270.4771.401. (CXX) g++ options: -O3 -lm -ldl

libxsmm

M N K: 128

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 128HEX ModeSNC3 - Default2K4K6K8K10KSE +/- 151.59, N = 9SE +/- 103.51, N = 37862.28182.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

libxsmm

M N K: 256

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 256HEX ModeSNC3 - Default9001800270036004500SE +/- 36.38, N = 3SE +/- 124.41, N = 152706.04213.91. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

libxsmm

M N K: 32

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 32HEX ModeSNC3 - Default8001600240032004000SE +/- 68.51, N = 12SE +/- 58.49, N = 123214.13507.41. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

libxsmm

M N K: 64

OpenBenchmarking.orgGFLOPS/s, More Is Betterlibxsmm 2-1.17-3645M N K: 64HEX ModeSNC3 - Default12002400360048006000SE +/- 68.57, N = 15SE +/- 63.47, N = 155554.05826.51. (CXX) g++ options: -dynamic -Bstatic -static-libgcc -lgomp -lm -lrt -ldl -lquadmath -lstdc++ -pthread -fPIC -std=c++14 -O2 -fopenmp-simd -funroll-loops -ftree-vectorize -fdata-sections -ffunction-sections -fvisibility=hidden -msse4.2

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgGFInst/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2HEX ModeSNC3 - Default15003000450060007500SE +/- 66.01, N = 6SE +/- 151.76, N = 127093.936713.241. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

miniBUDE

Implementation: OpenMP - Input Deck: BM2

OpenBenchmarking.orgBillion Interactions/s, More Is BetterminiBUDE 20210901Implementation: OpenMP - Input Deck: BM2HEX ModeSNC3 - Default60120180240300SE +/- 2.64, N = 6SE +/- 6.07, N = 12283.76268.531. (CC) gcc options: -std=c99 -Ofast -ffast-math -fopenmp -march=native -lm

NAMD

Input: ATPase with 327,506 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0Input: ATPase with 327,506 AtomsHEX ModeSNC3 - Default1.01242.02483.03724.04965.062SE +/- 0.05577, N = 15SE +/- 0.01928, N = 34.499703.78755

NAMD

Input: STMV with 1,066,628 Atoms

OpenBenchmarking.orgns/day, More Is BetterNAMD 3.0Input: STMV with 1,066,628 AtomsHEX ModeSNC3 - Default0.57511.15021.72532.30042.8755SE +/- 0.06201, N = 15SE +/- 0.01818, N = 132.555961.97540

NWChem

Input: C240 Buckyball

OpenBenchmarking.orgSeconds, Fewer Is BetterNWChem 7.0.2Input: C240 BuckyballHEX ModeSNC3 - Default4008001200160020001779.71660.61. (F9X) gfortran options: -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -ldimqm -lga -larmci -lpeigs -l64to32 -lopenblas -lpthread -lrt -llapack -lnwcblas -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz -lcomex -m64 -ffast-math -std=legacy -fdefault-integer-8 -finline-functions -O2

OpenRadioss

Model: Bumper Beam

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Bumper BeamHEX ModeSNC3 - Default20406080100SE +/- 0.36, N = 3SE +/- 0.43, N = 3110.68110.55

OpenRadioss

Model: Chrysler Neon 1M

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Chrysler Neon 1MHEX ModeSNC3 - Default1530456075SE +/- 0.38, N = 3SE +/- 0.37, N = 366.4661.48

OpenRadioss

Model: Cell Phone Drop Test

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Cell Phone Drop TestHEX ModeSNC3 - Default918273645SE +/- 0.29, N = 3SE +/- 0.06, N = 340.7339.18

OpenRadioss

Model: Bird Strike on Windshield

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Bird Strike on WindshieldHEX ModeSNC3 - Default4080120160200SE +/- 0.18, N = 3SE +/- 0.38, N = 3172.45154.56

OpenRadioss

Model: Rubber O-Ring Seal Installation

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: Rubber O-Ring Seal InstallationHEX ModeSNC3 - Default50100150200250SE +/- 6.59, N = 9SE +/- 3.87, N = 12231.10219.85

OpenRadioss

Model: INIVOL and Fluid Structure Interaction Drop Container

OpenBenchmarking.orgSeconds, Fewer Is BetterOpenRadioss 2023.09.15Model: INIVOL and Fluid Structure Interaction Drop ContainerHEX ModeSNC3 - Default20406080100SE +/- 0.28, N = 3SE +/- 0.25, N = 393.1390.63

PETSc

Test: Streams

OpenBenchmarking.orgMB/s, More Is BetterPETSc 3.19Test: StreamsHEX ModeSNC3 - Default130K260K390K520K650KSE +/- 1960.36, N = 3SE +/- 5879.52, N = 4595124.23449771.291. (CC) gcc options: -fPIC -O3 -O2 -lpthread -lm

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Only

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read OnlyHEX ModeSNC3 - Default140K280K420K560K700KSE +/- 42784.99, N = 12SE +/- 6739.41, N = 36385805426521. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read Only - Average LatencyHEX ModeSNC3 - Default0.41470.82941.24411.65882.0735SE +/- 0.108, N = 12SE +/- 0.023, N = 31.6451.8431. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Write

OpenBenchmarking.orgTPS, More Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read WriteHEX ModeSNC3 - Default3K6K9K12K15KSE +/- 2.00, N = 3SE +/- 37.41, N = 313656135061. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PostgreSQL

Scaling Factor: 100 - Clients: 1000 - Mode: Read Write - Average Latency

OpenBenchmarking.orgms, Fewer Is BetterPostgreSQL 16Scaling Factor: 100 - Clients: 1000 - Mode: Read Write - Average LatencyHEX ModeSNC3 - Default1632486480SE +/- 0.01, N = 3SE +/- 0.20, N = 373.2374.041. (CC) gcc options: -fno-strict-aliasing -fwrapv -O2 -lpgcommon -lpgport -lpq -lm

PyHPC Benchmarks

Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of State

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Equation of StateHEX ModeSNC3 - Default0.38540.77081.15621.54161.927SE +/- 0.012, N = 3SE +/- 0.008, N = 31.7131.546

PyHPC Benchmarks

Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral Mixing

OpenBenchmarking.orgSeconds, Fewer Is BetterPyHPC Benchmarks 3.0Device: CPU - Backend: Numpy - Project Size: 4194304 - Benchmark: Isoneutral MixingHEX ModeSNC3 - Default0.43450.8691.30351.7382.1725SE +/- 0.011, N = 3SE +/- 0.020, N = 31.9311.907

RELION

Test: Basic - Device: CPU

OpenBenchmarking.orgSeconds, Fewer Is BetterRELION 4.0.1Test: Basic - Device: CPUHEX ModeSNC3 - Default306090120150SE +/- 0.83, N = 3SE +/- 0.54, N = 3114.40104.051. (CXX) g++ options: -fopenmp -std=c++11 -O3 -rdynamic -ldl -ltiff -lfftw3f -lfftw3 -lpng -ljpeg -lmpi_cxx -lmpi

SPECFEM3D

Model: Mount St. Helens

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Mount St. HelensHEX ModeSNC3 - Default0.90431.80862.71293.61724.5215SE +/- 0.026444752, N = 3SE +/- 0.037097693, N = 74.0189747733.9957103661. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

SPECFEM3D

Model: Layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Layered HalfspaceHEX ModeSNC3 - Default246810SE +/- 0.036865236, N = 3SE +/- 0.060537089, N = 37.4226885187.2075692591. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

SPECFEM3D

Model: Tomographic Model

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Tomographic ModelHEX ModeSNC3 - Default1.00092.00183.00274.00365.0045SE +/- 0.014802330, N = 3SE +/- 0.014525946, N = 34.4486445524.3612348561. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

SPECFEM3D

Model: Homogeneous Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Homogeneous HalfspaceHEX ModeSNC3 - Default1.24062.48123.72184.96246.203SE +/- 0.007527831, N = 3SE +/- 0.012290725, N = 35.5139720445.4331742391. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

SPECFEM3D

Model: Water-layered Halfspace

OpenBenchmarking.orgSeconds, Fewer Is BetterSPECFEM3D 4.1.1Model: Water-layered HalfspaceHEX ModeSNC3 - Default3691215SE +/- 0.016038732, N = 3SE +/- 0.054952663, N = 39.3212822919.2438509811. (F9X) gfortran options: -O2 -fopenmp -std=f2008 -fimplicit-none -fmax-errors=10 -pedantic -pedantic-errors -O3 -finline-functions -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Stockfish

Chess Benchmark

OpenBenchmarking.orgNodes Per Second, More Is BetterStockfish 17Chess BenchmarkHEX ModeSNC3 - Default130M260M390M520M650MSE +/- 10082654.90, N = 9SE +/- 16310733.41, N = 65661414155874743171. (CXX) g++ options: -lgcov -m64 -lpthread -fno-exceptions -std=c++17 -fno-peel-loops -fno-tracer -pedantic -O3 -funroll-loops -msse -msse3 -mpopcnt -mavx2 -mbmi -mavx512f -mavx512bw -mavx512vnni -mavx512dq -mavx512vl -msse4.1 -mssse3 -msse2 -mbmi2 -flto -flto-partition=one -flto=jobserver

SVT-AV1

Encoder Mode: Preset 3 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 3 - Input: Bosphorus 4KHEX ModeSNC3 - Default3691215SE +/- 0.031, N = 3SE +/- 0.011, N = 39.1069.1551. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 5 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 5 - Input: Bosphorus 4KHEX ModeSNC3 - Default714212835SE +/- 0.19, N = 3SE +/- 0.35, N = 331.1930.861. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 8 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 8 - Input: Bosphorus 4KHEX ModeSNC3 - Default1530456075SE +/- 0.74, N = 3SE +/- 0.65, N = 366.9963.971. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 13 - Input: Bosphorus 4K

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 13 - Input: Bosphorus 4KHEX ModeSNC3 - Default4080120160200SE +/- 4.03, N = 12SE +/- 0.39, N = 3195.99193.491. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 5 - Input: Beauty 4K 10-bit

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 5 - Input: Beauty 4K 10-bitHEX ModeSNC3 - Default1.29892.59783.89675.19566.4945SE +/- 0.018, N = 3SE +/- 0.006, N = 35.6845.7731. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 8 - Input: Beauty 4K 10-bit

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 8 - Input: Beauty 4K 10-bitHEX ModeSNC3 - Default246810SE +/- 0.009, N = 3SE +/- 0.055, N = 38.1917.9911. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

SVT-AV1

Encoder Mode: Preset 13 - Input: Beauty 4K 10-bit

OpenBenchmarking.orgFrames Per Second, More Is BetterSVT-AV1 2.2Encoder Mode: Preset 13 - Input: Beauty 4K 10-bitHEX ModeSNC3 - Default3691215SE +/- 0.00, N = 3SE +/- 0.02, N = 313.4112.981. (CXX) g++ options: -march=native -mno-avx -mavx2 -mavx512f -mavx512bw -mavx512dq

TensorFlow

Device: CPU - Batch Size: 512 - Model: ResNet-50

OpenBenchmarking.orgimages/sec, More Is BetterTensorFlow 2.16.1Device: CPU - Batch Size: 512 - Model: ResNet-50HEX ModeSNC3 - Default50100150200250SE +/- 2.28, N = 4SE +/- 1.54, N = 3214.73173.99

Timed Linux Kernel Compilation

Build: defconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.8Build: defconfigHEX ModeSNC3 - Default612182430SE +/- 0.23, N = 8SE +/- 0.20, N = 826.9823.44

Timed Linux Kernel Compilation

Build: allmodconfig

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed Linux Kernel Compilation 6.8Build: allmodconfigHEX ModeSNC3 - Default4080120160200SE +/- 0.27, N = 3SE +/- 1.64, N = 3193.11131.15

Timed LLVM Compilation

Build System: Ninja

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 16.0Build System: NinjaHEX ModeSNC3 - Default20406080100SE +/- 0.68, N = 15SE +/- 0.86, N = 594.0676.73

Timed LLVM Compilation

Build System: Unix Makefiles

OpenBenchmarking.orgSeconds, Fewer Is BetterTimed LLVM Compilation 16.0Build System: Unix MakefilesHEX ModeSNC3 - Default50100150200250SE +/- 1.09, N = 3SE +/- 0.78, N = 3214.44197.13

Xcompact3d Incompact3d

Input: X3D-benchmarking input.i3d

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: X3D-benchmarking input.i3dHEX ModeSNC3 - Default1632486480SE +/- 0.09, N = 3SE +/- 0.42, N = 372.1168.891. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Xcompact3d Incompact3d

Input: input.i3d 129 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 129 Cells Per DirectionHEX ModeSNC3 - Default0.20320.40640.60960.81281.016SE +/- 0.011942737, N = 3SE +/- 0.005913486, N = 150.9031983220.8319549961. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz

Xcompact3d Incompact3d

Input: input.i3d 193 Cells Per Direction

OpenBenchmarking.orgSeconds, Fewer Is BetterXcompact3d Incompact3d 2021-03-11Input: input.i3d 193 Cells Per DirectionHEX ModeSNC3 - Default0.63491.26981.90472.53963.1745SE +/- 0.02990610, N = 15SE +/- 0.01327750, N = 32.821902662.564382311. (F9X) gfortran options: -cpp -O2 -funroll-loops -floop-optimize -fcray-pointer -fbacktrace -lmpi_usempif08 -lmpi_mpifh -lmpi -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm -lz


Phoronix Test Suite v10.8.5