gnr-new 2 x Intel Xeon 6980P testing with a Intel AvenueCity v0.01 (BHSDCRB1.IPC.0035.D44.2408292336 BIOS) and ASPEED on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2410219-NE-GNRNEW81583&grr .
gnr-new Processor Motherboard Chipset Memory Disk Graphics Network OS Kernel Compiler File-System Screen Resolution a b c d e f g h Intel Xeon 6980P @ 3.90GHz (128 Cores / 256 Threads) Intel AvenueCity v0.01 (BHSDCRB1.IPC.0035.D44.2408292336 BIOS) Intel Ice Lake IEH 768GB 2 x 1920GB KIOXIA KCD8XPUG1T92 + 960GB SAMSUNG MZ1L2960HCJR-00A07 ASPEED Intel I210 + 2 x Intel 10-Gigabit X540-AT2 Ubuntu 24.04 6.10.0-phx (x86_64) GCC 13.2.0 ext4 1920x1200 2 x Intel Xeon 6980P @ 3.90GHz (256 Cores / 512 Threads) 1520GB OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-OiuXZC/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-OiuXZC/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: intel_pstate performance (EPP: performance) - CPU Microcode: 0x10002f0 Python Details - Python 3.12.2 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; RSB filling; PBRSB-eIBRS: Not affected; BHI: BHI_DIS_S + srbds: Not affected + tsx_async_abort: Not affected
gnr-new xnnpack: QS8MobileNetV2 xnnpack: FP16MobileNetV3Small xnnpack: FP16MobileNetV3Large xnnpack: FP16MobileNetV2 xnnpack: FP16MobileNetV1 xnnpack: FP32MobileNetV3Small xnnpack: FP32MobileNetV3Large xnnpack: FP32MobileNetV2 xnnpack: FP32MobileNetV1 cp2k: H20-256 cp2k: Fayalite-FIST onednn: Recurrent Neural Network Training - CPU onednn: Recurrent Neural Network Inference - CPU litert: NASNet Mobile litert: Inception ResNet V2 litert: Inception V4 litert: DeepLab V3 litert: Mobilenet Float litert: Mobilenet Quant litert: SqueezeNet litert: Quantized COCO SSD MobileNet v1 cp2k: H20-64 onednn: Deconvolution Batch shapes_1d - CPU onednn: IP Shapes 1D - CPU onednn: IP Shapes 3D - CPU onednn: Convolution Batch Shapes Auto - CPU onednn: Deconvolution Batch shapes_3d - CPU a b c d e f g h 16018 16450 20564 13055 6572 16497 21106 14100 7326 107.609 75.855 922.329 541.57 187424 97036.8 73575.5 45569.5 7701.36 23350.5 14175.2 17895.1 12.181 77.8182 0.533866 0.968782 0.307993 1.12242 16209 147931 24350 13002 6116 17230 22355 14261 6677 109.302 76.241 940.142 554.009 227700 108579 81248.7 34737.6 8300.1 14592.4 13568.9 15577.7 12.527 79.1489 0.588435 0.957475 0.291766 1.10921 16602 16425 20236 13643 6061 16241 20904 14549 6934 109.894 76.13 946.678 554.403 291285 167708 78595.3 36867.6 8544.3 20879.4 13116.8 15569.9 12.53 80.1511 0.578234 0.945705 0.29714 1.12372 480430 157277 290035 54548 34728 185501 83616 62736 26673 78.617 88.095 1300.17 907.42 1182200 918207 365116 120180 66587.9 55675.6 53573.6 92404.1 14.026 176.373 1.36219 1.45456 0.5198 1.08184 79.27 81.103 79.046 81.037 OpenBenchmarking.org
XNNPACK Model: QS8MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 a b c d 100K 200K 300K 400K 500K 16018 16209 16602 480430 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small a b c d 30K 60K 90K 120K 150K 16450 147931 16425 157277 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large a b c d 60K 120K 180K 240K 300K 20564 24350 20236 290035 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 a b c d 12K 24K 36K 48K 60K 13055 13002 13643 54548 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 a b c d 7K 14K 21K 28K 35K 6572 6116 6061 34728 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small a b c d 40K 80K 120K 160K 200K 16497 17230 16241 185501 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large a b c d 20K 40K 60K 80K 100K 21106 22355 20904 83616 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 a b c d 13K 26K 39K 52K 65K 14100 14261 14549 62736 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 a b c d 6K 12K 18K 24K 30K 7326 6677 6934 26673 1. (CXX) g++ options: -O3 -lrt -lm
CP2K Molecular Dynamics Input: H20-256 OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: H20-256 a b c d e f g h 20 40 60 80 100 107.61 109.30 109.89 78.62 79.27 81.10 79.05 81.04 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
CP2K Molecular Dynamics Input: Fayalite-FIST OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: Fayalite-FIST a b c d 20 40 60 80 100 75.86 76.24 76.13 88.10 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU a b c d 300 600 900 1200 1500 922.33 940.14 946.68 1300.17 MIN: 878.99 MIN: 910.45 MIN: 920.2 MIN: 1285.87 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU a b c d 200 400 600 800 1000 541.57 554.01 554.40 907.42 MIN: 538.82 MIN: 549.09 MIN: 549.73 MIN: 894.75 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
LiteRT Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: NASNet Mobile a b c d 300K 600K 900K 1200K 1500K 187424 227700 291285 1182200
LiteRT Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception ResNet V2 a b c d 200K 400K 600K 800K 1000K 97036.8 108579.0 167708.0 918207.0
LiteRT Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception V4 a b c d 80K 160K 240K 320K 400K 73575.5 81248.7 78595.3 365116.0
LiteRT Model: DeepLab V3 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: DeepLab V3 a b c d 30K 60K 90K 120K 150K 45569.5 34737.6 36867.6 120180.0
LiteRT Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Float a b c d 14K 28K 42K 56K 70K 7701.36 8300.10 8544.30 66587.90
LiteRT Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Quant a b c d 12K 24K 36K 48K 60K 23350.5 14592.4 20879.4 55675.6
LiteRT Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: SqueezeNet a b c d 11K 22K 33K 44K 55K 14175.2 13568.9 13116.8 53573.6
LiteRT Model: Quantized COCO SSD MobileNet v1 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 a b c d 20K 40K 60K 80K 100K 17895.1 15577.7 15569.9 92404.1
CP2K Molecular Dynamics Input: H20-64 OpenBenchmarking.org Seconds, Fewer Is Better CP2K Molecular Dynamics 2024.3 Input: H20-64 a b c d 4 8 12 16 20 12.18 12.53 12.53 14.03 1. (F9X) gfortran options: -fopenmp -march=native -mtune=native -O3 -funroll-loops -fbacktrace -ffree-form -fimplicit-none -std=f2008 -lcp2kstart -lcp2kmc -lcp2kswarm -lcp2kmotion -lcp2kthermostat -lcp2kemd -lcp2ktmc -lcp2kmain -lcp2kdbt -lcp2ktas -lcp2kgrid -lcp2kgriddgemm -lcp2kgridcpu -lcp2kgridref -lcp2kgridcommon -ldbcsrarnoldi -ldbcsrx -lcp2kdbx -lcp2kdbm -lcp2kshg_int -lcp2keri_mme -lcp2kminimax -lcp2khfxbase -lcp2ksubsys -lcp2kxc -lcp2kao -lcp2kpw_env -lcp2kinput -lcp2kpw -lcp2kgpu -lcp2kfft -lcp2kfpga -lcp2kfm -lcp2kcommon -lcp2koffload -lcp2kmpiwrap -lcp2kbase -ldbcsr -lsirius -lspla -lspfft -lsymspg -lvdwxc -l:libhdf5_fortran.a -l:libhdf5.a -lz -lgsl -lelpa_openmp -lcosma -lcosta -lscalapack -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lfftw3_mpi -lfftw3 -lfftw3_omp -lmpi_cxx -lmpi -l:libopenblas.a -lvori -lstdc++ -lmpi_usempif08 -lmpi_mpifh -lopen-rte -lopen-pal -lhwloc -levent_core -levent_pthreads -lm
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU a b c d 40 80 120 160 200 77.82 79.15 80.15 176.37 MIN: 11.78 MIN: 12.79 MIN: 12.18 MIN: 37.23 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU a b c d 0.3065 0.613 0.9195 1.226 1.5325 0.533866 0.588435 0.578234 1.362190 MIN: 0.48 MIN: 0.52 MIN: 0.51 MIN: 1.23 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU a b c d 0.3273 0.6546 0.9819 1.3092 1.6365 0.968782 0.957475 0.945705 1.454560 MIN: 0.91 MIN: 0.91 MIN: 0.9 MIN: 1.32 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU a b c d 0.117 0.234 0.351 0.468 0.585 0.307993 0.291766 0.297140 0.519800 MIN: 0.28 MIN: 0.27 MIN: 0.28 MIN: 0.42 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU a b c d 0.2528 0.5056 0.7584 1.0112 1.264 1.12242 1.10921 1.12372 1.08184 MIN: 1.1 MIN: 1.08 MIN: 1.09 MIN: 1 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl -lpthread
Phoronix Test Suite v10.8.5