rt up AMD Ryzen 9 9950X 16-Core testing with a ASUS ROG STRIX X670E-E GAMING WIFI (2401 BIOS) and AMD Radeon PRO W7900 45GB on Ubuntu 24.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2410151-PTS-RTUP561730&sro&grr .
rt up Processor Motherboard Chipset Memory Disk Graphics Audio Monitor Network OS Kernel Desktop Display Server OpenGL OpenCL Compiler File-System Screen Resolution a b c d AMD Ryzen 9 9950X 16-Core @ 5.75GHz (16 Cores / 32 Threads) ASUS ROG STRIX X670E-E GAMING WIFI (2401 BIOS) AMD Device 14d8 2 x 32GB DDR5-6400MT/s Corsair CMK64GX5M2B6400C32 Western Digital WD_BLACK SN850X 2000GB + 257GB Flash Drive AMD Radeon PRO W7900 45GB (2200/3200MHz) AMD Navi 31 HDMI/DP DELL U2723QE Intel I225-V + Intel Wi-Fi 6E Ubuntu 24.04 6.10.1-061001-generic (x86_64) GNOME Shell 46.0 X Server 1.21.1.11 + Wayland 4.6 Mesa 24.2.0-devel (LLVM 18.1.7 DRM 3.57) OpenCL 2.1 AMD-APP (3625.0) GCC 13.2.0 ext4 3840x2160 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=x86_64-linux-gnu --disable-vtable-verify --disable-werror --enable-cet --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-multilib --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-uJ7kn6/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-plugin --enable-shared --enable-threads=posix --host=x86_64-linux-gnu --program-prefix=x86_64-linux-gnu- --target=x86_64-linux-gnu --with-abi=m64 --with-arch-32=i686 --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-multilib-list=m32,m64,mx32 --with-target-system-zlib=auto --with-tune=generic --without-cuda-driver -v Processor Details - Scaling Governor: amd-pstate-epp powersave (EPP: balance_performance) - CPU Microcode: 0xb404022 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of usercopy/swapgs barriers and __user pointer sanitization + spectre_v2: Mitigation of Enhanced / Automatic IBRS; IBPB: conditional; STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected + srbds: Not affected + tsx_async_abort: Not affected
rt up xnnpack: QS8MobileNetV2 xnnpack: FP16MobileNetV3Small xnnpack: FP16MobileNetV3Large xnnpack: FP16MobileNetV2 xnnpack: FP16MobileNetV1 xnnpack: FP32MobileNetV3Small xnnpack: FP32MobileNetV3Large xnnpack: FP32MobileNetV2 xnnpack: FP32MobileNetV1 onednn: Recurrent Neural Network Training - CPU onednn: Recurrent Neural Network Inference - CPU litert: SqueezeNet litert: Quantized COCO SSD MobileNet v1 litert: Inception V4 litert: Inception ResNet V2 litert: NASNet Mobile litert: DeepLab V3 litert: Mobilenet Float litert: Mobilenet Quant onednn: Deconvolution Batch shapes_1d - CPU onednn: IP Shapes 1D - CPU onednn: IP Shapes 3D - CPU onednn: Convolution Batch Shapes Auto - CPU onednn: Deconvolution Batch shapes_3d - CPU a b c d 747 851 1395 1070 1008 902 1601 1385 1024 752.845 401.080 1399.04 1237.78 15724.4 14335.0 10524.1 1993.85 1002.97 593.870 1.82100 0.636587 3.17092 5.24008 1.40825 759 858 1397 1066 1008 913 1622 1355 995 753.326 401.052 1377.42 1265.69 15880.3 14386.0 10303.4 1916.71 1003.998 599.288 1.81784 0.637647 3.18695 5.23454 1.41365 752 861 1398 1075 1000 907 1618 1355 1001 752.460 399.152 1390.86 1273.91 15836.1 14366.4 10342.5 1971.80 1008.93 585.758 1.81460 0.638661 3.17892 5.24225 1.40659 791 884 1440 1112 1038 950 1693 1409 1072 795.016 422.910 1408.46 1448.45 17160.6 15403.4 12903.4 2250.63 1060.57 738.067 1.86755 0.669839 3.35833 5.45077 1.44600 OpenBenchmarking.org
XNNPACK Model: QS8MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: QS8MobileNetV2 a b c d 200 400 600 800 1000 SE +/- 6.66, N = 3 SE +/- 2.19, N = 3 SE +/- 4.84, N = 3 SE +/- 4.00, N = 3 747 759 752 791 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Small a b c d 200 400 600 800 1000 SE +/- 1.76, N = 3 SE +/- 4.70, N = 3 SE +/- 1.67, N = 3 SE +/- 4.41, N = 3 851 858 861 884 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV3Large a b c d 300 600 900 1200 1500 SE +/- 6.84, N = 3 SE +/- 2.65, N = 3 SE +/- 5.21, N = 3 SE +/- 5.51, N = 3 1395 1397 1398 1440 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV2 a b c d 200 400 600 800 1000 SE +/- 5.49, N = 3 SE +/- 6.11, N = 3 SE +/- 0.88, N = 3 SE +/- 2.19, N = 3 1070 1066 1075 1112 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP16MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP16MobileNetV1 a b c d 200 400 600 800 1000 SE +/- 6.17, N = 3 SE +/- 5.04, N = 3 SE +/- 7.22, N = 3 SE +/- 6.08, N = 3 1008 1008 1000 1038 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Small OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Small a b c d 200 400 600 800 1000 SE +/- 3.61, N = 3 SE +/- 1.45, N = 3 SE +/- 3.28, N = 3 SE +/- 3.18, N = 3 902 913 907 950 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV3Large OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV3Large a b c d 400 800 1200 1600 2000 SE +/- 14.53, N = 3 SE +/- 2.65, N = 3 SE +/- 5.13, N = 3 SE +/- 6.69, N = 3 1601 1622 1618 1693 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV2 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV2 a b c d 300 600 900 1200 1500 SE +/- 8.45, N = 3 SE +/- 8.33, N = 3 SE +/- 6.39, N = 3 SE +/- 30.28, N = 3 1385 1355 1355 1409 1. (CXX) g++ options: -O3 -lrt -lm
XNNPACK Model: FP32MobileNetV1 OpenBenchmarking.org us, Fewer Is Better XNNPACK b7b048 Model: FP32MobileNetV1 a b c d 200 400 600 800 1000 SE +/- 2.33, N = 3 SE +/- 9.53, N = 3 SE +/- 5.51, N = 3 SE +/- 14.74, N = 3 1024 995 1001 1072 1. (CXX) g++ options: -O3 -lrt -lm
oneDNN Harness: Recurrent Neural Network Training - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Training - Engine: CPU a b c d 200 400 600 800 1000 SE +/- 1.48, N = 3 SE +/- 0.30, N = 3 SE +/- 0.72, N = 3 SE +/- 1.11, N = 3 752.85 753.33 752.46 795.02 MIN: 726.85 MIN: 729.83 MIN: 726.66 MIN: 763.71 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
oneDNN Harness: Recurrent Neural Network Inference - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Recurrent Neural Network Inference - Engine: CPU a b c d 90 180 270 360 450 SE +/- 0.27, N = 3 SE +/- 0.66, N = 3 SE +/- 0.53, N = 3 SE +/- 0.44, N = 3 401.08 401.05 399.15 422.91 MIN: 386.16 MIN: 385.35 MIN: 384.47 MIN: 402.96 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
LiteRT Model: SqueezeNet OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: SqueezeNet a b c d 300 600 900 1200 1500 SE +/- 2.23, N = 3 SE +/- 12.91, N = 6 SE +/- 5.30, N = 3 SE +/- 3.75, N = 3 1399.04 1377.42 1390.86 1408.46
LiteRT Model: Quantized COCO SSD MobileNet v1 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Quantized COCO SSD MobileNet v1 a b c d 300 600 900 1200 1500 SE +/- 1.69, N = 3 SE +/- 7.74, N = 3 SE +/- 12.41, N = 6 SE +/- 10.93, N = 3 1237.78 1265.69 1273.91 1448.45
LiteRT Model: Inception V4 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception V4 a b c d 4K 8K 12K 16K 20K SE +/- 91.27, N = 3 SE +/- 106.26, N = 3 SE +/- 31.83, N = 3 SE +/- 90.13, N = 3 15724.4 15880.3 15836.1 17160.6
LiteRT Model: Inception ResNet V2 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Inception ResNet V2 a b c d 3K 6K 9K 12K 15K SE +/- 163.02, N = 3 SE +/- 120.32, N = 3 SE +/- 120.76, N = 3 SE +/- 38.80, N = 3 14335.0 14386.0 14366.4 15403.4
LiteRT Model: NASNet Mobile OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: NASNet Mobile a b c d 3K 6K 9K 12K 15K SE +/- 42.72, N = 3 SE +/- 51.10, N = 3 SE +/- 86.24, N = 3 SE +/- 102.57, N = 3 10524.1 10303.4 10342.5 12903.4
LiteRT Model: DeepLab V3 OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: DeepLab V3 a b c d 500 1000 1500 2000 2500 SE +/- 10.75, N = 3 SE +/- 19.56, N = 3 SE +/- 10.88, N = 3 SE +/- 30.70, N = 3 1993.85 1916.71 1971.80 2250.63
LiteRT Model: Mobilenet Float OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Float a b c d 200 400 600 800 1000 SE +/- 1.06, N = 3 SE +/- 3.97, N = 3 SE +/- 3.98, N = 3 SE +/- 3.94, N = 3 1002.97 1004.00 1008.93 1060.57
LiteRT Model: Mobilenet Quant OpenBenchmarking.org Microseconds, Fewer Is Better LiteRT 2024-10-15 Model: Mobilenet Quant a b c d 160 320 480 640 800 SE +/- 2.40, N = 3 SE +/- 0.53, N = 3 SE +/- 6.33, N = 3 SE +/- 10.36, N = 3 593.87 599.29 585.76 738.07
oneDNN Harness: Deconvolution Batch shapes_1d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_1d - Engine: CPU a b c d 0.4202 0.8404 1.2606 1.6808 2.101 SE +/- 0.01078, N = 3 SE +/- 0.00686, N = 3 SE +/- 0.00929, N = 3 SE +/- 0.01009, N = 3 1.82100 1.81784 1.81460 1.86755 MIN: 1.4 MIN: 1.4 MIN: 1.38 MIN: 1.39 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
oneDNN Harness: IP Shapes 1D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 1D - Engine: CPU a b c d 0.1507 0.3014 0.4521 0.6028 0.7535 SE +/- 0.002186, N = 3 SE +/- 0.003981, N = 3 SE +/- 0.003543, N = 3 SE +/- 0.004097, N = 3 0.636587 0.637647 0.638661 0.669839 MIN: 0.6 MIN: 0.6 MIN: 0.59 MIN: 0.59 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
oneDNN Harness: IP Shapes 3D - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: IP Shapes 3D - Engine: CPU a b c d 0.7556 1.5112 2.2668 3.0224 3.778 SE +/- 0.01148, N = 3 SE +/- 0.01628, N = 3 SE +/- 0.01639, N = 3 SE +/- 0.01850, N = 3 3.17092 3.18695 3.17892 3.35833 MIN: 3.01 MIN: 3.01 MIN: 3.01 MIN: 3.02 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
oneDNN Harness: Convolution Batch Shapes Auto - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Convolution Batch Shapes Auto - Engine: CPU a b c d 1.2264 2.4528 3.6792 4.9056 6.132 SE +/- 0.01241, N = 3 SE +/- 0.01667, N = 3 SE +/- 0.01347, N = 3 SE +/- 0.00366, N = 3 5.24008 5.23454 5.24225 5.45077 MIN: 4.93 MIN: 4.93 MIN: 4.94 MIN: 4.96 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
oneDNN Harness: Deconvolution Batch shapes_3d - Engine: CPU OpenBenchmarking.org ms, Fewer Is Better oneDNN 3.6 Harness: Deconvolution Batch shapes_3d - Engine: CPU a b c d 0.3254 0.6508 0.9762 1.3016 1.627 SE +/- 0.00511, N = 3 SE +/- 0.00648, N = 3 SE +/- 0.00405, N = 3 SE +/- 0.00140, N = 3 1.40825 1.41365 1.40659 1.44600 MIN: 1.33 MIN: 1.33 MIN: 1.33 MIN: 1.33 1. (CXX) g++ options: -O3 -march=native -fopenmp -msse4.1 -fPIC -fcf-protection=full -pie -ldl
Phoronix Test Suite v10.8.5