rt gh200

ARMv8 Neoverse-V2 testing with a Pegatron JIMBO P4352 (00022432 BIOS) and NVIDIA GH200 144G HBM3e 143GB on Ubuntu 24.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2410163-NE-RTGH2003260&sor.

rt gh200ProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverOpenCLCompilerFile-SystemScreen ResolutionabcwdARMv8 Neoverse-V2 @ 3.47GHz (72 Cores)Pegatron JIMBO P4352 (00022432 BIOS)1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC11000GB CT1000T700SSD3NVIDIA GH200 144G HBM3e 143GB2 x Intel X550Ubuntu 24.046.8.0-45-generic-64k (aarch64)NVIDIAOpenCL 3.0 CUDA 12.6.65GCC 13.2.0ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-dIwDw0/gcc-13-13.2.0/debian/tmp-nvptx/usr --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto --without-cuda-driver -v Processor Details- Scaling Governor: cppc_cpufreq ondemand (Boost: Disabled)Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

rt gh200onednn: IP Shapes 1D - CPUonednn: IP Shapes 3D - CPUonednn: Convolution Batch Shapes Auto - CPUonednn: Deconvolution Batch shapes_1d - CPUonednn: Deconvolution Batch shapes_3d - CPUonednn: Recurrent Neural Network Training - CPUonednn: Recurrent Neural Network Inference - CPUlitert: DeepLab V3litert: SqueezeNetlitert: Inception V4litert: NASNet Mobilelitert: Mobilenet Floatlitert: Mobilenet Quantlitert: Inception ResNet V2litert: Quantized COCO SSD MobileNet v1xnnpack: FP32MobileNetV1xnnpack: FP32MobileNetV2xnnpack: FP32MobileNetV3Largexnnpack: FP32MobileNetV3Smallxnnpack: FP16MobileNetV1xnnpack: FP16MobileNetV2xnnpack: FP16MobileNetV3Largexnnpack: FP16MobileNetV3Smallxnnpack: QS8MobileNetV2abcwd37.207221.754933.625564.042664.904333136.412605.21554.91974.4908385.1012144.3582.531885.3129751.921509.54639945151310424708491326100594936.447721.431233.266364.794065.397333108.012820.21560.02976.6668417.1111628.0586.657878.9189364.561493.5464193714951029464836131699394336.869521.815033.492863.010864.901632626.612756.31587.52971.8538472.3211446.3583.996892.2099731.311506.1663393414931045470853130998593336.837521.937633.638363.489564.378233137.912676.21584.85980.6258378.4012010.2585.475896.0299626.921508.696399461503104947085313251004945OpenBenchmarking.org

oneDNN

Harness: IP Shapes 1D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 1D - Engine: CPUbwdca918273645SE +/- 0.19, N = 3SE +/- 0.34, N = 7SE +/- 0.25, N = 13SE +/- 0.46, N = 1536.4536.8436.8737.21MIN: 6.72MIN: 9.15MIN: 7.15MIN: 8.561. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: IP Shapes 3D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 3D - Engine: CPUbacwd510152025SE +/- 0.17, N = 3SE +/- 0.20, N = 3SE +/- 0.27, N = 15SE +/- 0.24, N = 1521.4321.7521.8221.94MIN: 2.88MIN: 6.42MIN: 2.84MIN: 2.411. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Convolution Batch Shapes Auto - Engine: CPUbcawd816243240SE +/- 0.34, N = 3SE +/- 0.42, N = 3SE +/- 0.21, N = 3SE +/- 0.26, N = 333.2733.4933.6333.64MIN: 10.06MIN: 11.93MIN: 13.44MIN: 13.011. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_1d - Engine: CPUcwdab1428425670SE +/- 0.18, N = 3SE +/- 0.64, N = 3SE +/- 0.29, N = 3SE +/- 0.68, N = 563.0163.4964.0464.79MIN: 26.95MIN: 27.37MIN: 26.96MIN: 27.291. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_3d - Engine: CPUwdcab1530456075SE +/- 0.63, N = 3SE +/- 0.18, N = 3SE +/- 0.18, N = 3SE +/- 0.38, N = 364.3864.9064.9065.40MIN: 10.33MIN: 51.95MIN: 52.66MIN: 56.81. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Training - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Training - Engine: CPUcbawd7K14K21K28K35KSE +/- 383.55, N = 12SE +/- 408.18, N = 3SE +/- 285.08, N = 3SE +/- 203.73, N = 332626.633108.033136.433137.9MIN: 25969.2MIN: 30321.8MIN: 30879.8MIN: 305351. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Inference - Engine: CPUawdcb3K6K9K12K15KSE +/- 124.52, N = 15SE +/- 175.20, N = 15SE +/- 151.52, N = 12SE +/- 139.24, N = 312605.212676.212756.312820.2MIN: 5050.89MIN: 5982.7MIN: 5719.24MIN: 6727.361. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

LiteRT

Model: DeepLab V3

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: DeepLab V3abwdc30060090012001500SE +/- 19.05, N = 4SE +/- 14.51, N = 15SE +/- 24.49, N = 15SE +/- 25.75, N = 151554.911560.021584.851587.52

LiteRT

Model: SqueezeNet

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: SqueezeNetcabwd2004006008001000SE +/- 6.75, N = 3SE +/- 4.48, N = 3SE +/- 1.93, N = 3SE +/- 7.24, N = 3971.85974.49976.67980.63

LiteRT

Model: Inception V4

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception V4wdabc2K4K6K8K10KSE +/- 4.64, N = 3SE +/- 22.61, N = 3SE +/- 13.66, N = 3SE +/- 45.37, N = 38378.408385.108417.118472.32

LiteRT

Model: NASNet Mobile

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: NASNet Mobilecbwda3K6K9K12K15KSE +/- 99.01, N = 15SE +/- 107.12, N = 15SE +/- 111.66, N = 15SE +/- 103.46, N = 1511446.311628.012010.212144.3

LiteRT

Model: Mobilenet Float

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet Floatacwdb130260390520650SE +/- 4.90, N = 3SE +/- 5.75, N = 3SE +/- 5.73, N = 3SE +/- 5.14, N = 3582.53584.00585.48586.66

LiteRT

Model: Mobilenet Quant

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet Quantbacwd2004006008001000SE +/- 12.25, N = 3SE +/- 5.58, N = 3SE +/- 7.20, N = 9SE +/- 1.36, N = 3878.92885.31892.21896.03

LiteRT

Model: Inception ResNet V2

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception ResNet V2bwdca2K4K6K8K10KSE +/- 70.78, N = 3SE +/- 112.95, N = 3SE +/- 40.36, N = 3SE +/- 135.40, N = 39364.569626.929731.319751.92

LiteRT

Model: Quantized COCO SSD MobileNet v1

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Quantized COCO SSD MobileNet v1bcwda30060090012001500SE +/- 8.13, N = 3SE +/- 6.56, N = 3SE +/- 7.46, N = 3SE +/- 5.30, N = 31493.541506.161508.691509.54

XNNPACK

Model: FP32MobileNetV1

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV1cawdb140280420560700SE +/- 5.00, N = 3SE +/- 6.08, N = 3SE +/- 2.03, N = 3SE +/- 3.61, N = 36336396396411. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP32MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV2cbawd2004006008001000SE +/- 1.20, N = 3SE +/- 4.37, N = 3SE +/- 12.91, N = 3SE +/- 12.44, N = 39349379459461. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP32MobileNetV3Large

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3Largecbwda30060090012001500SE +/- 2.67, N = 3SE +/- 2.52, N = 3SE +/- 14.05, N = 3SE +/- 21.95, N = 314931495150315131. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP32MobileNetV3Small

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3Smallbacwd2004006008001000SE +/- 7.09, N = 3SE +/- 14.43, N = 3SE +/- 12.99, N = 3SE +/- 9.24, N = 310291042104510491. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV1

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV1bacwd100200300400500SE +/- 2.52, N = 3SE +/- 4.33, N = 3SE +/- 6.36, N = 3SE +/- 3.38, N = 34644704704701. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV2bacwd2004006008001000SE +/- 6.11, N = 3SE +/- 10.58, N = 3SE +/- 11.67, N = 3SE +/- 7.26, N = 38368498538531. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV3Large

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3Largecbwda30060090012001500SE +/- 5.21, N = 3SE +/- 7.75, N = 3SE +/- 16.92, N = 3SE +/- 23.90, N = 313091316132513261. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV3Small

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3Smallcbwda2004006008001000SE +/- 5.03, N = 3SE +/- 5.90, N = 3SE +/- 13.35, N = 3SE +/- 14.26, N = 3985993100410051. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: QS8MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: QS8MobileNetV2cbwda2004006008001000SE +/- 3.33, N = 3SE +/- 9.50, N = 2SE +/- 12.57, N = 3SE +/- 14.57, N = 39339439459491. (CXX) g++ options: -O3 -lrt -lm


Phoronix Test Suite v10.8.5