rt gh200

ARMv8 Neoverse-V2 testing with a Pegatron JIMBO P4352 (00022432 BIOS) and NVIDIA GH200 144G HBM3e 143GB on Ubuntu 24.04 via the Phoronix Test Suite.

HTML result view exported from: https://openbenchmarking.org/result/2410163-NE-RTGH2003260&sro&grw.

rt gh200ProcessorMotherboardMemoryDiskGraphicsNetworkOSKernelDisplay DriverOpenCLCompilerFile-SystemScreen ResolutionabcwdARMv8 Neoverse-V2 @ 3.47GHz (72 Cores)Pegatron JIMBO P4352 (00022432 BIOS)1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC11000GB CT1000T700SSD3NVIDIA GH200 144G HBM3e 143GB2 x Intel X550Ubuntu 24.046.8.0-45-generic-64k (aarch64)NVIDIAOpenCL 3.0 CUDA 12.6.65GCC 13.2.0ext41920x1200OpenBenchmarking.orgKernel Details- Transparent Huge Pages: madviseCompiler Details- --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-backtrace --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-offload-defaulted --enable-offload-targets=nvptx-none=/build/gcc-13-dIwDw0/gcc-13-13.2.0/debian/tmp-nvptx/usr --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto --without-cuda-driver -v Processor Details- Scaling Governor: cppc_cpufreq ondemand (Boost: Disabled)Security Details- gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + reg_file_data_sampling: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected

rt gh200xnnpack: FP32MobileNetV1litert: Inception ResNet V2litert: Quantized COCO SSD MobileNet v1litert: Mobilenet Floatlitert: Inception V4litert: NASNet Mobilelitert: DeepLab V3litert: SqueezeNetxnnpack: FP32MobileNetV2xnnpack: FP32MobileNetV3Largexnnpack: FP32MobileNetV3Smallxnnpack: FP16MobileNetV1xnnpack: FP16MobileNetV2xnnpack: FP16MobileNetV3Largexnnpack: FP16MobileNetV3Smalllitert: Mobilenet Quantxnnpack: QS8MobileNetV2onednn: IP Shapes 1D - CPUonednn: IP Shapes 3D - CPUonednn: Convolution Batch Shapes Auto - CPUonednn: Deconvolution Batch shapes_1d - CPUonednn: Deconvolution Batch shapes_3d - CPUonednn: Recurrent Neural Network Training - CPUonednn: Recurrent Neural Network Inference - CPUabcwd6399751.921509.54582.5318385.1012144.31554.91974.4909451513104247084913261005885.31294937.207221.754933.625564.042664.904333136.412605.26419364.561493.54586.6578417.1111628.01560.02976.666937149510294648361316993878.91894336.447721.431233.266364.794065.397333108.012820.26339731.311506.16583.9968472.3211446.31587.52971.853934149310454708531309985892.20993336.869521.815033.492863.010864.901632626.612756.36399626.921508.69585.4758378.4012010.21584.85980.6259461503104947085313251004896.02994536.837521.937633.638363.489564.378233137.912676.2OpenBenchmarking.org

XNNPACK

Model: FP32MobileNetV1

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV1abcwd140280420560700SE +/- 6.08, N = 3SE +/- 3.61, N = 3SE +/- 5.00, N = 3SE +/- 2.03, N = 36396416336391. (CXX) g++ options: -O3 -lrt -lm

LiteRT

Model: Inception ResNet V2

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception ResNet V2abcwd2K4K6K8K10KSE +/- 135.40, N = 3SE +/- 70.78, N = 3SE +/- 40.36, N = 3SE +/- 112.95, N = 39751.929364.569731.319626.92

LiteRT

Model: Quantized COCO SSD MobileNet v1

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Quantized COCO SSD MobileNet v1abcwd30060090012001500SE +/- 5.30, N = 3SE +/- 8.13, N = 3SE +/- 6.56, N = 3SE +/- 7.46, N = 31509.541493.541506.161508.69

LiteRT

Model: Mobilenet Float

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet Floatabcwd130260390520650SE +/- 4.90, N = 3SE +/- 5.14, N = 3SE +/- 5.75, N = 3SE +/- 5.73, N = 3582.53586.66584.00585.48

LiteRT

Model: Inception V4

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Inception V4abcwd2K4K6K8K10KSE +/- 22.61, N = 3SE +/- 13.66, N = 3SE +/- 45.37, N = 3SE +/- 4.64, N = 38385.108417.118472.328378.40

LiteRT

Model: NASNet Mobile

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: NASNet Mobileabcwd3K6K9K12K15KSE +/- 103.46, N = 15SE +/- 107.12, N = 15SE +/- 99.01, N = 15SE +/- 111.66, N = 1512144.311628.011446.312010.2

LiteRT

Model: DeepLab V3

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: DeepLab V3abcwd30060090012001500SE +/- 19.05, N = 4SE +/- 14.51, N = 15SE +/- 25.75, N = 15SE +/- 24.49, N = 151554.911560.021587.521584.85

LiteRT

Model: SqueezeNet

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: SqueezeNetabcwd2004006008001000SE +/- 4.48, N = 3SE +/- 1.93, N = 3SE +/- 6.75, N = 3SE +/- 7.24, N = 3974.49976.67971.85980.63

XNNPACK

Model: FP32MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV2abcwd2004006008001000SE +/- 12.91, N = 3SE +/- 4.37, N = 3SE +/- 1.20, N = 3SE +/- 12.44, N = 39459379349461. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP32MobileNetV3Large

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3Largeabcwd30060090012001500SE +/- 21.95, N = 3SE +/- 2.52, N = 3SE +/- 2.67, N = 3SE +/- 14.05, N = 315131495149315031. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP32MobileNetV3Small

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP32MobileNetV3Smallabcwd2004006008001000SE +/- 14.43, N = 3SE +/- 7.09, N = 3SE +/- 12.99, N = 3SE +/- 9.24, N = 310421029104510491. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV1

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV1abcwd100200300400500SE +/- 4.33, N = 3SE +/- 2.52, N = 3SE +/- 6.36, N = 3SE +/- 3.38, N = 34704644704701. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV2abcwd2004006008001000SE +/- 10.58, N = 3SE +/- 6.11, N = 3SE +/- 11.67, N = 3SE +/- 7.26, N = 38498368538531. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV3Large

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3Largeabcwd30060090012001500SE +/- 23.90, N = 3SE +/- 7.75, N = 3SE +/- 5.21, N = 3SE +/- 16.92, N = 313261316130913251. (CXX) g++ options: -O3 -lrt -lm

XNNPACK

Model: FP16MobileNetV3Small

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: FP16MobileNetV3Smallabcwd2004006008001000SE +/- 14.26, N = 3SE +/- 5.90, N = 3SE +/- 5.03, N = 3SE +/- 13.35, N = 3100599398510041. (CXX) g++ options: -O3 -lrt -lm

LiteRT

Model: Mobilenet Quant

OpenBenchmarking.orgMicroseconds, Fewer Is BetterLiteRT 2024-10-15Model: Mobilenet Quantabcwd2004006008001000SE +/- 5.58, N = 3SE +/- 12.25, N = 3SE +/- 7.20, N = 9SE +/- 1.36, N = 3885.31878.92892.21896.03

XNNPACK

Model: QS8MobileNetV2

OpenBenchmarking.orgus, Fewer Is BetterXNNPACK b7b048Model: QS8MobileNetV2abcwd2004006008001000SE +/- 14.57, N = 3SE +/- 9.50, N = 2SE +/- 3.33, N = 3SE +/- 12.57, N = 39499439339451. (CXX) g++ options: -O3 -lrt -lm

oneDNN

Harness: IP Shapes 1D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 1D - Engine: CPUabcwd918273645SE +/- 0.46, N = 15SE +/- 0.19, N = 3SE +/- 0.25, N = 13SE +/- 0.34, N = 737.2136.4536.8736.84MIN: 8.56MIN: 6.72MIN: 7.15MIN: 9.151. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: IP Shapes 3D - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: IP Shapes 3D - Engine: CPUabcwd510152025SE +/- 0.20, N = 3SE +/- 0.17, N = 3SE +/- 0.27, N = 15SE +/- 0.24, N = 1521.7521.4321.8221.94MIN: 6.42MIN: 2.88MIN: 2.84MIN: 2.411. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Convolution Batch Shapes Auto - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Convolution Batch Shapes Auto - Engine: CPUabcwd816243240SE +/- 0.21, N = 3SE +/- 0.34, N = 3SE +/- 0.42, N = 3SE +/- 0.26, N = 333.6333.2733.4933.64MIN: 13.44MIN: 10.06MIN: 11.93MIN: 13.011. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_1d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_1d - Engine: CPUabcwd1428425670SE +/- 0.29, N = 3SE +/- 0.68, N = 5SE +/- 0.18, N = 3SE +/- 0.64, N = 364.0464.7963.0163.49MIN: 26.96MIN: 27.29MIN: 26.95MIN: 27.371. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Deconvolution Batch shapes_3d - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Deconvolution Batch shapes_3d - Engine: CPUabcwd1530456075SE +/- 0.18, N = 3SE +/- 0.38, N = 3SE +/- 0.18, N = 3SE +/- 0.63, N = 364.9065.4064.9064.38MIN: 52.66MIN: 56.8MIN: 51.95MIN: 10.331. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Training - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Training - Engine: CPUabcwd7K14K21K28K35KSE +/- 285.08, N = 3SE +/- 408.18, N = 3SE +/- 383.55, N = 12SE +/- 203.73, N = 333136.433108.032626.633137.9MIN: 30879.8MIN: 30321.8MIN: 25969.2MIN: 305351. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl

oneDNN

Harness: Recurrent Neural Network Inference - Engine: CPU

OpenBenchmarking.orgms, Fewer Is BetteroneDNN 3.6Harness: Recurrent Neural Network Inference - Engine: CPUabcwd3K6K9K12K15KSE +/- 124.52, N = 15SE +/- 139.24, N = 3SE +/- 151.52, N = 12SE +/- 175.20, N = 1512605.212820.212756.312676.2MIN: 5050.89MIN: 6727.36MIN: 5719.24MIN: 5982.71. (CXX) g++ options: -O3 -march=native -fopenmp -mcpu=generic -fPIC -pie -ldl


Phoronix Test Suite v10.8.5