gh200 ARMv8 Neoverse-V2 testing with a Pegatron JIMBO P4352 (00022432 BIOS) and NVIDIA GH200 144G HBM3e 143GB on Ubuntu 24.04 via the Phoronix Test Suite. a: Processor: ARMv8 Neoverse-V2 @ 3.47GHz (72 Cores), Motherboard: Pegatron JIMBO P4352 (00022432 BIOS), Memory: 1 x 480GB LPDDR5-6400MT/s NVIDIA 699-2G530-0236-RC1, Disk: 1000GB CT1000T700SSD3, Graphics: NVIDIA GH200 144G HBM3e 143GB, Network: 2 x Intel X550 OS: Ubuntu 24.04, Kernel: 6.8.0-45-generic-64k (aarch64), Display Driver: NVIDIA, OpenCL: OpenCL 3.0 CUDA 12.6.65, Compiler: GCC 13.2.0, File-System: ext4, Screen Resolution: 1920x1200 x265 Video Input: Bosphorus 4K Frames Per Second > Higher Is Better a . 8.81 |===================================================================== x265 Video Input: Bosphorus 1080p Frames Per Second > Higher Is Better a . 12.61 |==================================================================== simdjson 3.10 Throughput Test: Kostya GB/s > Higher Is Better a . 3.11 |===================================================================== simdjson 3.10 Throughput Test: TopTweet GB/s > Higher Is Better a . 4.14 |===================================================================== simdjson 3.10 Throughput Test: LargeRandom GB/s > Higher Is Better a . 1.15 |===================================================================== simdjson 3.10 Throughput Test: PartialTweets GB/s > Higher Is Better a . 4.06 |===================================================================== simdjson 3.10 Throughput Test: DistinctUserID GB/s > Higher Is Better a . 4.16 |===================================================================== ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 5.80711 |================================================================== ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 5.14685 |================================================================== ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 45.23 |==================================================================== ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 210.12 |=================================================================== ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 109.24 |=================================================================== ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 390.07 |=================================================================== ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 321.24 |=================================================================== ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 1262.14 |================================================================== ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 0.582902 |================================================================= ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 0.462943 |================================================================= ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 161.33 |=================================================================== ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 317.87 |=================================================================== ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 18.65 |==================================================================== ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 159.07 |=================================================================== ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better a . 0.360196 |================================================================= ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better a . 0.303222 |================================================================= ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better GraphicsMagick 1.3.43 Operation: Swirl Iterations Per Minute > Higher Is Better a . 657 |====================================================================== GraphicsMagick 1.3.43 Operation: Rotate Iterations Per Minute > Higher Is Better a . 331 |====================================================================== GraphicsMagick 1.3.43 Operation: Sharpen Iterations Per Minute > Higher Is Better a . 411 |====================================================================== GraphicsMagick 1.3.43 Operation: Enhanced Iterations Per Minute > Higher Is Better a . 359 |====================================================================== GraphicsMagick 1.3.43 Operation: Resizing Iterations Per Minute > Higher Is Better a . 442 |====================================================================== GraphicsMagick 1.3.43 Operation: Noise-Gaussian Iterations Per Minute > Higher Is Better a . 301 |====================================================================== GraphicsMagick 1.3.43 Operation: HWB Color Space Iterations Per Minute > Higher Is Better a . 656 |====================================================================== GraphicsMagick Operation: Swirl Iterations Per Minute > Higher Is Better a . 605 |====================================================================== GraphicsMagick Operation: Rotate Iterations Per Minute > Higher Is Better a . 209 |====================================================================== GraphicsMagick Operation: Sharpen Iterations Per Minute > Higher Is Better a . 171 |====================================================================== GraphicsMagick Operation: Enhanced Iterations Per Minute > Higher Is Better a . 351 |====================================================================== GraphicsMagick Operation: Resizing Iterations Per Minute > Higher Is Better a . 282 |====================================================================== GraphicsMagick Operation: Noise-Gaussian Iterations Per Minute > Higher Is Better a . 217 |====================================================================== GraphicsMagick Operation: HWB Color Space Iterations Per Minute > Higher Is Better a . 430 |====================================================================== BYTE Unix Benchmark 5.1.3-git Computational Test: Pipe LPS > Higher Is Better a . 202565282.2 |============================================================== BYTE Unix Benchmark 5.1.3-git Computational Test: Dhrystone 2 LPS > Higher Is Better a . 4998587529.8 |============================================================= BYTE Unix Benchmark 5.1.3-git Computational Test: System Call LPS > Higher Is Better a . 145868649.3 |============================================================== 7-Zip Compression 24.05 Test: Compression Rating MIPS > Higher Is Better a . 384775 |=================================================================== 7-Zip Compression 24.05 Test: Decompression Rating MIPS > Higher Is Better a . 420524 |=================================================================== 7-Zip Compression Test: Compression Rating MIPS > Higher Is Better a . 393523 |=================================================================== 7-Zip Compression Test: Decompression Rating MIPS > Higher Is Better a . 418819 |=================================================================== Etcpak 2.0 Benchmark: Multi-Threaded - Configuration: ETC2 Mpx/s > Higher Is Better a . 471.19 |=================================================================== BYTE Unix Benchmark 5.1.3-git Computational Test: Whetstone Double MWIPS > Higher Is Better a . 721978.0 |================================================================= LeelaChessZero 0.31.1 Backend: BLAS Nodes Per Second > Higher Is Better LeelaChessZero 0.31.1 Backend: Eigen Nodes Per Second > Higher Is Better a . 360 |====================================================================== Stockfish 17 Chess Benchmark Nodes Per Second > Higher Is Better a . 168428763 |================================================================ Stockfish Chess Benchmark Nodes Per Second > Higher Is Better a . 58496753 |================================================================= GROMACS 2024 Implementation: MPI CPU - Input: water_GMX50_bare Ns Per Day > Higher Is Better a . 6.001 |==================================================================== GROMACS 2024 Implementation: NVIDIA CUDA GPU - Input: water_GMX50_bare Ns Per Day > Higher Is Better GROMACS Input: water_GMX50_bare Ns Per Day > Higher Is Better a . 7.156 |==================================================================== Apache Cassandra 5.0 Test: Writes Op/s > Higher Is Better PostgreSQL 17 Scaling Factor: 1 - Clients: 500 - Mode: Read Only TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 1 - Clients: 800 - Mode: Read Only TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 1 - Clients: 1000 - Mode: Read Only TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 1 - Clients: 500 - Mode: Read Write TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 1 - Clients: 800 - Mode: Read Write TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 1 - Clients: 1000 - Mode: Read Write TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 100 - Clients: 500 - Mode: Read Only TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 100 - Clients: 800 - Mode: Read Only TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 100 - Clients: 1000 - Mode: Read Only TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 100 - Clients: 500 - Mode: Read Write TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 100 - Clients: 800 - Mode: Read Write TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 1000 - Clients: 500 - Mode: Read Only TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 1000 - Clients: 800 - Mode: Read Only TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 100 - Clients: 1000 - Mode: Read Write TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 1000 - Clients: 1000 - Mode: Read Only TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 1000 - Clients: 500 - Mode: Read Write TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 1000 - Clients: 800 - Mode: Read Write TPS > Higher Is Better PostgreSQL 17 Scaling Factor: 1000 - Clients: 1000 - Mode: Read Write TPS > Higher Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 172.23 |=================================================================== ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 194.36 |=================================================================== ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 22.15 |==================================================================== ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 4.75865 |================================================================== ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 9.15193 |================================================================== ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 2.56003 |================================================================== ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 3.11158 |================================================================== ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 0.791624 |================================================================= ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 1715.64 |================================================================== ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 2160.11 |================================================================== ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 6.19726 |================================================================== ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 3.14523 |================================================================== ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 53.63 |==================================================================== ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 6.28553 |================================================================== ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better a . 2776.33 |================================================================== ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better a . 3298.19 |================================================================== PyPerformance 1.11 Benchmark: go Milliseconds < Lower Is Better a . 98.2 |===================================================================== PyPerformance 1.11 Benchmark: chaos Milliseconds < Lower Is Better a . 47.4 |===================================================================== PyPerformance 1.11 Benchmark: float Milliseconds < Lower Is Better a . 56.8 |===================================================================== PyPerformance 1.11 Benchmark: nbody Milliseconds < Lower Is Better a . 64.5 |===================================================================== PyPerformance 1.11 Benchmark: pathlib Milliseconds < Lower Is Better a . 15.5 |===================================================================== PyPerformance 1.11 Benchmark: raytrace Milliseconds < Lower Is Better a . 217 |====================================================================== PyPerformance 1.11 Benchmark: xml_etree Milliseconds < Lower Is Better a . 45.8 |===================================================================== PyPerformance 1.11 Benchmark: gc_collect Milliseconds < Lower Is Better a . 1.08 |===================================================================== PyPerformance 1.11 Benchmark: json_loads Milliseconds < Lower Is Better a . 17.5 |===================================================================== PyPerformance 1.11 Benchmark: crypto_pyaes Milliseconds < Lower Is Better a . 54.8 |===================================================================== PyPerformance 1.11 Benchmark: async_tree_io Milliseconds < Lower Is Better a . 748 |====================================================================== PyPerformance 1.11 Benchmark: regex_compile Milliseconds < Lower Is Better a . 82.3 |===================================================================== PyPerformance 1.11 Benchmark: python_startup Milliseconds < Lower Is Better a . 18.7 |===================================================================== PyPerformance 1.11 Benchmark: asyncio_tcp_ssl Milliseconds < Lower Is Better a . 1.49 |===================================================================== PyPerformance 1.11 Benchmark: django_template Milliseconds < Lower Is Better a . 26.3 |===================================================================== PyPerformance 1.11 Benchmark: asyncio_websockets Milliseconds < Lower Is Better a . 510 |====================================================================== PyPerformance 1.11 Benchmark: pickle_pure_python Milliseconds < Lower Is Better a . 205 |====================================================================== Mobile Neural Network 2.9.b11b7037d Model: nasnet ms < Lower Is Better a . 5.008 |==================================================================== Mobile Neural Network 2.9.b11b7037d Model: mobilenetV3 ms < Lower Is Better a . 1.134 |==================================================================== Mobile Neural Network 2.9.b11b7037d Model: squeezenetv1.1 ms < Lower Is Better a . 1.824 |==================================================================== Mobile Neural Network 2.9.b11b7037d Model: resnet-v2-50 ms < Lower Is Better a . 11.34 |==================================================================== Mobile Neural Network 2.9.b11b7037d Model: SqueezeNetV1.0 ms < Lower Is Better a . 3.396 |==================================================================== Mobile Neural Network 2.9.b11b7037d Model: MobileNetV2_224 ms < Lower Is Better a . 1.502 |==================================================================== Mobile Neural Network 2.9.b11b7037d Model: mobilenet-v1-1.0 ms < Lower Is Better a . 1.793 |==================================================================== Mobile Neural Network 2.9.b11b7037d Model: inception-v3 ms < Lower Is Better a . 13.69 |==================================================================== Epoch 4.19.4 Epoch3D Deck: Cone Seconds < Lower Is Better a . 188.20 |=================================================================== WarpX 24.10 Input: Uniform Plasma Seconds < Lower Is Better a . 16.90 |==================================================================== WarpX 24.10 Input: Plasma Acceleration Seconds < Lower Is Better a . 20.38 |==================================================================== Timed Linux Kernel Compilation 6.8 Build: defconfig Seconds < Lower Is Better a . 66.71 |==================================================================== Timed Linux Kernel Compilation 6.8 Build: allmodconfig Seconds < Lower Is Better a . 285.13 |=================================================================== Timed LLVM Compilation 16.0 Build System: Ninja Seconds < Lower Is Better a . 175.03 |=================================================================== Timed LLVM Compilation 16.0 Build System: Unix Makefiles Seconds < Lower Is Better a . 276.93 |=================================================================== Build2 0.17 Time To Compile Seconds < Lower Is Better a . 84.79 |==================================================================== C-Ray 2.0 Resolution: 4K - Rays Per Pixel: 16 Seconds < Lower Is Better a . 20.36 |==================================================================== C-Ray 2.0 Resolution: 5K - Rays Per Pixel: 16 Seconds < Lower Is Better a . 36.21 |==================================================================== C-Ray 2.0 Resolution: 1080p - Rays Per Pixel: 16 Seconds < Lower Is Better a . 5.195 |==================================================================== POV-Ray Trace Time Seconds < Lower Is Better a . 7.786 |==================================================================== Blender 4.0.2 Blend File: BMW27 - Compute: CPU-Only Seconds < Lower Is Better a . 38.06 |==================================================================== Blender 4.0.2 Blend File: Classroom - Compute: CPU-Only Seconds < Lower Is Better a . 78.37 |==================================================================== Blender 4.0.2 Blend File: Fishy Cat - Compute: CPU-Only Seconds < Lower Is Better a . 73.02 |==================================================================== Blender 4.0.2 Blend File: Barbershop - Compute: CPU-Only Seconds < Lower Is Better a . 381.45 |=================================================================== Blender 4.0.2 Blend File: Pabellon Barcelona - Compute: CPU-Only Seconds < Lower Is Better a . 154.46 |=================================================================== XNNPACK 2cd86b Model: FP32MobileNetV2 us < Lower Is Better a . 967 |====================================================================== XNNPACK 2cd86b Model: FP32MobileNetV3Large us < Lower Is Better a . 1426 |===================================================================== XNNPACK 2cd86b Model: FP32MobileNetV3Small us < Lower Is Better a . 945 |====================================================================== XNNPACK 2cd86b Model: FP16MobileNetV2 us < Lower Is Better a . 840 |====================================================================== XNNPACK 2cd86b Model: FP16MobileNetV3Large us < Lower Is Better a . 1226 |===================================================================== XNNPACK 2cd86b Model: FP16MobileNetV3Small us < Lower Is Better a . 881 |====================================================================== XNNPACK 2cd86b Model: QU8MobileNetV2 us < Lower Is Better a . 945 |====================================================================== XNNPACK 2cd86b Model: QU8MobileNetV3Large us < Lower Is Better a . 1484 |===================================================================== XNNPACK 2cd86b Model: QU8MobileNetV3Small us < Lower Is Better a . 1083 |=====================================================================