graviton4 192 Core Benchmarks for a future article. amazon testing on Ubuntu 24.04 via the Phoronix Test Suite. ARMv8 Neoverse-V2: Processor: ARMv8 Neoverse-V2 (192 Cores), Motherboard: Amazon EC2 r8g.48xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 1520GB, Disk: 429GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Ubuntu 24.04, Kernel: 6.8.0-41-generic-64k (aarch64), Compiler: GCC 13.2.0, File-System: ext4, System Layer: amazon b: Processor: ARMv8 Neoverse-V2 (192 Cores), Motherboard: Amazon EC2 r8g.48xlarge (1.0 BIOS), Chipset: Amazon Device 0200, Memory: 1520GB, Disk: 429GB Amazon Elastic Block Store, Network: Amazon Elastic OS: Ubuntu 24.04, Kernel: 6.8.0-41-generic-64k (aarch64), Compiler: GCC 13.2.0, File-System: ext4, System Layer: amazon Mobile Neural Network 2.9.b11b7037d Model: nasnet ms < Lower Is Better ARMv8 Neoverse-V2 . 7.697 |==================================================== Mobile Neural Network 2.9.b11b7037d Model: mobilenetV3 ms < Lower Is Better ARMv8 Neoverse-V2 . 1.171 |==================================================== Mobile Neural Network 2.9.b11b7037d Model: squeezenetv1.1 ms < Lower Is Better ARMv8 Neoverse-V2 . 1.787 |==================================================== Mobile Neural Network 2.9.b11b7037d Model: resnet-v2-50 ms < Lower Is Better ARMv8 Neoverse-V2 . 12.08 |==================================================== Mobile Neural Network 2.9.b11b7037d Model: SqueezeNetV1.0 ms < Lower Is Better ARMv8 Neoverse-V2 . 3.462 |==================================================== Mobile Neural Network 2.9.b11b7037d Model: MobileNetV2_224 ms < Lower Is Better ARMv8 Neoverse-V2 . 1.617 |==================================================== OpenSSL Algorithm: AES-128-GCM byte/s > Higher Is Better ARMv8 Neoverse-V2 . 832526756250 |============================================= Mobile Neural Network 2.9.b11b7037d Model: mobilenet-v1-1.0 ms < Lower Is Better ARMv8 Neoverse-V2 . 1.428 |==================================================== OpenSSL Algorithm: ChaCha20 byte/s > Higher Is Better ARMv8 Neoverse-V2 . 269383786770 |============================================= OpenSSL Algorithm: SHA512 byte/s > Higher Is Better ARMv8 Neoverse-V2 . 102206077080 |============================================= b ................. 102218737610 |============================================= OpenSSL Algorithm: RSA4096 POV-Ray Trace Time Seconds < Lower Is Better ARMv8 Neoverse-V2 . 5.609 |==================================================== b ................. 5.430 |================================================== OpenSSL Algorithm: SHA256 byte/s > Higher Is Better ARMv8 Neoverse-V2 . 163267255410 |============================================= b ................. 163273663510 |============================================= Stockfish Chess Benchmark Nodes Per Second > Higher Is Better ARMv8 Neoverse-V2 . 147286327 |================================================ b ................. 138804644 |============================================= Mobile Neural Network 2.9.b11b7037d Model: inception-v3 ms < Lower Is Better ARMv8 Neoverse-V2 . 16.37 |==================================================== 7-Zip Compression Test: Decompression Rating MIPS > Higher Is Better ARMv8 Neoverse-V2 . 843885 |=================================================== b ................. 850700 |=================================================== 7-Zip Compression Test: Compression Rating MIPS > Higher Is Better ARMv8 Neoverse-V2 . 945362 |================================================== b ................. 964131 |=================================================== x265 Video Input: Bosphorus 1080p Frames Per Second > Higher Is Better ARMv8 Neoverse-V2 . 18.15 |==================================================== b ................. 18.15 |==================================================== ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better OpenSSL Algorithm: AES-256-GCM byte/s > Higher Is Better ARMv8 Neoverse-V2 . 768310026790 |============================================= ONNX Runtime 1.19 Model: GPT-2 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 6.99432 |================================================== ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 5.58698 |================================================== ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 69.32 |==================================================== ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 144.59 |=================================================== x265 Video Input: Bosphorus 4K Frames Per Second > Higher Is Better ARMv8 Neoverse-V2 . 9.52 |===================================================== b ................. 9.52 |===================================================== GROMACS Input: water_GMX50_bare Ns Per Day > Higher Is Better ARMv8 Neoverse-V2 . 13.71 |==================================================== OpenSSL Algorithm: ChaCha20-Poly1305 byte/s > Higher Is Better ARMv8 Neoverse-V2 . 203934370470 |============================================= ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 275.96 |=================================================== ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 351.73 |=================================================== ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: bertsquad-12 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 209.51 |=================================================== ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 716.31 |=================================================== ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 1.31038 |================================================== ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 1.47366 |================================================== ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: ArcFace ResNet-100 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 54.16 |==================================================== ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 139.36 |=================================================== ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 91.39 |==================================================== ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 92.44 |==================================================== ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 0.498905 |================================================= ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better ARMv8 Neoverse-V2 . 0.472144 |================================================= ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Parallel Inferences Per Second > Higher Is Better ONNX Runtime 1.19 Model: Faster R-CNN R-50-FPN-int8 - Device: CPU - Executor: Standard Inferences Per Second > Higher Is Better XNNPACK 2cd86b Model: FP32MobileNetV2 us < Lower Is Better ARMv8 Neoverse-V2 . 16258 |==================================================== XNNPACK 2cd86b Model: FP32MobileNetV3Large us < Lower Is Better ARMv8 Neoverse-V2 . 18004 |==================================================== XNNPACK 2cd86b Model: FP32MobileNetV3Small us < Lower Is Better ARMv8 Neoverse-V2 . 12214 |==================================================== XNNPACK 2cd86b Model: FP16MobileNetV2 us < Lower Is Better ARMv8 Neoverse-V2 . 18310 |==================================================== XNNPACK 2cd86b Model: FP16MobileNetV3Large us < Lower Is Better ARMv8 Neoverse-V2 . 26209 |==================================================== XNNPACK 2cd86b Model: FP16MobileNetV3Small us < Lower Is Better ARMv8 Neoverse-V2 . 12365 |==================================================== XNNPACK 2cd86b Model: QU8MobileNetV2 us < Lower Is Better ARMv8 Neoverse-V2 . 11780 |==================================================== XNNPACK 2cd86b Model: QU8MobileNetV3Large us < Lower Is Better ARMv8 Neoverse-V2 . 17439 |==================================================== XNNPACK 2cd86b Model: QU8MobileNetV3Small us < Lower Is Better ARMv8 Neoverse-V2 . 13815 |==================================================== Build2 0.17 Time To Compile Seconds < Lower Is Better ARMv8 Neoverse-V2 . 70.10 |=================================================== b ................. 70.96 |==================================================== simdjson 3.10 Throughput Test: Kostya GB/s > Higher Is Better ARMv8 Neoverse-V2 . 2.16 |===================================================== b ................. 2.16 |===================================================== simdjson 3.10 Throughput Test: TopTweet GB/s > Higher Is Better ARMv8 Neoverse-V2 . 2.71 |===================================================== b ................. 2.71 |===================================================== simdjson 3.10 Throughput Test: LargeRandom GB/s > Higher Is Better ARMv8 Neoverse-V2 . 0.89 |===================================================== b ................. 0.89 |===================================================== simdjson 3.10 Throughput Test: PartialTweets GB/s > Higher Is Better ARMv8 Neoverse-V2 . 2.67 |===================================================== b ................. 2.68 |===================================================== simdjson 3.10 Throughput Test: DistinctUserID GB/s > Higher Is Better ARMv8 Neoverse-V2 . 2.73 |===================================================== b ................. 2.72 |===================================================== BYTE Unix Benchmark 5.1.3-git Computational Test: Pipe LPS > Higher Is Better ARMv8 Neoverse-V2 . 399837391.3 |============================================== b ................. 400090420.1 |============================================== BYTE Unix Benchmark 5.1.3-git Computational Test: Dhrystone 2 LPS > Higher Is Better ARMv8 Neoverse-V2 . 10384571085 |============================================== b ................. 10381629718 |============================================== BYTE Unix Benchmark 5.1.3-git Computational Test: System Call LPS > Higher Is Better ARMv8 Neoverse-V2 . 289337684.7 |============================================== b ................. 289327337.4 |============================================== BYTE Unix Benchmark 5.1.3-git Computational Test: Whetstone Double MWIPS > Higher Is Better ARMv8 Neoverse-V2 . 1559589.3 |================================================ b ................. 1559332.5 |================================================ ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 142.97 |=================================================== ONNX Runtime 1.19 Model: yolov4 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 178.98 |=================================================== ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 14.42 |==================================================== ONNX Runtime 1.19 Model: ZFNet-512 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 6.91273 |================================================== ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 3.62224 |================================================== ONNX Runtime 1.19 Model: T5 Encoder - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 2.84162 |================================================== ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 4.7706 |=================================================== ONNX Runtime 1.19 Model: CaffeNet 12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 1.39501 |================================================== ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 763.13 |=================================================== ONNX Runtime 1.19 Model: fcn-resnet101-11 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 678.58 |=================================================== ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 18.46 |==================================================== ONNX Runtime 1.19 Model: ResNet50 v1-12-int8 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 7.1738 |=================================================== ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 10.94 |==================================================== ONNX Runtime 1.19 Model: super-resolution-10 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 10.82 |==================================================== ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Parallel Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 2004.39 |================================================== ONNX Runtime 1.19 Model: ResNet101_DUC_HDC-12 - Device: CPU - Executor: Standard Inference Time Cost (ms) < Lower Is Better ARMv8 Neoverse-V2 . 2117.99 |==================================================