nvidia-gpu-compute-oring-agx-32g ARMv8 Cortex-A78E testing with a NVIDIA Jetson AGX Orin Developer Kit (36.3.0-gcid-36191598 BIOS) and Orin on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408272-NE-NVIDIAGPU37&grr .
nvidia-gpu-compute-oring-agx-32g Processor Motherboard Memory Disk Graphics Network OS Kernel Desktop Display Server Display Driver Vulkan Compiler File-System Screen Resolution baseline ARMv8 Cortex-A78E @ 2.20GHz (12 Cores) NVIDIA Jetson AGX Orin Developer Kit (36.3.0-gcid-36191598 BIOS) 30GB 1000GB Samsung SSD 960 EVO 1TB + 64GB G1M15M Orin Realtek RTL8822CE 802.11ac PCIe Ubuntu 22.04 5.15.136-tegra (aarch64) GNOME Shell 42.9 X Server 1.21.1.4 NVIDIA 1.3.251 GCC 11.4.0 + CUDA 12.2 ext4 6582x1234 OpenBenchmarking.org - Transparent Huge Pages: always - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v - Scaling Governor: tegra194 performance - BAR1 / Visible vRAM Size: N/A - Python 3.10.12 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 but not BHB + srbds: Not affected + tsx_async_abort: Not affected
nvidia-gpu-compute-oring-agx-32g vkfft: FFT + iFFT C2C 1D batched in double precision vkfft: FFT + iFFT C2C Bluestein benchmark in double precision ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet vkresample: 2x - Double vkfft: FFT + iFFT C2C Bluestein in single precision vkfft: FFT + iFFT C2C 1D batched in single precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT C2C 1D batched in half precision vkfft: FFT + iFFT R2C / C2R viennacl: CPU BLAS - dGEMM-TT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sCOPY vkresample: 2x - Single libplacebo: baseline 3569 734 3.56 267.88 9.84 10.82 25.06 16.10 16.13 6.42 6.14 34.68 8.71 1.43 5.00 2.99 2.75 3.10 3.23 16.10 512.018 3867 26448 28743 10844 31914 11818 26.6 26.3 23.5 27.3 62.1 47.4 61.8 76.0 64.1 63.9 79.4 64.4 56.738 OpenBenchmarking.org
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision baseline 800 1600 2400 3200 4000 SE +/- 17.23, N = 3 3569 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision baseline 160 320 480 640 800 SE +/- 0.58, N = 3 734 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet baseline 0.801 1.602 2.403 3.204 4.005 SE +/- 0.06, N = 12 3.56 MIN: 3.22 / MAX: 68.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer baseline 60 120 180 240 300 SE +/- 0.75, N = 12 267.88 MIN: 228.72 / MAX: 358.85 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m baseline 3 6 9 12 15 SE +/- 0.07, N = 12 9.84 MIN: 9.23 / MAX: 218.63 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd baseline 3 6 9 12 15 SE +/- 0.29, N = 12 10.82 MIN: 9.19 / MAX: 92.97 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny baseline 6 12 18 24 30 SE +/- 0.44, N = 12 25.06 MIN: 17.44 / MAX: 68.09 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 baseline 4 8 12 16 20 SE +/- 0.54, N = 12 16.10 MIN: 10.31 / MAX: 86.73 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 baseline 4 8 12 16 20 SE +/- 0.51, N = 12 16.13 MIN: 13.81 / MAX: 89.7 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet baseline 2 4 6 8 10 SE +/- 0.16, N = 12 6.42 MIN: 5.86 / MAX: 24.85 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 baseline 2 4 6 8 10 SE +/- 0.05, N = 12 6.14 MIN: 5.81 / MAX: 40.11 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 baseline 8 16 24 32 40 SE +/- 0.35, N = 12 34.68 MIN: 27.78 / MAX: 73.92 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet baseline 2 4 6 8 10 SE +/- 0.10, N = 12 8.71 MIN: 8.13 / MAX: 94.74 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface baseline 0.3218 0.6436 0.9654 1.2872 1.609 SE +/- 0.05, N = 12 1.43 MIN: 1.25 / MAX: 38.68 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 baseline 1.125 2.25 3.375 4.5 5.625 SE +/- 0.02, N = 12 5.00 MIN: 4.8 / MAX: 53.16 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet baseline 0.6728 1.3456 2.0184 2.6912 3.364 SE +/- 0.03, N = 12 2.99 MIN: 2.82 / MAX: 36.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 baseline 0.6188 1.2376 1.8564 2.4752 3.094 SE +/- 0.03, N = 12 2.75 MIN: 2.58 / MAX: 54.9 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 baseline 0.6975 1.395 2.0925 2.79 3.4875 SE +/- 0.02, N = 12 3.10 MIN: 2.93 / MAX: 27.15 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 baseline 0.7268 1.4536 2.1804 2.9072 3.634 SE +/- 0.03, N = 12 3.23 MIN: 3.07 / MAX: 49.98 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet baseline 4 8 12 16 20 SE +/- 0.54, N = 12 16.10 MIN: 10.31 / MAX: 86.73 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
VkResample Upscale: 2x - Precision: Double OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Double baseline 110 220 330 440 550 SE +/- 0.02, N = 3 512.02 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision baseline 800 1600 2400 3200 4000 SE +/- 2.91, N = 3 3867 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision baseline 6K 12K 18K 24K 30K SE +/- 47.76, N = 3 26448 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling baseline 6K 12K 18K 24K 30K SE +/- 23.30, N = 3 28743 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision baseline 2K 4K 6K 8K 10K SE +/- 26.42, N = 3 10844 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision baseline 7K 14K 21K 28K 35K SE +/- 122.59, N = 3 31914 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R baseline 3K 6K 9K 12K 15K SE +/- 34.23, N = 3 11818 1. (CXX) g++ options: -O3
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT baseline 6 12 18 24 30 SE +/- 0.15, N = 3 26.6 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN baseline 6 12 18 24 30 SE +/- 0.43, N = 3 26.3 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT baseline 6 12 18 24 30 SE +/- 0.00, N = 3 23.5 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN baseline 6 12 18 24 30 SE +/- 0.03, N = 3 27.3 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T baseline 14 28 42 56 70 SE +/- 0.98, N = 3 62.1 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N baseline 11 22 33 44 55 SE +/- 2.85, N = 3 47.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT baseline 14 28 42 56 70 SE +/- 0.36, N = 3 61.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY baseline 20 40 60 80 100 SE +/- 0.26, N = 3 76.0 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY baseline 14 28 42 56 70 SE +/- 0.73, N = 3 64.1 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT baseline 14 28 42 56 70 SE +/- 0.43, N = 3 63.9 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY baseline 20 40 60 80 100 SE +/- 0.74, N = 3 79.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY baseline 14 28 42 56 70 SE +/- 0.19, N = 3 64.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
VkResample Upscale: 2x - Precision: Single OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Single baseline 13 26 39 52 65 SE +/- 0.04, N = 3 56.74 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5