nvidia-gpu-compute-orin-nano-8g ARMv8 Cortex-A78E testing with a NVIDIA Jetson Orin Nano Developer Kit (36.3.0-gcid-36191598 BIOS) and Orin on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2408278-NE-NVIDIAGPU25&gru .
nvidia-gpu-compute-orin-nano-8g Processor Motherboard Memory Disk Graphics Network OS Kernel Desktop Display Server Display Driver Vulkan Compiler File-System Screen Resolution baseline ARMv8 Cortex-A78E @ 1.51GHz (6 Cores) NVIDIA Jetson Orin Nano Developer Kit (36.3.0-gcid-36191598 BIOS) 8GB 1000GB Western Digital WD_BLACK SN770 1TB Orin Realtek RTL8111/8168/8411 + Realtek RTL8822CE 802.11ac PCIe Ubuntu 22.04 5.15.136-tegra (aarch64) GNOME Shell 42.9 X Server 1.21.1.4 NVIDIA 1.3.251 GCC 11.4.0 + CUDA 12.2 ext4 6582x1234 OpenBenchmarking.org - Transparent Huge Pages: always - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v - Scaling Governor: tegra194 performance - BAR1 / Visible vRAM Size: N/A - Python 3.10.12 - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Mitigation of CSV2 but not BHB + srbds: Not affected + tsx_async_abort: Not affected
nvidia-gpu-compute-orin-nano-8g vkfft: FFT + iFFT R2C / C2R vkfft: FFT + iFFT C2C 1D batched in half precision vkfft: FFT + iFFT C2C Bluestein in single precision vkfft: FFT + iFFT C2C 1D batched in double precision vkfft: FFT + iFFT C2C 1D batched in single precision vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT C2C Bluestein benchmark in double precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling viennacl: CPU BLAS - sCOPY viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-TT vkresample: 2x - Double vkresample: 2x - Single ncnn: Vulkan GPU - mobilenet ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - FastestDet betsy: ETC2 RGB - Highest baseline 4137 12138 1439 1020 9269 4121 228 10170 27 28.5 21.6 27.7 28.3 21.5 20.8 21.7 9.55 8.27 9.71 9.71 511.949 195.858 22.96 5.25 4.99 4.06 4.85 8.55 1.60 16.46 64.25 12.84 14.64 31.88 22.96 29.87 14.89 12.21 555.20 4.50 OpenBenchmarking.org
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R baseline 900 1800 2700 3600 4500 SE +/- 4.33, N = 3 4137 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision baseline 3K 6K 9K 12K 15K SE +/- 7.69, N = 3 12138 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision baseline 300 600 900 1200 1500 SE +/- 0.88, N = 3 1439 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision baseline 200 400 600 800 1000 SE +/- 0.88, N = 3 1020 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision baseline 2K 4K 6K 8K 10K SE +/- 6.36, N = 3 9269 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision baseline 900 1800 2700 3600 4500 SE +/- 7.21, N = 3 4121 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision baseline 50 100 150 200 250 SE +/- 0.00, N = 3 228 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling baseline 2K 4K 6K 8K 10K SE +/- 6.66, N = 3 10170 1. (CXX) g++ options: -O3
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY baseline 6 12 18 24 30 SE +/- 0.00, N = 3 27 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY baseline 7 14 21 28 35 SE +/- 0.21, N = 3 28.5 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT baseline 5 10 15 20 25 SE +/- 0.06, N = 3 21.6 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY baseline 7 14 21 28 35 SE +/- 0.27, N = 3 27.7 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY baseline 7 14 21 28 35 SE +/- 0.06, N = 3 28.3 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT baseline 5 10 15 20 25 SE +/- 0.06, N = 3 21.5 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N baseline 5 10 15 20 25 SE +/- 0.03, N = 3 20.8 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T baseline 5 10 15 20 25 SE +/- 0.06, N = 3 21.7 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN baseline 3 6 9 12 15 SE +/- 0.01, N = 3 9.55 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT baseline 2 4 6 8 10 SE +/- 0.02, N = 3 8.27 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN baseline 3 6 9 12 15 SE +/- 0.10, N = 3 9.71 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT baseline 3 6 9 12 15 SE +/- 0.06, N = 3 9.71 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
VkResample Upscale: 2x - Precision: Double OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Double baseline 110 220 330 440 550 SE +/- 0.04, N = 3 511.95 1. (CXX) g++ options: -O3
VkResample Upscale: 2x - Precision: Single OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Single baseline 40 80 120 160 200 SE +/- 0.01, N = 3 195.86 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet baseline 6 12 18 24 30 SE +/- 0.76, N = 12 22.96 MIN: 18.82 / MAX: 104.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 baseline 1.1813 2.3626 3.5439 4.7252 5.9065 SE +/- 0.03, N = 12 5.25 MIN: 4.99 / MAX: 69.14 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 baseline 1.1228 2.2456 3.3684 4.4912 5.614 SE +/- 0.03, N = 12 4.99 MIN: 4.67 / MAX: 58.58 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 baseline 0.9135 1.827 2.7405 3.654 4.5675 SE +/- 0.01, N = 12 4.06 MIN: 3.91 / MAX: 34.67 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet baseline 1.0913 2.1826 3.2739 4.3652 5.4565 SE +/- 0.02, N = 12 4.85 MIN: 4.62 / MAX: 38.77 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 baseline 2 4 6 8 10 SE +/- 0.04, N = 12 8.55 MIN: 8.17 / MAX: 64.81 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface baseline 0.36 0.72 1.08 1.44 1.8 SE +/- 0.01, N = 12 1.60 MIN: 1.52 / MAX: 28.82 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet baseline 4 8 12 16 20 SE +/- 0.05, N = 12 16.46 MIN: 15.72 / MAX: 91.46 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 baseline 14 28 42 56 70 SE +/- 0.24, N = 12 64.25 MIN: 59.43 / MAX: 115.51 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 baseline 3 6 9 12 15 SE +/- 0.10, N = 12 12.84 MIN: 11.75 / MAX: 58.16 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet baseline 4 8 12 16 20 SE +/- 0.12, N = 12 14.64 MIN: 12.35 / MAX: 42.61 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 baseline 7 14 21 28 35 SE +/- 0.20, N = 12 31.88 MIN: 29.64 / MAX: 124.06 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 baseline 6 12 18 24 30 SE +/- 0.76, N = 12 22.96 MIN: 18.82 / MAX: 104.41 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny baseline 7 14 21 28 35 SE +/- 0.61, N = 12 29.87 MIN: 27.28 / MAX: 72.83 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd baseline 4 8 12 16 20 SE +/- 0.04, N = 12 14.89 MIN: 14.34 / MAX: 51.85 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m baseline 3 6 9 12 15 SE +/- 0.06, N = 12 12.21 MIN: 11.54 / MAX: 50.03 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer baseline 120 240 360 480 600 SE +/- 0.53, N = 12 555.20 MIN: 509.68 / MAX: 621.35 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet baseline 1.0125 2.025 3.0375 4.05 5.0625 SE +/- 0.05, N = 12 4.50 MIN: 4.21 / MAX: 50.51 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
Phoronix Test Suite v10.8.5