ngc smoke run ARMv8 Neoverse-V2 testing with a Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) and NVIDIA GH200 480GB on Ubuntu 22.04 via the Phoronix Test Suite.
HTML result view exported from: https://openbenchmarking.org/result/2403013-NE-NGCSMOKER54&grr&rdt .
ngc smoke run Processor Motherboard Memory Disk Graphics Network OS Kernel Display Driver OpenCL Compiler File-System Screen Resolution a b c d ARMv8 Neoverse-V2 @ 3.39GHz (72 Cores) Quanta Cloud QuantaGrid S74G-2U 1S7GZ9Z0000 S7G MB (CG1) (3A06 BIOS) 1 x 480GB DRAM-6400MT/s 960GB SAMSUNG MZ1L2960HCJR-00A07 + 1920GB SAMSUNG MZTL21T9 NVIDIA GH200 480GB 2 x Mellanox MT2910 + 2 x QLogic FastLinQ QL41000 10/25/40/50GbE Ubuntu 22.04 6.5.0-1007-NVIDIA-64k (aarch64) NVIDIA OpenCL 3.0 CUDA 12.4.89 GCC 11.4.0 + CUDA 11.5 ext4 1920x1200 OpenBenchmarking.org Kernel Details - Transparent Huge Pages: madvise Compiler Details - --build=aarch64-linux-gnu --disable-libquadmath --disable-libquadmath-support --disable-werror --enable-bootstrap --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-fix-cortex-a53-843419 --enable-gnu-unique-object --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --enable-libphobos-checking=release --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-link-serialization=2 --enable-multiarch --enable-nls --enable-objc-gc=auto --enable-plugin --enable-shared --enable-threads=posix --host=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --with-default-libstdcxx-abi=new --with-gcc-major-version-only --with-target-system-zlib=auto -v Processor Details - Scaling Governor: cppc_cpufreq performance (Boost: Disabled) Graphics Details - BAR1 / Visible vRAM Size: N/A - vBIOS Version: 96.00.7e.00.02 Python Details - Python 3.10.12 Security Details - gather_data_sampling: Not affected + itlb_multihit: Not affected + l1tf: Not affected + mds: Not affected + meltdown: Not affected + mmio_stale_data: Not affected + retbleed: Not affected + spec_rstack_overflow: Not affected + spec_store_bypass: Mitigation of SSB disabled via prctl + spectre_v1: Mitigation of __user pointer sanitization + spectre_v2: Not affected + srbds: Not affected + tsx_async_abort: Not affected
ngc smoke run vkfft: FFT + iFFT C2C 1D batched in double precision vkfft: FFT + iFFT C2C Bluestein benchmark in double precision vkfft: FFT + iFFT C2C Bluestein in single precision vkfft: FFT + iFFT C2C multidimensional in single precision vkfft: FFT + iFFT R2C / C2R vkfft: FFT + iFFT C2C 1D batched in single precision vkfft: FFT + iFFT C2C 1D batched in single precision, no reshuffling ncnn: Vulkan GPU - FastestDet ncnn: Vulkan GPU - vision_transformer ncnn: Vulkan GPU - regnety_400m ncnn: Vulkan GPU - squeezenet_ssd ncnn: Vulkan GPU - yolov4-tiny ncnn: Vulkan GPUv2-yolov3v2-yolov3 - mobilenetv2-yolov3 ncnn: Vulkan GPU - resnet50 ncnn: Vulkan GPU - alexnet ncnn: Vulkan GPU - resnet18 ncnn: Vulkan GPU - vgg16 ncnn: Vulkan GPU - googlenet ncnn: Vulkan GPU - blazeface ncnn: Vulkan GPU - efficientnet-b0 ncnn: Vulkan GPU - mnasnet ncnn: Vulkan GPU - shufflenet-v2 ncnn: Vulkan GPU-v3-v3 - mobilenet-v3 ncnn: Vulkan GPU-v2-v2 - mobilenet-v2 ncnn: Vulkan GPU - mobilenet vkfft: FFT + iFFT C2C 1D batched in half precision viennacl: CPU BLAS - dGEMM-TT viennacl: CPU BLAS - dGEMM-TN viennacl: CPU BLAS - dGEMM-NT viennacl: CPU BLAS - dGEMM-NN viennacl: CPU BLAS - dGEMV-T viennacl: CPU BLAS - dGEMV-N viennacl: CPU BLAS - dDOT viennacl: CPU BLAS - dAXPY viennacl: CPU BLAS - dCOPY viennacl: CPU BLAS - sDOT viennacl: CPU BLAS - sAXPY viennacl: CPU BLAS - sCOPY viennacl: OpenCL BLAS - dGEMM-TT viennacl: OpenCL BLAS - dGEMM-TN viennacl: OpenCL BLAS - dGEMM-NT viennacl: OpenCL BLAS - dGEMM-NN viennacl: OpenCL BLAS - dGEMV-T viennacl: OpenCL BLAS - dGEMV-N viennacl: OpenCL BLAS - dDOT viennacl: OpenCL BLAS - dAXPY viennacl: OpenCL BLAS - dCOPY viennacl: OpenCL BLAS - sDOT viennacl: OpenCL BLAS - sAXPY viennacl: OpenCL BLAS - sCOPY vkresample: 2x - Double vkresample: 2x - Single cl-mem: Write cl-mem: Read cl-mem: Copy arrayfire: Conjugate Gradient OpenCL clpeak: Global Memory Bandwidth clpeak: Single-Precision Float clpeak: Integer Compute INT financebench: Black-Scholes OpenCL clpeak: Double-Precision Double a b c d 58405 20810 17867 44489 42397 185774 194497 3.09 31.52 14.78 5.43 6.79 4.89 4.27 1.63 2.16 5.26 4.16 1.75 3.49 2.04 2.29 2.26 2.13 4.89 151912 137 141 125 135 686 411 1247 1803 2027 667 3943 2920 7070 7027 7527 7057 308 81.2 550 799 603 282 420 316 24.296 5.230 2354.9 1045.9 308.6 2.997 3483.99 64545.62 33119.10 4.347 32959.17 58253 21000 17967 43731 41809 186082 190037 3.10 32.32 14.74 5.47 6.80 4.92 4.28 1.63 2.18 5.25 4.23 1.78 3.55 2.03 2.29 2.27 2.16 4.92 151910 138 140 125 137 699 408 1238 1806 1948 664 3924 2892 7070 7067 7537 7093 308 81.5 552 798 604 282 427 316 24.294 5.231 2353.4 1045.9 308.5 2.983 3484.06 64547.74 33144.74 4.373 32961.21 58256 21094 17886 45071 42581 189944 190909 3.08 31.92 14.77 5.48 6.81 4.92 4.32 1.65 2.17 5.25 4.23 1.74 3.52 2.04 2.27 2.30 2.12 4.92 152866 140 141 124 139 691 405 1243 1837 1920 666 3917 2907 7057 7000 7537 7037 308 81.2 552 799 604 283 427 316 24.290 5.230 2354.9 1046.1 308.6 2.998 3483.95 64547.25 33146.12 4.351 32941.99 58299 21320 17942 45007 43048 190310 192507 3.12 31.13 15.22 5.43 6.82 4.91 4.32 1.62 2.20 5.26 4.21 1.77 3.53 2.04 2.27 2.27 2.12 4.91 151969 136 140 124 141 696 418 1247 1830 1917 663 3920 2857 7070 7070 7540 7053 307 81.4 553 799 604 283 426 316 24.297 5.230 2352.1 1046.0 308.5 2.997 3484.32 64520.97 33129.34 4.339 32933.63 OpenBenchmarking.org
VkFFT Test: FFT + iFFT C2C 1D batched in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in double precision a b c d 13K 26K 39K 52K 65K SE +/- 150.19, N = 3 SE +/- 46.92, N = 3 SE +/- 17.34, N = 3 SE +/- 21.83, N = 3 58405 58253 58256 58299 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein benchmark in double precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein benchmark in double precision a b c d 5K 10K 15K 20K 25K SE +/- 188.78, N = 3 SE +/- 195.51, N = 3 SE +/- 282.81, N = 3 SE +/- 152.14, N = 15 20810 21000 21094 21320 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C Bluestein in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C Bluestein in single precision a b c d 4K 8K 12K 16K 20K SE +/- 131.79, N = 3 SE +/- 196.01, N = 5 SE +/- 147.24, N = 3 SE +/- 168.45, N = 7 17867 17967 17886 17942 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C multidimensional in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C multidimensional in single precision a b c d 10K 20K 30K 40K 50K SE +/- 479.16, N = 15 SE +/- 441.36, N = 3 SE +/- 475.72, N = 3 SE +/- 571.37, N = 3 44489 43731 45071 45007 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT R2C / C2R OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT R2C / C2R a b c d 9K 18K 27K 36K 45K SE +/- 298.99, N = 3 SE +/- 460.34, N = 3 SE +/- 552.29, N = 3 SE +/- 289.59, N = 15 42397 41809 42581 43048 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision a b c d 40K 80K 120K 160K 200K SE +/- 1557.78, N = 3 SE +/- 1095.80, N = 3 SE +/- 1666.00, N = 3 SE +/- 479.86, N = 3 185774 186082 189944 190310 1. (CXX) g++ options: -O3
VkFFT Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in single precision, no reshuffling a b c d 40K 80K 120K 160K 200K SE +/- 2261.14, N = 3 SE +/- 521.00, N = 3 SE +/- 583.76, N = 3 SE +/- 720.67, N = 3 194497 190037 190909 192507 1. (CXX) g++ options: -O3
NCNN Target: Vulkan GPU - Model: FastestDet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: FastestDet a b c d 0.702 1.404 2.106 2.808 3.51 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.06, N = 3 SE +/- 0.03, N = 3 3.09 3.10 3.08 3.12 MIN: 2.95 / MAX: 4.64 MIN: 2.92 / MAX: 4.71 MIN: 2.91 / MAX: 4.57 MIN: 2.97 / MAX: 4.56 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vision_transformer OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vision_transformer a b c d 8 16 24 32 40 SE +/- 0.28, N = 3 SE +/- 0.98, N = 3 SE +/- 0.60, N = 3 SE +/- 0.08, N = 3 31.52 32.32 31.92 31.13 MIN: 30.23 / MAX: 62.77 MIN: 30.14 / MAX: 67.5 MIN: 30.27 / MAX: 57.34 MIN: 30.31 / MAX: 64.21 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: regnety_400m OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: regnety_400m a b c d 4 8 12 16 20 SE +/- 0.13, N = 3 SE +/- 0.15, N = 3 SE +/- 0.24, N = 3 SE +/- 0.19, N = 3 14.78 14.74 14.77 15.22 MIN: 13.74 / MAX: 17.76 MIN: 14 / MAX: 20.37 MIN: 13.51 / MAX: 18.11 MIN: 14.15 / MAX: 21.54 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: squeezenet_ssd OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: squeezenet_ssd a b c d 1.233 2.466 3.699 4.932 6.165 SE +/- 0.03, N = 3 SE +/- 0.07, N = 3 SE +/- 0.07, N = 3 SE +/- 0.04, N = 3 5.43 5.47 5.48 5.43 MIN: 5.18 / MAX: 8.53 MIN: 5.22 / MAX: 11.66 MIN: 5.16 / MAX: 11.04 MIN: 5.17 / MAX: 7.39 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: yolov4-tiny OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: yolov4-tiny a b c d 2 4 6 8 10 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 6.79 6.80 6.81 6.82 MIN: 6.66 / MAX: 8.49 MIN: 6.64 / MAX: 8.25 MIN: 6.42 / MAX: 12.5 MIN: 6.69 / MAX: 8.1 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPUv2-yolov3v2-yolov3 - Model: mobilenetv2-yolov3 a b c d 1.107 2.214 3.321 4.428 5.535 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 4.89 4.92 4.92 4.91 MIN: 4.76 / MAX: 7.99 MIN: 4.77 / MAX: 7.57 MIN: 4.74 / MAX: 6.99 MIN: 4.79 / MAX: 6.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet50 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet50 a b c d 0.972 1.944 2.916 3.888 4.86 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.02, N = 3 4.27 4.28 4.32 4.32 MIN: 4.05 / MAX: 6.65 MIN: 4.05 / MAX: 7.65 MIN: 4.05 / MAX: 8.1 MIN: 4.1 / MAX: 7.58 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: alexnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: alexnet a b c d 0.3713 0.7426 1.1139 1.4852 1.8565 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.00, N = 3 1.63 1.63 1.65 1.62 MIN: 1.49 / MAX: 2.81 MIN: 1.5 / MAX: 2.94 MIN: 1.44 / MAX: 4.74 MIN: 1.5 / MAX: 4.84 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: resnet18 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: resnet18 a b c d 0.495 0.99 1.485 1.98 2.475 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 2.16 2.18 2.17 2.20 MIN: 2.04 / MAX: 3.59 MIN: 2.05 / MAX: 5.44 MIN: 2.04 / MAX: 3.54 MIN: 2.08 / MAX: 3.65 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: vgg16 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: vgg16 a b c d 1.1835 2.367 3.5505 4.734 5.9175 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 5.26 5.25 5.25 5.26 MIN: 5.08 / MAX: 8.11 MIN: 5.07 / MAX: 7.45 MIN: 4.94 / MAX: 11.68 MIN: 5.07 / MAX: 7.08 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: googlenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: googlenet a b c d 0.9518 1.9036 2.8554 3.8072 4.759 SE +/- 0.02, N = 3 SE +/- 0.05, N = 3 SE +/- 0.06, N = 3 SE +/- 0.01, N = 3 4.16 4.23 4.23 4.21 MIN: 3.99 / MAX: 5.76 MIN: 4.01 / MAX: 6.82 MIN: 4 / MAX: 6.41 MIN: 4.03 / MAX: 5.75 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: blazeface OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: blazeface a b c d 0.4005 0.801 1.2015 1.602 2.0025 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 1.75 1.78 1.74 1.77 MIN: 1.68 / MAX: 3.09 MIN: 1.67 / MAX: 7.1 MIN: 1.6 / MAX: 2.93 MIN: 1.64 / MAX: 3.01 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: efficientnet-b0 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: efficientnet-b0 a b c d 0.7988 1.5976 2.3964 3.1952 3.994 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.04, N = 3 SE +/- 0.04, N = 3 3.49 3.55 3.52 3.53 MIN: 3.27 / MAX: 5.07 MIN: 3.33 / MAX: 8.63 MIN: 3.22 / MAX: 6.65 MIN: 3.25 / MAX: 6.52 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mnasnet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mnasnet a b c d 0.459 0.918 1.377 1.836 2.295 SE +/- 0.02, N = 3 SE +/- 0.01, N = 3 SE +/- 0.04, N = 3 SE +/- 0.03, N = 3 2.04 2.03 2.04 2.04 MIN: 1.89 / MAX: 3.56 MIN: 1.94 / MAX: 3.48 MIN: 1.86 / MAX: 3.85 MIN: 1.87 / MAX: 6.48 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: shufflenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: shufflenet-v2 a b c d 0.5153 1.0306 1.5459 2.0612 2.5765 SE +/- 0.01, N = 3 SE +/- 0.02, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 2.29 2.29 2.27 2.27 MIN: 2.13 / MAX: 3.94 MIN: 2.12 / MAX: 3.59 MIN: 2.13 / MAX: 5.55 MIN: 2.1 / MAX: 5.57 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v3-v3 - Model: mobilenet-v3 a b c d 0.5175 1.035 1.5525 2.07 2.5875 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 SE +/- 0.05, N = 3 SE +/- 0.04, N = 3 2.26 2.27 2.30 2.27 MIN: 2.11 / MAX: 3.89 MIN: 2.16 / MAX: 5.3 MIN: 2.1 / MAX: 3.74 MIN: 2.11 / MAX: 4.59 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU-v2-v2 - Model: mobilenet-v2 a b c d 0.486 0.972 1.458 1.944 2.43 SE +/- 0.03, N = 3 SE +/- 0.02, N = 3 SE +/- 0.04, N = 3 SE +/- 0.05, N = 3 2.13 2.16 2.12 2.12 MIN: 1.99 / MAX: 3.7 MIN: 2.04 / MAX: 8.11 MIN: 1.96 / MAX: 3.59 MIN: 1.96 / MAX: 5.5 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
NCNN Target: Vulkan GPU - Model: mobilenet OpenBenchmarking.org ms, Fewer Is Better NCNN 20230517 Target: Vulkan GPU - Model: mobilenet a b c d 1.107 2.214 3.321 4.428 5.535 SE +/- 0.02, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 SE +/- 0.01, N = 3 4.89 4.92 4.92 4.91 MIN: 4.76 / MAX: 7.99 MIN: 4.77 / MAX: 7.57 MIN: 4.74 / MAX: 6.99 MIN: 4.79 / MAX: 6.53 1. (CXX) g++ options: -O3 -rdynamic -lgomp -lpthread
VkFFT Test: FFT + iFFT C2C 1D batched in half precision OpenBenchmarking.org Benchmark Score, More Is Better VkFFT 1.3.4 Test: FFT + iFFT C2C 1D batched in half precision a b c d 30K 60K 90K 120K 150K SE +/- 190.55, N = 3 SE +/- 506.81, N = 3 SE +/- 136.79, N = 3 SE +/- 377.55, N = 3 151912 151910 152866 151969 1. (CXX) g++ options: -O3
ViennaCL Test: CPU BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TT a b c d 30 60 90 120 150 SE +/- 1.53, N = 3 SE +/- 2.18, N = 5 SE +/- 4.73, N = 3 SE +/- 1.20, N = 3 137 138 140 136 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-TN a b c d 30 60 90 120 150 SE +/- 1.20, N = 3 SE +/- 0.93, N = 5 SE +/- 0.58, N = 3 SE +/- 0.58, N = 3 141 140 141 140 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NT a b c d 30 60 90 120 150 SE +/- 0.88, N = 3 SE +/- 0.77, N = 5 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 125 125 124 124 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMM-NN a b c d 30 60 90 120 150 SE +/- 0.88, N = 3 SE +/- 1.29, N = 5 SE +/- 2.67, N = 3 SE +/- 2.65, N = 3 135 137 139 141 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-T a b c d 150 300 450 600 750 SE +/- 17.19, N = 3 SE +/- 3.65, N = 5 SE +/- 1.20, N = 3 SE +/- 10.68, N = 3 686 699 691 696 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dGEMV-N a b c d 90 180 270 360 450 SE +/- 0.33, N = 3 SE +/- 2.90, N = 5 SE +/- 1.86, N = 3 SE +/- 8.51, N = 3 411 408 405 418 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dDOT a b c d 300 600 900 1200 1500 SE +/- 3.33, N = 3 SE +/- 2.00, N = 5 SE +/- 3.33, N = 3 SE +/- 3.33, N = 3 1247 1238 1243 1247 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dAXPY a b c d 400 800 1200 1600 2000 SE +/- 23.33, N = 3 SE +/- 29.93, N = 5 SE +/- 3.33, N = 3 SE +/- 10.00, N = 3 1803 1806 1837 1830 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - dCOPY a b c d 400 800 1200 1600 2000 SE +/- 44.85, N = 3 SE +/- 17.15, N = 5 SE +/- 41.63, N = 3 SE +/- 14.53, N = 3 2027 1948 1920 1917 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sDOT a b c d 140 280 420 560 700 SE +/- 4.18, N = 3 SE +/- 5.77, N = 5 SE +/- 4.33, N = 3 SE +/- 3.51, N = 3 667 664 666 663 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sAXPY a b c d 800 1600 2400 3200 4000 SE +/- 14.53, N = 3 SE +/- 9.80, N = 5 SE +/- 16.67, N = 3 SE +/- 15.28, N = 3 3943 3924 3917 3920 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: CPU BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: CPU BLAS - sCOPY a b c d 600 1200 1800 2400 3000 SE +/- 20.00, N = 3 SE +/- 29.56, N = 5 SE +/- 3.33, N = 3 SE +/- 23.33, N = 3 2920 2892 2907 2857 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TT a b c d 1500 3000 4500 6000 7500 SE +/- 0.00, N = 3 SE +/- 11.55, N = 3 SE +/- 3.33, N = 3 SE +/- 0.00, N = 3 7070 7070 7057 7070 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-TN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-TN a b c d 1500 3000 4500 6000 7500 SE +/- 17.64, N = 3 SE +/- 46.31, N = 3 SE +/- 15.28, N = 3 SE +/- 45.09, N = 3 7027 7067 7000 7070 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NT OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NT a b c d 1600 3200 4800 6400 8000 SE +/- 3.33, N = 3 SE +/- 8.82, N = 3 SE +/- 3.33, N = 3 SE +/- 0.00, N = 3 7527 7537 7537 7540 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMM-NN OpenBenchmarking.org GFLOPs/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMM-NN a b c d 1500 3000 4500 6000 7500 SE +/- 31.80, N = 3 SE +/- 58.97, N = 3 SE +/- 31.80, N = 3 SE +/- 35.28, N = 3 7057 7093 7037 7053 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-T OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-T a b c d 70 140 210 280 350 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 308 308 308 307 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dGEMV-N OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dGEMV-N a b c d 20 40 60 80 100 SE +/- 0.13, N = 3 SE +/- 0.12, N = 3 SE +/- 0.26, N = 3 SE +/- 0.21, N = 3 81.2 81.5 81.2 81.4 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dDOT a b c d 120 240 360 480 600 SE +/- 0.88, N = 3 SE +/- 0.33, N = 3 SE +/- 1.15, N = 3 SE +/- 1.00, N = 3 550 552 552 553 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dAXPY a b c d 200 400 600 800 1000 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 1.20, N = 3 SE +/- 0.88, N = 3 799 798 799 799 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - dCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - dCOPY a b c d 130 260 390 520 650 SE +/- 0.33, N = 3 SE +/- 0.58, N = 3 SE +/- 0.88, N = 3 SE +/- 0.58, N = 3 603 604 604 604 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sDOT OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sDOT a b c d 60 120 180 240 300 SE +/- 0.00, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.67, N = 3 282 282 283 283 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sAXPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sAXPY a b c d 90 180 270 360 450 SE +/- 2.60, N = 3 SE +/- 1.33, N = 3 SE +/- 2.33, N = 3 SE +/- 3.93, N = 3 420 427 427 426 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
ViennaCL Test: OpenCL BLAS - sCOPY OpenBenchmarking.org GB/s, More Is Better ViennaCL 1.7.1 Test: OpenCL BLAS - sCOPY a b c d 70 140 210 280 350 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 SE +/- 0.33, N = 3 316 316 316 316 1. (CXX) g++ options: -fopenmp -O3 -rdynamic -lOpenCL
VkResample Upscale: 2x - Precision: Double OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Double a b c d 6 12 18 24 30 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 SE +/- 0.00, N = 3 24.30 24.29 24.29 24.30 1. (CXX) g++ options: -O3
VkResample Upscale: 2x - Precision: Single OpenBenchmarking.org ms, Fewer Is Better VkResample 1.0 Upscale: 2x - Precision: Single a b c d 1.177 2.354 3.531 4.708 5.885 SE +/- 0.000, N = 3 SE +/- 0.001, N = 3 SE +/- 0.001, N = 3 SE +/- 0.002, N = 3 5.230 5.231 5.230 5.230 1. (CXX) g++ options: -O3
cl-mem Benchmark: Write OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Write a b c d 500 1000 1500 2000 2500 SE +/- 1.31, N = 3 SE +/- 0.88, N = 3 SE +/- 0.99, N = 3 SE +/- 3.80, N = 3 2354.9 2353.4 2354.9 2352.1 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Read OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Read a b c d 200 400 600 800 1000 SE +/- 0.00, N = 3 SE +/- 0.20, N = 3 SE +/- 0.03, N = 3 SE +/- 0.03, N = 3 1045.9 1045.9 1046.1 1046.0 1. (CC) gcc options: -O2 -flto -lOpenCL
cl-mem Benchmark: Copy OpenBenchmarking.org GB/s, More Is Better cl-mem 2017-01-13 Benchmark: Copy a b c d 70 140 210 280 350 SE +/- 0.03, N = 3 SE +/- 0.12, N = 3 SE +/- 0.07, N = 3 SE +/- 0.03, N = 3 308.6 308.5 308.6 308.5 1. (CC) gcc options: -O2 -flto -lOpenCL
ArrayFire Test: Conjugate Gradient OpenCL OpenBenchmarking.org ms, Fewer Is Better ArrayFire 3.9 Test: Conjugate Gradient OpenCL a b c d 0.6746 1.3492 2.0238 2.6984 3.373 SE +/- 0.003, N = 3 SE +/- 0.005, N = 3 SE +/- 0.005, N = 3 SE +/- 0.003, N = 3 2.997 2.983 2.998 2.997 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Global Memory Bandwidth OpenBenchmarking.org GBPS, More Is Better clpeak 1.1.2 OpenCL Test: Global Memory Bandwidth a b c d 700 1400 2100 2800 3500 SE +/- 0.33, N = 3 SE +/- 0.20, N = 3 SE +/- 0.27, N = 3 SE +/- 0.04, N = 3 3483.99 3484.06 3483.95 3484.32 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Single-Precision Float OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Single-Precision Float a b c d 14K 28K 42K 56K 70K SE +/- 0.43, N = 3 SE +/- 0.85, N = 3 SE +/- 0.56, N = 3 SE +/- 0.91, N = 3 64545.62 64547.74 64547.25 64520.97 1. (CXX) g++ options: -O3
clpeak OpenCL Test: Integer Compute INT OpenBenchmarking.org GIOPS, More Is Better clpeak 1.1.2 OpenCL Test: Integer Compute INT a b c d 7K 14K 21K 28K 35K SE +/- 2.54, N = 3 SE +/- 0.09, N = 3 SE +/- 0.26, N = 3 SE +/- 8.15, N = 3 33119.10 33144.74 33146.12 33129.34 1. (CXX) g++ options: -O3
FinanceBench Benchmark: Black-Scholes OpenCL OpenBenchmarking.org ms, Fewer Is Better FinanceBench 2016-07-25 Benchmark: Black-Scholes OpenCL a b c d 0.9839 1.9678 2.9517 3.9356 4.9195 SE +/- 0.010, N = 3 SE +/- 0.004, N = 3 SE +/- 0.010, N = 3 SE +/- 0.016, N = 3 4.347 4.373 4.351 4.339 1. (CXX) g++ options: -O3 -march=native -fopenmp
clpeak OpenCL Test: Double-Precision Double OpenBenchmarking.org GFLOPS, More Is Better clpeak 1.1.2 OpenCL Test: Double-Precision Double a b c d 7K 14K 21K 28K 35K SE +/- 0.74, N = 3 SE +/- 0.74, N = 3 SE +/- 18.62, N = 3 SE +/- 1.51, N = 3 32959.17 32961.21 32941.99 32933.63 1. (CXX) g++ options: -O3
Phoronix Test Suite v10.8.5